Essentials of Machine Learning Algorithms (With Python and R Codes) PDF

2/6/2016
EssentialsofMachineLearningAlgorithms(withPythonandRCodes)
Introduction
Googlesselfdrivingcarsandrobotsgetalotofpress,butthecompanysrealfuture
is in machine learning, the technology that enables computers to get smarter and
morepersonal.
EricSchmidt(GoogleChairman)
Weareprobablylivinginthemostdefiningperiodofhumanhistory.Theperiodwhencomputing
movedfromlargemainframestoPCstocloud.Butwhatmakesitdefiningisnotwhathashappened,
butwhatiscomingourwayinyearstocome.
What makes this period exciting for some one like me is the democratization of the tools and
techniques, which followed the boost in computing. Today, as a data scientist, I can build data
crunching machines with complex algorithms for a few dollors per hour. But, reaching here wasnt
easy!Ihadmydarkdaysandnights.
Whocanbenefitthemostfromthisguide?
WhatIamgivingouttodayisprobablythemostvaluableguide,Ihaveevercreated.
Theideabehindcreatingthisguideistosimplifythejourneyofaspiringdatascientistsandmachine
learning enthusiasts across the world. Through this guide, I will enable you to work on machine
learning problems and gain from experience. I am providing a high level understanding about
variousmachinelearningalgorithmsalongwithR&Pythoncodestorunthem.Theseshould
besufficienttogetyourhandsdirty.
http://www.analyticsvidhya.com/blog/2015/08/commonmachinelearningalgorithms/
1/20
2/6/2016
Ihavedeliberatelyskippedthestatisticsbehindthesetechniques,asyoudontneedtounderstand
thematthestart.So,ifyouarelookingforstatisticalunderstandingofthesealgorithms,youshould
lookelsewhere.But,ifyouarelookingtoequipyourselftostartbuildingmachinelearningproject,
youareinforatreat.
Broadly,thereare3typesofMachineLearning
Algorithms..
1.SupervisedLearning
Howitworks:This algorithm consist of a target / outcome variable (or dependent variable) which
istobepredictedfromagivensetofpredictors(independentvariables).Usingthesesetofvariables,
wegenerateafunctionthatmapinputstodesiredoutputs.Thetrainingprocesscontinuesuntilthe
modelachievesadesiredlevelofaccuracyonthetrainingdata.ExamplesofSupervisedLearning:
Regression,DecisionTree,RandomForest,KNN,LogisticRegressionetc.
2.UnsupervisedLearning
2/20
2/6/2016
2.UnsupervisedLearning
Howitworks:Inthisalgorithm,wedonothaveanytargetoroutcomevariabletopredict/estimate.
It is used for clustering population in different groups, which is widely used for segmenting
customers in different groups for specific intervention. Examples of Unsupervised Learning:Apriori
algorithm,Kmeans.
3.ReinforcementLearning:
Howitworks:Usingthisalgorithm,themachineistrainedtomakespecificdecisions.Itworksthis
way:themachineisexposedtoanenvironmentwhereittrainsitselfcontinuallyusingtrialanderror.
Thismachinelearnsfrompastexperienceandtriestocapturethebestpossibleknowledgetomake
accuratebusinessdecisions.ExampleofReinforcementLearning:MarkovDecisionProcess
ListofCommonMachineLearningAlgorithms
Hereisthelistofcommonlyusedmachinelearningalgorithms.Thesealgorithmscanbeappliedto
almostanydataproblem:
1.LinearRegression
2.LogisticRegression
3.DecisionTree
4.SVM
5.NaiveBayes
6.KNN
7.KMeans
8.RandomForest
9.DimensionalityReductionAlgorithms
10.GradientBoost&Adaboost
1.LinearRegression
It is used to estimate real values (cost of houses, number of calls, total sales etc.) based on
continuous variable(s). Here, we establish relationship between independent and dependent
3/20
2/6/2016
variables by fitting a best line. This best fit line is known as regression line and represented by a
linearequationY=a*X+b.
Thebestwaytounderstandlinearregressionistorelivethisexperienceofchildhood.Letussay,you
askachildinfifthgradetoarrangepeopleinhisclassbyincreasingorderofweight,withoutasking
themtheirweights!Whatdoyouthinkthechildwilldo?He/shewouldlikelylook(visuallyanalyze)
attheheightandbuildofpeopleandarrangethemusingacombinationofthesevisibleparameters.
Thisislinearregressioninreallife!Thechildhasactuallyfiguredoutthatheightandbuildwouldbe
correlatedtotheweightbyarelationship,whichlooksliketheequationabove.
Inthisequation:
YDependentVariable
aSlope
XIndependentvariable
bIntercept
These coefficients a and b are derived based on minimizing the sum of squared difference of
distancebetweendatapointsandregressionline.
Look at the below example. Here we have identified the best fit line having linear
equationy=0.2811x+13.9.Nowusingthisequation,wecanfindtheweight,knowingtheheightofa
person.
4/20
2/6/2016
LinearRegressionisofmainlytwotypes:SimpleLinearRegressionandMultipleLinearRegression.
Simple Linear Regression is characterized by one independent variable. And, Multiple Linear
Regression(as the name suggests) is characterized by multiple (more than 1) independent
variables.Whilefindingbestfitline,youcanfitapolynomialorcurvilinearregression.Andtheseare
knownaspolynomialorcurvilinearregression.
PythonCode
#ImportLibrary
#Importothernecessarylibrarieslikepandas,numpy...
fromsklearnimportlinear_model
#LoadTrainandTestdatasets
#Identifyfeatureandresponsevariable(s)andvaluesmustbenumericandnumpyarrays
x_train=input_variables_values_training_datasets
y_train=target_variables_values_training_datasets
x_test=input_variables_values_test_datasets
#Createlinearregressionobject
linear=linear_model.LinearRegression()
#Trainthemodelusingthetrainingsetsandcheckscore
linear.fit(x_train,y_train)
linear.score(x_train,y_train)
#EquationcoefficientandIntercept
print('Coefficient:\n',linear.coef_)
print('Intercept:\n',linear.intercept_)
#PredictOutput
predicted=linear.predict(x_test)
RCode
#LoadTrainandTestdatasets
#Identifyfeatureandresponsevariable(s)andvaluesmustbenumericandnumpyarrays
x_train<input_variables_values_training_datasets
5/20
2/6/2016
y_train<target_variables_values_training_datasets
x_test<input_variables_values_test_datasets
x<cbind(x_train,y_train)
linear<lm(y_train~.,data=x)
summary(linear)
#PredictOutput
predicted=predict(linear,x_test)
2.LogisticRegression
Dontgetconfusedbyitsname!Itisaclassificationnotaregressionalgorithm.Itisusedtoestimate
discrete values ( Binary values like 0/1, yes/no, true/false ) based on given set of independent
variable(s). In simple words, it predicts the probability of occurrence of an event by fitting data to
a logit function. Hence, it is also known as logit regression. Since, it predicts the probability,
itsoutputvaluesliesbetween0and1(asexpected).
Again,letustryandunderstandthisthroughasimpleexample.
Letssayyourfriendgivesyouapuzzletosolve.Thereareonly2outcomescenarioseitheryou
solveitoryoudont.Nowimagine,thatyouarebeinggivenwiderangeofpuzzles/quizzesinan
attempt to understand which subjects you are good at. The outcome to this study would be
somethinglikethisifyouaregivenatrignometrybasedtenthgradeproblem,youare70%likelyto
solveit.Ontheotherhand,ifitisgradefifthhistoryquestion,theprobabilityofgettinganansweris
only30%.ThisiswhatLogisticRegressionprovidesyou.
Comingtothemath,thelogoddsoftheoutcomeismodeledasalinearcombinationofthepredictor
variables.
odds=p/(1p)=probabilityofeventoccurrence/probabilityofnoteventoccurrence
ln(odds)=ln(p/(1p))
6/20
2/6/2016
logit(p)=ln(p/(1p))=b0+b1X1+b2X2+b3X3....+bkXk
Above, p is the probability of presence of the characteristic of interest. It chooses parameters that
maximizethelikelihoodofobservingthesamplevaluesratherthanthatminimizethesumofsquared
errors(likeinordinaryregression).
Now,youmayask,whytakealog?Forthesakeofsimplicity,letsjustsaythatthisisoneofthebest
mathematicalwaytoreplicateastepfunction.Icangoinmoredetails,butthatwillbeatthepurpose
ofthisarticle.
PythonCode
#ImportLibrary
fromsklearn.linear_modelimportLogisticRegression
#Assumedyouhave,X(predictor)andY(target)fortrainingdatasetandx_test(predictor)ofte
st_dataset
#Createlogisticregressionobject
model=LogisticRegression()
model.fit(X,y)
model.score(X,y)
7/20
2/6/2016
#EquationcoefficientandIntercept
print('Coefficient:\n',model.coef_)
print('Intercept:\n',model.intercept_)
#PredictOutput
predicted=model.predict(x_test)
RCode
logistic<glm(y_train~.,data=x,family='binomial')
summary(logistic)
#PredictOutput
predicted=predict(logistic,x_test)
Furthermore..
Therearemanydifferentstepsthatcouldbetriedinordertoimprovethemodel:
includinginteractionterms
removingfeatures
regularizationtechniques
usinganonlinearmodel
3.DecisionTree
ThisisoneofmyfavoritealgorithmandIuseitquitefrequently.Itisatypeofsupervisedlearning
algorithm that is mostly used for classification problems. Surprisingly, it works for both categorical
and continuous dependent variables. In this algorithm, we split the population into two or more
homogeneoussets.Thisisdonebasedonmostsignificantattributes/independentvariablestomake
8/20
2/6/2016
asdistinctgroupsaspossible.Formoredetails,youcanread:DecisionTreeSimplified.
source:statsexchange
In the image above, you can see that population is classified into four different groups based on
multiple attributes to identify if they will play or not. To split the population into different
heterogeneousgroups,itusesvarioustechniqueslikeGini,InformationGain,Chisquare,entropy.
The best way to understand how decision tree works, is to play Jezzball a classic game from
Microsoft (image below). Essentially, you have a room with moving walls and you need to create
wallssuchthatmaximumareagetsclearedoffwithouttheballs.
So,everytimeyousplittheroomwithawall,youaretryingtocreate2differentpopulationswithin
9/20
2/6/2016
the same room. Decision trees work in very similar fashion by dividing a population in as different
groupsaspossible.
More:SimplifiedVersionofDecisionTreeAlgorithms
PythonCode
#ImportLibrary
#Importothernecessarylibrarieslikepandas,numpy...
fromsklearnimporttree
st_dataset
#Createtreeobject
model=tree.DecisionTreeClassifier(criterion='gini')#forclassification,hereyoucanchanget
healgorithmasginiorentropy(informationgain)bydefaultitisgini
#model=tree.DecisionTreeRegressor()forregression
model.fit(X,y)
model.score(X,y)
#PredictOutput
RCode
library(rpart)
#growtree
fit<rpart(y_train~.,data=x,method="class")
summary(fit)
#PredictOutput
predicted=predict(fit,x_test)
10/20
2/6/2016
4.SVM(SupportVectorMachine)
It is a classification method. In this algorithm, we plot each data item as a point in ndimensional
space(wherenisnumberoffeaturesyouhave)withthevalueofeachfeaturebeingthevalueofa
particularcoordinate.
Forexample,ifweonlyhadtwofeatureslikeHeightandHairlengthofanindividual,wedfirstplot
these two variables in two dimensional space where each point has two coordinates (these co
ordinatesareknownasSupportVectors)
Now,wewillfindsomelinethatsplitsthedatabetweenthetwodifferentlyclassifiedgroupsofdata.
Thiswillbethelinesuchthatthedistancesfromtheclosestpointineachofthetwogroupswillbe
farthestaway.
In the example shown above, the line which splits the data into two differently classified groups is
the black line, since the two closest points are the farthest apart from the line. This line is our
classifier. Then, depending on where the testing data lands on either side of the line, thats what
classwecanclassifythenewdataas.
More:SimplifiedVersionofSupportVectorMachine
11/20
2/6/2016
ThinkofthisalgorithmasplayingJezzBallinndimensionalspace.Thetweaksinthegame
are:
Youcandrawlines/planesatanyangles(ratherthanjusthorizontalorverticalasinclassicgame)
Theobjectiveofthegameistosegregateballsofdifferentcolorsindifferentrooms.
Andtheballsarenotmoving.
PythonCode
#ImportLibrary
fromsklearnimportsvm
st_dataset
#CreateSVMclassificationobject
model=svm.svc()#thereisvariousoptionassociatedwithit,thisissimpleforclassificatio
n.Youcanreferlink,formo#redetail.
model.fit(X,y)
model.score(X,y)
#PredictOutput
RCode
library(e1071)
#Fittingmodel
fit<svm(y_train~.,data=x)
summary(fit)
#PredictOutput
12/20
2/6/2016
5.NaiveBayes
It is a classification technique based on Bayes theorem with an assumption of independence
between predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a
particular feature in a class is unrelated to the presence of any other feature. For example, a fruit
maybeconsideredtobeanappleifitisred,round,andabout3inchesindiameter.Evenifthese
featuresdependoneachotherorupontheexistenceoftheotherfeatures,anaiveBayesclassifier
wouldconsiderallofthesepropertiestoindependentlycontributetotheprobabilitythatthisfruitisan
apple.
Naive Bayesian model is easy to build and particularly useful for very large data sets.Along with
simplicity,NaiveBayesisknowntooutperformevenhighlysophisticatedclassificationmethods.
BayestheoremprovidesawayofcalculatingposteriorprobabilityP(c|x)fromP(c),P(x)andP(x|c).
Lookattheequationbelow:
Here,
P(c|x)istheposteriorprobabilityofclass(target)givenpredictor(aribute).
P(c)isthepriorprobabilityofclass.
P(x|c)isthelikelihoodwhichistheprobabilityofpredictorgivenclass.
P(x)isthepriorprobabilityofpredictor.
Example: Lets understand it using an example. Below I have a training data set of weather and
correspondingtargetvariablePlay.Now,weneedtoclassifywhetherplayerswillplayornotbased
onweathercondition.Letsfollowthebelowstepstoperformit.
Step1:Convertthedatasettofrequencytable
Step2:CreateLikelihoodtablebyfindingtheprobabilitieslikeOvercastprobability=0.29and
probabilityofplayingis0.64.
13/20
2/6/2016
Step3:Now,useNaiveBayesianequationtocalculatetheposteriorprobabilityforeachclass.The
classwiththehighestposteriorprobabilityistheoutcomeofprediction.
Problem:Playerswillpayifweatherissunny,isthisstatementiscorrect?
Wecansolveitusingabovediscussedmethod,soP(Yes|Sunny)=P(Sunny|Yes)*P(Yes)/P
(Sunny)
HerewehaveP(Sunny|Yes)=3/9=0.33,P(Sunny)=5/14=0.36,P(Yes)=9/14=0.64
Now,P(Yes|Sunny)=0.33*0.64/0.36=0.60,whichhashigherprobability.
Naive Bayes uses a similar method to predict the probability of different class based on various
attributes. This algorithm is mostly used in text classification and with problems having multiple
classes.
PythonCode
#ImportLibrary
fromsklearn.naive_bayesimportGaussianNB
st_dataset
#CreateSVMclassificationobjectmodel=GaussianNB()#thereisotherdistributionformultino
mialclasseslikeBernoulliNaiveBayes,Referlink
14/20
2/6/2016
model.fit(X,y)
#PredictOutput
RCode
library(e1071)
#Fittingmodel
fit<naiveBayes(y_train~.,data=x)
summary(fit)
#PredictOutput
6.KNN(KNearestNeighbors)
It can be used for both classification and regression problems. However, it is more widely used in
classification problems in the industry. K nearest neighbors is a simple algorithm that stores all
available cases and classifies new cases by a majority vote of its k neighbors. The case being
assigned to the class is most common amongst its K nearest neighbors measured by a distance
function.
These distance functions can be Euclidean, Manhattan, Minkowski and Hamming distance. First
threefunctionsareusedforcontinuousfunctionandfourthone(Hamming)forcategoricalvariables.
IfK=1,thenthecaseissimplyassignedtotheclassofitsnearestneighbor.Attimes,choosingK
turnsouttobeachallengewhileperformingKNNmodeling.
More:Introductiontoknearestneighbors:Simplified.
15/20
2/6/2016
KNNcaneasilybemappedtoourreallives.Ifyouwanttolearnaboutaperson,ofwhomyouhave
noinformation,youmightliketofindoutabouthisclosefriendsandthecircleshemovesinandgain
accesstohis/herinformation!
ThingstoconsiderbeforeselectingKNN:
KNNiscomputationallyexpensive
Variablesshouldbenormalizedelsehigherrangevariablescanbiasit
WorksonpreprocessingstagemorebeforegoingforKNNlikeoutlier,noiseremoval
PythonCode
#ImportLibrary
fromsklearn.neighborsimportKNeighborsClassifier
st_dataset
#CreateKNeighborsclassifierobjectmodel
KNeighborsClassifier(n_neighbors=6)#defaultvalueforn_neighborsis5
model.fit(X,y)
#PredictOutput
RCode
16/20
2/6/2016
library(knn)
#Fittingmodel
fit<knn(y_train~.,data=x,k=5)
summary(fit)
#PredictOutput
7.KMeans
Itisatypeofunsupervisedalgorithmwhichsolvestheclusteringproblem.Itsprocedurefollowsa
simpleandeasywaytoclassifyagivendatasetthroughacertainnumberofclusters(assumek
clusters).Datapointsinsideaclusterarehomogeneousandheterogeneoustopeergroups.
Rememberfiguringoutshapesfrominkblots?kmeansissomewhatsimilarthisactivity.Youlookat
theshapeandspreadtodecipherhowmanydifferentclusters/populationarepresent!
HowKmeansformscluster:
17/20
2/6/2016
1.Kmeanspicksknumberofpointsforeachclusterknownascentroids.
2.Eachdatapointformsaclusterwiththeclosestcentroidsi.e.kclusters.
3.Findsthecentroidofeachclusterbasedonexistingclustermembers.Herewehavenewcentroids.
4.Aswehavenewcentroids,repeatstep2and3.Findtheclosestdistanceforeachdatapointfromnew
centroids and get associated with new kclusters. Repeat this process until convergence occurs i.e.
centroidsdoesnotchange.
HowtodeterminevalueofK:
In Kmeans, we have clusters and each cluster has its own centroid. Sum of square of difference
betweencentroidandthedatapointswithinaclusterconstituteswithinsumofsquarevalueforthat
cluster.Also,whenthesumofsquarevaluesforalltheclustersareadded,itbecomestotalwithin
sumofsquarevaluefortheclustersolution.
Weknowthatasthenumberofclusterincreases,thisvaluekeepsondecreasingbutifyouplotthe
resultyoumayseethatthesumofsquareddistancedecreasessharplyuptosomevalueofk,and
thenmuchmoreslowlyafterthat.Here,wecanfindtheoptimumnumberofcluster.
PythonCode
#ImportLibrary
fromsklearn.clusterimportKMeans
#Assumedyouhave,X(attributes)fortrainingdatasetandx_test(attributes)oftest_dataset
#CreateKNeighborsclassifierobjectmodel
18/20
2/6/2016
k_means=KMeans(n_clusters=3,random_state=0)
model.fit(X)
#PredictOutput
RCode
library(cluster)
fit<kmeans(X,3)#5clustersolution
8.RandomForest
Random Forest is a trademark term for an ensemble of decision trees. In Random Forest,
weve collection of decision trees (so known as Forest). To classify a new object based on
attributes, each tree gives a classification and we say the tree votes for that class. The forest
choosestheclassificationhavingthemostvotes(overallthetreesintheforest).
Eachtreeisplanted&grownasfollows:
1.If the number of cases in the training set is N, then sample of N cases is taken at random but with
replacement.Thissamplewillbethetrainingsetforgrowingthetree.
2.If there are M input variables, a number m<<M is specified such that at each node, m variables are
selectedatrandomoutoftheMandthebestsplitonthesemisusedtosplitthenode.Thevalueofmis
heldconstantduringtheforestgrowing.
3.Eachtreeisgrowntothelargestextentpossible.Thereisnopruning.
Formoredetailsonthisalgorithm,comparingwithdecisiontreeandtuningmodelparameters,I
wouldsuggestyoutoreadthesearticles:
1.IntroductiontoRandomforestSimplified
2.ComparingaCARTmodeltoRandomForest(Part1)
19/20
2/6/2016
3.ComparingaRandomForesttoaCARTmodel(Part2)
4.TuningtheparametersofyourRandomForestmodel
Python
#ImportLibrary
fromsklearn.ensembleimportRandomForestClassifier
st_dataset
#CreateRandomForestobject
model=RandomForestClassifier()
model.fit(X,y)
#PredictOutput
RCode
library(randomForest)
#Fittingmodel
fit<randomForest(Species~.,x,ntree=500)
summary(fit)
#PredictOutput
9.DimensionalityReductionAlgorithms
In the last 45 years, there has been an exponential increase in data capturing at every possible
stages. Corporates/ GovernmentAgencies/ Research organisations are not only coming with new
20/20

Essentials of Machine Learning Algorithms (With Python and R Codes) PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Essentials of Machine Learning Algorithms (With Python and R Codes) PDF

Загружено:

Авторское право:

Доступные форматы

2/6/2016

Вам также может понравиться