Академический Документы
Профессиональный Документы
Культура Документы
RANDOM FOREST
Ensemble learning technique ( collection of many individual components to create a big tree). used as boosting
technique ( sum of the whole is more than sum of individual parts ). Forest -->collection of trees. Many decision
trees (each tree = one model with different subsets of features(different combinations of feature) and subsets of
data) -> combine output of all these trees -> predict sample based on the maximum occurence of output from the
different DT models. Hence it makes up for the difficiencies in the individual models.
The term ensemble is used when more than one machine learning model/algorithm is bundled to give out the
average
Outcome of all models are taken and majority voting is made the decision
Random forest has classifier for classification and regressor for regression
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF1_sayan_Titanic.html 1/9
9/21/2018 Random forest
In [10]: data.head()
Out[10]:
Boat Unnamed:
Name Age Class/Dept Ticket Joined Job
[Body] 7
AB??-AL-
MUN??, Mr 3rd Class 2699?18
0 27 Cherbourg ? 15? NaN
N??s??f Passenger 15s 9d
Q??sim
ABBOTT,
3rd Class CA2673?
2 Mrs Rhoda 39 Southampton ? A? NaN
Passenger 20 5s
Mary 'Rosa'
ABBOTT, Mr
3rd Class CA2673?
3 Rossmore 16 Southampton Jeweller ? ?[190] NaN
Passenger 20 5s
Edward
ABBOTT, Mr
3rd Class CA2673?
4 Eugene 13 Southampton Scholar ? ?? NaN
Passenger 20 5s
Joseph
data['Name']= data['Name'].apply(cleanup)
data['Boat [Body]']= data['Boat [Body]'].apply(cleanup)
data['Age'] = data['Age'].apply(pd.to_numeric, errors='coerce')
data = data[["Name","Age","Class/Dept","Boat [Body]"]]
data.head()
Out[11]:
Name Age Class/Dept Boat [Body]
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF1_sayan_Titanic.html 2/9
9/21/2018 Random forest
data["Crew/Pass"]=data["Class/Dept"].apply(checkPass)
data.head()
Out[12]:
Name Age Class/Dept Boat [Body] Crew/Pass
2 ABBOTT, Mrs Rhoda Mary 'Rosa' 39.0 3rd Class Passenger A Passenger
data["Class"]=data["Class/Dept"].apply(checkClass)
data.head()
Out[13]:
Boat
Name Age Class/Dept Crew/Pass Class
[Body]
3rd Class
0 AB -AL-MUN , Mr N s f Q sim 27.0 15 Passenger 3rd
Passenger
3rd Class
1 ABBING, Mr Anthony 42.0 Passenger 3rd
Passenger
3rd Class
4 ABBOTT, Mr Eugene Joseph 13.0 Passenger 3rd
Passenger
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF1_sayan_Titanic.html 3/9
9/21/2018 Random forest
data["Adult/Child"]=data["Age"].apply(checkAdult)
data.head()
Out[14]:
Boat
Name Age Class/Dept Crew/Pass Class Adult/Child
[Body]
3rd Class
1 ABBING, Mr Anthony 42.0 Passenger 3rd Adult
Passenger
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF1_sayan_Titanic.html 4/9
9/21/2018 Random forest
In [16]: data["Gender"]=data["Name"].apply(checkGender)
data.head()
Out[16]:
Boat
Name Age Class/Dept Crew/Pass Class Adult/Child Gender
[Body]
ABBOTT, Mrs
3rd Class
2 Rhoda Mary 39.0 A Passenger 3rd Adult Female
Passenger
'Rosa'
ABBOTT, Mr
3rd Class
3 Rossmore 16.0 [190] Passenger 3rd Child Male
Passenger
Edward
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF1_sayan_Titanic.html 5/9
9/21/2018 Random forest
Out[17]:
Boat
Name Age Class/Dept Crew/Pass Class Adult/Child Gender Survival
[Body]
AB -AL-
MUN , Mr 3rd Class
0 27.0 15 Passenger 3rd Adult Male 1
NsfQ Passenger
sim
ABBING,
3rd Class
1 Mr 42.0 Passenger 3rd Adult Male 1
Passenger
Anthony
ABBOTT,
Mrs
3rd Class
2 Rhoda 39.0 A Passenger 3rd Adult Female 1
Passenger
Mary
'Rosa'
ABBOTT,
Mr 3rd Class
3 16.0 [190] Passenger 3rd Child Male 0
Rossmore Passenger
Edward
ABBOTT,
Mr 3rd Class
4 13.0 Passenger 3rd Child Male 1
Eugene Passenger
Joseph
In [18]: data.groupby(['Crew/Pass'])['Survival'].sum()*100/data.groupby(['Crew/Pass'])[
'Survival'].count()
Out[18]: Crew/Pass
Crew 90.217391
Passenger 90.310651
Name: Survival, dtype: float64
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF1_sayan_Titanic.html 6/9
9/21/2018 Random forest
compare("Class",data)
Out[19]: Class
1st 89.714286
2nd 88.395904
3rd 91.396333
Crew 90.217391
Name: Survival, dtype: float64
In [20]: compare("Gender",data)
Out[20]: Gender
Female 95.840555
Male 88.557743
Name: Survival, dtype: float64
In [21]: compare("Adult/Child",data)
Out[21]: Adult/Child
Adult 89.699955
Child 95.964126
Name: Survival, dtype: float64
In [22]: trainingData=data[["Age","Crew/Pass","Class","Adult/Child","Gender","Survival"
]]
trainingData.head()
Out[22]:
Age Crew/Pass Class Adult/Child Gender Survival
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF1_sayan_Titanic.html 7/9
9/21/2018 Random forest
catData=trainingData[["Crew/Pass","Class","Adult/Child","Gender"]].apply(catTo
Num)
trainingData[["Crew/Pass","Class","Adult/Child","Gender"]]=catData
trainingData.head()
C:\Users\hariz\Anaconda3\lib\site-packages\pandas\core\frame.py:3137: Setting
WithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
0 27.0 1 2 0 1 1
1 42.0 1 2 0 1 1
2 39.0 1 2 0 0 1
3 16.0 1 2 1 1 0
4 13.0 1 2 1 1 1
Out[24]: 2426
In [26]: test.head()
Out[26]:
Age Crew/Pass Class Adult/Child Gender Survival
1990 30.0 0 3 0 1 1
485 32.0 0 3 0 1 1
1591 17.0 1 1 1 1 1
1704 31.0 0 3 0 1 1
2318 34.0 0 3 0 1 1
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF1_sayan_Titanic.html 8/9
9/21/2018 Random forest
In [28]: clf
In [33]: checkAccuracy(clf)
Out[33]: 0.8930041152263375
In [34]: #There are known issues while installing xgboost on windows. Hence, commented
the below code
In [37]: #checkAccuracy(clf)
In [38]: #clf
file:///D:/KOMAL/SIMPLILEARN/MY%20COURSES/IN%20PROGRESS/My%20Codes_ML_DS/pdf%20conversion/htmls/komal_RF1_sayan_Titanic.html 9/9