R Code - Decision Tree and Random Forest

Assignment - Decision tree & Random Forest
Decision tree and Pruning

 Reading data and setting seed to maintain consistency in results.
rm(list=ls())
library(readr)
Data <- read.csv("Data_data.csv")
str(Data)
set.seed(123)
 Data_splittting the data to train and test data.

Data_split<-sample(2,nrow(Data),replace = TRUE,prob = c(0.7,0.3))
train1 <-Data[Data_split==1,]
test1<- Data[Data_split==2,]
 Decision tree data model building by using the rpart library.
data_model <-rpart(Class~.,data = train1,method = 'class',minData_split=3,minbucket=1)
plot(data_model )
text(data_model )
Output Decision tree.

 To get the column index of class variable.
col_index <- grep("Class",names(Data))
 Data_model prediction
data_model _predict <-predict(data_model ,test1[,-col_index],type = 'class')
mn1 <-mean(data_model _predict==test1$Class)
mn1
table(pred=data_model _predict,true=test1$Class)
true
pred acc good unacc vgood
acc 111 4 15 6
good 4 10 1 0
unacc 2 0 341 0
vgood 2 3 0 16
 Pruning implementation.
printcp(data_model )
opt<-which.min(data_model $cptable[,"xerror"])
cp <- data_model $cptable[opt,'CP']
Pruned_data_model <- prune(data_model ,cp)
plot(Pruned_data_model )
text(Pruned_data_model )
Random Forest
 Loading library and building data_model .
library(randomForest)
set.seed(100)
rf<- randomForest(Class~.,data=train1)
rf
Call:
randomForest(formula = Class ~ ., data = train1)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each Data_splitt: 2
OOB estimate of error rate: 3.3%

Confusion matrix:
acc good unacc vgood class.error
acc 258 3 3 1 0.02641509
good 4 46 0 2 0.11538462
unacc 17 1 835 0 0.02110199
vgood 8 1 0 34 0.20930233
 Prediction and Confusion matrix.

p1 <- predict(rf,test1)
library(Dataset)
confusionMatrix(p1,test1$Class)
Confusion Matrix and Statistics
Reference
Prediction acc good unacc vgood
acc 116 4 8 6
good 2 11 1 0
unacc 0 0 348 0
vgood 1 2 0 16
Overall Statistics
Accuracy : 0.9534
95% CI : (0.9315, 0.9699)
No Information Rate : 0.6932
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.9006
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: acc Class: good Class: unacc Class: vgood

Sensitivity 0.9748 0.64706 0.9748 0.72727
Specificity 0.9545 0.99398 1.0000 0.99391
Pos Pred Value 0.8657 0.78571 1.0000 0.84211
Neg Pred Value 0.9921 0.98802 0.9461 0.98790
Prevalence 0.2311 0.03301 0.6932 0.04272
Detection Rate 0.2252 0.02136 0.6757 0.03107
Detection Prevalence 0.2602 0.02718 0.6757 0.03689
Balanced Accuracy 0.9647 0.82052 0.9874 0.86059
 Plot
No of tress vs error.
 Tuning of the data_model

t <- tuneRF(train1[,-7],train1[,7],stepFactor = 0.5,plot = TRUE,ntreeTry = 400,trace =
TRUE,improve = 0.05)
Call:
randomForest(formula = Class ~ ., data = train1, ntree = 400, mtry = 4,
importance = TRUE, proximity = TRUE)
Type of random forest: classification
Number of trees: 400
No. of variables tried at each Data_splitt: 4
OOB estimate of error rate: 1.81%

Confusion matrix:
acc good unacc vgood class.error
acc 257 4 3 1 0.03018868
good 1 48 0 3 0.07692308
unacc 8 1 844 0 0.01055100
vgood 0 1 0 42 0.02325581
 Running random forest with optimum number of trees and variables.
rf <- randomForest(Class~.,
data=train1,ntree=400,mtry=4,importance=TRUE,proximity=TRUE)
p2<-predict(rf,test1)
confusionMatrix(p2,test1$Class)
Confusion Matrix and Statistics
Reference
Prediction acc good unacc vgood
acc 109 0 8 0
good 3 15 2 0
unacc 3 0 347 0
vgood 4 2 0 22
Overall Statistics
Accuracy : 0.9573
95% CI : (0.936, 0.973)
No Information Rate : 0.6932
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.9096
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: acc Class: good Class: unacc Class: vgood

Sensitivity 0.9160 0.88235 0.9720 1.00000
Specificity 0.9798 0.98996 0.9810 0.98783
Pos Pred Value 0.9316 0.75000 0.9914 0.78571
Neg Pred Value 0.9749 0.99596 0.9394 1.00000
Prevalence 0.2311 0.03301 0.6932 0.04272
Detection Rate 0.2117 0.02913 0.6738 0.04272
Detection Prevalence 0.2272 0.03883 0.6796 0.05437
Balanced Accuracy 0.9479 0.93616 0.9765 0.99391

R Code - Decision Tree and Random Forest

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

R Code - Decision Tree and Random Forest

Загружено:

Авторское право:

Доступные форматы

Assignment - Decision tree & Random Forest

Decision tree and Pruning

Data <- read.csv("Data_data.csv")

 Data_splittting the data to train and test data.

Output Decision tree.

OOB estimate of error rate: 3.3%

 Prediction and Confusion matrix.

Confusion Matrix and Statistics

Class: acc Class: good Class: unacc Class: vgood

 Tuning of the data_model

OOB estimate of error rate: 1.81%

Confusion Matrix and Statistics

Class: acc Class: good Class: unacc Class: vgood

Вам также может понравиться