Вы находитесь на странице: 1из 6

Assignment - Decision tree & Random Forest

Decision tree and Pruning


 Reading data and setting seed to maintain consistency in results.

rm(list=ls())

library(readr)

Data <- read.csv("Data_data.csv")

str(Data)

set.seed(123)

 Data_splittting the data to train and test data.


Data_split<-sample(2,nrow(Data),replace = TRUE,prob = c(0.7,0.3))
train1 <-Data[Data_split==1,]
test1<- Data[Data_split==2,]
 Decision tree data model building by using the rpart library.
data_model <-rpart(Class~.,data = train1,method = 'class',minData_split=3,minbucket=1)
plot(data_model )
text(data_model )

Output Decision tree.


 To get the column index of class variable.
col_index <- grep("Class",names(Data))
 Data_model prediction
data_model _predict <-predict(data_model ,test1[,-col_index],type = 'class')
mn1 <-mean(data_model _predict==test1$Class)
mn1
table(pred=data_model _predict,true=test1$Class)

true
pred acc good unacc vgood
acc 111 4 15 6
good 4 10 1 0
unacc 2 0 341 0
vgood 2 3 0 16
 Pruning implementation.
printcp(data_model )
opt<-which.min(data_model $cptable[,"xerror"])
cp <- data_model $cptable[opt,'CP']
Pruned_data_model <- prune(data_model ,cp)
plot(Pruned_data_model )
text(Pruned_data_model )
Random Forest
 Loading library and building data_model .
library(randomForest)
set.seed(100)
rf<- randomForest(Class~.,data=train1)
rf

Call:
randomForest(formula = Class ~ ., data = train1)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each Data_splitt: 2

OOB estimate of error rate: 3.3%


Confusion matrix:
acc good unacc vgood class.error
acc 258 3 3 1 0.02641509
good 4 46 0 2 0.11538462
unacc 17 1 835 0 0.02110199
vgood 8 1 0 34 0.20930233

 Prediction and Confusion matrix.


p1 <- predict(rf,test1)
library(Dataset)
confusionMatrix(p1,test1$Class)

Confusion Matrix and Statistics

Reference
Prediction acc good unacc vgood
acc 116 4 8 6
good 2 11 1 0
unacc 0 0 348 0
vgood 1 2 0 16

Overall Statistics

Accuracy : 0.9534
95% CI : (0.9315, 0.9699)
No Information Rate : 0.6932
P-Value [Acc > NIR] : < 2.2e-16

Kappa : 0.9006
Mcnemar's Test P-Value : NA

Statistics by Class:

Class: acc Class: good Class: unacc Class: vgood


Sensitivity 0.9748 0.64706 0.9748 0.72727
Specificity 0.9545 0.99398 1.0000 0.99391
Pos Pred Value 0.8657 0.78571 1.0000 0.84211
Neg Pred Value 0.9921 0.98802 0.9461 0.98790
Prevalence 0.2311 0.03301 0.6932 0.04272
Detection Rate 0.2252 0.02136 0.6757 0.03107
Detection Prevalence 0.2602 0.02718 0.6757 0.03689
Balanced Accuracy 0.9647 0.82052 0.9874 0.86059

 Plot
No of tress vs error.

 Tuning of the data_model


t <- tuneRF(train1[,-7],train1[,7],stepFactor = 0.5,plot = TRUE,ntreeTry = 400,trace =
TRUE,improve = 0.05)
Call:
randomForest(formula = Class ~ ., data = train1, ntree = 400, mtry = 4,
importance = TRUE, proximity = TRUE)
Type of random forest: classification
Number of trees: 400
No. of variables tried at each Data_splitt: 4

OOB estimate of error rate: 1.81%


Confusion matrix:
acc good unacc vgood class.error
acc 257 4 3 1 0.03018868
good 1 48 0 3 0.07692308
unacc 8 1 844 0 0.01055100
vgood 0 1 0 42 0.02325581
 Running random forest with optimum number of trees and variables.
rf <- randomForest(Class~.,
data=train1,ntree=400,mtry=4,importance=TRUE,proximity=TRUE)
p2<-predict(rf,test1)
confusionMatrix(p2,test1$Class)

Confusion Matrix and Statistics

Reference
Prediction acc good unacc vgood
acc 109 0 8 0
good 3 15 2 0
unacc 3 0 347 0
vgood 4 2 0 22

Overall Statistics

Accuracy : 0.9573
95% CI : (0.936, 0.973)
No Information Rate : 0.6932
P-Value [Acc > NIR] : < 2.2e-16

Kappa : 0.9096
Mcnemar's Test P-Value : NA

Statistics by Class:

Class: acc Class: good Class: unacc Class: vgood


Sensitivity 0.9160 0.88235 0.9720 1.00000
Specificity 0.9798 0.98996 0.9810 0.98783
Pos Pred Value 0.9316 0.75000 0.9914 0.78571
Neg Pred Value 0.9749 0.99596 0.9394 1.00000
Prevalence 0.2311 0.03301 0.6932 0.04272
Detection Rate 0.2117 0.02913 0.6738 0.04272
Detection Prevalence 0.2272 0.03883 0.6796 0.05437
Balanced Accuracy 0.9479 0.93616 0.9765 0.99391

Вам также может понравиться