Академический Документы
Профессиональный Документы
Культура Документы
Website:www.tertiarycourses.com.my
Email: enquiry@tertiaryinfotech.com
About the Trainer
Dr Ghazaleh Babanejad has received Phd from
University Putra Malaysia in Faculty of
Computer Science and Information Technology..She is
working on recommender systems in the field of skyline
queries over Dynamic and Incomplete databases for her
PhD thesis. She is also working on Data Science field as a
trainer and Data Scientist. She worked on Machine
Learning and Process Mining projects. She also has
several international certificates in Practical Machine
Learning (John Hopkins University) Mining Massive
Datasets (Stanford University), Process Mining
(Eindhoven University), Hadoop (University of San Diego),
MongoDB for DBAs (MongoDB Inc) and some
other certificates. She has more than 5 year experience
as a lecturer and data base administrator.
Agenda
Module 1 Introduction to Machine Learning
- What is Machine Learning
- Installing mlr package
- Supervised vs Unsupervised Learning
- Regression vs Classification
Module 2 Datasets
- Iris Dataset
- Boston Housing Price Dataset
- Mtcars Dataset
Module 3 Preprocessing
- Sampling
- Impute missing values
- Normalize columns
- Split data into train and test set
Agenda
Module 4 Regression based Models
https://github.com/rkrtiwari/rMachineLe
arning
Module 1
Getting Started
What is Machine Learning?
• Machine Learning is about building
programs with tunable parameters that
are adjusted automatically so as to
improve their behavior by adapting to
previously seen data
• Machine Learning is a subfield of
Artificial Intelligence
Why Machine Learning?
http://www.goratings.org/
Machine Learning
• Supervised Learning
– Classification
– Regression
• Unsupervised Learning
– Clustering
– Dimensionality Reduction
R Packages for ML
• rpart
• randomForest
• e1071
• glmnet
• nnet
• class
• FNN
• Xgboost
• lda
Installing and Loading R ML
Packages
install.packages(“mlr")
library(mlr)
Module 2
Datasets
Iris Flower Dataset
Iris Flower Dataset
iris2.1=subset(iris,
select=c("Sepal.Length","Sepal.Width"))
# select only these 2 columns
iris2.2=iris[1:100, ]
# select the first 100 rows
Sampling
## random sampling
data(airquality)
aqr=airquality
summary(aqr)
summary(iris2.4)
Train and test set
## create train and test set
nr <- nrow(iris)
inTrain <- sample(1:nr, 0.6*nr)
iris.train <- iris[inTrain,]
iris.test <- iris[-inTrain,]
Module 4
Regression Models
What is Supervised Learning
• In Supervised Learning, we have a
dataset consisting of both features and
labels.
• The input data (X) is associated with a
target label (y)
Supervised Learning Examples
• Spam Email Filter
• Tumor Classification
Classification Steps
# Step 1 Load classifer library
library(package)
# Step 3 Training
model <- classifier(y ~ ., data = train)
# Step 4: Prediction
class <- predict(model, data = test)
Multiple Linear
Regression
Split the Boston dataset
### Splitting data
data(Boston)
nr <- nrow(Boston)
inTrain <- sample(1:nr, 0.6*nr)
bh.train <- Boston[inTrain,]
bh.test <- Boston[-inTrain,]
Making tasks and learner
### Making Tasks
library(mlr)
names(mod)
getLearnerModel(mod)
Make predictions
regr.pred = predict(mod, newdata =
bh.test)
regr.pred
performance(regr.pred, measures =
list(rmse))
head(getPredictionTruth(regr.pred))
head(getPredictionResponse(regr.pred))
Visualize Results
plotLearnerPrediction(regr.lrn,
features="lstat", task=regr.task)
Ex: Multiple Linear Regression
Use MLR regressor to build a model to
predict media house price (MEDV) using
boston dataset
Time: 5 mins
Logistic Regression
Classifier
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
log.task = makeClassifTask(id = "ir2", data =
ir2.train, target= "Species")
log.task
log.lrn = makeLearner("classif.logreg")
# predict.type = "prob" >if you want
probabililties)
log.lrn
Train the model
mod = train(log.lrn, log.task)
mod
names(mod)
getLearnerModel(mod)
Make Prediction
log.pred = predict(mod, newdata =
ir2.test)
log.pred
performance(log.pred, measures =
list(mmce, acc))
head(getPredictionTruth(log.pred))
head(getPredictionResponse(log.pred))
Verify Model
### Confusion Matrix
calculateConfusionMatrix(log.pred)
df =
generateThreshVsPerfData(log.pred,
measures = list(fpr, tpr,mmce))
plotROCCurves(df)
Visualize Results
plotLearnerPrediction(log.lrn,
features=c("Petal.Length","Petal.Width"),
task=log.task)
Ex: Logistic Regression Classifier
Use Logistic regression to build a model to
predict am variable using mtcars dataset
Time: 5 mins
Linear Discriminant
Analysis
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
lda.task = makeClassifTask(id = "ir2", data =
ir2.train, target= "Species")
lda.task
lda.lrn = makeLearner("classif.lda")
lda.lrn
Train the model
mod = train(lda.lrn, lda.task)
mod
names(mod)
getLearnerModel(mod)
Make Prediction
lda.pred = predict(mod, newdata =
ir2.test)
lda.pred
performance(lda.pred, measures =
list(mmce, acc))
head(getPredictionTruth(lda.pred))
head(getPredictionResponse(lda.pred))
Verify Model
### Confusion Matrix
calculateConfusionMatrix(lda.pred)
Visualize Results
plotLearnerPrediction(lda.lrn,
features=c("Petal.Length","Petal.Width"),
task=lda.task)
Ex: Linear Discriminant
Use LDA to build a model to predict gear
variable using mtcars dataset
Time: 5 mins
Decision Tree
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
rpart.task = makeClassifTask(id = "ir2", data
= ir2.train, target= "Species")
rpart.task
rpart.lrn = makeLearner("classif.rpart")
rpart.lrn
Train the model
mod = train(rpart.lrn, rpart.task)
mod
names(mod)
getLearnerModel(mod)
Make Prediction
rpart.pred = predict(mod, newdata =
ir2.test)
rpart.pred
performance(rpart.pred, measures =
list(mmce, acc))
head(getPredictionTruth(rpart.pred))
head(getPredictionResponse(rpart.pred))
Verify Model
### Confusion Matrix
calculateConfusionMatrix(rpart.pred)
Visualize Results
plotLearnerPrediction(rpart.lrn,
features=c("Petal.Length","Petal.Width"),
task=rpart.task)
Ex: Decision Tree
Use Decision Tree to build a model to
predict gear variable using mtcars dataset
Time: 5 mins
Random Forest
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
rf.task = makeClassifTask(id = "ir2", data =
ir2.train, target= "Species")
rf.task
rf.lrn = makeLearner("classif.randomForest")
rpart.lrn
Train the model
mod = train(rf.lrn, rf.task)
mod
names(mod)
getLearnerModel(mod)
Make Prediction
rf.pred = predict(mod, newdata =
ir2.test)
rf.pred
performance(rf.pred, measures =
list(mmce, acc))
head(getPredictionTruth(rf.pred))
head(getPredictionResponse(rf.pred))
Verify Model
### Confusion Matrix
calculateConfusionMatrix(rf.pred)
Visualize Results
plotLearnerPrediction(rf.lrn,
features=c("Petal.Length","Petal.Width"),
task=rf.task)
Ex: Random Forest
Use Random Forest to build a model to
predict gear variable using mtcars
dataset
Time: 5 mins
Gradient Booster
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
gbm.task = makeClassifTask(id = "ir2", data
= ir2.train, target= "Species")
gbm.task
gbm.lrn = makeLearner("classif.gbm")
gbm.lrn
Train the model
mod = train(gbm.lrn, gbm.task)
mod
names(mod)
getLearnerModel(mod)
Make Prediction
gbm.pred = predict(mod, newdata =
ir2.test)
gbm.pred
performance(gbm.pred, measures =
list(mmce, acc))
head(getPredictionTruth(gbm.pred))
head(getPredictionResponse(gbm.pred))
Verify Model
### Confusion Matrix
calculateConfusionMatrix(gbm.pred)
Visualize Results
plotLearnerPrediction(gbm.lrn,
features=c("Petal.Length","Petal.Width"),
task=gbm.task)
Ex: Gradient Boost
Use GBM tree to build a model to predict
gear variable using mtcars dataset
Time: 5 mins
XG Boost
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
xg.task = makeClassifTask(id = "ir2", data =
ir2.train, target= "Species")
xg.task
xg.lrn = makeLearner("classif.xgboost")
xg.lrn
Train the model
mod = train(xg.lrn, xg.task)
mod
names(mod)
getLearnerModel(mod)
Make Prediction
xg.pred = predict(mod, newdata =
ir2.test)
xg.pred
performance(xg.pred, measures =
list(mmce, acc))
head(getPredictionTruth(xg.pred))
head(getPredictionResponse(xg.pred))
Verify Model
### Confusion Matrix
calculateConfusionMatrix(xg.pred)
Visualize Results
plotLearnerPrediction(xg.lrn,
features=c("Petal.Length","Petal.Width"),
task=xg.task)
Ex: XG Boost
Use XG boost tree to build a model to
predict gear variable using mtcars
dataset
Time: 5 mins
Naïve Bayes
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
nb.task = makeClassifTask(id = "ir2", data =
ir2.train, target= "Species")
nb.task
nb.lrn = makeLearner("classif.naiveBayes")
nb.lrn
Train the model
mod = train(nb.lrn, nb.task)
mod
names(mod)
getLearnerModel(mod)
Make Prediction
nb.pred = predict(mod, newdata =
ir2.test)
nb.pred
performance(nb.pred, measures =
list(mmce, acc))
head(getPredictionTruth(nb.pred))
head(getPredictionResponse(nb.pred))
Verify Model
### Confusion Matrix
calculateConfusionMatrix(nb.pred)
Visualize Results
plotLearnerPrediction(nb.lrn,
features=c("Petal.Length","Petal.Width"),
task=nb.task)
Ex: naiveBayes
Use naiveBayes to build a model to
predict gear variable using mtcars
dataset
Time: 5 mins
k Nearest Neighbour
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
knn.task = makeClassifTask(id = "ir2", data =
ir2.train, target= "Species")
knn.task
knn.lrn = makeLearner("classif.knn")
knn.lrn
Train the model
mod = train(knn.lrn, knn.task)
mod
names(mod)
getLearnerModel(mod)
Make Prediction
knn.pred = predict(mod, newdata =
ir2.test)
knn.pred
performance(knn.pred, measures =
list(mmce, acc))
head(getPredictionTruth(knn.pred))
head(getPredictionResponse(knn.pred))
Verify Model
### Confusion Matrix
calculateConfusionMatrix(knn.pred)
Visualize Results
plotLearnerPrediction(knn.lrn,
features=c("Petal.Length","Petal.Width"),
task=knn.task)
Ex: k Nearest Neighbour
Use kNN to build a model to predict gear
variable using mtcars dataset
Time: 5 mins
Support Vector
Machines
Split the Iris Dataset
### Splitting data
data(iris)
iris2=subset(iris, subset=iris$Species
%in% c("versicolor","virginica"))
iris2$Species=factor(iris2$Species)
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
svm.task = makeClassifTask(id = "ir2", data
= ir2.train, target= "Species")
svm.task
svm.lrn = makeLearner("classif.svm")
svm.lrn
Train the model
mod = train(svm.lrn, svm.task)
mod
names(mod)
getLearnerModel(mod)
Make Prediction
svm.pred = predict(mod, newdata =
ir2.test)
svm.pred
performance(svm.pred, measures =
list(mmce, acc))
head(getPredictionTruth(svm.pred))
head(getPredictionResponse(svm.pred))
Verify Model
### Confusion Matrix
calculateConfusionMatrix(svm.pred)
Visualize Results
plotLearnerPrediction(svm.lrn,
features=c("Petal.Length","Petal.Width"),
task=svm.task)
Ex: support vector machines
Use svm to build a model to predict am
variable using mtcars dataset
Time: 5 mins
Unsupervised
Learning
What is Unsupervised Learning
• In Supervised Learning, we have a
dataset consisting of both features but
without labels.
• The most common method is cluster
analysis, which is used for exploratory
data analysis to find hidden patterns or
grouping in data.
Unsupervised Learning Examples
• Image grouping
• Grouping of drug molecules
K-means clustering
Split the Iris Dataset
### Splitting data
data(iris)
nr <- nrow(iris)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris[inTrain,-5]
ir2.test <- iris[-inTrain,-5]
ir2.Class<- iris[-inTrain,5]
## clustering only deals with numeric
data
Making task and learner
kmeans.task = makeClassifTask(id = "ir2",
data = ir2.train)
kmeans.task
kmeans.lrn =
makeLearner("cluster.kmeans“,
centers=3)
# specify how many clusters you want
kmeans.lrn
Train the model
mod = train(kmeans.lrn, kmeans.task)
mod
names(mod)
getLearnerModel(mod)
Make Prediction
kmeans.pred = predict(mod, newdata =
ir2.test)
svm.pred
head(getPredictionResponse(kmeans.pr
ed))
Visualize Results
plotLearnerPrediction(kmeans.lrn,
features=c("Petal.Length","Petal.Width"),
task=kmeans.task)
Ex: kMeans
Use kmeans to build a cluster the mtcars
dataset into 5 groups
Time: 5 mins
Dimensionality Reduction - PCA
cor(iris[,1:4])
names(iris)
plot(iris$Sepal.Length, col = iris$Species)
plot(iris$Sepal.Width, col = iris$Species)
summary(pc2)
vars <- apply(pc2$x, 2, var)
props <- vars / sum(vars)
cumsum(props)
barplot(cumsum(props))
Neural Network
(optional)
One Layer MLP
Split the Iris Dataset
### Splitting data
data(iris)
nr <- nrow(iris)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris[inTrain,]
ir2.test <- iris[-inTrain,]
Making task and learner
nn.task = makeClassifTask(id = "ir2", data =
ir2.train, target= "Species"))
nn.task
nn.lrn = makeLearner("classif.nnet")
Train the model
mod = train(nn.lrn, nn.task)
mod
names(mod)
getLearnerModel(mod)
Make Prediction
nn.pred = predict(mod, newdata =
iris.test)
nn.pred
performance(nn.pred, measures =
list(mmce, acc))
head(getPredictionTruth(nn.pred))
head(getPredictionResponse(nn.pred))
Verify Model
### Confusion Matrix
calculateConfusionMatrix(nn.pred)
Visualize Results
plotLearnerPrediction(nn.lrn,
features=c("Petal.Length","Petal.Width"),
task=nn.task)
Ex: Neural Network
Use neural nets to build a model to
predict gear variable using mtcars
dataset
Time: 5 mins
Summary
Parting
Message
Q&A
Feedback
https://goo.gl/EDezXH
136
Thank You!
Ghazaleh Babanejad
ghazaleh.babanejad@gmail.com
01123005257