Вы находитесь на странице: 1из 63

Machine Learning / Data

Mining in Learning Analytics


Prof. Dr. Mohamed Amine Chatti
M.Sc. Arham Muslim

Social Computing Group​, University of Duisburg-Essen


www.uni-due.de/soco/
Course Content
Big
Data Personalization
What?
Why?
Data, Environments, Context
Objectives
Recommender
Systems
Hadoop
Ecosystem

Learning Analytics
Machine
Learning / Learner
Data Mining Modeling
Information
Visualization &
Visual Analytics

How? Who?
Privacy
Methods Stakeholders
Social
Network
Analysis (SNA)

Machine Learning in Learning Analytics 2


Motivation
• LA Objectives
• Predicting student performance
• Intelligent feedback
• Personalization / Recommendation
• Detecting at-risk students
• Grouping students
• Student modeling
• …

Classification, prediction, clustering

Machine Learning / Data mining methods

Machine Learning in Learning Analytics 3


Machine Learning (ML)
• It is a field of study that gives the ability to the computer for self-learn
without being explicitly programmed (Arthur Samuel, 1959)

• A computer algorithm/program is said to learn from performance measure


𝑃 and experience 𝐸 with some class of tasks 𝑇 if its performance at tasks in
𝑇, as measured by 𝑃, improves with experience 𝐸 (Tom M. Mitchell, 1997)

• Machine learning systems learn how to combine input to produce useful


predictions on never-before-seen data (Google)

Machine Learning in Learning Analytics 4


Machine Learning
• Machine Learning is using data to answer questions

Training Prediction
using answer
data questions

Data Training Model Predictions


answer
questions

Machine Learning in Learning Analytics 5


The 7 Steps of Machine Learning
1. Gathering data 1. Data
gathering

2. Preparing that data 2. Data


• E.g. feature engineering - map raw data to features, split in training preparation

and test sets


3. Model
3. Choosing a model selection

4. Training 4. Training

5. Evaluation
5.
6. Parameter tuning Evaluation

• E.g. adding or removing features 6.


Parameter
tuning
7. Prediction
• E.g. “spam” or “not spam” 7.
Prediction

Machine Learning in Learning Analytics 6


Data Mining
• Data mining is the analysis step of the knowledge discovery in databases
process, or KDD
• Process is iterative: If results are not satisfying, change the process and try
again (change parameters, more data, different data representations, …)
Visualization,
Evaluation
Data Knowledge
Mining
Transformation, Patterns
Selection,
Projection
Data Cleaning, Task-relevant
Data Integration Data

Data Warehouse

Databases

Machine Learning in Learning Analytics 7


Data Mining
• Data mining (knowledge discovery in Databases)
• Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful)
information or patterns from data in large databases
• Alternative names: Knowledge discovery in databases (KDD), knowledge extraction,
data/pattern analysis

• Roots of data mining


• Statistics
• Machine Learning
• Database Systems
• Information Visualization

• Data Mining = extraction of patterns from data


• Patterns
• Regularities – examples: clusters, frequent itemsets
• Irregularities – examples: outliers

Machine Learning in Learning Analytics 8


Machine Learning vs. Data Mining
• Machine learning focuses on prediction, based on known properties learned
from the training data (supervised learning)

• Data mining focuses on the discovery of (previously) unknown properties in


the data (unsupervised learning)

• Both employ the same methods and overlap significantly

Machine Learning in Learning Analytics 9


Machine Learning Tasks – Dataset 1
Data about 860 recently deceased persons
to study the effects of drinking, smoking,
and body weight on the life expectancy

Drinker Smoker Weight Age


Yes Yes 120 44
No No 70 96
Yes No 72 88
Yes Yes 55 52
No Yes 94 56
No No 62 93
… … … …

Questions:
• What is the effect of smoking and drinking on a person’s bodyweight?
• Do people that smoke also drink?
• What factors influence a person’s life expectancy the most?
• Can we identify groups of people having a similar lifestyle?

Machine Learning in Learning Analytics 10


Machine Learning Tasks – Dataset 2
Data about 240 students to investigate relationships among course
grades and the student’s overall performance in the Bachelor program
Operations Workflow
Liner algebra Logic Programming … Duration Results
research systems
9 8 8 9 9 … 36 Cum laude
7 6 - 8 8 … 42 Passed
- - 5 4 6 … 54 Failed
8 6 6 6 5 … 38 Passed
6 7 6 - 8 … 39 Passed
9 9 9 9 8 … 38 Cum laude
5 5 - 6 6 … 52 Failed
… … … … … … … …
Questions:
• Will new student X pass or fail?
• Are the marks of courses highly correlated?
• Which electives do excellent students (cum laude) take?
• Which courses significantly delay the moment of graduation?
• Why do students drop out?
• Can one identify groups of students having a similar study behavior?
Machine Learning in Learning Analytics 11
Machine Learning Tasks – Dataset 3
Data about 240 customer orders in a coffee bar
recorded by the cash register

Cappuccino Latte Expresso Americano Ristretto Tea Muffin Bagel


1 0 0 0 0 0 1 0
0 2 0 0 0 0 1 1
0 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 1 2 0
0 0 0 1 1 0 0 0
… … … … … … … …

Questions:
• Which products are frequently purchased together?
• When do people buy a particular product?
• Is it possible to characterize typical customer groups?

Machine Learning in Learning Analytics 12


Machine Learning Tasks
• Classification
• Mining patterns that can classify future data into known classes
• E.g. Will new student X pass or fail?

• Clustering
• Identifying a set of similar groups in the data
• E.g. Can we identify groups of people having a similar lifestyle?

• Association Rule Mining


• Mining any rule of the form X -> Y, where X and Y are sets of data items
• E.g. Which products are frequently purchased together?

Machine Learning in Learning Analytics 13


Key ML Terminology
• Example is a particular instance of data, 𝒙 (a data set consists of examples /
instances)
• Features are input variables describing our data: 𝒙𝒊
• In the spam detector example, the features could include the following:
• subject line
• words in the email text
• sender's email address
• time of day the email was sent
• Label is the thing (class, number) we're predicting: 𝒚
• Labeled example has {features, label}: (𝒙, 𝒚)
• Unlabeled example has {features, ?}: (𝒙, ? )

Machine Learning in Learning Analytics 14


Key ML Terminology
• Models
• Model maps examples to predicted labels: 𝒚′
• Training means creating or learning the model: you show the model labeled
examples and enable the model to gradually learn the relationships between features
and label
• Prediction / Inference means applying the trained model to unlabeled examples: you
use the trained model to make useful predictions
• Classification vs. Regression
• A classification model predicts discrete values
• Is a given email message spam or not spam?
• Is this an image of a dog, a cat, or a hamster?
• A regression model predicts continuous values
• What is the value of a house in Duisburg?
• What is the probability that a user will click on this ad?

Machine Learning in Learning Analytics 15


Variables
• Data set (sample or table) consists of examples / instances (row in a table)
• Variables are often referred to as features / attributes (column in a table)
• Two types:
• Categorical (discrete) variables
• Has only a finite set of values
• Ordinal (high-med-low, grades) or
• Nominal (true-false, color, profession)
• Numerical (continuous) variables
• Has real numbers as values (e.g. Temperature, height, weight)
• Ordered, cannot be enumerated easily

Machine Learning in Learning Analytics 16


Machine Learning Tasks – Dataset 1
Data about 860 recently deceased persons
to study the effects of drinking, smoking,
and body weight on the life expectancy

Drinker Smoker Weight Age


Yes Yes 120 44
No No 70 96
Yes No 72 88
Yes Yes 55 52
No Yes 94 56
No No 62 93
… … … …

• Example / Instance?
• Features / Attributes?
• Labeled / Unlabeled data?
• Discrete / Continuous variables?
• Nominal / Ordinal variables?

Machine Learning in Learning Analytics 17


Machine Learning Tasks – Dataset 2
Data about 240 students to investigate relationships among course
grades and the student’s overall performance in the Bachelor program

Operations Workflow
Liner algebra Logic Programming … Duration Results
research systems
9 8 8 9 9 … 36 Cum laude
7 6 - 8 8 … 42 Passed
- - 5 4 6 … 54 Failed
8 6 6 6 5 … 38 Passed
6 7 6 - 8 … 39 Passed
9 9 9 9 8 … 38 Cum laude
5 5 - 6 6 … 52 Failed
… … … … … … … …

• Example / Instance?
• Features / Attributes?
• Labeled / Unlabeled data?
• Discrete / Continuous variables?
• Nominal / Ordinal variables?

Machine Learning in Learning Analytics 18


Machine Learning Tasks – Dataset 3
Data about 240 customer orders in a coffee bar
recorded by the cash register

Cappuccino Latte Expresso Americano Ristretto Tea Muffin Bagel


1 0 0 0 0 0 1 0
0 2 0 0 0 0 1 1
0 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 1 2 0
0 0 0 1 1 0 0 0
… … … … … … … …

• Example / Instance?
• Features / Attributes?
• Labeled / Unlabeled data?
• Discrete / Continuous variables?
• Nominal / Ordinal variables?

Machine Learning in Learning Analytics 19


Types of Machine Learning
• Supervised Learning
• Teaches machines by example
• Looking for something specific (supervised)
• Trying to predict a specific class or quantity
• Have training examples with labels (labeled data)
• Methods
• Classification: Predicts categorical class labels (label is categorical variable)
• Regression: Predicts unknown or missing values (label is numerical variable)

Machine Learning in Learning Analytics 20


Types of Machine Learning
• Unsupervised Learning
• Trying to “understand” the data
• Looking for structure or unusual patterns
• Not looking for something specific (unsupervised)
• Does not require labeled data (unlabeled data)
• Methods
• Clustering
• Frequent Pattern Mining
• Outlier Detection

Machine Learning in Learning Analytics 21


Classification

Machine Learning in Learning Analytics 22


Classification
• Setting
• Class labels are known for small set of “training data”
• Task
• Find models/functions/rules that
b b a
a b
• Describe and distinguish classes
• Predict class membership for “new” objects
b
b b
• Classification = supervised learning a a
• Training set contains labeled items
• New data is classified based on the training set a aa a
• Classifier predicts class labels a
Machine Learning in Learning Analytics 23
Classification – Examples
• Predict new applicant‘s loan eligibility
Previous customers Classifier Decision rules
Salary > 5 L
Age
Salary Good/
Profession Prof. = Exec Bad
Location
Customer type
(Good/Bad)

New applicant’s data


• Predict risk potential
Training Data
Simple Classifier
ID Age Car Type Risk
1 23 Family High if Age > 50 then Risk = Low;
2 17 Sportive High
3 43 Sportive High
if Age ≤ 50 and Car Type = Truck then Risk = Low;
4 68 Family Low if Age ≤ 50 and Car Type ≠ Truck then Risk = High;
5 32 Truck Low

Machine Learning in Learning Analytics 24


Classification – Phases
• Usually, the given data set is divided into training and test sets
• Training set is used to train the classifier and build the model
• Test set is used to evaluate the classifier

• Goal: previously unseen data should be assigned a class as accurately as


possible

• Two Phases:
• Training Phase (Model Construction)
• Prediction Phase (Inference)

Machine Learning in Learning Analytics 25


Classification – Training Phase (Model Construction)
ID Age Car Type Risk Training
1 23 Family High Data
2 17 Sportive High
3 43 Sportive High Training
4 68 Family Low
5 32 Truck Low

Unknown Data Classifier Class Label

if Age > 50 then Risk = Low;


if Age ≤ 50 and Car Type = Truck then Risk = Low;
(Age = 60, Family) if Age ≤ 50 and Car Type ≠ Truck then Risk = High;

Machine Learning in Learning Analytics 26


Classification – Prediction Phase (Inference)
ID Age Car Type Risk Training
1 23 Family High Data
2 17 Sportive High
3 43 Sportive High Training
4 68 Family Low
5 32 Truck Low

Unknown Data Classifier Class Label

if Age > 50 then Risk = Low;


if Age ≤ 50 and Car Type = Truck then Risk = Low; Risk = Low
(Age = 60, Family) if Age ≤ 50 and Car Type ≠ Truck then Risk = High;

Machine Learning in Learning Analytics 27


Classification
• Major Classification Methods
• Bayesian Classifiers
• Decision Tree Classifiers
• Nearest Neighbor Classifiers

• Logistic Regression
• Support Vector Machines (SVM)
• Neural Networks

Machine Learning in Learning Analytics 28


Bayesian Classifiers
Classification

Machine Learning in Learning Analytics 29


Bayesian Classifiers – Basics
• A probabilistic framework for solving classification problems

• Performs probabilistic prediction; i.e. predicts class membership


probabilities

• Foundation: based on Bayes’ theorem

• Performance: A simple Bayesian classifier; naïve Bayesian classifier, has


comparable performance with decision tree

Machine Learning in Learning Analytics 30


Bayes‘ Theorem
• Probability theory:
𝑃(𝐴∧𝐵)
• Conditional probability: 𝑃 𝐴 𝐵 = (“probability of A given B”)
𝑃(𝐵)
• Product rule: 𝑃 𝐴 ∧ 𝐵 = 𝑃(𝐴|𝐵) ⋅ 𝑃(𝐵)

• Bayes’ theorem
• 𝑃 𝐴 ∧ 𝐵 = 𝑃(𝐴|𝐵) ⋅ 𝑃(𝐵)
• 𝑃 𝐵 ∧ 𝐴 = 𝑃(𝐵|𝐴) ⋅ 𝑃(𝐴)
• Since 𝑃 𝐴∧𝐵 =𝑃 𝐵∧𝐴 ⇒
𝑃(𝐴|𝐵) ⋅ 𝑃(𝐵) = 𝑃(𝐵|𝐴) ⋅ 𝑃(𝐴) ⇒

𝑃(𝐵|𝐴) ⋅ 𝑃(𝐴)
𝑃 𝐴𝐵 =
𝑃(𝐵)
Machine Learning in Learning Analytics 31
Bayesian Classifiers – Components
prior probabiliy
posteriori probabiliy
𝑃(𝑋|𝐶) ⋅ 𝑃(𝐶)
𝑃 𝐶𝑋 =
𝑃(𝑋)
• Let 𝑋 be a data example (“evidence”): class label is unknown
• Let 𝐶 be a hypothesis that 𝑋 belongs to class 𝐶
• Classification is to determine 𝑃 𝐶 𝑋 , (i.e., posteriori probability): the probability
that 𝑋 belongs to class 𝐶 given the observed data example 𝑋
• 𝑃(𝐶) (prior probability): the initial probability
• E.g., 𝑋 will buy computer, regardless of age, income, …
• 𝑃(𝑋): probability that example is observed
• 𝑃 𝑋 𝐶 (likelihood): the probability of observing the example 𝑋, given that the
hypothesis holds
• E.g., Given that 𝑋 will buy computer, the prob. that 𝑋 is 31..40, medium income

Machine Learning in Learning Analytics 32


Bayes Classifier
• Let 𝐷 be a training set of examples and their associated class labels, and each
example is represented by an n-Dim attribute vector 𝑋 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 )
• Suppose there are m classes 𝐶1 , 𝐶2 , … , 𝐶𝑚 .
• Classification: Assign to the class with the maximum posteriori, i.e., the
maximal 𝑃(𝐶𝑖 |𝑋) (e.g. 𝑃(𝑆𝑝𝑎𝑚|𝑋) and 𝑃(𝑁𝑜𝑡 𝑠𝑝𝑎𝑚|𝑋))
• This can be derived from Bayes’ theorem prior probabiliy
posteriori probabiliy 𝑃(𝑋|𝐶𝑖 ) ⋅ 𝑃(𝐶𝑖 )
𝑃 𝐶𝑖 𝑋 =
𝑃(𝑋)
• Since 𝑃(𝑋) is constant for all classes, only 𝑃 𝐶𝑖 𝑋 = 𝑃(𝑋|𝐶𝑖 ) ⋅ 𝑃 𝐶𝑖 needs
to be maximized

Machine Learning in Learning Analytics 33


Bayes Classifier
• Estimate the apriori probabilities 𝑃(𝐶𝑖 ) of classes 𝐶𝑖 by using the observed
frequency of the individual class labels 𝐶𝑖 in the training set, i.e.,
𝑁𝐶𝑖
𝑃 𝐶𝑖 =
𝑁

• How to estimate the values of 𝑃 𝑋|𝐶𝑖 ?

Machine Learning in Learning Analytics 34


Naïve Bayes Classifier
• A simplified assumption: attributes are conditionally independent (i.e., no dependence relation between
attributes):
𝑛

𝑃 𝑋 𝐶𝑖 = ෑ 𝑃 𝑥𝑘 𝐶𝑖 = 𝑃 𝑥1 𝐶𝑖 ⋅ 𝑃 𝑥2 𝐶𝑖 ⋅ … ⋅ 𝑃 𝑥𝑛 𝐶𝑖
𝑘=1
• If 𝑘-th attribute is categorical:
𝑃 𝑥𝑘 𝐶𝑖 is estimated as the relative frequency of samples having value 𝑥𝑘 as 𝑘-th attribute in class 𝐶𝑖 in
the training set
• If 𝑘-th attribute is continuous:
𝑃 𝑥𝑘 𝐶𝑖 can be estimated through Gaussian distribution with a mean 𝜇 and standard deviation 𝜎 and
𝑃 𝑥𝑘 𝐶𝑖 is
𝑃 𝑥𝑘 𝐶𝑖 = 𝑔(𝑥𝑘 , 𝜇𝐶𝑖 , 𝜎𝐶𝑖 )

1 (𝑥−𝜇)2

𝑔 𝑥, 𝜇, 𝜎 = 𝑒 2𝜎2
2𝜋𝜎
• Computationally easy in both cases

Machine Learning in Learning Analytics 35


Naïve Bayes Example
Outlook Temp Humidity Wind Play
Sunny Hot High False Yes
Sunny Hot High False No
Sunny Hot High False Yes
Sunny Hot High False No
Sunny Mild High True No
Overcast Mild High False Yes
Overcast Mild Normal False Yes
Overcast Mild Normal False Yes
Overcast Mild Normal False Yes
Rain Cool Normal True Yes
Rain Cool Normal True Yes
Rain Mild High True No
Rain Cool Normal True No
Rain Cool Normal True Yes

Machine Learning in Learning Analytics 36


Naïve Bayes Example

Outlook Temp Humidity Wind Play


Sunny Hot High False Yes
Sunny Hot High False No
Sunny Hot High False Yes Outlook Temp Humidity Wind Play
Sunny Hot High False No
Yes No Yes No Yes No Yes No Yes No
Sunny Mild High True No
Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5
Overcast Mild High False Yes
Overcast 4 0 Mild 4 2 Normal 6 1 True 3 3
Overcast Mild Normal False Yes
Rain 3 2 Cool 3 1
Overcast Mild Normal False Yes
Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14
Overcast Mild Normal False Yes
Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5
Rain Cool Normal True Yes
Rain 3/9 2/5 Cool 3/9 1/5
Rain Cool Normal True Yes
Rain Mild High True No
Rain Cool Normal True No
Rain Cool Normal True Yes

Machine Learning in Learning Analytics 37


Naïve Bayes Example
Outlook Temp Humidity Wind Play
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5
Overcast 4 0 Mild 4 2 Normal 6 1 True 3 3
Rain 3 2 Cool 3 1
Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14
Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5
Rain 3/9 2/5 Cool 3/9 1/5

• We want to predict “Play” on the following day:


Outlook Temp Humidity Wind Play
Sunny Cool High True ?
Bayes classifier
Choose 𝐶𝑖 that maximizes P(Yes | Sunny, Cool, High, True)
𝑃 𝐶𝑖 𝑋
P(No | Sunny, Cool, High, True)

Machine Learning in Learning Analytics 38


Naïve Bayes Example
Outlook Temp Humidity Wind Play
Naïve Bayes classifier
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5 Choose 𝐶𝑖 that maximizes
Overcast 4 0 Mild 4 2 Normal 6 1 True 3 3 𝑃 𝐶𝑖 𝑋
Rain 3 2 Cool 3 1 equivalent to
Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14 Choose 𝐶𝑖 that maximizes
Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5 𝑃(𝑋|𝐶𝑖 ) ⋅ 𝑃 𝐶𝑖
Rain 3/9 2/5 Cool 3/9 1/5 Which is according to the Naïve Bayes equation equal to
Outlook Temp Humidity Wind Play
ς𝑛𝑘=1 𝑃 𝑥𝑘 𝐶𝑖 ⋅ 𝑃 𝐶𝑖
Sunny Cool High True ?

𝑃 𝑌𝑒𝑠 𝑆𝑢𝑛𝑛𝑦, 𝐶𝑜𝑜𝑙, 𝐻𝑖𝑔ℎ, 𝑇𝑟𝑢𝑒 =


𝑃 𝑆𝑢𝑛𝑛𝑦 𝑌𝑒𝑠 ⋅ 𝑃 𝐶𝑜𝑜𝑙 𝑌𝑒𝑠 ⋅ 𝑃 𝐻𝑖𝑔ℎ 𝑌𝑒𝑠 ⋅ 𝑃 𝑇𝑟𝑢𝑒 𝑌𝑒𝑠 ⋅ 𝑃 𝑌𝑒𝑠
2/9 ⋅ 3/9 ⋅ 3/9 ⋅ 3/9 ⋅ 9/14 = 0.0053

𝑃 𝑁𝑜 𝑆𝑢𝑛𝑛𝑦, 𝐶𝑜𝑜𝑙, 𝐻𝑖𝑔ℎ, 𝑇𝑟𝑢𝑒 =


𝑃 𝑆𝑢𝑛𝑛𝑦 𝑁𝑜 ⋅ 𝑃 𝐶𝑜𝑜𝑙 𝑁𝑜 ⋅ 𝑃 𝐻𝑖𝑔ℎ 𝑁𝑜 ⋅ 𝑃 𝑇𝑟𝑢𝑒 𝑁𝑜 ⋅ 𝑃 𝑁𝑜
3/5 ⋅ 1/5 ⋅ 4/5 ⋅ 3/5 ⋅ 5/14 = 0.0206
Machine Learning in Learning Analytics 39
Decision Tree Classifiers
Classification

Machine Learning in Learning Analytics 40


Decision Tree Classifiers
• Learned function is represented as a tree item id age car type risk potential
1 young sportive high
• A flow-chart-like tree structure 2 young family low
• Internal nodes represent a test on an attribute 3 young sportive high
4 old family low
• Branch represents an outcome of the test 5 old sportive low
• Leaf nodes represent class labels 6 old sportive low

• Learned tree can be transformed into IF-THEN rules


• IF age > 60 THEN risk = low
• IF age ≤ 60 AND car_type = sportive THEN risk = high age?
• Classification steps young old
• Decision tree generation
car type? Low risk
• Traverse the tree to classify an unknown sample
• e.g. age = 18; car type = sportive -> risk = high sportive family
• Advantages
high risk low risk
• Decision trees are intuitive to most users

Machine Learning in Learning Analytics 41


Decision Tree Generation
• Basic algorithm item id age car type risk potential
1 young sportive high
• Tree is created in a top-down recursive divide-and-conquer 2 young family low
manner 3 young sportive high
• Attributes may be categorical or continuous-valued 4
5
old
old
family
sportive
low
low
• At start, all the training examples are assigned to the root 6 old sportive low
node
• Recursively partition/split the examples at each node
• Goal: find such splits which lead to as homogenous groups as
possible age?
• Split at the attribute that result in minimal heterogeneity young old
• Example: Use “age” or “car type” as split condition?
• Conditions for stopping partitioning car type? Low risk

• All examples for a given node belong to the same class sportive family
• There are no remaining attributes for further partitioning
high risk low risk
• There are no examples left

Machine Learning in Learning Analytics 42


How to determine the Best Split?
• Nodes with homogeneous class distribution (pure nodes) are preferred
• Need a measure of node impurity

C0: 5 C0: 9
C1: 5 C1: 1

Non-homogeneous, Homogeneous,
High degree of impurity Low degree of impurity

Machine Learning in Learning Analytics 43


Decision Tree - Example
Training Examples 9 yes / 5 no
• Predict if Hasan will play tennis today? Day Outlook Humidity Wind Play
D1 Sunny High Weak No
D2 Sunny High Strong No
D3 Overcast High Weak Yes
• Build a decision tree D4 Rain High Weak Yes
D5 Rain Normal Weak Yes
D6 Rain Normal Strong No
• Divide & conquer D7 Overcast Normal Strong Yes
D8 Sunny High Weak No
• Split into subsets
D9 Sunny Normal Weak Yes
• Are they pure? (all yes or all no) D10 Rain Normal Weak Yes
• If yes: stop D11 Sunny Normal Strong Yes
D12 Overcast High Strong Yes
• If not: repeat
D13 Overcast Normal Weak Yes
D14 Rain High Strong No

New Data
D15 Rain High Weak ?

Machine Learning in Learning Analytics 44


Decision Tree - Example
9 yes / 5 no

Outlook

Overcast
Sunny Rain

4 yes / 0 no

2 yes / 3 no pure subset 3 yes / 2 no


split further split further

Machine Learning in Learning Analytics 45


Decision Tree - Example
9 yes / 5 no

Outlook

Overcast
Sunny Rain
Day Outlook Humidity Wind
D3 Overcast High Weak Day Outlook Humidity Wind
Humidity D7 Overcast Normal Strong D4 Rain High Weak
D12 Overcast High Strong D5 Rain Normal Weak
D13 Overcast Normal Weak D6 Rain Normal Strong
High Normal D10 Rain Normal Weak
4 yes / 0 no D14 Rain High Strong
Day Humidity Wind Day Humidity Wind pure subset 3 yes / 2 no
D1 High Weak D9 Normal Weak
D2 High Strong D11 Normal Strong
split further
D8 High Weak

Machine Learning in Learning Analytics 46


Decision Tree - Example
9 yes / 5 no

Outlook

Overcast
Sunny Rain
Day Outlook Humidity Wind
D3 Overcast High Weak
Humidity D7 Overcast Normal Strong Wind
D12 Overcast High Strong
D13 Overcast Normal Weak
High Normal 4 yes / 0 no Weak Strong
Day Humidity Wind Day Humidity Wind pure subset Day Humidity Wind Day Humidity Wind
D1 High Weak D9 Normal Weak D4 High Weak D6 Normal Strong
D2 High Strong D11 Normal Strong D5 Normal Weak D14 High Strong
D8 High Weak D10 Normal Weak
Machine Learning in Learning Analytics 47
Decision Tree - Example

Outlook

Overcast
Sunny Rain
yes

Humidity Wind

High Normal Weak Strong


no yes yes no

New Data yes


Machine Learning in Learning Analytics 48
Which attribute to split on?
9 yes / 5 no 9 yes / 5 no

Outlook Wind

Sunny Overcast Rain Weak Strong

2 yes / 3 no 4 yes / 0 no 3 yes / 2 no 6 yes / 2 no 3 yes / 3 no

• Want to measure “purity” of the split


• More certain about yes/no after the split
• Pure set (4 yes / 0 no) => completely certain (100%)
• Impure (3 yes / 3 no) => completely uncertain (50%)
• Must be symmetric: 4 yes / 0 no as pure as 0 yes / 4 no

Machine Learning in Learning Analytics 49


Split Strategies
• Given
• A set 𝑇 of training objects
• A (disjoint, complete) partitioning 𝑇1 , 𝑇2 , …, 𝑇𝑚 of 𝑇
• The relative frequencies 𝑝𝑖 of class 𝐶𝑖 in 𝑇 and in the partitions 𝑇1 , 𝑇2 , …, 𝑇𝑚
9 yes / 5 no 9 yes / 5 no 9 yes / 5 no

Outlook Humidity Wind

Sunny Overcast Rain High Normal Weak Strong

2 yes / 3 no 4 yes / 0 no 3 yes / 2 no 3 yes / 4 no 6 yes / 1 no 6 yes / 2 no 3 yes / 3 no

• Wanted
• A measure for the heterogeneity of a set 𝑆 of training objects with respect to the class
membership
• A split of 𝑇 into partitions 𝑇1 , 𝑇2 , …, 𝑇𝑚 such that the heterogeneity is minimized
• Proposals: Entropy/Information gain, Gini index

Machine Learning in Learning Analytics 50


Entropy
For two classes:
• Entropy
• The entropy of a set 𝑇 of training examples is defined as follows:
𝑘
For k classes 𝐶𝑖 with
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 = − ෍ 𝑝𝑖 ⋅ 𝑙𝑜𝑔2 𝑝𝑖
probabilities 𝑝𝑖
𝑖=1
• 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 = 0 if 𝑝𝑖 = 1 for any class 𝑐𝑖
• 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 = 1 if there are 𝑘 = 2 classes with 𝑝𝑖 = 1Τ2 for each 𝑖

• Interpretation: assume item 𝑋 belongs to 𝑇


• How many bits need to tell if X positive or negative
3 3 3 3
• Impure (3 yes / 3 no): 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = 1 𝑏𝑖𝑡𝑠
6 6 6 6

4 4 0 0
• Pure set (4 yes / 0 no): 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = 0 𝑏𝑖𝑡𝑠
4 4 4 4

Machine Learning in Learning Analytics 51


Information Gain
• The entropy of a set 𝑇 of training objects is defined as follows:
𝑘
For k classes 𝐶𝑖 with
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 = − ෍ 𝑝𝑖 ⋅ 𝑙𝑜𝑔2 𝑝𝑖 probabilities 𝑝𝑖
𝑖=1

• Let 𝐴 be the attribute that induced the partitioning 𝑇1 , 𝑇2 , …, 𝑇𝑚 of 𝑇. The


information gain of attribute 𝐴 wrt. 𝑇 is defined as follows:
𝑚
𝑇𝑖
𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑔𝑎𝑖𝑛 𝑇, 𝐴 = 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑇) − ෍ ⋅ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑇𝑖 )
𝑇
𝑖=1

Machine Learning in Learning Analytics 52


Entropy / Information Gain – Example
9 9 5 5
𝑘 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 9 yes / 5 no
14 14 14 14
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 = − ෍ 𝑝𝑖 ⋅ 𝑙𝑜𝑔2 𝑝𝑖 Wind
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 = 0.94
𝑖=1

Weak Strong

6 yes / 2 no 3 yes / 3 no
6 6 2 2 3 3 3 3
− 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2
8 8 8 8 6 6 6 6
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑊𝑖𝑛𝑑𝑊𝑒𝑎𝑘 = 0.811 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑊𝑖𝑛𝑑𝑆𝑡𝑟𝑜𝑛𝑔 = 1.0

𝑚
𝑇𝑖
𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑔𝑎𝑖𝑛 𝑇, 𝐴 = 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑇) − ෍ ⋅ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑇𝑖 )
𝑇
𝑖=1
8 6
𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑔𝑎𝑖𝑛 𝑇, 𝑊𝑖𝑛𝑑 = 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 − ⋅ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑊𝑖𝑛𝑑𝑊𝑒𝑎𝑘 − ⋅ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑊𝑖𝑛𝑑𝑆𝑡𝑟𝑜𝑛𝑔
14 14
8 6
= 0.94 − ⋅ 0.811 − ⋅ 1.0 = 0.048
14 14
Machine Learning in Learning Analytics 53
Entropy / Information Gain – Example
9 9 5 5
𝑘 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 9 yes / 5 no
14 14 14 14
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 = − ෍ 𝑝𝑖 ⋅ 𝑙𝑜𝑔2 𝑝𝑖 Humidity
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 = 0.94
𝑖=1

High Normal

3 yes / 4 no 6 yes / 1 no
3 3 4 4 6 6 1 1
− 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2
7 7 7 7 7 7 7 7
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦𝐻𝑖𝑔ℎ = 0.985 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦𝑁𝑜𝑟𝑚𝑎𝑙 = 0.592

𝑚
𝑇𝑖
𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑔𝑎𝑖𝑛 𝑇, 𝐴 = 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑇) − ෍ ⋅ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑇𝑖 )
𝑇
𝑖=1
7 7
𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑔𝑎𝑖𝑛 𝑇, 𝐻𝑢𝑚𝑖𝑑𝑡𝑦 = 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 − ⋅ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦𝐻𝑖𝑔ℎ − ⋅ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦𝑁𝑜𝑟𝑚𝑎𝑙
14 14
7 7
= 0.94 − ⋅ 0.985 − ⋅ 0.592 = 0.151
14 14
Machine Learning in Learning Analytics 54
Entropy / Information Gain – Example
9 9 5 5
𝑘 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 9 yes / 5 no
14 14 14 14
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 = − ෍ 𝑝𝑖 ⋅ 𝑙𝑜𝑔2 𝑝𝑖 Outlook
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 = 0.94
𝑖=1

Sunny Overcast Rain 3 3 2 2


2 2 3 3 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2
− 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 5 5 5 5
5 5 5 5 2 yes / 3 no 4 yes / 0 no 3 yes / 2 no
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑅𝑎𝑖𝑛 = 0.971
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑆𝑢𝑛𝑛𝑦 = 0.971
4 4 0 0
− 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2
4 4 4 4
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡 = 0

𝑚
𝑇𝑖
𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑔𝑎𝑖𝑛 𝑇, 𝐴 = 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑇) − ෍ ⋅ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑇𝑖 )
𝑇
𝑖=1
𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑔𝑎𝑖𝑛 𝑇, 𝑂𝑢𝑡𝑙𝑜𝑜𝑘
5 4 5
= 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑇 − ⋅ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑆𝑢𝑛𝑛𝑦 − ⋅ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡 − ⋅ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑅𝑎𝑖𝑛
14 14 14
5 4 5
= 0.94 − ⋅ 0.971 − ⋅0− ⋅ 0.971 = 0.246
14 14 14
Machine Learning in Learning Analytics 55
Entropy / Information Gain – Example
entropy = 0.940 entropy = 0.940 entropy = 0.940
9 yes / 5 no 9 yes / 5 no 9 yes / 5 no

Outlook Humidity Wind

Sunny Overcast Rain High Normal Weak Strong

2 yes / 3 no 4 yes / 0 no 3 yes / 2 no 3 yes / 4 no 6 yes / 1 no 6 yes / 2 no 3 yes / 3 no


0.971 0 0.971 0.985 0.592 0.811 1.0

5 4 5
𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑔𝑎𝑖𝑛 𝑇, 𝑂𝑢𝑡𝑙𝑜𝑜𝑘 = 0.94 − ⋅ 0.971 − ⋅0− ⋅ 0.971 = 0.246
14 14 14
7 7
𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑔𝑎𝑖𝑛 𝑇, 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.94 − ⋅ 0.985 − ⋅ 0.592 = 0.151
14 14
9 yes / 5 no
8 6
𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑔𝑎𝑖𝑛 𝑇, 𝑊𝑖𝑛𝑑 = 0.94 − ⋅ 0.811 − ⋅ 1.0 = 0.048 Outlook
14 14

• Result: “Outlook” yields the highest information gain Sunny Overcast Rain

? yes ?
Machine Learning in Learning Analytics 56
Entropy / Information Gain – Example
Day Outlook Humidity Wind Play
D1 Sunny High Weak No Final Decision Tree
D2 Sunny High Strong No
D3 Overcast High Weak Yes {1, …, 14}
D4 Rain High Weak Yes
Outlook
D5 Rain Normal Weak Yes
D6 Rain Normal Strong No
D7 Overcast Normal Strong Yes Overcast
D8 Sunny High Weak No Sunny Rain
yes {3, 7, 12, 13}
D9 Sunny Normal Weak Yes
D10 Rain Normal Weak Yes {1, 2, 8, 9, 11} {4, 5, 6, 10, 14}
D11 Sunny Normal Strong Yes Humidity Wind
D12 Overcast High Strong Yes
D13 Overcast Normal Weak Yes
D14 Rain High Strong No High Normal Weak Strong
no {1, 2, 8} yes {9, 11} no {6, 14} yes {4, 5, 10}

Machine Learning in Learning Analytics 57


Nearest Neighbor Classifiers
Classification

Machine Learning in Learning Analytics 58


Nearest Neighbor Classifiers
• Instance-based learning
• Store training examples and delay the processing (“lazy evaluation”) until a new
instance must be classified
• Typical approaches : k-nearest neighbor approach

• Eager evaluation
• Create models from data (training phase) and then use these models for classification
(test phase)
• Examples: Decision tree, Bayes classifier

Machine Learning in Learning Analytics 59


k-Nearest Neighbor (kNN)
• Intuition: Nearby things should have the
same class
• Distance function
• Defines the (dis-)similarity for pairs of objects

• Algorithm
• New object x
• Compute distance to every training example
• Select k-Neighborhood (x): k closest instances
• Label x with most frequent class in k-
Neighborhood (x) (majority vote)

Machine Learning in Learning Analytics 60


kNN Example
• Example: Risk potential of a 25 year car driver with 210 max speed? (1-NN?,
3-NN?)

Machine Learning in Learning Analytics 61


kNN – Parameter k
• Problem of choosing an appropriate value for parameter k
• k too small: high sensitivity against outliers
• k too large: decision set contains many objects from other classes
• Empirically, 1 ≪ 𝑘 < 10 yields a high classification accuracy in many cases

Machine Learning in Learning Analytics 62


Classification – Summary
• Classification = supervised learning
• Training set contains labeled items
• New data is classified based on the training set
• Classifier predicts class labels

• Bayesian Classifiers
• A statistical classifier: performs probabilistic prediction; i.e. predicts class membership probabilities
• Based on Bayes’ theorem

• Decision Tree Classifiers


• Learned function is represented as a tree
• Tree is created in a top-down recursive divide-and-conquer manner

• Nearest Neighbor Classifiers


• k-Nearest Neighbor (kNN) approach
• Intuition: Nearby things should have the same class
• Distance function: Defines the similarity for pairs of objects

Machine Learning in Learning Analytics 63

Вам также может понравиться