Pert 8

Matakuliah : M0614 / Data Mining & OLAP
Tahun : Feb - 2010
Classification and Prediction

Pertemuan 08
Learning Outcomes
Pada akhir pertemuan ini, diharapkan mahasiswa

akan mampu :
Mahasiswa dapat menggunakan teknik analisis
classification by decision tree induction, Bayesian
classification, classification by back propagation, dan
lazy learners pada data mining. (C3)
3
Bina Nusantara
Acknowledgments
These slides have been adapted from Han, J.,
Kamber, M., & Pei, Y. Data Mining: Concepts
and Technique and Tan, P.-N., Steinbach, M.,
& Kumar, V. Introduction to Data Mining.
Bina Nusantara
Outline Materi
Bayesian classification
5
Bina Nusantara
Bayesian Classification: Why?
A statistical classifier: performs probabilistic prediction, i.e., predicts
class membership probabilities
Foundation: Based on Bayes Theorem.
Performance: A simple Bayesian classifier, nave Bayesian classifier,
has comparable performance with decision tree and selected neural
network classifiers
Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct prior
knowledge can be combined with observed data
Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision making
against which other methods can be measured
June 20, 2010 Data Mining: Concepts and Techniques 6

Bayesian Theorem: Basics
Let X be a data sample (evidence): class label is unknown
Let H be a hypothesis that X belongs to class C
Classification is to determine P(H|X), (posteriori probability), the probability
that the hypothesis holds given the observed data sample X
P(H) (prior probability), the initial probability
E.g., X will buy computer, regardless of age, income,
P(X): probability that sample data is observed
P(X|H) (likelyhood), the probability of observing the sample X, given that
the hypothesis holds
E.g., Given that X will buy computer, the prob. that X is 31..40, medium
income

Bayesian Theorem
Given training data X, posteriori probability of a hypothesis H, P(H|X),
follows the Bayes theorem
P (H | X ) = P (X | H )P (H )
P (X )
Informally, this can be written as
posteriori = likelihood x prior/evidence
Predicts X belongs to C2 iff the probability P(Ci|X) is the highest among
all the P(Ck|X) for all the k classes
Practical difficulty: require initial knowledge of many probabilities,
significant computational cost

Example of Bayes Theorem
Given:
A doctor knows that meningitis causes stiff neck 50% of the time
Prior probability of any patient having meningitis is 1/50,000
Prior probability of any patient having stiff neck is 1/20
If a patient has stiff neck, whats the probability he/she has meningitis?
P ( S | M ) P ( M ) 0.5 1 / 50000
P( M | S ) = = = 0.0002
P( S ) 1 / 20
Bayesian Classifiers
Consider each attribute and class label as random variables
Given a record with attributes (A1, A2,,An)

Goal is to predict class C
Specifically, we want to find the value of C that maximizes P(C| A1,
A2,,An )
Can we estimate P(C| A1, A2,,An ) directly from data?

Bayesian Classifiers
Approach:
compute the posterior probability P(C | A1, A2, , An) for all values of C
using the Bayes theorem
P ( A A K A | C ) P (C )
P (C | A A K A ) =
1 2 n
1 2 n
P(A A K A )
1 2 n
Choose value of C that maximizes

P(C | A1, A2, , An)
Equivalent to choosing value of C that maximizes

P(A1, A2, , An|C) P(C)
How to estimate P(A1, A2, , An | C )?

Nave Bayes Classifier
Assume independence among attributes Ai when class is given:
P(A1, A2, , An |C) = P(A1| Cj) P(A2| Cj) P(An| Cj)
Can estimate P(Ai| Cj) for all Ai and Cj.
New point is classified to Cj if P(Cj) P(Ai| Cj) is maximal.

How rto
ic a l Estimate
r ic al
o u sProbabilities from Data?
o o u
g g it n s
at
e
at
e
on las
c c c c
Tid Refund Marital
Status
Taxable
Income Evade
Class: P(C) = Nc/N
e.g., P(No) = 7/10,
1 Yes Single 125K No P(Yes) = 3/10
2 No Married 100K No
3 No Single 70K No For discrete attributes:
4 Yes Married 120K No P(Ai | Ck) = |Aik|/ Nck
5 No Divorced 95K Yes where |Aik| is number of instances
6 No Married 60K No having attribute Ai and belongs to
7 Yes Divorced 220K No
class Ck
Examples:
8 No Single 85K Yes
P(Status=Married|No) = 4/7
9 No Married 75K No P(Refund=Yes|Yes)=0
10 No Single 90K Yes
10
How to Estimate Probabilities from Data?
For continuous attributes:
Discretize the range into bins
one ordinal attribute per bin
violates independence assumption
Two-way split: (A < v) or (A > v)
choose only one of the two splits as new attribute
Probability density estimation:
Assume attribute follows a normal distribution
Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
Once probability distribution is known, can use it to estimate the
conditional probability P(Ai|c)
a l a l s
u
o ric
Howegoto Estimate ric
Probabilities from Data? uo
eg
t tin ss t n a
c a c a c o cl
Tid Refund Marital Taxable Normal distribution:
Status Income Evade
1
( Ai ij ) 2
2 ij2
1 Yes Single 125K No
P( A | c ) =
i j
e
2 No Married 100K No
2 2
ij
3 No Single 70K No
One for each (Ai,ci) pair
4 Yes Married 120K No
5 No Divorced 95K Yes For (Income, Class=No):
6 No Married 60K No If Class=No
7 Yes Divorced 220K No sample mean = 110
8 No Single 85K Yes
sample variance = 2975
9 No Married 75K No
10 No Single 90K Yes
10
1
( 120 110 ) 2
P ( Income = 120 | No) = e 2 ( 2975 )

= 0.0072
2 (54.54)
Example of Nave Bayes Classifier
Given a Test Record:
X = (Refund = No, Married, Income = 120K)
naive Bayes Classifier:
P(Refund=Yes|No) = 3/7 l P(X|Class=No) = P(Refund=No|Class=No)

P(Refund=No|No) = 4/7 P(Married| Class=No)
P(Refund=Yes|Yes) = 0 P(Income=120K| Class=No)
P(Refund=No|Yes) = 1 = 4/7 4/7 0.0072 = 0.0024
P(Marital Status=Single|No) = 2/7
P(Marital Status=Divorced|No)=1/7
P(Marital Status=Married|No) = 4/7 l P(X|Class=Yes) = P(Refund=No| Class=Yes)
P(Marital Status=Single|Yes) = 2/7 P(Married| Class=Yes)
P(Marital Status=Divorced|Yes)=1/7 P(Income=120K| Class=Yes)
P(Marital Status=Married|Yes) = 0 = 1 0 1.2 10-9 = 0
For taxable income:
If class=No: sample mean=110 Since P(X|No)P(No) > P(X|Yes)P(Yes)
sample variance=2975 Therefore P(No|X) > P(Yes|X)
If class=Yes: sample mean=90
sample variance=25 => Class = No
Nave Bayes Classifier
If one of the conditional probability is zero, then the entire

expression becomes zero
Probability estimation:
N ic
Original : P ( Ai | C ) = c: number of classes
Nc
p: prior probability
N ic + 1
Laplace : P( Ai | C ) = m: parameter
Nc + c
N ic + mp
m - estimate : P( Ai | C ) =
Nc + m
Example of Nave Bayes Classifier
Name Give Birth Can Fly Live in Water Have Legs Class
human yes no no yes mammals
A: attributes
python no no no no non-mammals M: mammals
salmon no no yes no non-mammals
whale yes no yes no mammals N: non-mammals
frog no no sometimes yes non-mammals
komodo no no no yes non-mammals
6 6 2 2
bat
pigeon
yes
no
yes
yes
no
no
yes
yes
mammals
non-mammals
P ( A | M ) = = 0.06
cat yes no no yes mammals
7 7 7 7
leopard shark yes no yes no non-mammals 1 10 3 4
turtle no no sometimes yes non-mammals P ( A | N ) = = 0.0042
penguin no no sometimes yes non-mammals 13 13 13 13
porcupine yes no no yes mammals
eel no no yes no non-mammals 7
salamander no no sometimes yes non-mammals P ( A | M ) P ( M ) = 0.06 = 0.021
gila monster no no no yes non-mammals 20
platypus no no no yes mammals
owl no yes no yes non-mammals 13
dolphin yes no yes no mammals
P ( A | N ) P( N ) = 0.004 = 0.0027
eagle no yes no yes non-mammals 20
P(A|M)P(M) > P(A|N)P(N)

Give Birth Can Fly Live in Water Have Legs Class
yes no yes no ? => Mammals
Example Nave Bayesian Classifier: Training Dataset
Class: age income studentcredit_rating

uys_compu
<=30 high no fair no
C1:buys_computer = yes
<=30 high no excellent no
C2:buys_computer = no
3140 high no fair yes
>40 medium no fair yes
Data sample >40 low yes fair yes
X = (age <=30, >40 low yes excellent no
Income = medium, 3140 low yes excellent yes
Student = yes <=30 medium no fair no
Credit_rating = Fair) <=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
3140 medium no excellent yes
3140 high yes fair yes
>40 medium no excellent no
Example Nave Bayesian Classifier: Training Dataset
P(Ci): P(buys_computer = yes) = 9/14 = 0.643
P(buys_computer = no) = 5/14= 0.357
Compute P(X|Ci) for each class

P(age = <=30 | buys_computer = yes) = 2/9 = 0.222
P(age = <= 30 | buys_computer = no) = 3/5 = 0.6
P(income = medium | buys_computer = yes) = 4/9 = 0.444
P(income = medium | buys_computer = no) = 2/5 = 0.4
P(student = yes | buys_computer = yes) = 6/9 = 0.667
P(student = yes | buys_computer = no) = 1/5 = 0.2
P(credit_rating = fair | buys_computer = yes) = 6/9 = 0.667
P(credit_rating = fair | buys_computer = no) = 2/5 = 0.4
X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = yes) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044

P(X|buys_computer = no) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = yes) * P(buys_computer = yes) = 0.028
P(X|buys_computer = no) * P(buys_computer = no) = 0.007
Therefore, X belongs to class (buys_computer = yes)

Nave Bayes: Summary
Robust to isolated noise points
Handle missing values by ignoring the instance during probability

estimate calculations
Robust to irrelevant attributes
Independence assumption may not hold for some attributes

Use other techniques such as Bayesian Belief Networks (BBN)
Nave Bayesian Classifier: Comments
Advantages
Easy to implement
Good results obtained in most of the cases
Disadvantages
Assumption: class conditional independence, therefore loss of accuracy
Practically, dependencies exist among variables
E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
Dependencies among these cannot be modeled by Nave Bayesian
Classifier
How to deal with these dependencies?
Bayesian Belief Networks

Dilanjutkan ke pert. 09
Classification and Prediction (cont.)
Bina Nusantara

Pert 8

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Pert 8

Загружено:

Авторское право:

Доступные форматы

Matakuliah : M0614 / Data Mining & OLAP

Tahun : Feb - 2010

Classification and Prediction

Pada akhir pertemuan ini, diharapkan mahasiswa

June 20, 2010 Data Mining: Concepts and Techniques 6

June 20, 2010 Data Mining: Concepts and Techniques 7

June 20, 2010 Data Mining: Concepts and Techniques 8

Given a record with attributes (A1, A2,,An)

Can we estimate P(C| A1, A2,,An ) directly from data?

Choose value of C that maximizes

Equivalent to choosing value of C that maximizes

How to estimate P(A1, A2, , An | C )?

Can estimate P(Ai| Cj) for all Ai and Cj.

New point is classified to Cj if P(Cj) P(Ai| Cj) is maximal.

P ( Income = 120 | No) = e 2 ( 2975 )

P(Refund=Yes|No) = 3/7 l P(X|Class=No) = P(Refund=No|Class=No)

If one of the conditional probability is zero, then the entire

P(A|M)P(M) > P(A|N)P(N)

Class: age income studentcredit_rating

Compute P(X|Ci) for each class

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = yes) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044

Therefore, X belongs to class (buys_computer = yes)

Handle missing values by ignoring the instance during probability

Robust to irrelevant attributes

Independence assumption may not hold for some attributes

June 20, 2010 Data Mining: Concepts and Techniques 22

Вам также может понравиться