Академический Документы
Профессиональный Документы
Культура Документы
Lecture 3,4, 5
Nave Bayes Classifier
By
Dr Musi ali
Mustansar.ali@uettaxila.edu
.pk
Twitter: @musiali007
04/30/17 1
Outline
Classification
Text Categorisation
Probability
Bayesian Classifier
Conclusion
04/30/17 2
Categorization/Classification
Given:
A description of an instance, xX, where X is the instance
04/30/17 3
Learning for Categorization
x, c ( x ) D : h ( x ) c ( x )
04/30/17 4
4
Sample Category Learning Problem
Instance language: <size, color, shape>
size {small, medium, large}
color {red, blue, green}
shape {square, circle, triangle}
C = {positive, negative}
D:
Example Size Color Shape Category
1 small red circle positive
2 large red circle positive
3 small red triangle negative
4
04/30/17
large blue circle negative 5
5
Text Categorization
Assigning documents to a fixed set of categories.
Applications:
Web pages
Recommending
Yahoo-like classification
Newsgroup Messages
Recommending
spam filtering
News articles
Personalized newspaper
Email messages
Routing
Prioritizing
Folderizing
spam filtering
04/30/17 6
6
Is this spam?
=================================================
Click Below to order:
http://www.wholesaledaily.com/sales/nmd.htm
=================================================
04/30/17 7
Document Classification
planning
Testing language
Data: proof
intelligence
04/30/17 8
Text Categorization Examples
04/30/17 9
Methods (1)
Manual classification
Used by Yahoo!, Looksmart, about.com, ODP, Medline
very accurate when job is done by experts
consistent when the problem size and team is small
difficult and expensive to scale
Automatic document classification
Hand-coded rule-based systems
Used by CS depts spam filter, Reuters, CIA, Verity,
E.g., assign category if document contains a given boolean combination of
words
Commercial systems have complex query languages
04/30/17 10
Methods (2)
Accuracy is often very high if a query has been carefully refined over time by a
subject expert
Building and maintaining these queries is expensive
04/30/17 11
Text Categorization: attributes
04/30/17 12
Bayesian Methods
04/30/17 13
Axioms of Probability Theory
04/30/17 14
14
Conditional Probability
A A B B
04/30/17 15
Independence
P ( A B ) P ( A) P ( B )
04/30/17 16
16
Joint Distribution
The joint probability distribution for a set of random variables, X1,
,Xn gives the probability of every combination of values (an n-
dimensional array with vn values if all variables are discrete with v
values, all vn values must sum to 1): P(X1,,Xn)
positive negative
circle square circle square
red 0.20 0.02 red 0.05 0.30
blue 0.02 0.01 blue 0.20 0.20
The probability of all possible conjunctions (assignments of values to
some subset of variables) can be calculated by summing the
appropriate subset of values from the joint distribution.
P(red circle ) 0.20 0.05 0.25
P(red ) 0.20 0.02 0.05 0.3 0.57
Therefore, all conditional probabilities can also be calculated.
04/30/17 17
17
Joint Distribution, Example
04/30/17 18
Probabilistic Classification
04/30/17 19
19
Motivational stuff
P( E | H ) P( H )
P( H | E )
P( E )
P( E | H ) P( H )
P( H | E )
P( E )
04/30/17 21
21
Bayesian Categorization
Determine category of xk by determining for each yi
P (Y yi ) P( X xk | Y yi )
P (Y yi | X xk )
P ( X xk )
m
P ( X xk ) P(Y yi ) P ( X xk | Y yi )
i 1
22
Bayesian Categorization (cont.)
Need to know:
Priors: P(Y=yi)
Conditionals: P(X=xk | Y=yi)
23
Nave Bayesian Categorization
nijk mp
P ( X i xij | Y yk )
nk m
04/30/17 25
25
Nave Bayes: Learning
From training corpus, extract Vocabulary
Calculate required P(cj) and P(xk | cj) terms
For each cj in C do
docs subset of documents for which the target class
j
is cj
| docs j |
P (c j )
| total # documents |
04/30/17 26
Nave Bayes: Classifying
c NB argmax P (c j )
c jC
P( x | c )
i positions
i j
04/30/17 29
Things Wed Like to Do
Spam Classification
Given an email, predict whether it is spam or
not
Weather
Based on temperature, humidity, etc predict
04/30/17 30
Bayesian Classification Formulation
Problem statement:
Given features X1,X2,,Xn
Predict a label Y
04/30/17 31
Another Application
Digit Recognition
Classifier 5
04/30/17 32
The Bayes Classifier
04/30/17 33
The Bayes Classifier
04/30/17 34
The Bayes Classifier
Lets expand this for our digit recognition task:
04/30/17 35
Model Parameters
04/30/17 37
Model Parameters
not available)
04/30/17 38
The Nave Bayes Model
04/30/17 39
Nave Bayes Training
Now that weve decided to use a Nave Bayes classifier, we need to
train it with some data:
Training Data
04/30/17 40
Nave Bayes Training
Training in Nave Bayes is easy:
Estimate P(Y=v) as the fraction of records with
Y=v
04/30/17 41
Nave Bayes Training
For binary digits, training amounts to averaging all of the training
fives together and all of the training sixes together.
04/30/17 42
Nave Bayes Classification
04/30/17 43
Nave Bayes Assumption
04/30/17 44
Exclusive-OR Example
X1 X2 P(Y=0|X1,X2) P(Y=1|X1,X2)
0 0 1 0
0 1 0 1
1 0 0 1
1 1 1 0
04/30/17 45
Actually, the Nave Bayes assumption is almost never true
04/30/17 46
Underflow Prevention
04/30/17 47
47
Recap
04/30/17 48
Conclusions
04/30/17 49
Questions?
04/30/17 50
References
04/30/17 51
Appendix: Mathematical Formulation
04/30/17 52
Appendix: Joint Distribution of Nave
Bayes (NB)
The numerator is equivalent to the joint probability model
04/30/17 53
Appendix: Conditional Independence of
NB
04/30/17 54
Appendix: NB Final Model
04/30/17 55