Академический Документы
Профессиональный Документы
Культура Документы
Adra Marc
1
Introduction
Large amount of data on internet Information extraction : Data mining Classication : subdivision of NLP problems.
What is a Classier?
Text Class
Function who gives a class as an output for each text given in input Text represented as a vector of features
Supervised Learning Prior information to compute the class probabilities using a pre-classied sample Rely on the independance between features - not veried in reality
Performances
Study revealed by Yimmy Yang on the Reuters Datasets Good but not outstanding performances Remain surprising for a Classier that does't respect its assumptions
To summarize
Learning using a corpus of texts Optimal if we can assume independency between features Good performances but can be improved
Empirical Rules
Use the shape of the text to classify Require an important analysis of the texts to classify Efcient on spam ltering
10
Bayesian Networks
F1 F2
F3
F4
11
Find separation hyperplanes Minimise the distance of misclassied items from the decision boundary Better performances than SVM
. . . . . . . . . . . . . .. .. . . . . . .. . . . . . . . . .
Feature 1
12
Conclusion
Good original performance of the Naive Bayes Classier Simple to implement and to use Large panel of improvements Stay convenient for simple use
13
Thank you
Any Questions?
14