Академический Документы
Профессиональный Документы
Культура Документы
• Given a collection of records (training set ) Each record contains a set of attributes, one of the
attributes is the class.
• Find a model for class attribute as a function of the values of other attributes.
• Goal: previously unseen records should be assigned a class as accurately as possible.
A test set is used to determine the accuracy of the model. Usually, the given data set is divided
into training and test sets, with training set used to build the model and test set used to validate it.
2. Illustrating Classification Task
3. Classification Techniques
2
Another Example of Decision Tree
3
• Once the groupings have been calculated for each input variable, select the single input variable that
maximizes similarity within groupings and differences between groupings.
This process is repeated in each group that contains a convincing percentage of information in the
original data. The process is not terminated until all divisible groups have been divided.
The expected information from such a message pertaining to class membership is the sum over the
classes in proportion to their frequencies in 5; that is,
When applied to the set of training cases, Info(T) measures the average amount of information needed
to identify the class of a case in set T. This quantity is also known as the entropy of the set T. Now
consider a similar measurement after T has been partitioned (denoted by Tj) in accordance with the n
outcomes of a test X. The expected information requirement is the weighted sum over the n subsets:
The quantity
expresses the useful portion of the generated information by the split (that appears useful for
classification). The gain ratio selects a test to maximize the ratio above, subject to the constraint that the
information gain must be large — at least as large as the average gain over all tests examined.
5
3.1.2. Decision Tree Classification Task
(A)
6
(B)
(C)
(D)
7
(E)
(F)