Академический Документы
Профессиональный Документы
Культура Документы
Abstract: A popular and particularly efficient For numerical data, its revised algorithms
method for making a decision tree for classifica- have been proposed, which divide a numerical
tion from symbolic data is ID3 algorithm. Re- range of attribute into several intervals. To
vised algorithms for numerical data have been make a decision tree flexible, some algorithms
proposed, some of which divide a numerical are proposed to fuzzify the interval [3],[4]. Their
range into several intervals or fuzzy intervals. decision trees, however, are not easy to under-
Their decision trees, however, are not easy to stand because
understand. We propose a new version of ID3
(1) we cannot know how a range of attribute
algorithm to generate a understandable fuzzy
is divided into intervals,
decision tree using fuzzy sets defined by a user.
We apply it to diagnosis for potential transform- (2) a range of attribute may be divided into
ers by analyzing gas in oil. different intervals on different test nodes,
(3) one attribute may appear more than one
1. Introduction time in one sequences of tests.
Knowledge acquisition from data is very im- Moreover, we need a long sequence of tests since
portant in knowledge engineering. A popular the decision tree is binary.
and efficient method is ID3 algorithm proposed As for numerical data, many tuning meth-
by J.R. Quinlan [1],[2] in 1979, which makes ods for fuzzy rules are also proposed in the re-
a decision tree for classification from symbolic search of fuzzy control, which is called neuro-
data. fuzzy technique [5]. Since it generates rules
The decision tree consists of nodes for test- that contain all combinations of all fuzzy sets
ing attributes, edges for branching by values of in attributes, it has several difficulties as fol-
symbols and leaves for deciding class names t o lows:
be classified. ID3 algorithm applies to a set of (1) so many fuzzy rules are generated,
data and generates a decision tree which mini- (2) fuzzy sets in the rules are not understand-
mizes the expected value of the number of tests able since they are tuned for fitting the
for classifying the data. training data,
2114
~
repeat from 2 recursively. Note that this includes inconsistent data, say,
The information gain G(Ai,D)for the attribute the fourth and sixth data. Fuzzy sets low, mid-
A; by a fuzzy set of data D is defined by dle and high in the attribute height, fuzzy sets
light, middle and heavy in weight, and fuzzy sets
G(A;,0 ) = I ( D ) - E ( A j ,D), (1) light and dark in hair color are defined as
where
n low = {1/160,0.8/165,0.5/170,0.2/175},
I(D) = -xbk *hhPk), (2) middle = {0.5/165,1/170,0.5/175},
k= 1
m
high = {0.2/165,0.5/170,0.8/175,1/180},
E(Ai7 0) = Cbij * I(DF~~)), (3) light = {1/60,0.8/65,0.5/70,0.2/75},
j=l
middle = {0.5/65,1/70,0.5/75},
(4) heavy = {0.2/65,0.5/70,0.8/75,1/80},
light = {l/blond, 0 . 3 / r e d } ,
(5)
dark = { 0 . 6 / r e d 7l l b l a c k } .
As for assigning the class name to the Note that we can define fuzzy sets of the con-
leaf node, we propose three methods as fol- tinuous membership functions.
First, we calculate the information I ( D ) .
Since we have ID1 = 5.5, IDclI = 2.2 and
The node is assigned by the class name
[Dczl = 3.3, we have
that has the greatest membership value,
that is, other than the selected data are 2.2 2.2 3.3 3.3
I ( D ) = --log2 - - -log2 -
ignored. 5.5 5.5 5.5 5.5
If the condition (a) in step 2 in the algo- = 0.971(bits).
rithm holds. do the same as the method __ - - - . -
Next , we calculate the expected information
(a). If not, the node is considered to be
for all Ais. For height, using the step 3.2 in
empty, that is, the data are ignored.
the algorithm, we have the fuzzy sets of data
The node is assigned by all class names
Dheight,low 7 Dheight,midle and Dheight,high:
with their membership values, that is, all
data are taken into account. low mid high H W HC C
Now, we will illustrate one cycle of the algo- 1 0 0 160 60 blond C
1
2115
Then for low,we have
D
Dl
/-k
light
1 I 160,60, blond, C1
hair color
2116
C14.9 C24.1 + C14.25 X 0 X 0.9 c24.25 x 0 xo.l
Fa1
C14.5 C24.5 - =O
C 1 4 . 2 5 X 0.25 X 0.5
= 0.03
=O
C24.25 X 0.25 X 0.5
= 0.03
Fi I C1 4 . 3 C 2 4 . 7 + C1 4.25 X 0.75 X 0.3 C2 4.25 X 0.75 X 0.7
-
+
=O -0
c 1 4 . 0 c2=1.0 c 14 x 0 x 0.0 C 2 4 x 0 x 1.0
=O =O
c14.69 C2 4 . 3 1
Fig. 3. Fuzzy Reasoning in Fuzzy Decision Tree
the path of edge and the certainty of the class have 220 data that have been checked already.
attached to the leaf node, we also adopt the mul- We have two classes of causes, one is rough
tiplication. Finally, for the operation to aggre- (4 classes) and the other detailed (17 classes).
gate certainties of the same class from the differ- Note that the checked transformer were varied
ent paths of edges, we adopt the addition from in size, makers, usage condition and degree of
several alternatives. When the total certainty checking. Really several data is inconsistent
value exceeds the unity, we can normalize them. with each other.
In Fig. 3, we have the results (Cl with 0.69 and We divide them in half, one half is for gen-
C2 with 0.31. erating a decision tree and the other half is for
We newly formulated this inference method. checking it. We apply our algorithm to these
However, we have found that this is essentially data with fuzzy sets on each gas which are de-
the same as a reasoning method of fuzzy rules fined by experts and shown in Fig. 4, to gener-
for fuzzy control, so called x- x-+ method. ate two fuzzy decision tree for rough causes and
detailed causes.
4. Application to Diagnosis Then we check the results of above three
methods (a), (b) and (c). The results of rough
We apply our algorithms to diagnosis of po- causes is shown in Table 1 and that of detailed
tential transformers which contain oil [6]. When one Table 2, where the threshold 8, ranges from
a part of it is damaged, several kinds of gases 0.6 to 1 by 0.1 and 8, is fixed to 1. In Table 2,
are generated and dissolved in oil. Then a lit- (a-n) means to check the inferred classes with
tle amount of oil is sampled out and analyzed. the greatest n certainties.
Thus we have amounts of 10 gases, C H 4 , c 2 H 6 , We have the best result by the method
c 2 H 4 , c 3 H 8 , c 3 H 6 , c 2 H 2 , H2, CO,( 7 0 2 and (c). We have the best correction rates 87.0%
other gases (we think other gases as a kind of for rough causes by checking the inferred class
gas). with the greatest degree and 63.5% for detailed
Several potential transformers are destroyed causes by checking the inferred classes with the
and checked their causes a year. Now we greatest 3 certainties.
2117
Table 2. Correction rates for detailed causes
t
0.7 0.8 0.9 1.0
(a-1) 34.7% 32.2% 28.7% 24.4% 18.3%
(a-2) 40.9% 38.3% 33.9% 27.8% 20.0%
(a-3) 40.9% 38.3% 34.8% 28.7% 20.9%
(b-1) 41.7% 41.7% 40.9% 41.7% 41.7%
(b-2) 52.2% 51.3% 49.6% 50.4% 49.6%
(b-3) 54.8% 53.9% 52.2% 53.0% 52.2%
(c-1) 39.1% 40.9% 40.9% 42.6% 44.4%
(c-2) 55.7% 57.4% 54.8% 54.8% 56.5%
(c-3) 62.6% 63.5% 62.6% 63.5% 63.5%
References
[l] J.R. Quinlan (1979) : Discovering Rules by In-
duction from large Collections of Examples, in
D. Michie (ed.): E v e r t Systems in the Micro
Fig. 4. Fuzzy sets for each gas. Electronics Age, Edinburgh University Press.
Table 1. Correction rates for rough causes [2] J.R. Quinlan (1986) : Induction of Decision
I 6, I 0.6 I 0.7 I 0.8 I 0.9 I 1.0 1 Trees, Machine Learning, Vol. 1, pp.81-106.
[3] T. Tani and M. Sakoda (1991) : Fuzzy Ori-
ented Expert System to Determine Heater Out-
let Temperature Applying Machine Learning,
7th Fuzzy System Symposium (Japan Society
for Fuzzy Theory and Systems), pp.65%662 (in
Japanese).
5. Conclusion [4] S. Sakurai and D. Araki (1992) : Application of
Fuzzy Theory to Knowledge Acquisition, 15th
We proposed a new algorithm to generate a Intelligent System Symposium (Society of Instru-
fuzzy decision tree from numerical data using ment and Control Engineers), pp.169-174 (in
fuzzy sets defined by a user. Next, we formu- Japanese).
lated an inference method for such a fuzzy deci- [5] H. Ichihashi (1993) : Tuning Fuzzy Rules by
sion tree. Finally, we applied it to diagnosis of Neuro-Like Approach, Joumal of Japan Soci-
potential transformers by analyzing gas in oil. ety for Fuzzy Theory and Systems, vo1.5, No.2,
pp.191-203 (in Japanese).
Since we can easily transform a fuzzy de-
cision tree to a set of fuzzy rules, this can be [6] F. Kawachi and T. Matsuura (1990) : Develop-
considered as a method to generate fuzzy rules ment of Expert System for Diagnosis by Gas in
from a set of numerical data. But this is not so Oil and Its Evaluation in Practice Usage, Tech-
nical Meeting on Electrical Insulation Material
good in the sense of the number of fuzzy rules. (The Institute of Electrical Engineers of Japan),
We have already formulated another better rule EIM-90-40 (in Japanese).
generation method based on the fuzzy ID3 algo-
2118
-
111