Fuzzy ID3

Fuzzy Decision Trees by Fuzzy ID3 Algorithm
and Its Application to Diagnosis Systems

Motohide Umanol Hirotaka Okamoto2 Itsuo Hatono' Hiroyuki Tamura'
h m i o Kawachi3 Sukehisa Umedzu3 Junichi Kinoshita3
1: Department of Systems Engineering, Osaka University
1-1, Machikaneyama-cho, Toyonaka, Osaka 560, Japan
Tel: +81-6-844-1151 ext .4629, Fax: +81-6-857-7664
Internet: umano@sys.es.osaka-u.ac.jp
2: Department of Precision Engineering, Osaka University, Japan
Currently he is at Kawasaki Steel Corporation, Japan
3: Kansai Technical Engineering Co., Ltd., Japan
Abstract: A popular and particularly efficient For numerical data, its revised algorithms
method for making a decision tree for classifica- have been proposed, which divide a numerical
tion from symbolic data is ID3 algorithm. Re- range of attribute into several intervals. To
vised algorithms for numerical data have been make a decision tree flexible, some algorithms
proposed, some of which divide a numerical are proposed to fuzzify the interval [3],[4]. Their
range into several intervals or fuzzy intervals. decision trees, however, are not easy to under-
Their decision trees, however, are not easy to stand because
understand. We propose a new version of ID3
(1) we cannot know how a range of attribute
algorithm to generate a understandable fuzzy
is divided into intervals,
decision tree using fuzzy sets defined by a user.
We apply it to diagnosis for potential transform- (2) a range of attribute may be divided into
ers by analyzing gas in oil. different intervals on different test nodes,
(3) one attribute may appear more than one
1. Introduction time in one sequences of tests.
Knowledge acquisition from data is very im- Moreover, we need a long sequence of tests since
portant in knowledge engineering. A popular the decision tree is binary.
and efficient method is ID3 algorithm proposed As for numerical data, many tuning meth-
by J.R. Quinlan [1],[2] in 1979, which makes ods for fuzzy rules are also proposed in the re-
a decision tree for classification from symbolic search of fuzzy control, which is called neuro-
data. fuzzy technique [5]. Since it generates rules
The decision tree consists of nodes for test- that contain all combinations of all fuzzy sets
ing attributes, edges for branching by values of in attributes, it has several difficulties as fol-
symbols and leaves for deciding class names t o lows:
be classified. ID3 algorithm applies to a set of (1) so many fuzzy rules are generated,
data and generates a decision tree which mini- (2) fuzzy sets in the rules are not understand-
mizes the expected value of the number of tests able since they are tuned for fitting the
for classifying the data. training data,
0-7803-1896-X/94 $4.00 01994 IEEE 21 13

(3) the more the number of attributes are, the { C l , C2,.. .,Cn} and fuzzy sets Fil, Fi2, . . ., F;,
less convergent the error is. for the attribute A, (the value of m varies on ev-
Thus we propose a new algorithm to gener- ery attribute). Let Dck to be a fuzzy subset in
ate a fuzzy decision tree from data using fuzzy D whose class is ck and ID1 the sum of the mem-
sets defined by a user. And we apply it to diag- bership values in a fuzzy set of data D. Then
nosis of potential transformers by analyzing gas an algorithm to generate a fuzzy decision tree is
in oil in them [SI. in the followings:
1. Generate the root node that has a set of all
2. Fuzzy ID3 Algorithm data, i.e., a fuzzy set of all data with the
membership value 1.
ID3 algorithm [1],[2] applies to a set of data 2. If a node t with a fuzzy set of data D satisfies
and generates a decision tree for classifying the the following conditions:
data. Our algorithm, called fuzzy ID3 algo- (1) the proportion of a data set of a class Ck
rithm, is extended to apply to a fuzzy set of is greater than or equal to a threshold
data (several data with membership grades) and e,, that is,
generates a fuzzy decision tree using fuzzy sets
defined by a user for all attributes. A fuzzy deci-
sion tree consists of nodes for testing attributes, (2) the number of a data set is less than a
edges for branching by test values of fuzzy sets threshold e,, that is,
defined by a user and leaves for deciding class ID1 e n ,
names with certainties. An example of fuzzy (3) there are no attributes for more classifi-
decision trees is shown in Fig. 1. cations,
Our algorithm is very similar to ID3, except then it is a leaf node and assigned by the
ID3 selects the test attribute based on the infor- class name (more detailed method is de-
mation gain which is computed by the probabil- scribed below).
ity of ordinary data but ours by the probability 3. If it does not satisfy the above conditions, it
of membership values for data. is not a leaf and the test node is generated
Assume that we have a set of data D,where as follows:
each data has t numerical values for attributes
3.1 For Ais (i = 1 , 2 , . . .,e), calculate the
A I , A2, . .., Ai and one classified class C =
information gains G(A,,D), to be de-
scribed below, and select the test at-
tribute A,, that maximizes them.
hair color
3.2 Divide D into fuzzy subsets D1, D2,. ..,
l i g v k k
D, according to A,,, where the mem-
Clc0.16
a:0.84 bership value of the data in Dj is the
product of the membership value in D
and the value of Fma,j of the value of
Cl:0.16 C1:0.71
C2:0.84 C2: 0.29 A,, in D .
law/ mi& \high
3.3 Generate new nodes t l , t 2 , ..., t , for
C1:0.61 C1:0.03 C1:0.16 fuzzy subsets D1,D2, ..., D , and la-
C2:0.39 C2:0.97 C2:0.84 bel the fuzzy sets F,,,,j to edges that
connect between the nodes t j and t .
Fig. 1. Fuzzy decision tree. 3.4 Replace D by Dj (j= 1,2,. ..,m) and
2114
~
repeat from 2 recursively. Note that this includes inconsistent data, say,
The information gain G(Ai,D)for the attribute the fourth and sixth data. Fuzzy sets low, mid-
A; by a fuzzy set of data D is defined by dle and high in the attribute height, fuzzy sets
light, middle and heavy in weight, and fuzzy sets
G(A;,0 ) = I ( D ) - E ( A j ,D), (1) light and dark in hair color are defined as
where
n low = {1/160,0.8/165,0.5/170,0.2/175},
I(D) = -xbk *hhPk), (2) middle = {0.5/165,1/170,0.5/175},
k= 1
m
high = {0.2/165,0.5/170,0.8/175,1/180},
E(Ai7 0) = Cbij * I(DF~~)), (3) light = {1/60,0.8/65,0.5/70,0.2/75},
j=l
middle = {0.5/65,1/70,0.5/75},
(4) heavy = {0.2/65,0.5/70,0.8/75,1/80},
light = {l/blond, 0 . 3 / r e d } ,
(5)
dark = { 0 . 6 / r e d 7l l b l a c k } .
As for assigning the class name to the Note that we can define fuzzy sets of the con-
leaf node, we propose three methods as fol- tinuous membership functions.
First, we calculate the information I ( D ) .
Since we have ID1 = 5.5, IDclI = 2.2 and
The node is assigned by the class name
[Dczl = 3.3, we have
that has the greatest membership value,
that is, other than the selected data are 2.2 2.2 3.3 3.3
I ( D ) = --log2 - - -log2 -
ignored. 5.5 5.5 5.5 5.5
If the condition (a) in step 2 in the algo- = 0.971(bits).
rithm holds. do the same as the method __ - - - . -
Next , we calculate the expected information
(a). If not, the node is considered to be
for all Ais. For height, using the step 3.2 in
empty, that is, the data are ignored.
the algorithm, we have the fuzzy sets of data
The node is assigned by all class names
Dheight,low 7 Dheight,midle and Dheight,high:
with their membership values, that is, all
data are taken into account. low mid high H W HC C
Now, we will illustrate one cycle of the algo- 1 0 0 160 60 blond C
1
rithm. In the beginning of some cycle, we have 0 0 0.8 180 80 black C2
a fuzzy set of data D: 0.1 0.2 0.1 170 75 black C2

0.14 0.35 0.56 175 60 red CI
p height weight hair color class 1 0 0 160 75 black Ci
1 160 60 blond C1 0.06 0.15 0.24 175 60 red C2
0.8 180 80 black C2 0.8 0.5 0.2 165 60 blond C2
0.2 170 75 black C2 0 0 0.5 180 70 blond C1
0.7 175 60 red C1
1 160 75 black C2 The membership value is calculated by the prod-
0.3 175 60 red c2 uct of p in D and the membership value of the
1 165 60 blond C2 fuzzy sets low, middle and high of the value of
0.5 180 70 blond C1 the height in D.
2115
Then for low,we have
D
Dl
/-k
light
1 I 160,60, blond, C1
hair color
0.8 1 180,80, black, C2

and 0.21 / 175,60, IFXI,C1 0.2 1 170,75, black, C2
0.09 1 175,60, red, C2 0.42 1 175,60, red,C1
11165,60, blond, C2 1 1 160,75, black, C2
0.18 1 175,60, red,C2
- --1.14 log2 -
1.14 1.96 1.96
- -log2 - 0.5 1 180.70, blond, C1
3.1 3.1 3.1 3.1
= 0.949(bits). Fig. 2. Generated subtree.
For middle, we have IDheight,mia,eI = 1.2,

(l),(2) or (3) in the step 2 in the algorithm. For
= 0.35, ID2i ht,mialeI = 0.85,
IDE'ight,mialeI
and I(Dheight,miale) = 0.87lybits). this data, we have the fuzzy decision tree shown
For high, we have IDhe;ght,h,ghI = 2.4, in Fig. 1.
The fuzzy decision trees generated by this
PE:ight,highl = 1-06, D
;I:;ht,highl = 1-34, and algorithm is very easy to understand, because
I(Dheight,high)= 0.990(bitsg.
an attribute appears at most once in any paths
Now we can calculate the expected informa-
from the root to leaf nodes and the test for an
tion after testing by the height as
attribute is the same even when it appears at
3.1 different nodes, i.e., branching by the fuzzy sets
E(height,D) = - x 0.949
6.7 defined by a user.
12 2.4
+I
6.7 6.7
+
x 0.871 - x 0.990
3. Inference by Fuzzy Decision Tree
= 0.950(bits).
Inference in an ordinary decision tree is ex-
Thus we have the information gain for the ecuted by starting from the root node and re-
attribute height as peating to test the attribute at the node and
G(height,D)= I ( D ) - E(height, D) branch to an edge by its value until reaching at
a leaf node, a class attached to the leaf being as
= 0.971 - 0.950
the result.
= 0.02l(bits). On the other hand, in a fuzzy decision tree
we must branch more than one edge with cer-
By similar analysis for weight and hair color,
tainty. An example is shown in Fig. 3, where
we have
Ai stands for the attribute to be tested, F;j on
G(weight, D) = 0.118, the edge a fuzzy sets and the numerical value
G ( h a i r color, D) = 0.164. on the edge is a membership value of Fij for the
value of the attribute.
Since we select the attribute that maximize Now We must decide three operations. First,
the gain, we have the hair co2or as the test at- for the operation to aggregate membership val-
tribute. Now we have a subtree shown in Fig. 2. ues for the path of edges, we adopt the multi-
For the fuzzy sets of data D1 and 0 2 , we apply plication from many alternatives. Second, for
the same process until it hold the leaf condition the operation of the total membership value of
2116
C14.9 C24.1 + C14.25 X 0 X 0.9 c24.25 x 0 xo.l
Fa1
C14.5 C24.5 - =O
C 1 4 . 2 5 X 0.25 X 0.5
= 0.03
=O
C24.25 X 0.25 X 0.5
= 0.03
Fi I C1 4 . 3 C 2 4 . 7 + C1 4.25 X 0.75 X 0.3 C2 4.25 X 0.75 X 0.7
C14.8 C24.2 - = 0.06

C 1 4 . 7 5 X 0.8
= 0.60
= 0.13
C2 4 . 7 5 x 0.2
= 0.15
C14.4 C24.6 + C1 4 X 0 X 0.4 C24 X 0 X0.6
=O =O
C14.4 C24.6 C1 =O X 1 X 0.4 C 2 4 X 1 X0.6
-
+
=O -0
c 1 4 . 0 c2=1.0 c 14 x 0 x 0.0 C 2 4 x 0 x 1.0
=O =O
c14.69 C2 4 . 3 1
Fig. 3. Fuzzy Reasoning in Fuzzy Decision Tree
the path of edge and the certainty of the class have 220 data that have been checked already.
attached to the leaf node, we also adopt the mul- We have two classes of causes, one is rough
tiplication. Finally, for the operation to aggre- (4 classes) and the other detailed (17 classes).
gate certainties of the same class from the differ- Note that the checked transformer were varied
ent paths of edges, we adopt the addition from in size, makers, usage condition and degree of
several alternatives. When the total certainty checking. Really several data is inconsistent
value exceeds the unity, we can normalize them. with each other.
In Fig. 3, we have the results (Cl with 0.69 and We divide them in half, one half is for gen-
C2 with 0.31. erating a decision tree and the other half is for
We newly formulated this inference method. checking it. We apply our algorithm to these
However, we have found that this is essentially data with fuzzy sets on each gas which are de-
the same as a reasoning method of fuzzy rules fined by experts and shown in Fig. 4, to gener-
for fuzzy control, so called x- x-+ method. ate two fuzzy decision tree for rough causes and
detailed causes.
4. Application to Diagnosis Then we check the results of above three
methods (a), (b) and (c). The results of rough
We apply our algorithms to diagnosis of po- causes is shown in Table 1 and that of detailed
tential transformers which contain oil [6]. When one Table 2, where the threshold 8, ranges from
a part of it is damaged, several kinds of gases 0.6 to 1 by 0.1 and 8, is fixed to 1. In Table 2,
are generated and dissolved in oil. Then a lit- (a-n) means to check the inferred classes with
tle amount of oil is sampled out and analyzed. the greatest n certainties.
Thus we have amounts of 10 gases, C H 4 , c 2 H 6 , We have the best result by the method
c 2 H 4 , c 3 H 8 , c 3 H 6 , c 2 H 2 , H2, CO,( 7 0 2 and (c). We have the best correction rates 87.0%
other gases (we think other gases as a kind of for rough causes by checking the inferred class
gas). with the greatest degree and 63.5% for detailed
Several potential transformers are destroyed causes by checking the inferred classes with the
and checked their causes a year. Now we greatest 3 certainties.
2117
Table 2. Correction rates for detailed causes
t
0.7 0.8 0.9 1.0
(a-1) 34.7% 32.2% 28.7% 24.4% 18.3%
(a-2) 40.9% 38.3% 33.9% 27.8% 20.0%
(a-3) 40.9% 38.3% 34.8% 28.7% 20.9%
(b-1) 41.7% 41.7% 40.9% 41.7% 41.7%
(b-2) 52.2% 51.3% 49.6% 50.4% 49.6%
(b-3) 54.8% 53.9% 52.2% 53.0% 52.2%
(c-1) 39.1% 40.9% 40.9% 42.6% 44.4%
(c-2) 55.7% 57.4% 54.8% 54.8% 56.5%
(c-3) 62.6% 63.5% 62.6% 63.5% 63.5%
rithm, which will be described in the forthcom-

ing paper.
References
[l] J.R. Quinlan (1979) : Discovering Rules by In-
duction from large Collections of Examples, in
D. Michie (ed.): E v e r t Systems in the Micro
Fig. 4. Fuzzy sets for each gas. Electronics Age, Edinburgh University Press.
Table 1. Correction rates for rough causes [2] J.R. Quinlan (1986) : Induction of Decision
I 6, I 0.6 I 0.7 I 0.8 I 0.9 I 1.0 1 Trees, Machine Learning, Vol. 1, pp.81-106.
[3] T. Tani and M. Sakoda (1991) : Fuzzy Ori-
ented Expert System to Determine Heater Out-
let Temperature Applying Machine Learning,
7th Fuzzy System Symposium (Japan Society
for Fuzzy Theory and Systems), pp.65%662 (in
Japanese).
5. Conclusion [4] S. Sakurai and D. Araki (1992) : Application of
Fuzzy Theory to Knowledge Acquisition, 15th
We proposed a new algorithm to generate a Intelligent System Symposium (Society of Instru-
fuzzy decision tree from numerical data using ment and Control Engineers), pp.169-174 (in
fuzzy sets defined by a user. Next, we formu- Japanese).
lated an inference method for such a fuzzy deci- [5] H. Ichihashi (1993) : Tuning Fuzzy Rules by
sion tree. Finally, we applied it to diagnosis of Neuro-Like Approach, Joumal of Japan Soci-
potential transformers by analyzing gas in oil. ety for Fuzzy Theory and Systems, vo1.5, No.2,
pp.191-203 (in Japanese).
Since we can easily transform a fuzzy de-
cision tree to a set of fuzzy rules, this can be [6] F. Kawachi and T. Matsuura (1990) : Develop-
considered as a method to generate fuzzy rules ment of Expert System for Diagnosis by Gas in
from a set of numerical data. But this is not so Oil and Its Evaluation in Practice Usage, Tech-
nical Meeting on Electrical Insulation Material
good in the sense of the number of fuzzy rules. (The Institute of Electrical Engineers of Japan),
We have already formulated another better rule EIM-90-40 (in Japanese).
generation method based on the fuzzy ID3 algo-
2118
-
111

Fuzzy ID3

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Fuzzy ID3

Загружено:

Авторское право:

Доступные форматы

Fuzzy Decision Trees by Fuzzy ID3 Algorithm

and Its Application to Diagnosis Systems

0-7803-1896-X/94 $4.00 01994 IEEE 21 13

rithm. In the beginning of some cycle, we have 0 0 0.8 180 80 black C2

a fuzzy set of data D: 0.1 0.2 0.1 170 75 black C2

0.8 1 180,80, black, C2

For middle, we have IDheight,mia,eI = 1.2,

C14.8 C24.2 - = 0.06

rithm, which will be described in the forthcom-

Вам также может понравиться