The Revision of Inductive Learning Theory Within Incomplete and Imprecise Observations

ESWA 790
Expert Systems with Applications

PERGAMON
Expert Systems with Applications 15 (1998) 357366
The revision of inductive learning theory within incomplete and imprecise observations 1
A.M. Martnez-Enrquez a,*, G. Eschalada-Imaz b
a
Centro de Investigacion y de Estudios Avanzados, CINVESTAV-IPN Av. Institutio Politecnico Nacional, N 2508, CP O7000 Mexico City, Mexico b Institut dInvestigacio en Intelligencia Articial, IIIA-CSIC, Campus Universitat Autonoma de Barcelona 08193, Bellaterra, Spain
Abstract The system described here is concerned with the revision of inductive learning theory, i.e. the induced concept representation is updated and extended by taking into account new observations, as yet unlearned by the system. One of the aims of this paper is to show that it is feasible to classify observations, which have a great number of unknown values (incomplete information), as well as to estimate those values by exploiting pertinently known values. Imprecise information is also envisaged. The system works in two stages. Stage One applies a nonincremental unsupervised induction method, which is a learning cycle, make up of three steps: (i) verication of input information; (ii) classication of observations; (iii) organization of the information into hierarchies of clusters, according to concept formation. Stage Two is an incremental unsupervised induction method, which classies new observations allowing for the later inclusion of currently unknown values. In both stages, the system explores the elements of each cluster in order to discover their relationship. Another interesting point in our proposal is that its involved algorithms have polynomial complexity in order to deal with large databases. The whole mechanism is illustrated throughout this paper by making reference to experimental results obtained in the Periodic Table of Elements, PTE. 1998 Published by Elsevier Science Ltd. All rights reserved. Keywords: Induction learning theory; Concept formation; Conceptual clustering; Algorithms
1. Introduction The system described here is related to the revision of inductive learning theory (Charoux et al., 1996), i.e. the induced concept representation is updated and extended by taking into account observations as yet unlearned by the system. One of the aims of this paper is to show that it is feasible adequately to classify observations that have a great number of unknown values, as well as to allow imprecise information. To achieve this purpose, a method is proposed which has been implemented in a system in which a non-incremental and incremental unsupervised induction learning are well integrated. Systems for inductive learning can be classied according either to the nature of the input information or to the way that the systems carry out the learning process. In the former there are two types of learning: unsupervised and supervised. In unsupervised learning, each observation within a
1 This paper has been developed into the collaboration between IIIACSIC and CINVESTAV-IPN. The nance has been provided by CSIC (Spain) and CONACyT (Mexico). * Corresponding author. Tel.: 52-5-7477000/01 ext. 334; Fax: 525-7477002; e-mail: ammartin@mail.cinvestav.mx.
set is described by the same set of attributes, and this forms the nature of the input information. The result is a set of clusters, individually dened by a different and particular concept. By contrast, in supervised learning, the clusters are furnished ready-made to the system, and the systems only function is to classify observations, which are added later, and to incorporate that observation into the relevant cluster. There are two ways in which a system performs the learning process: non-incremental and incremental. In the nonincremental, the system deals with a nite set of observations (Michalski, 1980; Michalski and Step, 1983a, 1983b; Mizoguchi and Shimura, 1980). On the other hand, incremental systems deal with observation by observation, e.g. ADECLUS (Decaestecker, 1993), COBWEB (Fisher, 1987), CLASSIT (Gennari et al., 1989), UNIMEM (Lebowitz, 1987). Thus the learning process alters some of the clusters each time that it handles a new observation. In non-incremental systems, it is possible to determine the complexity of the algorithms, and this is useful in resolving real applications. However, if we want to consider a new observation within this system, it is necessary to process the whole set of observations again. By contrast, incremental systems can deal with individual new observations at any time. Usually, the number of known
0957-4174/99/$ - see front matter PII: S0957-417 4(98)00060-8
1998 Published by Elsevier Science Ltd. All rights reserved.
358
A.M. Martnez-Enrquez, G. Eschalada-Imaz / Expert Systems with Applications 15 (1998) 357366
observations is greater at the beginning of the process. Each time a new observation arrives, increases the execution time for its processing and takes more memory space. Therefore it is difcult to determine the complexity of the algorithm in incremental systems. In the real world, it is necessary to organize, classify, and infer concepts when there is a large set of observations similar to one another. Therefore it is more convenient and suitable to process rst the whole set of known observations using a non-incremental method. However, it is fundamental to recognize that the real world is dynamic and ever changing. Thus, a good learning system must include heuristics to improve and to update the systems knowledge. For these reasons, the system presented in this paper combines both strategies. The treatment performed by the learning system presented here can be briey summarized as follows: Given a nite set of observations, hereafter referred to as V, which is initially partitioned by a non-incremental strategy producing a set of clusters, S. Each individual cluster, hereafter referred to as C, is characterized by a concept, P(C), and all of the clusters organize themselves into a hierarchy. Thus, when a new observation O arrives, the current clusters, their corresponding concept, and the hierarchical structure are updated according to this new information. A new observation is thus integrated into the hierarchy following a top-down path-testing strategy. To place the observation into one of the successors of a cluster node, its features do not require matching with all the features of a particular cluster (subcluster). Rather, the system only checks whether the values of some relevant attributes of the concept are similar to the description of the new observation, and chooses the best location (node) for that information. When the system cannot place the new observation into one already established cluster, it repeats that learning cycle, modifying the hierarchical tree if necessary, acting as an incremental learning system (Decaestecker, 1993; Fisher, 1987; Lebowitz, 1987). The rest of the paper is organized as follows. Section 2 outlines the non-incremental process. Section 3 details adapted incremental learning that uses the concept formation. Section 4 explains different measures used to evaluate the classication of observations being incompletely dened, and explains how to forecast the unknown values. This paper also describes the algorithms that were used, as well as their complexity. Section 5 analyzes the results obtained in one of the applied domains: the Periodic Table of Elements, PTE. Section 6 provides some conclusions and suggestions for future research.
Given a set of observations V {O1 ; ; On }, a set of attributes D {A1 ; ; Ar }, and a descriptive table of the observations V D; Goal: to partition the given set of observations into S {C1 ; ; Ci ; ; Cc }, and to discover for each cluster Ci some general features (concept PCi ) shared by its elements. To achieve this goal, the following tasks are subsequently executed: verication of input information, classication of observations and concept formation. Throughout the verication process, redundant attributes are eliminated, since they might unbalance the outcome of the classication process. Two simple cases of redundant attributes are as follows: (a) similar attributes, if the values of an attribute Ak can be computed by a dened function concerning any other attribute Ai , referred to by different O V; Ak O f Ai O; (b) names, i.e. Ai ; Ak , if constant attribute, when an attribute A takes always the same value for all the observations, i.e. AO constant; O V. The verication algorithm is simple, and is explained in (Martnez-Enrquez and Escalada-Imaz, 1996). The classication process is based upon the clustering techniques, which employ (dis)similarity measures among the observations, rst described in the following papers (Anderberg, 1973; Mizoguchi and Shimura, 1980; Vogel and Wong, 1979). Our system considers four cases: (i) when the similarity function is predetermined by the user; (ii) when AO) can take a unique and well known value; (iii) when input information is incomplete, i.e. some AO are unknown, and when the information is imprecise, some AO are represented by an interval or a subset of values; (iv) when AO has more that one value. The complexity of 2 this algorithm is: O D : V . The type of attribute can be either numerical or symbolic, and in both cases, it may consist of one single or multiple values. For instance, carbon is described as follows: (Carbon6 ((boiling point (4470) (melting point 4100) (density 2.62) (atomic weight 12.011) (crystal structure HEXAGONAL) (acidbase BASIC) (covalent radius 0.77) (atomic radius 0.91) (atomic volume 4.58) (ionization energy 11.26) (specic heat 0.71) (electronegativity 2.55) (heat of vaporization 355.799) (heat of fusion Nil) (electrical conductivity 6.1e-4) (thermal conductivity 1.29) (physical state SOLID) (oxidation states { ^ 4 2}) (electron structure {S2 P2})) The aim of the concept formation process is to dene each cluster by searching for properties shared by elements pertained to the same cluster. In our proposal, the concept formation exemplies the idea put forward by Michalski, 1987, 1990, where a concept is made up of two components: (a) the Base Concept Representation (BCR), and (b) the Inferential Concept Interpretation (ICI) (Sections 3.1 and 3.2).
2. Non-incremental learning A typical case within non-incremental unsupervised learning is dened (Michalski, 1980) as follows:
359
Some classic techniques of conceptual clustering (Michalski, 1980, 1983; Michalski and Step, 1983a, 1983b) are used to construct the BCR. The ICI tries to discover how observations, which are part of the same cluster, relate to one another. All these relationships are implicit in the input data. The clusters and the associated concept are represented by a hierarchical organization, where the root represents the class and its concept: (C,P (C)), the successor nodes are subclasses (sub-concepts): (sC, P (sC)) representing observations more alike. The complexity of this algo2 2 rithm is O D : V (Martnez-Enrquez and EscaladaImaz, 1996).
Proof (succinct). A summary description of the complexity of the Global-Classication algorithm is as follows: Global-Classication is only executed one time; The complexity of Read(V, D, V D) is O D : V ; The computation cost of DISSIM(i, j) only realized one time is O D : V 2 , where the elementary operations are arithmetical; The cost of Return is constant. The complexity of Class is O D : V 2 , due to the maximum number of Class execution, which is V . The cost of each instruction is: The rst instruction is constant; The second instruction costs O D : V ; The accumulative cost due to the Class calls concerning Determine, Add, and Remove is O V . 2.2. Dissimilarity function, DISSIM The DISSIM function satises the following properties: (i) DISSIMOi ; Oj 0; 1 (ii) DISSIMOi ; Oi 0 (iii) DISSIMOi ; Oj DISSIMOj ; Oi The dissimilarity function is calculated by: DISSIMOi ; Oj 1=rSk1;;r dAkOi ; AkOj where dAk Oi ; Ak Oj distance) and 0; 1 (Manhattans normalized
2.1. Global classication The principle of the algorithm is simple. Each recursion is associated with a set of observations V and the table of dissimilarity distance DISSIM(i,j). The system selects the observation Ok with the maximum number of neighbors within a distance smaller than the maximum distance established by the user (Maximum_Dissimilarity). This observation Ok is designated as the cluster representative, and its neighbors form the cluster C. The remaining observations (V\C) are submitted to the same process until V becomes empty. The algorithm is as follows: Global-Classication Begin Read (V, D, V D); Oi ; Oj V; i j : DISSIMi; j=rSk1;;r dAkOi ; AkOJ Return (Class (V; V D; )) End Class (V, V D; S) Begin If V then Return (S) Select Ok V such that Oi V: Card{Oj : DISSIMk; j Maximum Dissimilarity} Card{Oj : DISSIMi; j Maximum Dissimilarity} Determine C {Oj : DISSIMk; j Maximum Dissimilarity} Add S S {C} and Remove V V\C Class (V; V D; S) End Proposition. The complexity of the Global-Classication 2 algorithm is O D : V , where the elementary operations are arithmetical, comparisons between variables, assignations of variables, and tests. Since the computation cost if low, this algorithm can be applied in real applications with large databases.
if dAk Oi ; Ak Oj 0 (resp.1) then Oi ; Oj are similar (resp. dissimilar) (with respect to attribute Ak). Four cases arise regarding the value of dAkOi ; AkOj : 2.2.1. Particular cases: dissimilarity function predetermined by user 2.2.1.1. By an expression user can specify: ddateOi i.e. 5 If dateOi dateOj then ddateOj dateOj dateOi dateOj =5 else dateOi dateOj =5. ddateOi dateOj 12 This expression obtains 1/5 as the distance between two consecutive months (e.g. January and December). 2.2.1.2. Fixing a specic value colour-eyes {green, blue, black, brown} d(colour-eyes(blue), colour-eyes(green)) 0.3 d(colour-eyes(black), colour-eyes (brown)) 0.3 Let DDate {1; ; 12}. The dateOj
dateOj 1=5mod5 datOi
360
2.2.2. Ideal cases: complete and precise information When Ak(Oi) and Ak(Oj) are known and each one is single valued. The distance is a function of the attribute type: Numerical dAk Oi ; Ak Oj Ak Oi MaximumAk Ak Oj MinimumAk
Ordered dAk Oi ; Ak Oj linePAk Oi PAk Oj CardDAk 1
2.2.4. The multiple valued cases When an observation has more than one value for a given attribute, the distance with respect to this attribute is calculated by the previous expression (Section 2.2.3.2). For instance the valence of chlorine is { ^ 1,3,5,7}, and for uorine is { 1}, hence the distance between chlorine and uorine is 1 1/5 0.8. The bias introduced by these distances is minimized in practice because the computed distance increases as the difference between the attribute-values increases. In the classication process three scenarios may occur: (a) Some observations are elements of a well-dened cluster, in such a case the system being described accomplishes concept formation (Sections 3.1 and 3.2). (b) An observation with complete description does not appear to match any other observation, and this leads to the creation of a new cluster containing one single element (Section 4). (c) Some observations do not contain sufcient information, because of unknown values (Section 4). Hence, these observations cannot be exclusively classied regarding (dis) similarity function against complete information (i.e. taking into account all attributes). In such a case, those observations are rstly stored. Once the partition is dened and the concept formation is established, the stored observations with incomplete information, and any new observations, are then processed (Section 3.3). This situation reects the second stage of our system: the incremental method.
where PAO n, if AO is the nth value of the scale of A. Card(D(A)) is the cardinality of the domain D(A). Symbolic If Ak Oi Ak Oj then dAk Oi ; Ak Oj 0 else dAk Oi ; Ak Oj 1.
2.2.3. Real cases 2.2.3.1. Incomplete information If some AO are not yet known, those values are computed by: 1. Similar attributes: when Ak has other attributes referred to by different names, the unknown values are computed from other similar attributes, i.e. Ai ; Ak ; if O V; Ak O f Ai O. 2. An expression: if t(A) Numeric, then compute the unknown values of A by arithmetic or geometric average expressions. 3. Otherwise the system considers Oi and Oj as dissimilar ones, if the two previous strategies cannot be applied: dAk Oi ; Ak Oj 1. 2.2.3.2. Imprecise information cases Sometimes it is difcult to precise a single attribute-value for a particular observation AO. Hence AO is represented by an interval or a subset of values of the domain DA; thus the distance expression between Oi and Oj is: Numeric dAOi ; AOj AverageAOi MaximumA AverageAOj MinimumA
3. Adapted incremental classication In the second option, i.e. the incremental method, the problem is dened as follows: Given a partition S {C1 ; ; Ci ; ; Cc } and the concept PCi associated to each cluster Ci Goal: First, classify a new observation O described by O D, where D is the same set of attributes used to describe the initial set of observations V. Second, update the concept P(C), which at the same time will uncover new relationship of the elements in C {O}. Third, approximate unknown values of O. The main purpose is to improve the classication of observations, which have incomplete information. For instance francium (Fr), due to its very short half live (22 min, has only 11 known attribute values out of 19 describing the whole set of observations V. Thus, it is difcult to classify Fr simply by similarity function measured against other elements, which have complete information. Therefore, the underlying idea is to classify Fr not only by similarity function, but also by taking into account the likeness of Fr to a given concept P(C). The strategy carried out by this system is rst, to measure
Symbolic dAOi ; AOj 1 CardAOi CardAOj Maximum{CardAOi ; CardAOj }
A.M. Martnez-Enrquez, G. Eschalada-Imaz / Expert Systems with Applications 15 (1998) 357366 Table 1 Ordered sequences followed by francium Sequence C4 (Fr,Cs,Rb,K,Na,Li) (Fr,Cs,Rb,Li,K,Na) C9 (Ca,Sr,Ba,Fr) Attributes Range of correlation
361
ter, it is possible that this new observation will become a member of that cluster. For example, taking concepts PC4 and PC9 as Cs; Rb; K; Na; Li1; 2; 10; 11; 12; 13; 14; I 4; 7; 8; 9; DCs; Rb; Li; K; Na15; 16; I PC4
({2,12}, I)({4},D [0.85,0.98] ({15,16},I) 0.95 ({2,12,15,16}, D)({4,I) [0.92,0.99]
the similarity between the descriptions of the new observation and the representative of each cluster (as detailed in Section 2.2). Second, if these similarity distances are greater than the Maximum Dissimilarity, the system searches for the concept P(C) most similar to the description O D, taking into account only the known values (Section 3.3). Third, the new observation is incorporated into the cluster (concept P(C)) most similar to it (Section 4). Finally, if the new observation forms sequences, increasing or decreasing series and compound symbolnumerical attributes, with the members of a particular cluster (Section 3.1), the system always predicts the unknown values by searching for some correlation with any of the already established sequences. This kind of predicting task follows a different heuristic to the one used by other probability learning systems (Cheesman et al., 1989). In order to nd the concept P(C) most similar to the new observation, our system begins with the most restrictive requirement or heuristic used to infer concept representation. 3.1. Inference concept interpretation, ICI The main purpose of ICI is to explain the relationships between attributes; relationships which are implicit in the input data. Some suitable heuristics carried out by the proposed method are: 3.1.1. Ordered sequences, whether the set of attributes uses numbers or symbols. If there are {A1 ; ; Ak }; k 2 attributes which Increase (Decrease) according to the same sequences Sj O1 ; ; Os ; s 3; S C; j 1, then the concept P(C) is represented by: S1 ; ; Sj {A1 ; ; Ai }; I; {Ai 1 ; ; Ak }; D PC
Ca; Sr; Ba3; 4; 7; 8; 9; I2; 10; 11; 12; 13; 14; 15; 16; D PC9 where (Cs,Rb,K,Na,Li) and (Cs,Rb,Li,K,Na) are the series following the Increasing and Decreasing sequences respectively, over that set of attributes. Assume now that francium (Fr) is the new observation. As mentioned before, it is described by only 11 out of the 19 attributes normally used by the whole set of observations V. Thus Fr is not dened for the following 8 attributes: {3,7,8,9,10,11,13,14}. When our system tries to include Fr into concepts C4 or C9 , the sequences shown in Table 1 are observed. In addition, the system computes the correlation between the different attributes in order to conrm the sequence, as shown in Table 1. Since the range of correlation for C9 is greater than the range for C4 , Fr seems to be more similar to C9 than to C4 . Given that the coefcient determines the linear dependent degree between two variables (X,Y), it is possible to calculate the unknown values by using the coefcient of correlation between attributes. i.e. If the coefcient of correlation is almost ^ 1, then Y aX b. When 0 r 1 (resp. 1 r 0) every positive increase in X produces a positive increase in Y (resp. decrease). Since other sequences exist relating to the same series, for example in sequences in which Fr does not appear due to its unknown values, the system quickly predicts some of those values by simply position the new observation in the sequences. Table 2 shows some of the values relating to the sequences of C4 and C9 . 3.1.2. Correlation between attributes Let A and A H be two attributes describing a new observation and the members of a cluster. Assume that the new observation pertains to the cluster. If the correlation between these attributes is of almost ^ 1, and if one of them, A, is associated with any of the ordered sequences related to the cluster, then the other one, A H , supports the inclusion of the new observation into the cluster.
This means that the s-tuple AO; ; AOs for all attribute A in {A1 ; ; Ai } forms a monotonic Increasing sequence. Hence, if a new observation O can be included into a non-varying ordered sequence which describes a clusTable 2 Predicted values of Fr, using this ordered sequence 3 Density 1.9 3.5 7 Covalent radius 2.35 1.98 8 Atomic radius 3.3 2.8 9 Atomic volume 71.1 39.2
Class C4 C9
10 Ionization energy 5.4 5.2
11 Specic heat 0.24 0.204
13 Heat of vaporization 67.7 142
14 Heat of fusion 2.1 7.7
362
Table 3 Constant and sub-domain attributes 5 Crystal structure {CCEC, CCEF} {CCEC, CCEF} CCCE 6 Covalent radius Basic Basic Basic 17 Physical state Solid {Liquid, Solid} Liquid 18 Oxidation states 2 1 1 19 Electron structure S2 S1 S1
Class C9 C4 Fr
Number of similar attribute 2 5
For instance, Fr can be ordered as (Cs, Fr, Rb, K, Na, Li) by the attribute 1 (the boiling point). The correlation between attributes 1 and 2 is 0.9867 and the correlation between attributes 1 and 12 is 0.84; both are almost ^ 1. As we see in Table 1, attributes 2 (melting point) and 12 (electronegativity) form increasing sequences. This means that boiling point is taken to be the factor that decides whether or not this new observation should be incorporated into that given cluster. 3.1.3. Combining both symbol and number attributes Sometimes an attribute-value is constant over more than one cluster: ACi ACj ACk constant, thus any of those clusters can be characterized by this constant: A; AO PCi ; ; and A; AO PCk ; therefore the system joins all those clusters and searches for differences among them. Frequently, the constant values are symbolic and the differences are sought among numerical attributes. Therefore combining symbol and numerical attributes is a way for dening a cluster. For instance, there are 74 solids out of the 106 elements in the PTE, thus (physical-state, Solid) is a common feature founded in more than one cluster. When the system observes this fact, it then searches for differences among the clusters with solid elements. It discovers, for example, that cluster C10 { Cu, Ag, Au} has the highest sub-domains in thermal and electrical conductivity in comparison with all of those clusters, hence this knowledge is included in the concept (i.e. good electrical and thermal conductors): Physical state; SolidThermal
3.2. Base concept representation, BCR In the BCR, this system uses the classical heuristics described in conceptual clustering (Michalski, 1983; Michalski and Step, 1983a, 1983b): 3.2.1. Constants When each element of a cluster has the same attribute value, AC constant, where AC is the set of values {AO : O C}. The concept P(C) includes the feature: A; AO, i.e. the value of attribute A for cluster C is AO, where O C. 3.2.2. Sub-domain When AC DA (Domain of values for attribute A). The cluster is made up of sub-domains, as follows: The values of attribute A throughout cluster C are: {vl ; ; vj } AC. The coded feature is: A; {v1 ; ; vj } PC. For example: If D(physical-state) { solid, gas, liquid, synthetically} And physical-state(C) {solid, liquid}, Then (physical-state) {solid, liquid}) P C. When the system tries to classify an observation, it compares the concept P(C) with O D : AO AC DA, such that AC PC. For instance, Fr matches C9 in 2 of its attribute values and C4 in 5, thus making Fr seems more similar to C4 (Table 3). 3.3. Incremental classication algorithm The general strategy is as follows: rst, incorporate O into the root concept P(G), secondly search for the subconcept P(sC*) most similar to O. If there is no concept C* similar to O (i.e. the threshold is not attained), then create a new cluster C : O C, with its own concept P(C). Otherwise select the most similar cluster C*, include O into C* and its associated subclasses sC*, update P(C*) as well as the concepts corresponding P(sC*)s, and forecast the unknown values. 3.3.1. Search for the most similar concept P (sC) Each recursion of the algorithm is associated with a node of the tree G; each node represents a subclass sC, its concept P(sC), and the remaining attribute values DsC not reported in the concept of that subclass. When the description of the new observation O D matches with the concept P(sC),
conductivity; Maximum; 3:17; 4:26Electrical conductivity; Maximum; 0:452; 0:63 PC10 Hence, when a new observation is included into a cluster, the system veries if all clusters compound properties are preserved, otherwise the concept is reformulated. For instance, aluminum Al13 is solid, has the next greatest value in thermal and electrical conductivity, as well as being similar to clusters C10 {Ag, Cu, Au} and C11 {Ga, In}. In that case, the system proves that aluminum maintains the compound knowledge of the concept P (C10 ), if Al is included in C10 .
363
O D and the grade of membership msC are updated. The remaining of the description O D\{Ao}; AO PsC is submitted to the same process until O D becomes empty. Each step of the MatchO algorithm is attached to the denition of the concept of a clusters subclass P(sC). Thus, the system matches O D against P(sC). The similarity between the new observation and the concept takes into account the following conditions: 3.3.1.1. Ordered sequences If Aj O1 ; ; Aj Os , for 1 j k 2 form Increasing (Decreasing) series for a sequence S O1 ; ; Os ; s 2, such that O O1 ; Os C, then the grade of membership m and O D are updated:
The algorithm searching for a similar concept can be expressed as the following: Search-Concept (O D, G) Begin Score ; r O D ; Node C G, such that C Do begin m 0.0; MatchO O D; C; m; If m 0 then score score end; Return (maximum (score)) End Match O O D; sC; ms Begin S: partition
C; m;
mC O mC O
k=r
where r number of known attribute values of the new observation O O D O D\{A1 O; ; Ak O} 3.3.1.2. Correlation between attributes Let A and A H be two attributes whose correlation is near to ^ 1, where A H is involved in a sequence S O1 ; Os ; s 2. If the new observation O agrees with A H O1 ; ; A H Os ; O O1 ; ; Os , then the grade of membership m and the description O D are updated:
m ms If sC 2 then Begin (*search for ordered sequence*)

If Aj O1 ; ; AjOs , for 1 j k 2 form Increasing (Decreasing) series for a sequence S O1 ; ; Os ; s 2, such that O O1 ; ; Os C, then begin
mm
k=r; O D O D\{A1 O; ; Ak O};
mC O mC O
1=r; and O D O D\AO
(*correlated attributes*) Let A, A H be two attributes whose Correlation is almost ^ 1, where A H is involved in a sequence S O1 ; ; Os ; s 2, and O agrees with A H O1 ; ; A H Os ; O O1 ; ; Os , then begin m m 1=r; O D D\AO end; end; end; (*combining symbolic and numerical attribute*) If A; AC; A ; MIN Max; 1b; ub PsC, where t(A) Symbolic, A(O) A(C) constant, and t(A*) Numeric, and A O 1b; ub then begin mC O mC O n=r; O D O D\{AO; A1 O ; ; An 1 O} end (*constant and sub-domain*) A D, such that AO AsC DA Do begin m m 1=r; O D O D\AO end; If O D and m ms and ArcsC Then begin
3.3.1.3. Combining symbolic and numerical attributes When a new observation O D matches with a compound symbolnumeric characteristic included in P(C), the grade of membership increases for each Ai belonging to the compound attribute, i.e. Assume: {A; AC; A ; Min Max; 1b; ub} PC;
where type t(A) Symbolic, t(A*) Numeric, A(C) constant-value, such that AC ACj ACk , and A , Min Max, 1b; ub; ubA C Min{1bA Cj ; , 1bA Ck } If AO; A O of the new observation O match with A; AC; A ; MIN Max; 1b; ub, i.e. AO AC and A O 1b; ub, then mC O mC O n=r, and O D O D\{AO; A1 O ; , {A : A O An 1 O} where n 1{A : AO AC} A C} . In general, multiple couples A; AC; A ; Min Max; 1b; ub can belong to a concept P (C). 3.3.1.4. Constant and sub-domain For all A, such that AO AC constant, (resp. sub-domain, AO AC DA), the grade of membership increases: A D; such that AO mC O AC DA; then mC O
1=r; and O D O D\AO
ms m; While ArcsC
and m
ms
364
Do begin sC Successor (Arc(sCi)); ArcsC ArcsC\sC; MatchO O D; sC; m end end; ms m; End Proposition. The worst-case complexity of the Search2 2 Concept algorithm is O D : V . Proof (succinct): The MatchO algorithm is executed as many as 2 V 1 times, corresponding to the maximum number of nodes of a tree, whose root is V, and the successors of each node are disjointed nodes. So the cost per step is: The cost of computing sequences is directly implied by the complexity of sorting D ordered lists. Thus, this calculation has a global complexity of O D 2 : V . The compound concept has a complexity of O D : V ). The computing cost of constant and sub-domains concept is O D : V . The cost of the instruction: score score due to calls to MatchO is O V .
to establish a threshold (quantity of information, IO). Thus, the system considers that any new observation to be classied must have a greater number of known attributes than unknown values. The quantity of information is measured by dividing the number of known attribute values of the new observation (r) by the number (m) of attributes describing the whole set of classied observations V. For instance, IFr 11/19 0.5789 IO r=m The system considers IO 0.5 by the default, but the user can change this value. When IO is below this threshold the classication process does not take place, e.g. berkelium Bk97 (7/19), californium Cf98 (8/19), einsteinium Es99 (6/ 19), fermium Fm100 (6/19), mendelevium Md101 (6/19), nobelium No102 (6/19), lawrencium Lw103 (4/19). 4.1. Validity of the classication In order to validate the classication of the new observation, the system multiplies the quantity of information IO by the grade of membership mCk O; the product must be greater than, or equal to 0.5. Otherwise this observation is considered to be isolated, in other words a new cluster is made up of only one element. Classification validityO IO mCk O 0:5
4. Membership function for classifying new observations The grade of membership mC O represents how a new observation O is similar to a clusters concept PC. The mC O is calculated dividing the number of attributes of the new observation that agree with the concept Ai O PC by the number (r) of known attributes describing the observation O D :
mCk O Number of Ai O
PCk =r; where i 1; ; r
For instance, in the case of francium, which is described by 11 attributes, its attributes-values agree with the concept PC4 , are ve constants or sub-domains, ve ordered sequences, and the eleventh, whose correlation to any two of the ve ordered sequences, is almost ^ 1. Thus the membership grade of Fr for C4 is 1. Likewise, seven of Frs 11 known properties are similar to the dened concept of cluster C9 {Sr, Ca, Ba}, thus Frs membership grade to this cluster is 0.63. Consequently, applying principles of this approach, Fr will be incorporated into C4 {Li, Na, K, Rb, Cs}. In one of the trials, in which all elements of the PTE were processed in a non-incremental way, Fr was incorporated into a cluster, whose elements have many unknown values. Even though in fact, those observations are not similar to Fr. Nevertheless, not all observations can be processed, since there are some with little information, hence it is necessary
For instance, hydrogen has 18 out of 19 known attributes, thus the quantity of information is IH 0.947, and the grade of membership to the most similar cluster if mC8(H) 0.5, C8 {N,O,F,S,Cl,Br}, making the classication_validity of hydrogen equal to 0.473. Therefore, hydrogen cannot be incorporated into cluster C8. It can be regarded as a single element of a new cluster, or as an isolated observation. This agrees with the ndings of chemistry. A similar case is astatine (At85) which has I 10/19 and mC8(At) mC3(At) 0.9, and mC10(At) 8/10. However, it is possible to predict some unknown values of astatine using either the ordered sequence of these clusters, C3, C8 and C10, or the coefcient of correlation of the attributes. Those unknown values are more or less similar to each other. On the other hand, once a cluster has been formed, the system triggers the concept formation process. Thus, in the case of the C13 {H}, the system observes an interesting fact: physical heat stategascovalent radius; minimum; 0:32 PC13
fusion; minimum; 0:058
5. Analysis of results This paper has only presented the results concerning the PTE, but the system has also been applied to the diagnosis of pneumonia (Martnez-Enrquez and Escalada-Imaz, 1994). By the non-incremental method, the system processed 84 of
A.M. Martnez-Enrquez, G. Eschalada-Imaz / Expert Systems with Applications 15 (1998) 357366 Table 4 The partition resulting from a similarity function, and classication of new observations with unknown values Partition resulting from a similarity function (nonincremental induction method) C1 {TI Sc Y La Ti Zr Ce Pr Nd Sm Eu Gd Tb Dy Ho Er Tm Yb Lu} Associate group of the PTE New observation (incremental method) Oatomic number(IO, mCk) Am95(0.842,0.875) Pm61(0.789, 0.8) Pa91 (0.684,0.846) Th90 (0.947,0.83) Np93 (0.789,0.8) U92 (0.947,0.722) Cm96 (0.63, 1) Ac89(0.63,1.0) Pu94 (0.894,0.7) Ra88 (0.68,0.846) C2 {Kr Ne Ar Xe Rn} C3 {Be Mg (Zn Cd) Co (Sn) Se Te} C4 {Li Na K Rb Cs} C5 {Ni Pd Pt Rh Ir Si Ge Pb} VIIIB IIA IIB, VIIIA IVB VIB IA He2 (0.684,0.92) C6 (0.947, 0.888) S16 (0.947,0.777) Po84 (0.8942,0.87) Fr87 (0.5789,1.0)
365
Some predicted unknown values (according to ordered sequence)
IIIB IIIA IVA Series lanthanum
(11,[0.19,0.27])(14,[7,8,6])(15,[0.0166,0.035]) (8,[2.0,2.7])
(7, (9,
1.32) (8, 16.7)(14,
2) (11, 0.3316)
5) (14,
6.2)
(13, [37,52]) (11, 0.2) (13, (3, 1.87) (7, 71.07) (8, [1.79, 1.83]) (8, 1.83)
52) (14, 1.98)(8,
17.49) 2.78) (9,
VIIIA IVB
Th90 (0.947, 0.83) U92 (0.947,0.888) Pu94 (0.894,0.82) Np93 (0.789,0.8) Pa91 (0.684, 0.846) Am95 (0.842, 0.62) Cm96 (0.63,0.8)
C6 {(V Nb) (Cr Mo) (Mn) (Fe)} C7 {(Hf) (Ta) (W) (Tc Re) (Ru Os)} C8 {N O F Cl Br} C9 {Ca Sr Ba} C10 {B (P As Bi Sb) Hg I} C11 {Cu Ag Au} C12 {Ga In}
VA, VIA (4th VIIA, VIIA period) {IV, V, VI, (6th VII, VIII}A period) VB, VIB VIIB IIA He2 (0.684, 0.6) S16 (0.947, 0.66) Ra88 (0.68, 0.92) (7, 1.98) (8, 2.78) (11, 0.204) (13, 142) (14, 7.75) (15, 0.03) (11,[0.12,0.1])(13,[77,104])(14,[11,19,8]) (13, 34.7)
IIIB, VB IIB, VIIB 1B IIIB
Po84 (0.842,0.75) S16(0.947, 0.66) Al13 (1.0, 0.6) Al13 (1.0,0.78)
104 elements of the PTE. Five of those elements presented dissimilarity from the remaining elements outside of the established limit. These observations {H, S, U, Th, C} were tested using the adapted incremental method. Hydrogen could not be added to any cluster, since IH*mCk(H) is less than 0.5. The authors consider that the partition resulting from this system agrees with the concepts established in chemistry, e.g. the similarity between elements positioned diagonally, the transition between metals and non metals (C10 ), isolated elements such as hydrogen, or sulfur whose grade of membership to different clusters is low.
Table 4 shows those results. The rst column presents the partition resulting from the similarity function, in a nonincremental unsupervised induction. The cluster representative observation is indicated in both print. The clusters, which have only one element (hydrogen, astatine), are not displayed. The second column shows the groups and periods of the PTE, which resemble the clusters. The third column indicates 12 observations with incomplete information, processed by the adapted incremental method. The observations in bold print (e.g. promethium Pm61) are each similar to only one cluster though not necessarily the same cluster.
366
The other elements are similar to more than one cluster. The cluster having the greatest value of membership function is chosen for that particular element, e.g. americium is incorporated into C1 , since mC1 (Am95) 0.875 mC5 (Am95) 0.62. The fourth column gives the predicted values in cases where an ordered sequence exists. It is also possible to calculate unknown values using the coefcient of correlation. Seven elements, with information too scant to be processed, do not appear e.g. berkelium and californium.
Acknowledgements The authors wish to thank the reviewers for their thorough reading, insightful comments, and suggested improvements. References
Anderberg, M. (Eds.) (1973). Cluster analysis for applications. New York: Academic Press. Charoux, B., Philipp S., & Cocquerez, J-P (1996). Systeme de vision mettant en ouvre une cooperation des segmentation guidee par linterpretation. In Proceeding of the 10th RFIA96 Congres de Reconnaissance des formes et Intelligence Articielle. (pp. 735744). Rennes, France. Cheesman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., & Freeman, D. (1989). AUTOCLASS: a Bayesian classication system. In Proceeding of the 5th IWML. (pp. 256266). Tioga, San Mateo, CA: Morgan Kaufmann. Decaestecker, C. (1993). Apprentissage et outils statistiques en classication conceptuelle incrementale. Revue dIntelligence Articielle, 7 (1), 3371. Fisher, D. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139172. Gennari, J., Langley, P., & Fhisher, D. (1989). Models of incremental concept formation. Articial intelligence (Vol. 40, pp. 1162). Amsterdam: North Holland. Lebowitz, M. (1987). Experiments with incremental concept formation: UNIMEN. Machine Learning, 2, 103138. Martnez-Enrquez, A., & Escalada-Imaz, G. (1994). An approach to unsupervised inductive learning, Report IIIA-CSIC. Martnez-Enrquez, A.M., & Escalada-Imaz, G. (1996). Un system dapprentissage base sur des heuristiques pour determiner des relations implicites entre objets. RFIA96 10e Congres de Reconnaissance de Formes et Intelligence Articielle (pp. 735744). Rennes, France. Michalski, R. (1980). Knowledge acquisition through conceptual clustering: a theoretical framework and algorithm for partitioning data into conjunctive concepts. International Journal of Policy Analysis and Information Systems, 4 (3), 219243. Michalski, R. (Ed.) (1983). A theory and methodology of inductive learning. In Machine learning: an articial intelligence approach (Vol. I). Tioga, San Mateo, CA: Morgan Kaufmann. Michalski, R. (1987). How to learn imprecise concepts: A method for employing a two-tiered knowledge representation in learning. Proceedings of the Fourth International Workshop on Machine Learning (pp. 5058), University of California. Michalski, R.S. (Ed.) (1990). Learning exible concepts: fundamental and methodology. In Machine learning: an articial intelligence approach (Vol. III). Tioga, San Mateo, CA: Morgan Kaufmann. Michaelski, R., & Step, R. (Eds.) (1983a). Learning from observation: conceptual clustering. In Machine Learning: An articial intelligence approach (Vol. I). Tioga, San Mateo, CA: Morgan Kaufmann. Michalski, R., & Step, R. (1983). Automated construction of classications: conceptual clustering versus numerical taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5 (4), 396409. Mizoguchi, R., & Shimura, M. (1980). A non parametric algorithm for detecting clusters using hierarchical structure. IEEE Transaction on Pattern Analysis and Machine Intelligence, 4, 292300. Vogel, M., & Wong, A. (1979). PFS clustering method. IEEE Transaction on Pattern Analysis and Machine Intelligence, 3, 237245.
6. Conclusions This paper has presented non-incremental and incremental unsupervised induction procedures, which are well integrated within a system. The non-incremental strategy can be effectively applied to process a large set of observations because of the low degree of complexity of its algorithms. The incremental induction strategy uses concept formation realized in the non-incremental induction, in order to deal with observations, which have incomplete information. In both cases, the system is able to discover important relationships implicit in the input data, which are useful in the rapid forecasting of unknown values. In this way, it is possible to manage an unknown number of observations as well as clusters not previously established. The hierarchical organization resulting from the concept formation carried out by our system is similar to other systems. However, the concept formation of our system differs in its inferential concept interpretation (ICI), because the system is able to discover knowledge that was implicit in the input data (e.g. sequences, increasing or decreasing series and compound symbolnumerical attributes). Furthermore, this system can be applied to a wide range of real-world problems exhibiting different features, such as impreciseness (observations with multiple values for the same attribute), incompleteness (unknown values) and when there is a large number of observations. At present, the system has been satisfactorily tested out on two realworld problems: discovering the properties and the relationship of chemical elements and the diagnosis of pneumonia (Martnez-Enrquez and Escalada-Imaz, 1994). Future work will increase the availability of statistical tools to predict and to explore data, as well as to test other domains by using large databases. In addition, some extensions can be developed to the system presented here. For example, it is important and interesting to experiment with a similarity function between a given cluster of observations and a new single observation. Another extension could deal with the similarity between two given hierarchies.

The Revision of Inductive Learning Theory Within Incomplete and Imprecise Observations

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

The Revision of Inductive Learning Theory Within Incomplete and Imprecise Observations

Загружено:

Авторское право:

Доступные форматы

ESWA 790

Expert Systems with Applications

0957-4174/99/$ - see front matter PII: S0957-417 4(98)00060-8

1998 Published by Elsevier Science Ltd. All rights reserved.

A.M. Martnez-Enrquez, G. Eschalada-Imaz / Expert Systems with Applications 15 (1998) 357366

A.M. Martnez-Enrquez, G. Eschalada-Imaz / Expert Systems with Applications 15 (1998) 357366

dateOj 1=5mod5 datOi

A.M. Martnez-Enrquez, G. Eschalada-Imaz / Expert Systems with Applications 15 (1998) 357366

Ordered dAk Oi ; Ak Oj linePAk Oi PAk Oj CardDAk 1

Symbolic dAOi ; AOj 1 CardAOi CardAOj Maximum{CardAOi ; CardAOj }

({2,12}, I)({4},D [0.85,0.98] ({15,16},I) 0.95 ({2,12,15,16}, D)({4,I) [0.92,0.99]

10 Ionization energy 5.4 5.2

11 Specic heat 0.24 0.204

13 Heat of vaporization 67.7 142

14 Heat of fusion 2.1 7.7

A.M. Martnez-Enrquez, G. Eschalada-Imaz / Expert Systems with Applications 15 (1998) 357366

Number of similar attribute 2 5

A.M. Martnez-Enrquez, G. Eschalada-Imaz / Expert Systems with Applications 15 (1998) 357366

m ms If sC 2 then Begin (*search for ordered sequence*)

k=r; O D O D\{A1 O; ; Ak O};

1=r; and O D O D\AO

1=r; and O D O D\AO

A.M. Martnez-Enrquez, G. Eschalada-Imaz / Expert Systems with Applications 15 (1998) 357366

PCk =r; where i 1; ; r

fusion; minimum; 0:058

Some predicted unknown values (according to ordered sequence)

IIIB IIIA IVA Series lanthanum

1.32) (8, 16.7)(14,

52) (14, 1.98)(8,

17.49) 2.78) (9,

IIIB, VB IIB, VIIB 1B IIIB

Po84 (0.842,0.75) S16(0.947, 0.66) Al13 (1.0, 0.6) Al13 (1.0,0.78)

A.M. Martnez-Enrquez, G. Eschalada-Imaz / Expert Systems with Applications 15 (1998) 357366

Вам также может понравиться

m ms If sC 2 then Begin (search for ordered sequence)