Академический Документы
Профессиональный Документы
Культура Документы
Ashish Ghosh
Machine Intelligence Unit and Center for Soft Computing Research
Indian Statistical Institute,
203, B.T. Road, Kolkata–700108, INDIA.
Email: ash@isical.ac.in
Rajib Mall
Department of Computer Science & Engineering
Indian Institute of Technology,
Kharagpur-721302, India.
Email: rajib@cse.iitkgp.ernet.in
Abstract: This paper focuses on multi-criteria tasks such as classification and clustering in the context of data
mining. The cost functions like rule interestingness, predictive accuracy and comprehensibility associated with rule
mining tasks can be treated as multiple objectives. Similarly, complementary measures like compactness and
connectedness of clusters are treated as two objectives for cluster analysis. We have carried out an extensive
simulation for these tasks using different real life and artificially created datasets. Experimental results presented
here show that multi-objective genetic algorithms (MOGA) bring a clear edge over the single objective ones in the
case of classification task; whereas for clustering task they produce comparable results.
Received: November 05, 2005 | Revised: April 15, 2006 | Accepted: June 12, 2006
predicting attributes. Classification rules can be material in the population either by crossover or by
considered as a particular kind of prediction rules mutation in order to obtain new points in the search
where the rule antecedent (“IF” part) contains space.
predicting attribute and rule consequent (“THEN” part)
contains a predicted value for the goal attribute. An 3.1.1 Genetic Representations
example of classification rule is:
Each individual in the population represents a
IF (Attendance > 75%) and (total_marks >60%)
candidate rule of the form “if Antecedent then
THEN (result= “pass”).
Consequent”. The antecedent of this rule can be
formed by a conjunction of at most n – 1 attributes,
In the classification task the data being mined is
where n is the number of attributes being mined.
divided into two mutually exclusive and exhaustive
Each condition is of the form Ai = Vij, where Ai is the
sets, the training set and the test set. The DM algorithm
i-th attribute and Vij is the j-th value of the i-th
has to discover rules by accessing the training set; and.
attribute’s domain. The consequent consists of a
the predictive performance of these rules is evaluated
single condition of the form G = gl, where G is the
on the test set (not seen during training). A measure of
goal attribute and gl is the lth value of the goal
predictive accuracy is discussed in a later section; the
attribute’s domain.
reader may refer to [12] also.
A string of fixed size encodes an individual with n
Clustering: In contrast to classification task, in the genes representing the values that each attribute can
clustering process the data-mining algorithm must, in assume in the rule as shown below. In addition, each
some sense, discover the classes by partitioning the gene also contains a Boolean flag (fp /fa) except the
data set into clusters, which is a form of unsupervised nth gene that indicates whether or not the ith condition
learning [13]. Examples that are similar to each other is present in the rule antecedent. Hence although all
tend to be assigned to the same cluster, whereas individuals have the same genome length, different
examples different from each other belong to different individuals represent rules of different lengths.
clusters. Applications of GAs for clustering are
discussed in [14,15]. A1j A2j A3j A4j An-1j gl
3.1 Genetic Algorithms (GAs) for For instance, suppose that a given individual
Classification represents two attribute values, where the attributes
are branch and semester and their corresponding
Genetic algorithms are probabilistic search algorithms. values can be EE, CS, IT, ET and 1st, 2nd, 3rd, 4th , 5th,
At each steps of such algorithm a set of N potential 6th, 7th, 8th respectively. Then a condition involving
solutions (called individuals Ik , where represents these attributes would be encoded in the genome by
the space of all possible individuals) is chosen in an four and 8 bits respectively. This can be represented
attempt to describe as good as possible solution of the as follows:
optimization problem [19,20,21]. This population P=
{I1, I2, . . IN} is modified according to the natural 0110 0101000
evolutionary process. After initialization, selection S:
IN IN and recombination Я : IN IN are executed to be interpreted as
in a loop until some termination criterion is reached.
Each run of the loop is called a generation and P (t) If (branch = CS or IT) and (semester=2nd or 4th).
denotes the population at generation t.
Hence this encoding scheme allows the
The selection operator is intended to improve the representation of conditions with internal
average quality of the population by giving individuals disjunctions, i.e. with the logical ‘OR’ operator
of higher quality a higher probability to be copied into within a condition. Obviously this encoding scheme
the next generation. Selection thereby focuses on the can be easily extended to represent rule antecedent
search of promising regions in the search space. The with several conditions (linked by a logical AND).
quality of an individual is measured by a fitness
function f: P→ R. Recombination changes the genetic
146 International Journal of Computing & Information Sciences Vol. 4, No. 3, December 2006, On-Line
In the case of continuous attributes the binary encoding where ‘n’ is the number of attributes in the
mechanism gets slightly more complex. A common antecedent and | dom(G ) | is the domain cardinality
approach is to use bits to represent the value of a (i.e. the number of possible values) of the goal
continuous attribute in binary notation. For instance the attribute G occurring in the consequent. The log term
binary string 00001101 represents the value 13 of a is included in the formula (3) to normalize the value
given integer-value attributes. of RInt, so that this measure takes a value between 0
and 1. The InfoGain is given by:
Similarly the goal attribute is also encoded in the
individual. This is one possibility. The second
InfoGain ( A i ) Info ( G ) Info ( G | A i ) (4)
possibility is to associate all individuals of the
population with the same predicted class, which is with
mk
never modified during the execution of the algorithm. Info ( G ) P ( g l ) log 2 ( p ( g l )) (5)
Hence if we want to discover a set of classification i 1
ni
rules predicting ‘k’ different classes, we would need to mk (6)
Info (G | Ai ) p (vij ) p ( g l | vij ) log 2 ( p ( g l | vij ))
run the evolutionary algorithm at least k-times, so that
i 1 j 1
in the ith run, i=1,2,3..,k, the algorithm discovers only
rules predicting the ith class [22]. where mk is the number of possible values of the goal
attribute Gk, ni is the number of possible values of the
3.1.2 Fitness Function attribute Ai, p(X) denotes the probability of X and
p(X|Y) denotes the conditional probability of X given
As discussed in Section 2.1, the discovered rules Y.
should have (a) high predictive accuracy (b)
comprehensibility and (c) interestingness. In this The overall fitness is computed as the arithmetic
subsection we discuss how these criteria can be defined weighted mean as
and used in the fitness evaluation of individuals in
GAs. w C ( R ) w PredAcc w3 RInt
f ( x) 1 2 , (7)
1.Comprehensibility Metric: There are various ways w w w3
1 2
to quantitatively measure rule comprehensibility. A
standard way of measuring comprehensibility is to where w1, w2 and w3 are user-defined weights.
count the number of rules and the number of conditions
in these rules. If these numbers increase then 3.1.3 Genetic Operators
comprehensibility decreases.
The crossover operator we consider here follows the
If a rule R can have at most M conditions, the idea of uniform crossover [27, 28]. After crossover is
comprehensibility of a rule C(R) can be defined as: completed, the algorithm analyses if any invalid
individual is created. If so, a repair operator is used to
C(R) = M – (number of condition (R)). (1) produce valid individuals.
2.Predictive Accuracy: As already mentioned, our rules The mutation operator randomly transforms the value
are of the form IF A THEN C. The antecedent part of of an attribute into another value belonging to the
the rule is a conjunction of conditions. A very simple same domain of the attribute.
way to measure the predictive accuracy of a rule is
Besides crossover and mutation, the insert and
A&C remove operators directly try to control the size of the
Pr edicAcc , (2)
A rules being evolved; thereby influence the
where | A & C | is defined as the number of records comprehensibility of the rules. These operators
randomly insert and remove, a condition in the rule
satisfying both A and C. antecedent. These operators are not part of the regular
GA. However we have introduced them here for
3.Interestingness: The computation of the degree of suitability in our rule generation scheme.
interestingness of a rule, in turn, consists of two terms.
One of them refers to the antecedent of the rule and the
other to the consequent. The degree of interestingness 3.2 Genetic Algorithm for Data
of the rule antecedent is calculated by an information- Clustering
theoretical measure, which is a normalized version of
the measure proposed in [25, 26] defined as follows: A lot of research has been conducted on applying
n 1 GAs to the problem of k clustering, where the
i 1
InfoGain ( A i )
required number of clusters is known [29].
n 1 (3) Adaptation to the k-clustering problem requires
RInt 1
log 2 ( dom ( G ) ) individual representation, fitness function creation,
operators, and parameter values.
Genetic Algorithm for Multi-Criterion Classification and Clustering in Data Mining 147
numbers are different. Given that the parents represent a single criterion optimization the main goal is to find
the same solution, we would expect the children to also the global optimal solutions. However, in a multi-
represent this solution. Instead, both children represent criteria optimization problem, there is more than one
the clustering {O1, O2, O3, O4, O5, O6} which does not objective function, each of which may have a
resemble either of the parents. different individual optimal solution. If there is a
sufficient difference in the optimal solutions
The crossover operator for matrix representation is as corresponding to different objectives then we say that
follows: the objective functions are conflicting. Multi-criteria
optimization with such conflicting objective
Alippi and Cucchiara [33] used a single–point asexual functions gives rise to a set of optimal solutions,
crossover to avoid the problem of redundancy (Figure instead of one optimal solution known as Pareto-
3). The tails of two rows of the matrix are swapped, optimal solutions [36].
starting from a randomly selected crossover point. This
operator may produce clustering with less than ‘k’ Let us illustrate the Pareto optimal solution with time
groups. & space complexity of an algorithm shown in Figure
6. In this problem we have to minimize both times as
Bezdek et al. [30] used a sexual 2-point crossover well as space requirements. The point ‘p’ represents a
(Figure 4). A crossover point and a distance (the solution, which has minimal time but high space
number of columns to be swapped) are randomly complexity. On the other hand, the point ‘r’
selected–these determine which columns are swapped represents a solution with high time complexity but
between the parents. This operator is context minimum space complexity. Considering both the
insensitive and may produce offspring with less than k objectives, no solution is optimal. So in this case we
groups. can’t say that solution ‘p’ is better than ‘r’. In fact,
there exists many such solutions like ‘q’ that belong
Mutation to the Pareto optimal set and one can’t sort the
solution according to the performance metrics
Mutation introduces new genetic material into the considering both the objectives. All the solutions, on
population. In a clustering context this corresponds to the curve, are known as Pareto-optimal solutions.
moving an object from one cluster to another. How this From Figure-6 it is clear that there exists solutions
is done is dependent on the representation. like ‘t’, which do not belong to the Pareto optimal set.
This dataset has 12960 records and nine attributes structures that cannot be discovered by other
having categorical values. The ninth attributes is methods. MOGA for data clustering uses two
treated as class attribute and there are five classes: complementary objectives based on cluster
not_recom (NR), recommended (R), very_recom (VR), compactness and connectedness. Let us define the
priority (P), and spec_prior(SP). The attributes and objective functions separately.
corresponding values are listed in Table 1.
Compactness
Results Cluster compactness can be measured by the overall
Experiments have been performed using MATLAB 5.3 deviation of a partitioning. This is simply computed
on a Linux server. The following parameters are used as the overall summed distances between data items
shown in Table 2. and their corresponding cluster centers as
P: population size
Pc : Probability of crossover
comp ( S ) d (i ,
c k S i c k
k ) (12)
Pm : probability of mutation
Rm : Removal operator where S is the set of all clusters, k is the centroid of
RI : Insert Operator cluster ck and d(..) is the choosen distance function
(e.g. Euclidean distance). As an objective, overall
For each of the datasets the simple genetic algorithm deviation should be minimized. This criterion is
had 100 individuals in the population and was run for similar to the popular criterion of intra-cluster
500 generations. The parameter values such as Pc, Pm, variance, which squares the distance value d(..) and is
Rm, and Ri were sufficient to find some good more strongly biased towards spherically shaped
individuals. The following computational protocols are clusters.
used in the basic simple genetic algorithm as well as
the proposed multi-objective genetic algorithm for rule
Connectedness
generation. The data set is divided into two parts:
This measure evaluates the degree to which
training set and test set. Here we have used 30% for
neighboring data points have been placed in the same
training set and the rest are test set. We represent the
cluster. It is computed as
predicted class to all individuals of the population,
which is never modified during the running of the
algorithm. Hence, for each class we run the algorithms N L
separately and get the corresponding rules. Conn ( S ) x i , nn i ( j )
i 1 j 1
(13)
Rules generated by MOGA have been compared with 1j if ck : r , s ck
where xr , s
those of SGA and all rules are listed in the following
0 otherwise
table. Table 3 and 4 show the results generated by SGA
nni(j) is the jth nearest neighbor of datum i and L is a
and, MOGA respectively from zoo dataset. Table 3 has
parameter determining the number of neighbors that
three columns namely class#, mined rules, and fitness
contributes to the connectivity measure. As an
value. Similarly, Table 4 has five columns which
objective, connectivity should be minimized. After
includes class#, mined rules, predictive accuracy,
defining theses two objectives, then the algorithms
comprehensibility and interestingness measures.
that are defined in Section 4.2 can be applied to
optimize them simultaneously. The genetic operators
Tables 5 and 6 show the result generated by SGA and
such as crossover, mutation is the same as single
MOGA respectively from nursery dataset. Table 5 has
objective genetic algorithm for data clustering.
three columns namely class#, mined rules, and fitness
value. Similarly, Table 6 has five columns which
includes class#, mined rules, predictive accuracy, 5.2.1 Experimental Details
comprehensibility and interestingness measures.
Parameters taken for simulations are 0.6 c 0.8
5.2 MOGA for Clustering and 0.001 m 0.01 . We have carried out
extensive simulation using labeled data sets for easy
Conventional genetic algorithm based data clustering validation of our results. Table 7 shows the results
utilize a single criterion that may not confirm to the obtained from both SGA based clustering and
diverse shapes of the underlying data. This section proposed MOGA based clustering.
provides a novel approach to data clustering based on
the explicit optimization of a partitioning with respect Population size was taken as 200. Other parameters
to multiple complementary clustering objectives [5]. It like selection, crossover and mutation were used for
has been shown that this approach may be more robust the simulation. MOGA based clustering generate
to the variety of cluster structures found in different solutions that are comparable or better than the
data sets, and may be able to identify certain cluster simple genetic algorithm. In the case of IRIS data set
Genetic Algorithm for Multi-Criterion Classification and Clustering in Data Mining 151
both the connectivity and compactness achieved a near [6] J. Handl, J. Knowles (2004). Multi-objective
optimal solution, whereas in the other two datasets Clustering with Automatic Determination of the
named as wine and WBCD the results of both the Number of Clusters. Tech. Report TR-
objectives were very much conflicting to each other. COMPSYSBIO-2004-02 UMIST, Manchester.
As expected the computational time requirement [7] A. Ghosh, B. Nath (2004). Multi-objective Rule
for MOGA is higher than the single objective Mining Using Genetic Algorithms. Information
based ones. Sciences, Vol. 163, pp. 123-133.
Dr. S. Dehuri is grateful to the Center for Soft [12] J. D. Kelly Jr. and L. Davis (1991). A Hybrid
Computing Research, Indian Statistical Institute, Genetic Algorithm for Classification. Proc. 12th
Kolkata for providing a fellowship to carry out this Int. Joint Conf. On AI, pp. 645-650.
work.
[13]A. K. Jain and R. C. Dubes (1988). “Algorithm
REFERENCES for Clustering Data”, Englewood cliffs, NJ:
Pretince Hall.
[1] U. M. Fayyad, G. Piatetsky-Shapiro and P. Smyth
(1996). From Data Mining to Knowledge [14]R. Krovi (1991). Genetic Algorithm for
Discovery: an Overview. In: U.M. Fayyad, G. Clustering: A Preliminary Investigation. IEEE
Piatetsky–Shapiro, P. Smyth and R. Uthurusamy Press, pp. 504-544.
(eds.), Advances in Knowledge Discovery & Data
Mining, pp.1-34, AAAI/MIT. [15] K. Krishna and M. Murty (1999). Genetic K-
means Algorithms. IEEE Transactions on
[2] R. J. Brachman and T. Anand (1996). The Process Systems, Man, and Cybernetics- Part-B, pp. 433-
of Knowledge discovery in Databases: A Human 439.
Centered Approach. In U. M. Fayyad, G.
Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, [16]J. M. Adamo (2001). Data Mining for
(eds.), Advances in knowledge Discovery and Data Association Rules and Sequential Patterns.
Mining, Chapter 2, pp. 37-57, AAAI/MIT Press. Springer-Verlag, New York.
[3] W. Frawley, G. Piatasky-Saprio, C. Mathews [17] R. Agrwal, R. Srikant (1994). Fast Algorithms
(1991). Knowledge Discovery in Databases: An for Mining Association Rules, in: Proceedings of
Overview. AAAI/ MIT Press. the 20th International Conference on Very Large
Databases, Santiago, Chile.
[4] J. Han, M. Kamber (2001). Data Mining-Concepts
and Techniques. Morgan Kaufmann. [18]R. Agrwal, T. Imielinski, A. Swami (1993).
Mining Association Rules Between Sets of Items
[5] A. A. Freitas (2002). Data Mining and Knowledge in Large Databases, in: Proceedings of ACM
Discovery with Evolutionary Algorithms. Springer- SIGMOD Conference on Management of Data,
Verlag, New York. pp. 207-216.
152 International Journal of Computing & Information Sciences Vol. 4, No. 3, December 2006, On-Line
[19]A. M. Ayad ((2000). A New Algorithm for [32]J. Bhuyan (1995). A Combination of Genetic
Incremental Mining of Constrained Association Algorithm and Simulated Evolution Techniques
Rules. Master Thesis , Department of Computer for Clustering. In C. Jinshong Hwang and Betty
Sciences and Automatic Control, Alexandria W. hwang, editors, proceedings of 1995 ACM
University. Computer Science Conference, pp. 127-134.
[19] C. Lance (1995). Practical Handbook of Genetic [33]C. Alippi and R. Cucchiara (1992). Cluster
Algorithms. CRC Press. Partitioning in Image Analysis Classification: A
Genetic Algorithm Approach. In CompEuro1992
[20]L. Davis (Eds.)(1991). Handbook of Genetic Proceedings. Computer Systems and Software
Algorithms, Van Nostrand , Rinhold, New York. Engineering, pp. 139-144.
[21]D. E. Goldberg (1989). Genetic Algorithms in [34]J.D. Schaffer (1984). Some Experiments in
Search, Optimization and Machine Learning. Machine Learning Using Vector Evaluated
Addison Wesley, New York. Genetic Algorithms (Doctoral Dissertation).
Nashville, TN: Vanderbilt University.
[22]C. Z. Janikow (1993). A Knowledge Intensive
Genetic Algorithm for Supervised Learning. [35]J. D. Schaffer (1985). Multiple Objective
Machine Learning 13, pp. 189-228. Optimization with Vector Evaluated Genetic
Algorithms. Proceedings of the First International
[23] A. Giordana and F. Neri (1995). Search Intensive Conference on Genetic Algorithms, pp. 93-100.
Concept Induction. Evolutionary Computation,
3(4), pp. 375-416. [36]K. Deb (2001). Multi-objective Optimization
Using Evolutionary Algorithms, Wiley, New
[24] E. Noda, A.A. Fritas and H.S. Lopes (1999). York.
Discovering Interesting Prediction Rules with a
Genetic Algorithm. Proc. Conference on [37]R. E. Steuer (1986). Multiple Criteria
Evolutionary Computation (CEC-99), pp. 1322- Optimization: Theory, Computation and
1329. Washington D.C., USA. Application. Wiley, New York.
[25] A.A Freitas (1998). On Objective Measures of [38] R. L. Keeney and H. Raiffa (1976). Decisions
Rule Surprisingness. Proc. Of 2nd European with Multiple Objectives. John Wiley, New York.
Symposium on Principle of data Mining and
Knowledge Discovery (PKDD-98). Lecturer Notes [39]H. Eschenauer, J. Koski and A. Osyczka (1990).
in Artficial Intelligence, 1510, pp.1-9. Multicriteria Design Optimization. Berlin,
Springer-Verlag.
[26] J. M. Cover, and J. A. Thomas (1991). Elements of
Information Theory. John Wiley & Sons. [40]A. Ghosh and S. Dehuri (2004). “ Evolutionary
Algorithms for Multi-criterion optimization: A
[27]G. Syswerda (1989). Uniform Crossover in survey”, International Journal on Computing and
Genetic algorithms. Proc. Of 3rd Int. Conf. On Information Science, vol. 2, pp. 38-57.
Genetic algorithms, pp. 2-9.
[41]Coello Coello, Carlos A., Van Veldhuizen, David
[28]A. P. Engelbrecht (2002). Computational A., and Lamont, Gray B. (2002). Evolutionary
Intelligence: An Introductions. John Wiley & Sons. Algorithms for Solving Multi-Objective
Problems, Kluwer Academic Publishers, New
[29]A. K. Jain, M. N. Murty, P. J. Flynn (1999). “Data York.
Clustering: A Survey,”. ACM Computing Survey,
vol.31, no.3. S. Dehuri received his M. Sc. in Mathematics from
Sambalpur University in 1998, M. Tech. in Computer
[30]J. C. Bezdek, S. Boggavarapu, L. O. Hall and A. Science from Utkal University in 2001 and Ph.D.
Bensaid (1994). Genetic Algorithm Guided from Utkal University in 2006.
Clustering. In Conference Proceedings of the First He is currently Reader and Head
IEEE Conference on Evolutionary Computation. at the Department of Information
IEEE world Congress on Computational and Communication Technology,
Intelligence, pp. 34-39. F. M. University, Balasore,
ORISSA, INDIA.
[31]Y. Lu, S. Lu, F. Fotouhi, Y. Deng, and S. Brown
(2003). Fast Genetic K-means Algorithm and its
Application in Gene Expression Data Analysis.
Techincal Report TR-DB-06.
Genetic Algorithm for Multi-Criterion Classification and Clustering in Data Mining 153
1 1 2 1 2 2
(a)
1 1 0 1 0 0
0 0 1 0 1 1
(b)
Figure 1: Chromosomes representing the clustering
{{O1, O2, O4}, {O3, O5, O6}} for the encoding Figure 5: Column mutation
schemes: (a) group number and (b) matrix
Parents Offspring
1 1 1 2 2 2 1 1 1 1 1 1
2 2 2 1 1 1 2 2 2 2 2 2 p
space
Offspring q
Parent r
0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0
0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 time
1 0 0 0 0 1 0 0 1 0 0 1 0 0 1 1
Figure 6 Trade off between time and space
Crossover point
1 0 1 0 0 1 0 0 0 1 0 1 0 0
0 1 0 0 1 0 0 1 0 0 0 0 0 1
0 0 0 1 0 0 1 0 1 0 1 0 1 0
Child 1 Child 2
1 0 1 0 1 0 0 0 0 1 0 0 1 0
0 1 0 0 0 0 0 1 0 0 0 1 0 1
0 0 0 1 0 1 1 0 1 0 1 0 0 0
Table 3: Rules Generated by SGA from Zoo Dataset Table 4: Rules Generated by MOGA from Zoo Dataset