Вы находитесь на странице: 1из 6

Data Clustering using Particle Swarm Optimization

DW van der Merwe AP Engelhrecht


Department of Computer Science Department of Computer Science
University of Pretoria University of Pretoria
tjippie@fouriersystems.co.za engel @driesie.cs.up.ac.za

Abstract- This paper proposes two new approaches to and unsupervised learning, e.g. LVQ-I1 [SI.
using PSO to cluster data. It is shown how PSO can he Recently, particle swarm optimization (PSO) 19, I O ] has
used to find the centroids of a user specified number of been applied to image clustering (131. This paper explores
clusters. The algorithm is then extended to use K-means the applicability of PSO to cluster data vectors. In the pro-
clustering to seed the initial swarm. This second alga. cess of doing so, the objective of the paper is twofold:
rithm basically uses PSO to refine the clusters formed by
K-means. The new PSO algorithms are evaluated on six to show that the standard PSO algorithm can be used
data sets, and compared to the performance of K-means to cluster arbitrary data, and
clustering. Results show that both PSO clustering tech- to develop a new PSO-based clustering algorithm
niques have much potential. where K-means clustering is used to seed the initial
swarm.
1 Introduction
The rest of the paper is organized as follows: Section 2
Data clustering is the process of grouping together similar presents an overview of the K-means algorithm. PSO is
multi-dimensional data vectors into a number of clusters or overviewed in section 3. The two PSO clustering techniques
bins. Clustering algorithms have been applied to a wide are discussed in section 4. Experimental results are summa-
range of problems, including exploratory data analysis, data rized in section S.
mining [4], image segmentation [ 121and mathematical pro-
gramming [I,161. Clustering techniques have been used 2 K-Means Clustering
successfully to address the scalability problem of machine
learning and data mining algorithms, where prior to, and One of the most important components of a clustering al-
during training, training data is clustered, and samples from gorithm i s the measure of similarity used to determine how
these clusters are selected for training, thereby reducing the close two patterns are to one another. K-means clustering
computational complexity of the training process, and even groups data vectors into a predefined number of clusters,
improving generalization performance [6, 15,14.31. based on Euclidean distance as similarity measure. Data
Clustering algorithms can be grouped into two main vectors within a cluster have small Euclidean distances from
classes of algorithms, namely supervised and unsupervised. one another, and are associated with one centroid vector,
With supervised clustering, the learning algorithm has an which represents the "midpoint" of that cluster. The cen-
external teacher that indicates the target class to which a troid vector is the mean of the data vectors that belong to
data vector should belong. For unsupervised clustering, a the corresponding cluster.
teacher does not exist, and data vectors are grouped based For the purpose of this paper, define the following sym-
on distance from one another. This paper focuses on unsu- bols:
pervised clustering.
Many unsupervised clustering algorithms have been de- Nd denotes the input dimension, i.e. the number of
veloped. Most of these algorithms group data into clusters parameters of each data vector
independent of the topology of input space. These algo- No denotes the number of data vectors to be clustered
rithms include, among others, K-means 17, 81, ISODATA
[2], and learning vector quantizers (LVQ) [SI. The self- N,denotes the number of cluster centroids (as pro-
organizing feature map (SOM) [ I I], on the other hand, per- vided by the user), i.e. the number of clusters to be
forms a topological clustering, where the topology of the formed
original input space is maintained. While clustering algo-
rithms are usually supervised or unsupervised, efficient hy- z p denotes the p-th data vector
brids have been developed that performs both supervised m3 denotes the centroid vector of cluster j

103AI 7.0002003 IEEE


0-7803-7804-0 215
n, is the number of data vectors in cluster j e xi : The ciirrenf positiolr of the particle;
C, is the subset of data vectors that form cluster;. e v, : The curreiif velocir?.of the particle;
Using the above notation. the standard K-means algorithm e y, : The persotral best posirion of the panicle.
is summarized as
Using the above notation. a particle's position is adjusted
I . Randomly initialize the !Vc cluster centroid vectors according to
2. Repeat
W ( t + 1) = WUi.k(t) +C,Tl.h(t)(Yi.k(t) - Zi.k(tjj +
(a) For each data vector, assign the vector to the C z r z . i ( t ) ( k ( t ) - xz.dt)j (3)
class with the closest centroid vector, where the
distance to the centroid is determined using
Xi(t + 1) = Xi(t) +Vi(t + 1) (4)

-
where U' is the inertia weight, cl and c~ are the acceleration
constants, r ~ . ~ (~t ~) .. j ( t )U(O.1). and k = 1:.. . N,j.
The velocity is thus calculated based on three contributions:
I

( 1) a fraction of the previous velocity, (2) the cognitive com-


ponent which is a function of the distance of the particle
where k subscripts the dimension
from its personal best position, and ( 3 ) the social compo-
fb) Recalculate the cluster centroid vectors, using nent which is a function of the distance of the particle from
the hest particle found thus far (i.e. the best of the personal
bests).
The personal best position of particle i is calculated as
until a stopping criterion is satisfied,
The K-means clustering process can he stopped when
any one of the following criteria are satisfied: when the
maximum number of iterations has been exceeded, when Two basic approaches to PSO exists based on the inter-
there is little change in the centroid vectors over a num- pretation of the neighborhood of particles. Equation (3) re-
ber of iteraionS. or when there are no cluster membership flects the gbesr version of PSO where, for each particle, the
changes. For the purposes of this study, the algorithm is neighborhood is simply the entire swarm. The social com-
stopped when a user-specified number of iterations has been ponent then causes particles to he drawn toward the best
exceeded. particle in the s w a m . In the /besr PSO model, the s w a m is
divided into overlapping neighborhoods, and the hest parti-
cle of each neighborhood is determined. For the lbesr PSO
3 Particle Swarm Optimization model, the social component of equation (3) changes to
Particle swarm optimization (PSO) is a population-based
c z n n ( t ) ( O , . a ( t )- z d t ) ) (6)
stochastic search process, modeled after the social behavior
~f a hird Ilrick 19, IO]. The algorithm maintains a popula- where Q3 is the hest particle in the neighborhood of the i-th
tion of particles, where each particle represents a potential particle.
solution to an optimisation prohlem. The PSO is usually executed with repeated application
In the context of PSO, a swarm refers to a number of of equations ( 3 ) and (4) until a specified number of itera-
potential solutions to the optimization prohlem, where each tions has been exceeded. Alternatively, the algorithm can
piitcntial sdution is rclerrcd to as a particle. The aim of the he terminated when the velocity updates are close to zero
PSO is t o lind the particlc position that results in the hest over a numher of iterations.
evaluation of a given fitness (ohjectivc) function.
Each particlc represents a position in j V d dimensional
spacc, and is :'flown'' through this multi-dimensional search
4 PSO Clustering
spacc, ad~justingits position toward both In the context of clustering, a single particle represents the
the particle's best position fnund thus far. and hrccluster centroid vectors. That is, each particle x, is con-
structed as follows:
the hest position in the neighhorhood of that panicle.
xt = (m,] ~ . . . : mi,:.. ; r n i ~ ~ ) (7)
Each particle 2 maintains thc Idlowing information:

216
where mij refers to the j-th cluster centroid vector of the K-means algorithm. The hybrid algorithm first executes the
i-th particle in cluster Cij. Therefore, a swarm represents a K-means algorithm once. In this case the K-means cluster-
number of candidate clusterings for the current data vectors. ing is terminated when ( I ) the maximum number of itera-
The fitness of panicles is easily measured as the quantiza- tions is exceeded, or when ( 2 ) the average change in cen-
tion error, troid vectors is less that 0.0001 (a user specified parameter).
The result of the K-means algorithm is then used as one of
the particles, while the rest of the swarm is initialized ran-
domly. The gbest PSO algorithm as presented above is then
executed.
where d is defined in equation (I), and 1Ci;l is the number
of data vectors belonging to cluster C,, i.e. the frequency
of that cluster. 5 Experimental Results
This section first presents a standard gbesr PSO for clus-
This section compares the results of the K-means, PSO and
tering data into a given number of clusters in section 4.1,
Hybrid clustering algorithms on six classification problems.
and then shows how K-means and the PSO algorithm can
The main purpose is to compare the'quality of the respec-
be combined to further improve the performance of the PSO
tive clusterings, where quality is measured according to the
clustering algorithm in section 4.2.
following three criteria:
4.1 gbest PSO Cluster Algorithm 0 the quantization error as defined in equation (8);
Using the standard gbest PSO, data vectors can be clustered 0 the intra-cluster distances, i.e. the distance between
as follows: data vectors within a cluster, where the objective is to
minimize the intra-cluster distances;
1. Initialize each particle to contain N , randomly se-
lected cluster centroids. e the inter-clus!er distances, i.e. the distance between
the centroids of the clusters, where the objective is to
2. Fort = 1 tot,,, do maximize the distance between clusters.
(a) For each particle i do The latter two objectives respectively correspond to crisp,
compact clusters that are well separated.
(b) For each data vector zp For all the results reported, averages over 30 simulations
i. calculate the Euclidean distance d(z,, mi; are given. All algorithms are run for 1000 function evalua-
to all cluster centroids C?; tions, and the PSO algorithms used I O particles. For PSO,
(U = 0.72 and c1 = cz = 1.49. These values were chosen
ii. assign z p to cluster Cti such that
to ensure good convergence [ 171.
d(z,, m i j ) = minvc,l .....~ . { d ( z mi,)}
~:
The classification problems used for the purpose of this
iii. calculate the fitness using equation (8) paper are
(c) Update the global best and local best positions
Artificial problem 1: This problem follows the fol-
(d) Update the cluster centroids using equations (3) lowing classification rule:
and (4).
1 if (i1 2 0.7) or ((21 5 0.3)
where t,,, is the maximum number of iterations.
The population-based search of the PSO algorithm re-
duces the effect that initial conditions has, as opposed to the
class =
{ and (i2 2 -0.2 - il))
0 otherwise
(9)

K-means algorithm; the search starts from multiple posi-


tions in parallel. Section 5 shows that the PSO algorithm with zl, i z
iq figure I.
-
A total of 400 data vectors were randomly created,
U(-l, 1). This problem is illustrated
performs better than the K-means algorithm in terms of
quantization error. Artificial problem 2: This is a 2-dimensional prob-
lem with 4 unique classes. The problem is interesting
4.2 Hybrid PSO and K-Means Clustering Algorithm in that only one of the inputs are really relevant to the
The K-means algorithm tends to converge faster (after less formation of the classes. A total of 600 patterns were
function evaluations) than the PSO, but usually with a less drawn from four independent bivariate normal distri-
accurate clustering [ 131. This section shows that the perfor- butions, where classes were distributed according to
mance of the PSO clustering algorithm can further be im-
proved by seeding the initial swarm with the result of the

217
cia66 3 3

1
-0.8
-0.6
-0.4
-0.2
21 0 21
0.2
0.4
0.6
0.8
0
. 0.1
-0.8 -0.6-0.4 -0.2 0 0.2 0.4 0.6 0.8

d d

Figure I : Artificial rule classification problem defined in Figure 2: Four-class artificial classification problem defined
equation (9) in equation (IO)

for i = 1 , . . . 4, where p is the mean vector and


~
algorithms. However, for the Wine problem. both K-means
is the covariance matrix; ml = -3, m2 = 0 , m3 = 3 and the PSO algorithms are significantly worse than the Hy-
and m4 = 6. The problem is illustrated in figure 2. brid algorithm.
When considering inter- and intra-cluster distances, the
Iris plants database: This is a well-understood latter ensures compact clusters with little deviation from the
database with 4 inputs, 3 classes and 150 data vec- cluster centroids, while the former ensures larger separation
tors. between the different clusters. With reference to these crite-
Wine: This is a classification problem with "well he- ria, the'PSO approaches succeeded most in finding clusters
h a v e d class structures. There are 13 inputs, 3 classes with larger separation than the K-means algorithm, with the
and 178 data vectors. Hybrid PSO algorithm doing so for 4 of the 6 problems. It
is also the PSO approaches that succeeded in forming the
Breast cancer: The Wisconsin breast cancer more compact clusters. The Hybrid PSO formed the most
database contains 9 relevant inputs and 2 classes. The compact clusters for 4 problems, the standard PSO for 1
objective is to classify each data vector into benign or problem, and the K-means algorithm for 1 problem.
malignant tumors. The results above show a general improvement of per-
formance when the PSO is seeded with the outcome of the
Automotives: This is an I I -dimensional data set rep- K-means algorithm.
resenting different attributes of more than 500 auto-
Figure 3 summarizes the effect of varying the number
mobiles from a car selling agent. of clusters for the different algorithms for the first artificial
Table I summarizes the results obtained from the three problem. It is expected that the quantization error should go
clustering algorithms for the problems above. The values down with increase in the number of clusters, as illustrated.
reported are averages over 30 simulations, with standard Figure 3 also shows that the Hybrid PSO algorithm consis-
deviations to indicate the range of values to which the al- tently perfoms better than the other two approaches with an
gorithms converge. First, consider the fitness of solutions, increase in the number of clusters.
i.e. the quantization error. For all the problems, except for Figure 4 illustrates the convergence behavior of the al-
Artificial 2 , the Hybrid algorithm had the smallest average gorithms for the first artificial problem. The K-means al-
quantization error. For the Artificial 2 problem. the PSO gorithm exhibited a faster, but premature convergence ID
clustering algorithm has a better quantization error, but not a large quantization error, while the PSO algorithms had
significantly better than the Hybrid algorithm. It is only for slower convergence, hut to lower quantization errors. As
the Wine and Iris problems that the standard K-means clus- indicated (refer to the circles) in figure 4, the K-means al-
tering is not significantly worse than the PSO and Hybrid - ..
aorithm converged after I2 function evaluations. the Hvhrid

218
Quantization Intra-cluster Inter-cluster
Problem Algorithm Error Distance Distance
Artificial 1 K-means 0.984i0.032 3.678i0.085 1.77 I f0.046
PSO 0.76910.03 I 3.826f0.091 1.I4210.052
Hybrid I
0.768f0.048 I
3.82310.083 1 1.15110.043
Artificial 2 K-means I 0.264f0.001 I 0.91 1f0.027 I 0.796f0.022
PSO 0.252~0.001 0.873f0.023 0.8 l5i0.019
Hybrid 0.25010.001 0.869f0.018 0.81410.01 1
Iris K-means 0.649f0.146 3.374i0.245 0.887i0.091
PSO 0.77410.094 3.489i0.186 0.881 f0.086
Hybrid 0.633f0.143 3.30410.204 0.852f0.097
Wine, K-means 1.139iO.125 4.2023~0.22.1 I.01 O i O . 146
PSO 1.49310.095 4.91 1i0.353 2.977f0.241
Hybrid 1
1.078f0.085 1
4.199io.514 I 2.799f0. I I 1
Breast-cancer K-means 1 1.99910.054 1 6.599+0.332 1 I ,824i0.25 I
PSO 2.536f0.197 7.285f0.351 3.545i0.204
Hybrid 1.890f0.125 6.55110.436 3.335f0.097
Automotive K-means 1030.714144.69 I 1032.3551342.2 1037.920rt22.14
971.553144.11 13675.675f341.3 988.818122.44
Hvbrid 902.414i43.81 1 1895.797f340.7 952.892+21.55

23 I I
.a. 2.1
1 .o
(i Convergence I

.E

:*
1.7.
1.5
g.13
E; 1:l
0.g
0.7

LL 0.75
1 0.7

055 Figure 4: Algorithm convergence for Artificial Problem 1


0.51 , , , , , , , , 1,
2 3 4 5 6 7 8 9 10
PSO algorithm after 82 function evaluations, and the stan-
Clusters
dard PSO after 120 function evaluations.

Figure 3: Effect of different number of clusters on Artificial 6 COnChSiOnS


Problem 1
This paper investigated the application of the PSO to cluster
data vectors. Two algorithms were tested, namely a standard
shest PSO and a Hybrid approach where the individuals of
the swarm are seeded hy the result of the K-means algo-
rithm. The two PSO approaches were compared against K-
means clustering, which showed that the PSO approaches

219
have better convergence to lower quantization errors, and [ I21 T Lillesand, R Keifer, Remote Sensing and Image In-
in general, larger inter-cluster distances and smaller intra- terpretation, John Wiley & Sons. 1994.
cluster distances.
Future studies will extend the fitness function to also ex- [ I31 M Omran, A Salman. AP Engelbrecht, Image Classi-
plicitly optimize the h e r - and intra-cluster distances. More fication using Particle Swarm Optimization, Proceed-
elaborate tests on higher dimensional problems-and large ings of the 4th Asia-Pacific Conference on Simulated
number of patterns will be done. The PSO clustering algo- Evolution and Learning, Singapore, 2002.
rithms will also he extended to dynamically determine the [ 141 G Potgieter, Mining Continuous Classes using Evolu-
optimal number of clusters. tionary Computing, M.Sc Thesis. Department of Com-
puter Science, University of Pretoria, Pretoria, South
Bibliography Africa. 2002.

[ I ] HC Andrews. lntroduclion to Mathematical Tech- [I51 JR Quinlan, C4.5: Programs for Machine Learning,
niques in Pattern Recognition, John Wiley & Sons, Morgan Kaufmann, San Mateo, 1993.
New York. 1972.
1161 MR Rao, Cluster Analysis and Mathematical Pro-
121 G Ball. D Hall, A Clustering Technique for Summarir- gramming, Journal of the American Statistical Asso-
ing Multivariate Data, Behavioral Science, Vol. 12, pp ciation, Vol. 22, pp 622-626, 197 I .
153-1.55, 1967. . .
[ 171 F van den Bergh, An Analysis of Particle Swarm O p
131 A P Engelbrecht. Sensitivity Analysis of Multilayer timizers, PhD Thesis, Department of Computer Sci-
Neural Networks, PhD Thesis, Department of Com- ence, University of Pretoria, Pretoria, South Africa,
puter Science. University of Stellenbosch, Stellen- 2002.
hosch. South Africa, 1999.

[4] IE Evangelou. DG Hadjimitsis, AA Lazakidou, C Clay-


ton, Data Mining and Knowledge Discovery in Com-
plex Image Data using Artificial Neural Networks,
Workshop on Complex Reasoning an Geographical
Datal Cyprus, 2001.
[S,] LV Fausett, Fundamentals of Neural Networks, Pren-
lice Hall, 1994.
161 D Fisher. Knowledge Acquisition via Incremental
Conceptual Clustering., Machine Learning, Vol. 2, pp
119- 172. 1987.

[71 E Forgy, Cluster Analysis of Multivariate Data: Ef-


ficicncy versus Interpretability of Classification, Bio-
metrics. Vol. 2,, pp 768-769, 1965.

(81 JA Hartigan, Clustering Algorithms, John Wiley EL


Sons, New York, 1975.

[ Y ) J Kcnncdy, RC Eberhart, Particle Swarm Optimiza-


tion, Proceedings of the lEEE International Joint Con-
Scrence on Neural Networks, Vol. 4, pp 1942-1948,
199.5.

I I O ] J Kennedy, RC Eherhart, Y Shi, Swarm Intelligence,


Morgan Kaulrnann, 2002.

I I I ] T Kohonen, Sell-Organizing Maps, Springer Series


in InSormation Sciences, Vol.30, Springer-Verlag. 1995.

220

Вам также может понравиться