Академический Документы
Профессиональный Документы
Культура Документы
www.elsevier.com/locate/eswa
Abstract
This study introduces a comparative study of implementation of clustering algorithms on classification of the analog modulated
communication signals. A number of key features are used for characterizing the analog modulation types. Four different clustering algorithms
are used for classifying the analog signals. These most representative clustering techniques are K-means clustering, fuzzy C-means clustering,
mountain clustering and subtractive clustering. Performance comparison of these clustering algorithms and the advantages and disadvantages
of the methods are examined. The validity analysis is performed. The study is supported with computer simulations.
q 2005 Elsevier Ltd. All rights reserved.
one location to another. Usually, it is required to identify method and subtractive clustering. More detailed discus-
and monitor these signals for many applications, both sions of clustering techniques are presented in (Duda &
defense and civilian. Civilian applications may include Hart, 1973; chürman, 1996).
monitoring the non-licensed transmitters, while defense
applications may be electronic surveillance (Azzouz &
2.1. K-means clustering
Nandi, 1996) or warfare purposes like threat detection
analysis and warning. Modulation recognition is extremely
K-means clustering also known as C-means clustering
important in communication intelligence applications for
has been applied to a variety of areas, including image
several reasons. Firstly, applying the signal to an improper
segmentation, speech data compression, data mining and so
demodulator may partially or completely damage the signal
on. The steps of K-means algorithm, are therefore, first
information content. Secondly, knowing the correct modu-
described in brief.
lation type help recognize the threat and to determine
suitable jamming wave-form. At the moment, the most Step 1 Choose K initial cluster centers z1, z2,.,zK
attractive applications area is radio and other re-configur- randomly from the n points{X1, X2, X3,.,Xn}.
able communication systems. Step 2 Assign point Xi, iZ1,2,.,n to the cluster Cj, j2{1,
In this paper, it has been investigated that how the 2,.,K} if kXiKzjk!kXiKzpk, pZ1, 2,.,K and
conventional clustering techniques work on modulation jsp
classification. For comparison, K-means clustering, fuzzy Step 3 Compute new cluster centers as follows
C-means clustering, mountain clustering and subtractive
clustering techniques were selected and evaluated on a data 1 X
znew
i Z X i Z 1; 2; .; K (1)
set obtained from analog modulated communication signals. ni X 2C j
j i
These modulations are amplitude modulated signals (AM),
double side band modulated signals (DSB), upper side band where ni is the number of elements belonging to the cluster
signals (USB), lower side band signals (LSB) and frequency Ci.
modulated signals (FM). Two key features, the standard Step 4 If jjznew Kzi jj! 3, iZ1,2,.,K, then terminate.
i
deviation of the direct phase component of the intercepted Otherwise continue from step 2.
signal and the signal spectrum symmetry around the carrier,
are employed for forming the data points. A comparative Note that in case the process does not terminate at step 4
study is achieved based on the computer simulations. The normally, then it is executed for a maximum number of
analysis of modulation classification requires appropriate iterations.
definitions of similarity measures to characterize differences
between modulation types. However, there has not been
discussed such a comparative study which incorporates 2.2. Fuzzy C-means clustering
characteristics of modulation types. The advantages and
disadvantages of the examined unsupervised clustering Fuzzy C-means clustering is a data clustering algorithm
techniques, which are K-means clustering, fuzzy C-means in which each data point belongs to a cluster to a degree
clustering, mountain clustering and subtractive clustering, specified by a membership grade. Bezdek proposed this
are investigated and simulation results are given. algorithm in 1973 (Bezdek, 1973) as an improvement over
earlier K-means clustering described in the previous title.
FCM partitions a collection of n vector Xi, iZ1,2,.,n into c
fuzzy groups, and finds a cluster center in each group such
2. Clustering that a cost function of dissimilarity measure is minimized.
The steps of FCM algorithm, are therefore, first described in
Clustering in N-dimensional Euclidean space RN is the brief.
process of partitioning a given set of n points into a number,
Step 1 Chose the cluster centers ci, iZ1,2,.,c randomly
say K, of groups or clusters in such a way that patterns in the
from the n points{X1, X2, X3,.,Xn}.
same cluster are similar in some sense and patterns in
Step 2 Compute the membership matrix U using the
different clusters are dissimilar in the same sense. Let the set
following equation
of n points {X1,X2,X3,.,Xn} be represented by the set S and
the K clusters be represented by C1,C2,.,CK. Then CisØ 1
for iZ1,2,.,K and CihCjZØ for iZ1,2,.,K, jZ1,2,.,K mij Z P
c 2=ðmK1Þ (2)
dij
and isj and gKiZ1 Ci Z S. In this study, we examine four of dkj
kZ1
the most representative clustering techniques which are
frequently used in radial basis function networks and fuzzy where dijZkciKxjk is the Euclidean distance between ith
modeling (Jang & Sun, 1997). These are K-means cluster center and jth data point, and m is the fuzziness
clustering, fuzzy C-means clustering, mountain clustering index.
644 H. Guldemır, A. Sengur / Expert Systems with Applications 30 (2006) 642–649
Step 3 Compute the cost function according to the after subtraction the second cluster center is again selected
following equation. Stop the process if it is below as the point in C that has the largest value for the new
a certain threshold mountain function. This process of revising the mountain
function and finding the next cluster centers continues until
X
c c X
X n a sufficient number of cluster centers are attained.
JðU; c1 ; .; cc Þ Z Ji Z mm 2
ij dij (3)
iZ1 iZ1 jZ1 2.4. Subtractive clustering
Step 4 Compute new c fuzzy cluster centers ci, iZ1,2,.,c
The mountain clustering method is simple and effective.
using the following equation
However, its computation grows exponentially with the
P
n dimension of the problem. An alternative approach is
mm
ij Xj subtractive clustering proposed by Chiu (Chiu, 1994) in
jZ1
ci Z P
n (4) which data points are considered as the candidates for center
mm
ij
of clusters. The algorithm continues as follow
jZ1
Step 1 Consider a collection of n data points {X1,X2,X3,.,
go to step 2. Xn} in an M-dimensional space. Since each data
point is a candidate for cluster center, a density
measure at data point Xi is defined as
2.3. Mountain clustering
!
Xn
jjXi KXj jj2
The mountain clustering method as proposed by Yager Di Z exp K (7)
jZ1
ðra =2Þ2
and Filev (Yager & Filev, 1994) is a relatively simple and
effective approach to approximate estimation of cluster where ra is a positive constant. Hence, a data point will have
centers on the basis of a density measure called mountain a high density value if it has many neighboring data point.
function. The following is a brief description of the The radius ra defines a neighborhood; data points outside
mountain clustering algorithm. this radius contribute only slightly to the density measure.
Step 1 Initialize the cluster centers forming a grid on the Step 2 After the density measure of each data point has
data space, where the intersection of the grid lines been calculated, the data point with the highest
constitute the candidates for cluster centers, density measure is selected as the first cluster
denoted as a set C. A finer gridding increases the center. Let Xc1 be the point selected and Dc1 its
number of potential cluster centers, but it also density measure. Next, the density measure for
increases the computation required. each data point Xi is revised as follows
Step 2 Construct a mountain function that represents a
density measure of a data set. The height of the !
mountain function at a point c2C can compute as jjXi KXj jj2
Di Z Di KDc1 exp K (8)
ðrb =2Þ2
X
N
jjcKxi jj2
mðcÞ Z exp K (5) where rb is a positive constant.
iZ1
2s2
Step 3 After the density calculation for each data point is
where xi is the ith data point and s is a design constant. revised, the next cluster center Xc2 is selected and
all of the density calculations for data points are
Step 3 Select the cluster centers by sequentially destruct- revised again. This process is repeated until a
ing the mountain function. First find the point in the sufficient number of cluster centers are generated.
candidate centers C that has the greatest value for
the mountain function; this becomes the first cluster
center c1. Obtaining next cluster center requires 3. Feature clustering and classification
eliminating the effect of the just-identified center,
which is typically surrounded by a number of grid The first step in any classification system is to identify
points that also have high density scores. This is the features that will be used to classify the data. Feature
realized by revising the mountain function as extraction is a form of data reduction, and the choice of
follows feature set can affect the performance of the classification
system. Some classifications can be determined from a
single feature, however, most are confirmed by examining
jjcKc1 jj2 several features at once (Sengur & Guldemir, 2003, 2005).
mnew ðcÞ Z mðcÞKmðc1 Þexp K (6)
2b2 Algorithms that do this statistically, known as clustering
H. Guldemır, A. Sengur / Expert Systems with Applications 30 (2006) 642–649 645
Fig. 1. (a) Theoretically produced first order autoregressive signal; (b) real voice signal.
algorithms (Gerhard, 2000). Each piece of data, called a where m is the modulation index, x(t) is the modulating
case, corresponds to an observation of a modulated signal signal and fc is the carrier frequency, y(t) is the Hilbert
and the features extracted from that observation are called transform. Kf is the frequency deviation coefficient of FM
parameters. Clustering algorithms work by examining a signal. In the expression given for SSB, the negative sign is
large number of cases and finding groups of cases with used for upper side-band (USB) signal generation and the
similar parameters. These groups are called clusters, and are positive sign is used for lower side-band (LSB) signal
considered to belong to the same category in the generation.
classification. In order to increase the accuracy of the classification,
a number of simulations have been done with theoreti-
cally produced different modulated signals with different
parameters such as various signal-to-noise ratios and
4. Signal generation and implementation modulation index. Sixty simulated signals of each of the
modulation types of DSB, LSB and USB have been
In the modulation schemes, two types of signals are used. generated. One hundred and twenty signals for AM with
These signals are a real voice signal and a simulated voice modulation indices of 0.3 and 1, and 180 signals for FM
signals both band-limited to 4 kHz. The simulated voice with frequency modulation indices of 1, 5, and 10 are
signal is produced by a first order autoregressive process of generated. Totally 480 modulated signals are used for the
the form (Dubuc, Boudreau, Patenaude, & Inkol, 1999) classification. These signals are generated and processed
y½k Z 0:95y½kK1 C n½k (9) using Matlab functions in Communication Toolbox. An
additive white Gaussian noise with SNR of between 0
where n[k] is a white Gaussian noise. and 60 dB is used in the modeling of theoretically
A modulated signal s(t) can be expressed by a function of produced analog modulated signals.
the form In the simulations, a first degree autoregressive 4 kHz
sðtÞ Z ac aðtÞcosð2pfc t C 4ðtÞ C q0 Þ (10) band-limited voice signal with sampling rate of 10 kHz and
resampled with 44 kHz and modulated by a 15 kHz
where a(t) is the signal envelope, fc is the carrier frequency, sinusoidal carrier is used. Fig. 1a shows the theoretically
f(t) is the phase, q0 is the initial phase and ac controls the produced signal. In order to incorporate the classification
carrier power. Particular modulation types are obtained by system in real application, the system is also tested with the
encoding the base band message into a(t) and f(t). The real voice signal shown in Fig. 1b.
modulation types were restricted to the types commonly The experimental results comparing the examined
used in analog communication. AM, DSB, SSB and FM clustering algorithms are provided for data set which
signals are expressed as follows, respectively is generated from the five analog modulated signals.
sðtÞ Z ½1 C mxðtÞcosð2pfc tÞ (11) This is a non-overlapping two dimensional data set,
where the number of cluster is five. The data set
sðtÞ Z xðtÞcosð2pfc tÞ (12) is generated as follows: The source signal is modulated
using the analog modulation schemas. An Additive
sðtÞ Z xðtÞcosð2pfc tÞHyðtÞsinð2pfc tÞ (13) White Gaussian Noise (AWGN) is introduced to the
2 3 modulated signal such that the signal has the signal-to-
ðt noise ratio randomly distributed between 0 and 60 dB
sðtÞ Z cos42pfc t C Kf xðtÞdt5 (14) range and the features are extracted from these
KN modulated signals.
646 H. Guldemır, A. Sengur / Expert Systems with Applications 30 (2006) 642–649
Fig. 2. P versus sdp (a) for first order autoregressive voice signal, (b) for real voice signal.
Two key features are used in this study for generating the carrier frequency, fc and fcn is defined as
data set. The first feature is the standard deviation of the fc Ns
direct phase component of the intercepted signal and it is fcn Z K1 (19)
fs
calculated as follow (Azzouz & Nandi, 1996)
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi here, it is assumed that the carrier frequency is known.
u ! !2 The P versus sdp features for the data set from
u1 X 1 X
sdp Z t 2
f ðiÞ K f ðiÞ (15) autoregressive voice signal and real voice signal are shown
C A ðiÞOt NL C A ðiÞOt NL in Fig. 2.
n a n a
the variation of the DB index, XB index and PBM index cannot be found with a reasonable amount of
with the number of clusters in the range 2–8, when the computation. In this study, K-means algorithm converge
fuzzy-c means algorithm is used for clustering. The different values due to the initial cluster centers. On the
optimum values of the indices are presented in boldface in other hand, the number of the clusters in the data sets
the table. must be specified before the process. Fuzzy C-means
algorithm is executed the best results even the initial
cluster centers has changed, but no guarantee ensures that
FCM always converge to an optimal solution. Mountain
6. Results and discussion clustering which is based on what human does in
visually forming cluster of a data set. Here, s parameter
In this study, four most popular clustering algorithms affects the height as well as the smoothness of the
are examined and a comparative study on the analog mountain function. The surface plot of the mountain
modulated communication signal is performed. The function with sZ0.05 is shown in Fig. 4. The mountain
performances of the clustering algorithms are tested clustering application results are satisfactory. However,
with MATLAB based computer simulations. The results its computation grows exponentially with the dimension
are shown in Figs. 3–7 for real voice signal. K-means of the problem because the method must evaluate the
algorithm is widely used at pattern recognition appli-
mountain function over all grid points. Subtractive
cations but it may converge to values that are
clustering method aims to overcome this problem by
not optimal. Also global solutions of the large problems
Table 4
Mountain K-means
Table 5
Mountain fuzzy c-means
Table 3
Performance of the k-means algorithm