Вы находитесь на странице: 1из 6

2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct.

19-20, Mumbai, India

Background Removal using K-Means Clustering as


a Preprocessing Technique for DWT Based
Face Recognition
Surabhi A.R. Shwetha T. Parekh K. Manikantan S. Ramachandran
Student, Student, Associate Professor, Professor,
Dept. of Electronics Dept. of Electronics Dept. of Electronics Dept. of Electronics
and Communication Engg., and Communication Engg., and Communication Engg., and Communication Engg.,
M.S. Ramaiah Institute M.S. Ramaiah Institute M.S. Ramaiah Institute S.J.B. Institute
of Technology, of Technology, of Technology, of Technology,
Bangalore-560054, INDIA Bangalore-560054, INDIA Bangalore-560054, INDIA Bangalore-560060, INDIA
surabhiram.1991@gmail.com shwetha.parekh@gmail.com kmanikantan@msrit.edu ramachandr@gmail.com

Corresponding Author

AbstractFace Recognition (FR) under varying background explained in Section II-A.


conditions is challenging, and exacting background invariant The Discrete Wavelet Transform (DWT) is used as the
features is an effective approach to solve this problem. In this
paper, we propose a novel method for background removal based
feature extractor and the BPSO is used for feature selection.
on the k-means clustering algorithm, which lays the ground for Preprocessed FERET images obtained using the proposed
DWT-based feature extraction to enhance the performance of a model are found to give extremely good FR accuracies. A
FR system. Individual stages of the FR system are examined and suitable combination of the above mentioned preprocessing
an attempt is made to improve each stage. A Binary Particle techniques is applied to Extended Yale B, ORL and UMIST
Swarm Optimization (BPSO)-based feature selection algorithm
is used to search the feature vector space for the optimal feature
databases and the results are tabulated.
subset. Experimental results, obtained by applying the proposed For enhanced face recognition, this paper proposes the
algorithm on ORL, UMIST, Extended Yale B and ColorFERET following novel ideas.
databases, show that the proposed system outperforms other FR
systems. A significant increase in the overall recognition rate and 1) Background removal using k-means clustering for
a substantial reduction in the number of features are observed. FERET database.
2) A feature extractor and feature selector model
I. I NTRODUCTION independent of the dataset used.
Face Recognition (FR) is one of the most important research The rest of the paper is organised as follows. The
areas today mainly because of its applications in the fields preprocessing techniques, Discrete Wavelet Transform and
of authentication, security and intelligence. Many approaches BPSO concepts are explained in Section II. In Section III
to FR have been developed [1]. The main component of we introduce the concept of k-means clustering and its use
a FR system is the feature extractor. Fourier Transform in background removal for FERET images. We also explain
(FT), Discrete Cosine Transform (DCT), Discrete Wavelet the proposed preprocessing flow for the databases ORL,
Transform (DWT) and Principal Component Analysis (PCA) UMIST, Extended Yale B and FERET that achieve high
are such feature extractors. The effectiveness of an extractor recognition accuracies. The proposed models are supported
is determined by its ability to extract the best discriminant by experimental results thus establishing the success of the
features of the face [2]. Feature Selection involves the models. Finally, Section IV contains the conclusion.
derivation of a subset of the extracted features. Binary
Particle Swarm Optimization (BPSO), Genetic Algorithm
(GA), Bacterial Foraging (BF), Artificial Neural Networks
(ANN) and Joint Boosting Feature Selection [3] are few
feature selectors.
Image quality is subject to factors such as illumination,
camera quality and resolution and background. It is desirable
to neutralise the effects caused by such factors. Image
preprocessing techniques provide the user with subjective tools
and thus bring out the contrast of details contained in the
images. The preprocessing techniques used in this paper are Fig. 1. General block diagram of proposed Face Recognition system

978-1-4577-2078-9/12/$26.002011 IEEE 1
2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

Fig. 2. Logarithmic Transform Fig. 4. Unsharp Masking with k=1

Fig. 2 shows the result of applying log transform. It maps


a narrow range of low-level intensities into a wide range of
intensities and maps a wide range of high-level intensities into
a narrow range of high-level intensities, thus increasing the
overall brightness of the image [4].
2) Histogram Equalization: Histogram equalization is an
effective contrast enhancement technique. It works especially
well in regions of low contrast by distributing the pixels
over a wider intensity range [5]. Histogram equalization
accomplishes this by effectively spreading out the most
frequent intensity values. The method is useful in images with
backgrounds and foregrounds that are both bright or both dark.
A disadvantage of this method is that it is indiscriminate.
It may increase the contrast of background noise, while
decreasing the usable signal. Fig. 3 shows the result of HE.
3) Scale Normalization: Scale Normalization is a
Fig. 3. Histogram Equalization
preprocessing technique used to remove unwanted components
in an image. In this paper, scale normalization is used to
II. F UNDAMENTAL C ONCEPTS retain only that region of the image where the face is present.
The general face recognition system is as shown in Fig. It is achieved as follows. An edge detection operation is
1. The properties of the image are first suitably enhanced performed on the image. The resulting binary image is
using image preprocessing techniques as in Section II-A. It scanned horizontally from top and bottom to find the first
is required that the important features be extracted from the edges. These rows are stored as Rmin and Rmax. Next, the
images in order to perform recognition. This paper employs the image is scanned vertically from left and right sides to find
DWT extractor as explained in Section II-B. The performance the first edges. These columns are stored as Cmin and Cmax.
of a FR system depends mainly on the features selected. BPSO The scale normalized image consists of the rows Rmin:Rmax
is an optimal feature selection tool. The algorithm and working and columns Cmin:Cmax of the original image [6].
of the same are explained in Section II-C. 4) Digital Unsharp Masking: This technique is an
extremely interesting and highly simplistic way to increase
A. Image Enhancement Techniques the sharpness of an image. A smoothened version of an image
Image enhancement is a basic image processing task is subtracted from the original image to give what is called a
that enables us to have subjective judgement over images. mask. A weighted portion of this mask is added back to the
These methods provide better contrast of details contained original image, thus enhancing the edges and the sharpness of
in the images. Background Removal removes the unwanted the image [4]. Let f be the image and fLP be the low-pass
components of the image. This is achieved by various filtered version. Mask is given by f - fLP . The resulting image
methods. k-means clustering for image segmentation and thus is g= f + k*mask, where k is a constant greater than 1 for high
background removal for FERET images is proposed in this boost filtering. Fig. 4 shows the result of unsharp masking.
paper. B. Discrete Wavelet Transform
1) Logarithmic Transformation: The logarithmic operator
Discrete Wavelet Transform is a mathematical tool capable
is a simple point processor where the mapping function is
of providing both the spatial and frequency representations of
a logarithmic curve. In other words, each pixel value is
the image simultaneously. This capability of DWT promotes
replaced with its logarithm. Most implementations take either
its use in face recognition. The decomposition of data into
the natural logarithm or the base 10 logarithm. The logarithmic
different ranges allows us to isolate the frequency components
mapping function is given by
introduced by intrinsic deformations due to expressions or
F = c log(1 + r) (1) extrinsic factors (like illumination) into certain sub bands [2].
The value of the constant c in the Eq. (1) is flexible and can There exists a large set of wavelet families depending upon the
be used according to requirements. r is the pixel intensity. choice of the mother wavelet. The low frequency components

978-1-4577-2078-9/12/$26.002011 IEEE 2
2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

Fig. 5. 2-D Wavelet Decomposition of a face image. (a) Original image (b) 1-level wavelet decomposition (c) 2-level wavelet decomposition

give a global description while the high frequency components where Vi+1 is the updated velocity vector, Vi is the present
concentrate on the finer details of the image. As shown in velocity vector, LBest is the Local Best Position of the particle,
Fig. 5, at the end of each level of wavelet decomposition, GBest is the best position attained by the entire swarm, is
four new images called LL, LH, HL and HH are created from the inertia factor.
the original image. The LL image is a reduced version of The positions of the particles are updated as
the original and retains most details. The LH image contains 1
horizontal edge features, while the HL image contains vertical if rand < , (Xi+1 ) = 1 else (Xi+1 ) = 0 (3)
1+eVi+1
edge features. The HH image contains only high frequency where Xi+1 is the updated position and rand is a string of
information and is typically noisy. In wavelet decomposition, random numbers between 0 and 1.
only the LL image is used to produce the next level of 2) Fitness Function: The fitness function mainly aims
decomposition as in Fig. 5. at increasing the class separation, which optimizes the
In this paper, face recognition using DWT is based on recognition process. The class means and global means are
the facial features extracted from the Reverse Biorthogonal calculated as follows
Wavelet Transform. Ni
Mi = N1i Wj (i) , i=1,2,3....L
P
C. Binary Particle Swarm Optimization
j=1
Particle Swarm Optimization, first introduced by Kennedy
(i)
and Eberhart, is an algorithm based on the social behavior where Wj , j=1,2,...Ni represents the sample images from
of bees and birds [7], [8]. This method searches the problem class wi and the grand mean M0 is given by
space iteratively and converges to an optimum solution. The L
1 X
position of each particle is updated by a velocity vector using M0 = Ni Mi
N
prior knowledge about the best position of the particle and i=1

that of the swarm as a whole. In each iteration, each particle is where N is the total number of images for all the classes. Thus
evaluated based on the value returned by a Fitness Function. In the fitness function F is computed as follows
v
Binary Particle Swarm Optimization (BPSO) [9], the particle u L
uX
position is coded as a binary string. The parameters used are T
F =t (M M ) (M M )
i 0 i 0 (4)
Swarm Size N=30, cognitive factor c1 =2, social factor c2 =2, i=1
Inertia Weight =0.6, Number of Iterations=100. where T denotes transpose of the matrix.
1) BPSO Algorithm: The algorithm for BPSO operation is
In this paper, the BPSO is used as a feature selector and
as follows:
the parameters are defined above.
Step 1: Initialize parameters c1 , c2 , r1 and r2
3) Euclidean Classifier: The Euclidean Distance Classifier
Step 2: Generate N particles with random positions and
is used to measure the similarity between test vector and the
velocities
reference vectors in the gallery [2]. The reference vectors
Step 3: Evaluate particles fitness using fitness function given
are the feature vectors obtained from the training images by
in Eq.(4)
applying DWT and BPSO feature selection. The Euclidean
Step 4: If fitness > fitness of particles LBest, update LBest.
distance between any two vectors in space is given as
Step 5: If fitness > fitness of present GBest, update GBest. v
Step 6: If stopping criteria are satisfied, terminate process and u N
uX 2
return feature vector. Else, update the velocity of each particle D =t (p q ) i i (5)
using Eq. (2) and position using Eq. (3) i=1
The velocities of the particles are updated as where pi or qi is the coordinate of p or q respectively in
Vi+1 =Vi +r1 c1 (LBestXi )+r2 c2 (GBestXi ) (2) dimension i.

978-1-4577-2078-9/12/$26.002011 IEEE 3
2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

(a) Original (b) k=3 (c) k=4 (d) k=5


Fig. 7. Result of clustering for different values of k

clustering algorithms [14]. k-means is the simplest algorithm


that classifies data based on their distance from each other
Fig. 6. Process flow for k-means clustering
[15]. The application of k-means clustering in order to achieve
image segmentation such that the background is isolated as
one segment is the main idea behind this concept. The process
III. D ISCUSSION OF P ROPOSED FR S YSTEMS AND
flow is described in Fig. 6. The k-means clustering algorithm
E XPERIMENTAL R ESULTS
is applied to FERET database images. The proposed algorithm
FERET database images have significant, slightly varying is as follows:
background which interferes in the FR process. Hence, Step 1: k cluster centers are randomly assigned. The values
techniques to overcome this effect are detailed in Experiment of these centroids are random gray-level intensities.
1. ORL and UMIST databases contain images that have Step 2: Each pixel in the image is assigned to that cluster
homogeneous background. Experiment 2 details the proposed whose centroid value is closest to the pixel intensity value.
FR model for these databases. Extended Yale B images have Step 3: The Euclidean Distance measurement is used to
a wide variation of illumination and thus they require an determine the closeness between the pixel and the centroids.
illumination correction technique. This is implemented in Step 4: The values of the cluster centroids are recalculated by
Experiment 3. averaging the intensity values of all the pixels in each cluster.
A. Experiment 1: FERET Database Steps 2-4 are repeated until no pixel changes its cluster.
The FERET database is a standard dataset that includes Thus, at the end of this procedure, the image has k segments.
11338 facial images of 994 individuals [10]. It has the The distance between the object and the centroid is
following subsets : fa and fb are frontal images, ra, rb, rc, calculated using the Euclidean Distance given by
k X
rd and re are random orientations, ql, qr, pl, pr, hl and hr are X 2
V= (xj i )
quarter left, quarter right, profile left, profile right, half left
i=1 xj Si
and half right respectively. We have chosen 20 images each
for 35 random subjects such that there are 4 images from ra where there are k clusters Si , i=1,2,.....,k and is the centroid
to re and 2 images each from the remaining subsets. We have or mean point of all the points xj Si
used smaller, that is, 384 256 pixels, version of the images The k-means clustering method [16] described above was
for our experiments. applied to FERET database. The segmented images with
The following sections describe the proposed background different values of k are as shown in Fig. 7. It is observed
removal technique, the preprocessing model and the FR model. that, for k=3, the image is divided into 3 segments in such
1) Proposed Background Removal Technique Using a way that the background is a single segment. This forms
k-means Clustering: Separating dynamic objects, such the foundation for using this method. A binary image is
as people, from a relatively static background scene is a created such that all non-white pixels of the segmented image
very important preprocessing step in many applications. are given the value 1 and all white pixels of the segmented
Accurate and efficient background removal is critical for image are given the value 0. Reconstruction of the image is
face recognition as the background information [11] may performed from the original image for the regions with value
interfere with the useful facial information. In this section, 1 in the binary image. The resulting image is found to have
the proposed background removal using k-means clustering been stripped of the background very effectively. This step has
[12] is explained. a large effect on the recognition accuracy.
Clustering is a process of organizing objects into groups 2) Proposed Preprocessing on FERET images: The images
whose members are similar in some way. A cluster is therefore from FERET database are subjected to k-means clustering with
a collection of objects which are similar to one another and k=3. As explained in Section III-A1, background removal is
are dissimilar to objects belonging to other clusters [13]. The performed. To this image, scale normalization is applied so
types of clustering are distance-based and concept-based. Here that only the facial part of the image is retained. Successive
we employ a concept-based clustering algorithm to achieve applications of Logarithmic Transformation, Unsharp Masking
image segmentation. k-means Clustering, Fuzzy C-Means, and Histogram Equalization are performed. This is illustrated
Hierarchical Clustering and Mixture of Gaussians are a few in Fig. 8.

978-1-4577-2078-9/12/$26.002011 IEEE 4
2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India

Fig. 8. Preprocessing steps for FERET images

(a) FERET

(a) RR v/s Training:Testing Ratio

(b) ORL and UMIST (c) Extended Yale B


(b) RR v/s No. of (c) RR v/s constant c (d) RR v/s Types of
Fig. 9. Preprocessing blocks for different databases Decomposition Levels Wavelets (Level 7)

3) Proposed FR model for FERET database: The face Fig. 10. Results for Experiment 1:FERET
recognition model is implemented for FERET database with
the preprocessing block as shown in Fig. 9(a). A fixed number
of images from each class are chosen to be the training set. The
FERET images are subjected to the preprocessing techniques
as in Fig. 8. The one-dimensional wavelet transform is applied
to each of these images and features are selected using BPSO. (a) Extended Yale B
This collection of images after feature extraction and selection
forms the face gallery. A test image is randomly picked from
the remaining preprocessed FERET images and the rbio DWT
is applied to it. The transformed test image is multiplied
with the feature vector returned by the BPSO in the training
stage. Using the Euclidean Classifier Eq. (5), the test image
(b) ORL
is compared with each of the images in the face gallery. One
with the least distance is returned as the best match.
Initially, we choose a value randomly for c (of Eq. (1))
between 0.25 and 2 and for DWT level of decomposition
between 5 and 8. One wavelet family among the four
popular families namely, haar, symlet, biorthogonal and (c) UMIST
Fig. 11. RR v/s Training:Testing Ratios
reverse biorthogonal, is chosen randomly. Then, by trial and
error method, we found that the following parameter values TABLE I
resulted in the highest recognition rate: c=0.25, level of PARAMETERS AND R ESULTS OF E XPERIMENTS
decomposition=7, wavelet family-Reverse Biorthogonal (rbio).
ORL UMIST EXTENDED FERET
The results are shown in Fig. 10(b), (c), (d). Thus fixing YALE B

these values, an experiment was conducted for different DWT level (rbio1.3) 5 5 7 7
Number of features selected 194 355 62 260
training:testing ratios. As seen in Fig. 10(a), the recognition Average testing time/ image(ms) 9.16 7.19 46 40
Training to Testing Ratio 4:6 7:12 3:16 8:12
rate remains almost constant at the maximum value beyond Peak Recognition Rate(%)
Average Recognition Rate(%)
98.33
97.68
99.16
98.51
97.9
96.32
94.2
86.14
the ratio 8:12. Hence, we find the ratio 8:12 is optimum for
minimum computation time. The average recognition rate of facial expressions and facial details. All the images are taken
86.14% thus obtained is better than existing systems [17]. against a dark homogeneous background with the subjects in
an upright, frontal position.The size of each image is 112
B. Experiment 2: ORL and UMIST Databases 92 pixels, with 256 gray levels per pixel.
ORL database contains different images of 40 distinct The UMIST Face Database contains grayscale images of
subjects [18]. Images are taken at different times, varying the size 112 92 pixels [19]. It has images of 20 unique subjects.

978-1-4577-2078-9/12/$26.002011 IEEE 5
2012 International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, Mumbai, India
The number of images per person varies from 19 to 36. It performance under frontal poses with variations in facial
contains images with varing angles from left profile to right expressions and facial details (ORL and UMIST databases).
profile. We have chosen 19 images per person. The experimental results indicate that the proposed method
In this experiment, we apply the general face recognition has performed well under severe illumination variations with
model in Fig. 1 with the preprocessing techniques in the order top recognition rates having reached 97.9% for Subset 5 of
log transform and unsharp masking for ORL and UMIST Extended Yale B considering only Pose 1. It is also successful
databases as shown in Fig. 9(b). As ORL and UMIST database in tackling the most challenging task of pose variance in FR
images do not contain much background information, k-means with average recognition rate of 86.14% for ColorFERET
fails to isolate it and hence is not used. The preprocessed considering all 13 poses. Thus, the proposed method has
images are given to a feature extractor which is DWT (reverse proven to be a promising technique under arbitrary variations
biorthogonal wavelet). As in Experiment 1 described above, in illumination, poses and backgrounds.
we determine the values of c and the decomposition level as This paper uses a simple Euclidean Classifier. By using
0.25 and 5 respectively by trial and error method. The reverse other classifiers such as SVM, Random Forest etc., the
biorthogonal wavelet was found to give the best results. Using performance of the FR system is expected to improve
these values, the recognition rates for various training:testing substantially.
ratios for both ORL and UMIST databases are shown in Fig.
11(b), (c). The training:testing ratios were fixed at 4:6 for ORL
R EFERENCES
and 7:12 for UMIST as there is no significant improvement
in the recognition rate beyond these values. It is seen that the [1] W. Zhao, R. Chellappa, P. T. Philips, A. Rosenfeld, Face Recognition: A
Literature Survey, ACM Computing Surveys, Vol. 35, No. 4, pp. 399-455,
wavelet family and the value of c are consistent with those of 2008.
FERET. Due to the smaller size of ORL and UMIST images, [2] Rabab M. Ramadan, Rehab F. Abdel-Kader, Face Recognition Using
decomposition levels higher than 5 did not provide higher Particle Swarm Optimization-Based Selected Features, International
Journal of Signal Processing, Image Processing and Pattern Recognition,
accuracies. The results are tabulated in Table I. Vol. 2, No. 2, 2009.
[3] Rong Xiao, Wujun Li, Yuan Dong Tian, Xiaoou Tang, Joint Boosting
C. Experiment 3: Extended Yale B Database Feature Selection for Robust Face Recognition, Proceedings of Computer
Extended Yale B contains 16128 images of 28 human Vision and Pattern Recognition, IEEE Computer Society, pp. 1415-1422,
2006.
subjects under 9 poses and 64 illumination conditions [20]. [4] R. Gonzalez, R. Woods, Digital Image Processing, Addison Wesley
We have used 19 images from subset 5 (Pose 1) for each of Publishing Company, 3rd Edition, 2009.
the 28 subjects. The size of the images are 640 480 pixels. [5] Rajesh Garg, Bhawna Mittal, Sheetal Garg, Histogram Equalization
Techniques for Image Enhancement, Proceedings of IJECT, Vol. 2, Issue
In this experiment, we apply the general face recognition 3, 2011.
model in Fig. 1 with the preprocessing techniques for [6] Muhammad Almas Anjum, M. Y. Javed, A. Basit, Face Recognition Using
the Extended Yale B database as shown in Fig. 9(c). To Double Dimension Reduction, World Academy of Science, Engineering
and Technology, 2005.
normalize the illumination variance, histogram equalization [7] J. Kennedy, R. Eberhart, Particle Swarm Optimization, Proceedings of
is used before performing log transform and unsharp IEEE International Conference on Neural Networks, pp. 1942-1948, 1995.
masking. As in the previous experiments, we fix c=0.25 [8] J. Kennedy, R. Eberhart, A New Optimizer using Particles Swarm Theory,
Proceedings of 6th International Symposium on Micro Machine, Human
and level of decomposition=7 by trial and error. Again, the Science, pp. 39-43, 1995.
reverse biorthogonal wavelet was found to give the best [9] J. Kennedy, R. C. Eberhart, A Discrete Binary Version of the Particle
results. Thus fixing these parameters, results for different Swarm Algorithm, Proceedings of IEEE International Conference on
Systems, Man, and Cybernetics, Vol. 5, pp. 4104-4108, 1997.
training:testing ratios were obtained as shown in Fig. 11(a). [10] FERET Database: http://face.nist.gov/colorferet.
Extended Yale B (Pose 1) images differ from each other [11] Susanta Mukhopadhyay, Bhabatosh Chanda, Multiscale Morphological
only in terms of illumination and have no pose variations. Segmentation of Gray-Scale Images, IEEE transactions on Image
Processing, Vol. 12, No. 5, 2003.
Illumination variations are to an extent neutralised by [12] Nikhil R. Pal, Sankar K. Pal, A Review On Image Segmentation
histogram equalization. Thus, the recognition rate does not Techniques, Pattern Recognition, Vol. 26, No. 9, pp. 1277-1294, 1993.
significantly increase beyond the ratio 3:16. The results are [13] Wikipedia contributors, K-means Clustering.
[14] Wikipedia contributors, Cluster analysis.
tabulated in Table I. [15] Suman Tatiraju, Avi Mehta, Image Segmentation using k-means
clustering, EM and Normalized Cuts, 2008.
IV. C ONCLUSION [16] Madhuri A. Dalal, Nareshkumar D. Harale, Umesh L. Kulkarni, An
A novel approach for a flexible FR system is proposed, iterative improved k-means Clustering, Proceedings of International
Conference on Advances in Computer Engineering, 2011.
which uses the combination of k-means clustering for [17] G. M. Deepa, R. Keerthi, N. Meghana, K. Manikantan, Face recognition
preprocessing, DWT-based feature extraction and a using spectrum-based feature extraction, Applied Soft Computing
BPSO-based feature selection, implemented using MATLAB Journal, Vol. 12, Issue 9, pp. 2913-2923, 2012.
[18] ORL Database: http://www.cl.cam.ac.uk/Research/DTG/attarchive/
[21]. k-means clustering has played a key role in efficient facedatabase.html.
background removal, which is the main contributor for [19] UMIST Database: http://www.sheffield.ac.uk/eee/research/iel/research/
the high recognition rates being obtained in ColorFERET face.
[20] Extended Yale B Database: http://cvc.yale.edu/projects/yalefacesB/
database. A successful attempt has been made to equally subsets.html.
handle all image variations (facial expressions, pose and [21] MATLAB: www.mathworks.com.
illumination). The proposed method exhibits extremely good

978-1-4577-2078-9/12/$26.002011 IEEE 6

Вам также может понравиться