Вы находитесь на странице: 1из 10

Published in IET Biometrics

Received on 28th August 2012


Revised on 14th May 2013
Accepted on 24th May 2013
doi: 10.1049/iet-bmt.2012.0049
ISSN 2047-4938
Efficient person authentication based on multi-level
fusion of ear scores
Latha Lakshmanan
Computer Science and Engineering Department, Kumaraguru College of Technology, Chinnavedampatti, Ganapathy,
Coimbatore 641006, TN, India
E-mail: surlatha@yahoo.com
Abstract: A two-stage geometric approach that is both scale and rotation invariant is implemented for extracting the unique features
present in the surface of an ear image. As occlusion because of ear rings and hair signicantly affect the efciency of ear recognition
process, only the middle portion of the ear is considered in this work. The resultant matching scores are compared against a threshold
to make a decision for authenticating a person. It is found that the fused scores obtained from the two levels of feature extraction
enhance the recognition accuracy compared with that of the individual stages. Finally, particle swarm optimisation technique is
applied on the matching scores in order to optimise the fusion parameters such as decision threshold and weights. It results in
further improved verication rates compared with the fusion of scores without optimisation. Thus, the proposed method works
on partial ear images and demonstrate the presence of more unique features in the middle part of the ear (as seen by the increase
in recognition accuracy) and the method also aids in reducing the computation time.
1 Introduction
Ear recognition is relatively a new eld of interest in the
perspective of human authentication [1]. The geometry and
shape of an ear have been observed to have signicant
variation among individuals. Ear is made up of features
such as outer rim (helix) and ridges (anti-helix), lobe
(bottom most part), concha (hollow part of ear) and tragus
(small prominence of cartilage) as shown in Fig. 1.
Earlier studies reveal that ear structures have unique
features, even among identical twins, especially in the
concha and lobe areas. The forensic science literature
reports that growth of ear is highly linear after the rst 4
months of birth [2]. During the period from eight years to
around the age of seventy, the ear structure remains
constant for a person, after which it again increases. The
proposed ear authentication system is based on geometric
approach that uses features present in both outer and inner
portions of an ear image, which actually represents the
shape and geometry of the ear.
The advantages and drawbacks of an ear recognition
system can be summarised as follows:
i Advantages:
less variation of ear structure because of ageing when
compared with face [3];
high stability of the ear pattern throughout a persons life
[3];
uniqueness of outer ear shape that do not change because
of emotion etc.[4];
limited surface of the ear allows faster processing
compared with face [4];
lack of expressive variation (as in face) reduces the
intra-class variations [5];
easy to capture ear even at a distance and the process is
non-invasive [3]; and
appearance of the ear is not altered by facial make-up,
spectacles or beards [4].
ii Drawbacks:
ear is often occluded by hair, cap and ear rings, which
cause a challenge to effective ear recognition [4];
pose variation of ear images and the presence of eyeglasses
may reduce the system performance [3]; and
ear may need to be recognised at a distance in the outdoor
environment, where signicant changes in the illumination
and shadows could pose severe challenges to robust ear
segmentation [4].
2 Review of ear recognition algorithms
Recognising people by their ear has recently received
signicant attention in the literature and a variety of
techniques have been published. An overview of different
methods employing two-dimensional (2D) ear recognition is
discussed below.
2.1 Iannarellis 12 point system
Iannarelli [2] proposed a way of identifying people using
photographs of ears of prisoners. It was developed as a
www.ietdl.org
IET Biom., 2013, Vol. 2, Iss. 3, pp. 97106 97
doi: 10.1049/iet-bmt.2012.0049 & The Institution of Engineering and Technology 2013
manual system by using more than 10 000 ears. The system
was built based on 12-measurements derived from the
intersection of vertical, horizontal, diagonal and
anti-diagonal lines drawn from the centre point of ear to
intersect the internal and external curves on the surface of
pinna. These points represent the unique features of an ear
and were measured by the 12 distances indicated in Fig. 2.
Iannarelli used right ears and normalised them by resising
and rotating to match a template. He tried to identify an ear
against the templates of the prisoners. The system requires
that the photographs are in the standardised form. An
alignment process is typically performed manually and the
measurement is taken in units of 4 mm and assigned an
integer distance value. These values, in addition to sex and
race were then used for identication. The weakness of this
system is that if the normalisation of an image is a bit off,
then the 12-measurements would be completely wrong.
2.2 Geometric methods
Geometric methods use angles and distances between the
selected points on ear as the discriminative properties,
particularly lengths of ear features measured at prescribed
angles. Chorus [4, 5] described two geometric methods of
recognition. In the concentric circle method (CCM), edge
detection was performed on a manually cropped ear image.
The centre of mass of these points was used as the centre of
several concentric circles with different radii. For each
radius, detected edges intersect the circle at multiple points.
From this, a feature vector was constructed with radius,
number of intersections and sum of linear distances
between the intersection points. In the contour-tracing
method (CTM), the number of endings, bifurcations, points
intersecting the circles and coordinates of these contour
features were determined for each contour. Choras
performed experiments using 12 subjects, with 20 images
per subject and obtained a recognition rate of 90% for
CCM and 97.5% for CTM.
Chorus [6] details three methods for ear recognition. First
one is the triangle ratio method (TRM), in which rst, the
system detects edges of a cropped ear image, and then it
uses the two farthest edge points from this ear image to
dene the maximal chord. These two points of the maximal
line were used to create two triangles. Thirteen properties of
these triangles, including the length of each side and height
of each triangle were used in the feature vector. Next
method is the shape ratio method (SRM), which uses the
ratio of contour length to the distance between contour
endpoints. The last method is angle-based contour
representation, where each contour was summarised as a
pair of angular differences. Measurements from TRM and
SRM achieved 100% rank-one recognition using 104
probe-gallery pairs, selected from a set of 80 subjects with
ten images each. The third method achieves 90.4%
rank-one recognition on the same 104 pairs of images.
Shailaja and Gupta [7] use a geometric method of
determining max-line similar to the maximal chord used by
Choras. Their system used n lines normal to this max-line
to divide the max-line into n + 1 identically spaced parts.
The recognition was carried out by extracting features in
two stages. The system rst compared the individual
distances between each angle in the rst feature vector of
the probe and gallery image, and then the number of
matching feature points in the second feature vector of both
images. Two images are said to be matched, if the matching
scores exceed a certain threshold value. On a set of 80
subjects, with 1 probe and 1 gallery image per subject, this
method gave an EER of 5% and a genuine acceptance rate
of 77% with 0.1% false acceptance rate (FAR). Another
structural feature extraction method was proposed by Mu
et al. [8] and named as long axis based shape and structural
feature extraction method. Shape feature vector of outer ear
and structural feature vector of inner ear form the local
feature vectors in this method. The unique feature vector
that was extracted by combining the two vectors was
matched against the template and a recognition rate of 85%
was obtained by this system.
2.3 Ensemble-based approach
Most of the ear biometric research makes use of entire ear
structure. Recently, a few methods have been developed
that analyse ear in parts named as ensembles. The method
divides ear into multiple regions, extracts features from
each region, compares features from the corresponding parts
of test and sample images and nally combines the results.
Results from previous literature show that ensemble-based
systems perform better than systems employing full ears.
It is to be noted that the size and locations of ensemble
Fig. 2 Illustration of Iannarellis system [2]
a Ear anatomy: (1) helix rim, (2) lobule, (3) anti-helix, (4) concha, (5) tragus,
(6) antitragus, (7) crus of Helix, (8) triangular fossa and (9) incisure
intertragica
b 12 Measurements used
Fig. 1 Structure of the ear
www.ietdl.org
98 IET Biom., 2013, Vol. 2, Iss. 3, pp. 97106
& The Institution of Engineering and Technology 2013 doi: 10.1049/iet-bmt.2012.0049
parts affect the recognition performance of a system.
Sometimes recognition needs to be performed with
occluded ears, with uncooperative subjects or in
uncontrolled situations. In particular, hair and jewelry may
occlude and modify the shape of ear. An ear-ring affects
smaller area, whereas hair covers a larger area by
surrounding the ear. Such scenarios can employ ensembles
of ear regions, which reduce errors because of occlusion by
localising them to certain parts, whereas other parts match
correctly. Additionally, an ensemble can weigh the parts
differently to suppress weaker parts and emphasise stronger
ones.
Burge and Burger [9, 10] suggested the use of thermogram
images to detect and mask the occlusions from ear image. The
method used the surface heat of subjects to create ear images.
Usually, hair occupies an ambient temperature between 27.2
and 29.7C, whereas ear ranges from 30 to 37.2C. So
removal of hair was carried out by segmenting lower
temperature areas. Yuan et al. [11] divided ear into three
separate subregions and trained feature spaces of each
subregion using a method of non-negative matrix
factorisation. This produced a set of thin images containing
only parts of ear and they achieved 93% rank-one
recognition. Then, this method was modied to perform
recognition on images with occlusion. The occluded dataset
contain four images from each of the 24 subjects: one
image with no occlusion and three images with occlusion
because of hair ranging from 10 to 40%. Matching scores
were determined by using the dot product between the
probe point and each gallery point in this space and scaled
by the probability that the region is not occluded. They
report higher recognition rates by using subregions than the
entire image.
Yuan and Mu [12] proposed an ear recognition approach
based on local information fusion. They evaluated images
that were partially occluded at the top, middle, bottom and
left or right parts of ear. First, a normalised ear image was
separated into 28 subwindows. Then, the concept of
neighbourhood preserved embedding was used for feature
extraction on each subwindow and the most discriminative
subwindows were selected according to the recognition
rates. These subwindows were combined by weighted
majority voting method for fusing the outputs at the
decision level. The recognition rate obtained shows a
superior performance of this multi-classier approach over
the whole ear-based methods.
2.4 Statistical methods
Statistical methods of recognition treat ear images as points in
a multi-dimensional space. Methods such as principal
component analysis (PCA) or independent component
analysis (ICA) are used to analyse the variance in
multi-dimensional space and reduce the number of
dimensions, while retaining the ability to discriminate
between points. Then, the comparison is done using
distance from a probe point to the gallery points in this
reduced space. Extraction of eigen-ears using PCA
technique reduces the size of images that need to be
validated. Eigen-ears dene the set of vectors that describe
the ear most. This method is initially complex, particularly
when dening the feature vectors, but later nding another
image describing ear features can be done easily.
Zhang et al. [13] used ICA and neural networks for ear
recognition. They extracted probe and gallery images
manually with no rotation. The system used a method of
fast-ICA to create a basis space from the gallery. The
radial-basis function neural network used the scores of
probe images as inputs. This system achieved rank-one
recognition of 94.11% on a set of 17 subjects using four
gallery images and two probe images per subject. Nosrati
et al. [14] applied a 2D wavelet transform to geometrically
normalised (aligned) ear images and found out the ear
features in three directions (horizontal, vertical and
diagonal). Feature sets from the three directions were then
combined to form a single feature matrix, using weighted
sum fusion. This technique allowed for changes in ear
images simultaneously along the three directions. PCA
technique was then applied to reduce the size of feature
matrix used in classication. They achieved a recognition
rate of 90.5% on the University of Science and Technology
Beijing (USTB) database.
Zhang and Mu [15] incorporated the original width/height
ratio of each ear into a classication. They manually rotated,
cropped ears in the dataset and then separated ears into ve
groups depending on their aspect ratio. Using both PCA
and ICA methods, they trained subspaces for each of these
ve groups. To perform recognition, rst they rotated and
cropped a probe ear, then performed recognition with PCA
or ICA. This yielded ve sets of results, which were fused
by support vector machine. The system was tested on three
dataset libraries. Library I contains 60 subjects with three
images per subject. From this, the system used two images
as gallery with one as a probe. By using PCA, the system
gave 85% recognition rate and by using ICA, it gave
91.67% rank-one recognition rate. Library II contains 77
subjects with four images per subject. With Library II, the
system used three images as gallery and one as a probe.
PCA achieved rank-one recognition rate of 81.82%,
whereas ICA had rank-one recognition rate of 92.21%.
Library III contains 17 subjects with six images per subject.
This method achieved a rank-one recognition rate of
94.12% with PCA and 100% with ICA.
Kumar and Wu [16] proposed an automated approach for
segmentation of the curved region of interest of ear by
using morphological operators and Fourier descriptors.
They investigated a new feature extraction approach by
adopting localised grey-level orientation extracted using
Gabor lters and also examined the local grey-level phase
information using complex Gabor lters. The experimental
results achieved average rank-one recognition accuracy of
96.27 and 95.93%, respectively, on the publicly available
database of 125 and 221 subjects. The performance of these
2D techniques is greatly affected by the pose variation and
imaging conditions. However, an ear can be imaged in
three-dimensional (3D) using a range sensor, which
provides a registered colour and range image pair [17].
A range image is relatively insensitive to illuminations and
it contains surface shape information related to the
anatomical structure, which makes it possible to develop a
robust 3D ear biometrics. Kumar and Passi [18] presented a
new approach for the adaptive combination of multiple
biometrics to dynamically ensure the desired level of
security. The method used a hybrid particle swarm
optimisation (PSO) to achieve an adaptive combination of
multiple biometrics from the matching scores. The
experimental results showed performance improvement and
conrmed that the score-level approach generated fairly
stable performance compared with decision-level approach.
It was shown that parameter tuning can be performed by
computing the optimal parameters (fusion rules, weights
and decision threshold) for every possible requirement of
www.ietdl.org
IET Biom., 2013, Vol. 2, Iss. 3, pp. 97106 99
doi: 10.1049/iet-bmt.2012.0049 & The Institution of Engineering and Technology 2013
input security level. Verication time of the method was
found to be comparable with any other non-adaptive
multi-modal biometric system.
From the above survey, it is found that the real challenge
lies in selecting the type of features along with the quality
of extracting features. As a geometrical approach gives
information about local parts of ear image, it is considered
to be more suitable feature extraction than the global
approach. Geometrical features bring out individual
distinctions better than colour or texture information.
Furthermore, computing geometrical parameters is fast and
therefore may be used in real-life applications. Hence, the
proposed work is based on geometric methods of feature
extraction.
3 Proposed ear-based system
A suitable way of nding the feature points from ear images
under partial occlusion is proposed here. Ear authentication
system generally comprises three stages namely
preprocessing, feature extraction and matching. In the
preprocessing stage, occlusion because of the presence of
ear rings and hair is removed. Then, the process of feature
extraction is implemented in two stages. In the rst stage,
features from outer ear shape are extracted and in the
second stage, features from the inner portion are extracted.
Distance between the corresponding feature sets of the
compared images exceeding a xed threshold are used for
nding the matching scores. Distance scores thus computed
from the inner and outer edges are combined by weighted
sum fusion. The proposed ear-based authentication system
is shown in Fig. 3 and the individual components are
discussed below.
3.1 Preprocessing
As occluded ear images pose challenge to effective ear
recognition, occlusion because of the presence of ear rings
at the bottom and hair at the top of ear need to be removed
for good results, which constitutes roughly about one-third
of the total area of the ear image. Thus, only the middle
part of ear image of size 151 146 is retained from the
original size of 400 300, as shown in Fig. 4b. The edges
of this portion of ear image are identied by using canny
edge detector [19]. Edge detection is performed to create an
edge map using gradient information. This method nds
edges by looking for local maxima of the gradient of input
image.
The gradient is calculated using the derivative of a
Gaussian lter and is given by (1) and (2).
gradient magnitude, G =
..........
G
2
x
+G
2
y
_
(1)
gradient angle, u = arctan
G
y
G
x
_ _
(2)
The method uses two thresholds to detect strong and weak
edges, and includes the weak edges in the output only if
they are connected to strong edges. This method is therefore
more robust to noise, and more likely to detect true weak
edges. The edges of ear image are represented in white and
the remaining area is made black by image binarisation.
Figs. 4ac illustrate this process on the image of an ear
sample.
3.2 Feature extraction based on hierarchical
approach
The proposed method extracts unique features from the
middle part of ear in two steps. The rst step extracts
features from the outer edge and the second step extracts
features from the inner edges by using a reference line and
a set of normal lines [7]. The reference line is the 45
slanting line drawn from the lower right corner towards the
upper left corner of ear, and normal lines are drawn
perpendicular to this reference line as shown in Figs. 4d
and e. The unique features extracted from the two stages are
represented in two vectors, named as rst feature vector V
1
and second feature vector V
2
, respectively.
The normal lines divide the reference line into (n + 1) equal
parts, where n is a positive integer and cn is the centre of the
reference line. Let p
1
, p
2
, p
3,
, p
n
be the intersecting points
of outer edge and normal lines as shown in Fig. 5. Let
i
be
the angle made by the line segment joining p
i
and cn and
the line segment joining u
i
and cn, where u
i
is the top left
Fig. 3 Block diagram of the ear authentication system
www.ietdl.org
100 IET Biom., 2013, Vol. 2, Iss. 3, pp. 97106
& The Institution of Engineering and Technology 2013 doi: 10.1049/iet-bmt.2012.0049
endpoint of the reference line. The vector V
1
has a xed
number of feature points as the number of normal lines are
xed by the system, that is, rst feature vector consists of n
angles corresponding to n points of the outer edge. The
second feature vector V
2
is similarly calculated, but the
number of points of interest varies unlike V
1
, as there are
more edges present inside the ear that intersect with normal
lines. The number of normal lines n should be properly
chosen, as small value of n does not produce good
recognition results. Whereas high value of n increases
accuracy of recognition, but it also increases the storage
requirements of feature vectors as well as the computation
time. Thus, the value of n is computed by trial and error
method and when n = 19, it is found to produce better
results [7].
3.3 Feature matching
In the rst stage of the proposed hierarchical approach,
matching process nds the angular distance between feature
points of two images and these values are summed up to
give the distance score of the images being compared. Let
V
11
= [
1
,
2
,
3,
...,
n
] and V
12
= [
1
,
2
,
3
, ...,
n
] be the
rst feature vectors of the two images being compared. The
difference, D
1
, between the two images can be calculated as
D
1
=

n
i=1
u
i
a
i

(3)
where | | represents the absolute value.
In the second stage, since the number of feature points
inside the ear vary, it is represented as y and z for the two
images being compared and it is assumed that y > z. Let
V
21
= [
1
,
2
,
3
, ...,
y
] and V
22
= [
1
,
2
,
3
, ...,
z
] be the
feature vectors of the two images. Let m
2
be the number of
points that are matched between two images, for a specic
normal line and it is calculated as
m
2
=

y
i=1

z
j=1
b
i
f
j

(4)
The number of matching points, m
2
is rst initialised to zero.
Then, based on the difference of pairs of points on the two
images, m
2
is updated as
m
2
= m
2
+1, if |b
i
f
i
| , threshold
= m
2
, if |b
i
f
i
| . threshold
(5)
This process is repeated for all feature points corresponding
to each normal line in the ear image. So, two points are
said to be matched, if their angles are nearly the same with
respect to each normal line. As the size of the second
feature vector is not xed, the percentage of matched
points, M
2
, is calculated as
M
2
= m
2
/ min v
1
, v
2
_ _
(6)
where v
1
, v
2
are the size of second feature vectors of the two
images. The value of M
2
represents the total number of
matching points between the two images. These scores are
then converted into distance scores, D
2
by subtracting them
from the highest matching score, so that D
2
can be
combined with D
1
scores of the rst stage.
4 Fusion of distance scores
Initially the verication rates are found out separately by
using the matching scores obtained through the two stages
of feature extraction. Then, the nal distance score, s
ear
, is
obtained by adding the distances D
1
and D
2
by using
weighted sum fusion that is given by
s
ear
= w
1
D
1
+w
2
D
2
(7)
where w
1
and w
2
are the weights assigned to scores of the two
stages of feature extraction. Equal weights are applied as the
two stages produce almost equal verication rates.
5 Optimisation of fusion parameters using
PSO
PSO is an evolutionary search algorithm motivated from the
social behaviour of a ock of birds trying to y to a
favourable environment. PSO is employed to nd the
solution for the adaptive selection of combination of
individual points, which are referred to as the particles in
multi-dimensional search space [20]. Each particle
(representing a bird in the ock), characterised by its
position and velocity, represents the possible solution in the
search space. The behaviour of particles in PSO imitates the
way in which birds communicate with each other while
ying. During this communication, each bird reviews its
new position in space with respect to the best position it
has covered so far. Birds in the ock also identify the bird
that has reached the best position. On knowing this
information, other birds update their velocity (that depends
on a birds local best position as well as position of the best
Fig. 4 Illustration of the feature extraction process
a Input image
b Cropped image
c Edge image
d Image with a reference line
e Image with normal lines
Fig. 5 Cropped ear image with reference and normal lines
www.ietdl.org
IET Biom., 2013, Vol. 2, Iss. 3, pp. 97106 101
doi: 10.1049/iet-bmt.2012.0049 & The Institution of Engineering and Technology 2013
bird in the ock) and y towards the best bird. The process of
regular communication and updation of velocity repeats until
the ock nds a favourable position.
PSO algorithm is implemented to nd the optimal
threshold that maximises accuracy. Each particle in PSO
stores the best position visited by it so far, called pbest
(local best position) in its memory. The particle also
interacts with all neighbours and stores the best position
visited by any particle in the search space and experiences a
pull towards this position called gbest (global best
position). The values of pbest and gbest are updated after
each iteration, if a more dominating solution is found by
the particle and by the population, respectively. This
process continues iteratively until either the desired result is
achieved or the computational power is exhausted [18].
Each particle in the k-dimensional space is dened as X
a
=
(x
a1
, x
a2
, , x
ak
), where the subscript a is the particle number
and the second subscript denotes the dimension. The previous
best position is represented as P
a
= ( p
a1
, p
a2
, , p
ak
) and
velocity along each dimension as V
a
= (v
a1
, v
a2
, , v
ak
).
The particles move to a new position in multi-dimensional
solution space depending on the particles best position
[local best position ( p
ak
) and global best position ( p
gk
)].
The positions p
ak
and p
gk
are updated after each iteration,
whenever a suitable lower cost solution is located by the
particle. The velocity vector of each particle determines the
movement details. The velocity and position update (8) and
(9) of a particle of the PSO for the instance (t + 1) can be
represented as follows
n
ak
t +1 ( ) = en
ak
(t) +c
1
r
1
r
ak
(t) x
ak
(t)
_ _
+c
2
r
2
r
gk
(t) x
ak
(t)
_ _
(8)
x
ak
(t +1) = x
ak
(t) +n
ak
(t +1) (9)
where is the inertia weight and ranges between 0 and 1 that
provides a balance between global and local search abilities of
the algorithm and controls the memory of the particle. The
accelerator coefcients c
1
and c
2
are positive constants and
r
1
and r
2
are two random numbers in the range (01),
usually assigned the same value to give equal weights and
they determine the relative inuence of pbest and gbest.
Also the particles are initialised randomly between 0 and 1.
The weights represented by particles are applied to the
training data and errors are found by computing FAR and
FRR, and then calculating the tness of each particle. The
error rate E is given by
E = C
1
FAR +C
2
FRR (10)
where C
1
is the cost of falsely accepting an imposter, C
2
is the
cost of falsely rejecting the genuine individual, FAR is the
global false acceptance rate and FRR is the global false
rejection rate. Equation (10) can be minimised by choosing
the appropriate decision threshold and can be rewritten in
terms of single cost as
C
2
= 2 C
1
(11)
which makes (10) as
E = C
1
FAR + 2 C
1
_ _
FRR (12)
Equation (12) results from the fact that global error is simply a
sum of the two error rates FAR and FRR, if cost is ignored,
that is, the costs are assumed to be 1. In this algorithm, rst
FAR is evolved for each trait and then thresholds are
calculated based on the FAR and updated. The global FAR
and FRR for the optimum threshold are then calculated
from the individual values of FAR and FRR.
6 Experimental results and discussion
The proposed method uses USTB database containing 308
ear images of 77 persons [21]. The images are taken under
Fig. 6 Different stages of ear image verication
www.ietdl.org
102 IET Biom., 2013, Vol. 2, Iss. 3, pp. 97106
& The Institution of Engineering and Technology 2013 doi: 10.1049/iet-bmt.2012.0049
different illuminations and angles. From this set, 280 images
are taken up for the analysis and 1000 intra and inter class
comparisons are made. Different stages of the proposed ear
authentication system are illustrated in Fig. 6.
Distance scores obtained out of the comparison of images
are plotted in Fig. 7, which shows that fusion of scores
produce more separation between intra and inter classes
than that of the individual scores. In addition, most of the
genuine scores obtained from features of the outer ear are
found to occupy a smaller range compared with that of the
inner ear. Similarly imposter scores obtained from the outer
ear show wider variation than inner ear. These observations
indicate the presence of more unique features in the outer
ear shape.
Performance of the two stages of feature extraction and
their fusion is shown in the receiver operating
characteristic (ROC) graphs of Fig. 8, which is plotted by
nding genuine acceptance, false acceptance and false
rejection rates using the distance scores. Here, equal
weights are used for combining the scores from the two
stages.
Table 1 compares the performance of the proposed system,
when it is applied on the complete ear image and on the
middle portion of ear. When the proposed system is
implemented on a system with Intel core 2 Duo processor
T6600 operating at 2.20 GHz and 3 GB RAM, it is found
that the computation time with a feature set from the middle
portion of the ear is reduced to half of the actual time taken
for processing the complete ear image. It is also seen that
verication rate has signicantly improved when the two
stages of feature extraction are combined by weighted sum
fusion.
Fig. 7 Inter and intra-class variations for scores
a First stage of feature extraction
b Second stages of feature extraction
c Fusion of scores of these two stages
www.ietdl.org
IET Biom., 2013, Vol. 2, Iss. 3, pp. 97106 103
doi: 10.1049/iet-bmt.2012.0049 & The Institution of Engineering and Technology 2013
Table 2 shows the verication rates obtained by varying the
weights applied to the scores from the two stages of feature
extraction. It is observed that optimal threshold values
found by PSO enhance the accuracy of the system as shown
in the last column of the table. It is also seen that the
weight combination w1 = 0.6 and w2 = 0.4 gives the highest
recognition accuracy. Thus, the use of PSO gives better
performance results by nding the optimal threshold for the
given weight values.
Role of partial ear images with varying levels of
occlusion is studied by taking an ear image from the
USTB database. A portion of the ear image is removed by
cropping (from 10 to 50% of the area) the top and bottom
locations and is shown in Fig. 9. Each of these partial
images is separately fed into the proposed system and the
authentication accuracy is found out.
The experimental results listed in Table 3 demonstrate that
the verication rate is considerably higher up to 30%
occluded ear images. But when 40% of ear portion is
removed from the top and bottom put together, it is found
that the verication rate drops abruptly. The result shows
that partial ear images can reduce the effect of noise
(because of hair and ear rings) to some extent, especially
for slighter occlusion. But when the occlusion is heavier, it
degrades the performance of the system.
Performance of the proposed method with two stages of
feature extraction is compared with existing methods and is
shown by the ROC graphs in Fig. 10. As seen from the
graph, the proposed fusion method performs well, in spite
of the lesser performance of the individual stages of fusion
compared with other methods.
Fig. 9 Partial ear image with varying levels of occlusion
a Original image
b 10%
c 20%
d 30%
e 40%
f 50%
Table 1 Performance comparison of the proposed ear
authentication system
Method used Verification
rate, %
Error
rate
Time
taken, s
feature extraction using
complete ear image
95.6 0.055 40.24
feature extraction from
middle ear (stage-1)
93.2 0.07 22.82
feature extraction from
middle ear (stage-2)
91.5 0.077 23.34
fusion of scores from
stages 1 and 2
98.3 0.03 24.52
Table 3 Effect of partial ears on authentication accuracy
Portion of ear removed (top and
bottom), %
10 20 30 40 50
Verification rate, % 99.6 98.8 98.3 86.4 71.6
Fig. 8 ROC curves for ear scores plotted between
a FAR and GAR
b FAR and FRR
Table 2 Performance comparison of the proposed system
before and after applying PSO
Fusion rule
applied
Weights
used
(w1, w2)
GAR
(before
PSO), %
Optimal
threshold
GAR
(after
PSO), %
weighted
sum
(0.5, 0.5) 98.3 0.798 98.9
(0.6, 0.4) 98.6 0.891 99.2
(0.4, 0.6) 97.8 0.674 98.4
(0.7, 0.3) 98.1 0.921 98.8
(0.3, 0.7) 96.5 0.562 97.3
www.ietdl.org
104 IET Biom., 2013, Vol. 2, Iss. 3, pp. 97106
& The Institution of Engineering and Technology 2013 doi: 10.1049/iet-bmt.2012.0049
Table 4 compares the performance of the proposed method
with a few existing methods of ear recognition. As it can be
seen, the proposed hierarchical method gives enhanced
performance, which is because of the removal of occlusion
in the ear image and the fusion of scores of the two stages
of feature extraction.
7 Conclusion
A two-stage geometric approach that is both scale and rotation
invariant is adopted in this work for extracting the unique
features of an ear image. As occlusion because of ear rings
and hair signicantly affect the efciency of ear recognition
process, only the middle portion of ear is considered in the
analysis, which is normally free from occlusion. It is found
that the computation time is reduced by 40%, as the size of
the ear image used for authentication is reduced to half of
its original size. The results also show an increase in the
verication rate with partial ear images, which signies the
presence of more unique features in the middle part of ear
than in other parts. Finally, particle swarm optimisation
technique is applied on the matching scores in order to
optimise the fusion parameters such as decision threshold
and weights. Use of this optimal fusion rule results in
further improved verication rates compared with the fusion
of scores without optimisation. Thus, the fusion of distance
scores obtained from two-stage feature extraction is found to
enhance the system accuracy level signicantly.
8 Acknowledgment
The authors acknowledge the University of Science and
Technology Beijing (USTB) for providing the ear image
database for this research work.
9 References
1 Stan, Z.L. (Ed.): Encyclopedia of biometrics (Springer, 2009), Vol. 2
2 Iannarelli, A.: Ear identication, forensic identication series
(Paramount Publishing Company, Freemont, CA, 1989)
3 Pug, A., Busch, C.: Ear biometrics: a survey of detection, feature
extraction and recognition methods, IET Biometrics, 2012, 1, (2),
pp. 114129
4 Choras, M.: Ear biometrics based on geometrical feature extraction,
Electron. Lett. Comput. Vis. Image Anal., 2005, 5, (3), pp. 8495
5 Choras, M.: Ear biometrics in passive human identication systems.
Proc. Sixth Int. Workshop on Pattern Recognition in Information
Systems, 2006, pp. 20042009
6 Choras, M.: Image feature extraction methods for ear biometrics a
survey. Sixth Int. Conf. on Computer Information Systems and
Industrial Management Applications, 2007, pp. 261265
7 Shailaja, D., Gupta, P.: A simple geometric approach for ear recognition.
Proc. Ninth Int. Conf. Information Technology, 2006, pp. 164167
8 Mu, Z., Yuan, L., Xu, Z., Xi, D., Qi, S.: Shape and structural feature
based ear recognition (Springer Berlin, Heidelberg, 2004) (LNCS,
3338), pp. 663670
9 Burge, M., Burger, W.: Ear biometrics in computer vision. Proc. 15th
Int. Conf. on ICPR, Barcelona, Spain, 2000, Vol. 2, pp. 2822
10 Burge, M., Burger, W.: Biometrics: personal identication in
networked society. Chapter Ear Biometrics, Kluwer Academic,
Boston, 1999, pp. 273285
11 Yuan, L., Mu, Z., Xu, Z.: Using ear biometrics for personal
recognition, in Li, S.Z., Sun, Z., Tan, T., Pankanti, S., Chollet, G.,
Zhang, D. (Eds.): IWBRS 2005 (Springer, Heidelberg, 2005)
(LNCS, 3781), pp. 221228
12 Yuan, L., Mu, Z.C.: Ear recognition based on local information fusion,
Pattern Recognit. Lett., 2012, 33, (2), pp. 182190
13 Zhang, H., Mu, Z., Qu, W., Liu, L.-M., Zhang, C.-Y.: A novel approach
for ear recognition based on ICA and RBF network. Proc. Int. Conf. on
Machine Learning and Cybernetics, 2005, Vol. 7, pp. 45114515
14 Nosrati, M., Faez, K., Faradji, F.: Using 2D wavelet and principal
component analysis for personal identication based on 2D ear
structure. Proc. Int. Conf. on Intelligent and Advanced Systems,
Kuala Lumpur, Malaysia, November 2528, 2007, pp. 616620
15 Zhang, H., Mu, Z.: Compound structure classier system for ear
recognition. Int. Conf. on Automation and Logistics, Qingdao,
September 13, 2008, pp. 23062309
Fig. 10 Comparison of ROC curves for existing and the proposed methods
Table 4 Performance comparison of existing ear recognition
systems with the proposed system
Authors Number of
images
taken
Database
used
Verification
rate, %
Prakash et al. [22] 150 IITK 95.2
Kumar and Wu [16] 465 IITD 96.3
Shailaja and Gupta
[7]
160 Self dataset 95
Kisku et al. [23] 102 IITK 93.4
proposed method
(with 30% ear
occlusion)
280 USTB 98.3
proposed method
(with optimised
parameters)
280 USTB 99.2
www.ietdl.org
IET Biom., 2013, Vol. 2, Iss. 3, pp. 97106 105
doi: 10.1049/iet-bmt.2012.0049 & The Institution of Engineering and Technology 2013
16 Kumar, A., Wu, C.: Automated human identication using ear
imaging, Pattern Recognit., 2012, 45, (3), pp. 956968, doi:
pi10.1016/j.patcog.2011.06.005
17 Zhou, J., Cadavid, S., Abdel-Mottaleb, M.: An efcient 3-D ear
recognition system employing local and holistic features, IEEE Trans.
Inf. Forensics Sec., 2012, 7, (3), pp. 978991
18 Kumar, A., Passi, A.: Comparison and combination of iris matchers
for reliable personal authentication, Pattern Recognit., 2010, 23, (3),
pp. 10161026
19 Gonzalez, R.C., Woods, R.E.: Digital image processing (Prentice-Hall,
Upper Saddle River, NJ, 2008, 3rd edn.)
20 Eberhart, R.C., Kennedy: Swarm Intelligence (Morgan Kaufmann, San
Diego, CA, 2001)
21 Ear recognition laboratory at university of science and technology
Beijing, http: //www.ustb.edu.cn/resb/en/index.htm (visited on 20 May
2010)
22 Prakash, S., Jayaraman, U., Gupta, P.: Ear localization from side face
images using distance transform and template matching. Proc. IEEE
Int. Workshop on Image Processing Theory, Tools and Applications,
Sousse, Tunisia, November 2008, pp. 18
23 Kisku, D.R., Gupta, P., Mehrotra, H., Sing, J.K.: Multimodal belief
fusion for face and ear biometrics, Intell. Inf. Manage. Sci. Res.,
2009, 1, (3), pp. 166171
www.ietdl.org
106 IET Biom., 2013, Vol. 2, Iss. 3, pp. 97106
& The Institution of Engineering and Technology 2013 doi: 10.1049/iet-bmt.2012.0049

Вам также может понравиться