Вы находитесь на странице: 1из 12

High Speed Fa

e Re ognition Based on Dis rete


Cosine Transforms and Neural Networks
Zhengjun Pan Hamid Bolouri
S ien e & Te hnology Resear h Centre, and
Department of Computer S ien e
Fa ulty of Engineering and Information S ien es
University of Hertfordshire
Hat eld, Herts, AL10 9AB, UK
fZ.Pan, H.Bolourigherts.a .uk
September 27, 1999

Abstra t

High information redundan y and orrelation in fa e images result in ineÆ ien ies when su h images
are used dire tly for re ognition. In this paper, dis rete osine transforms are used to redu e image
information redundan y be ause only a subset of the transform oeÆ ients are ne essary to preserve the
most important fa ial features su h as hair outline, eyes and mouth. We demonstrate experimentally that
when DCT oeÆ ients are fed into a ba kpropagation neural network for lassi ation, a high re ognition
rate an be a hieved by using a very small proportion of transform oeÆ ients. This makes DCT-based
fa e re ognition mu h faster than other approa hes.
Key words: Fa e re ognition, neural networks, feature extra tion, dis rete osine transform.

1 Introdu tion
High information redundan y present in fa e images results in ineÆ ien ies when these images are used
dire tly for re ognition, identi ation and lassi ation. Typi ally one builds a omputational model to
transform pixel images into fa e features, whi h generally should be robust to variations of illumination, s ale
and orientation, and then use these features for re ognition. Several te hniques for fa ial feature extra tion[3℄
have been proposed. They in lude methods based on geometri al features, statisti al features[12℄ , feature
points[11; 14; 30℄ and neural networks[17; 18; 21; 31; 32℄ .
The geometri al approa h represents fa es in terms of stru tural measures that in lude parameters su h
as ratios of distan es, angles, and areas between elementary features su h as eyes, nose, mouth or fa ial
templates su h as nose width and length, month position and hin type[1℄ . Then the features are used to
re ognise unknown images mainly by mat hing to the nearest neighbour in the stored database, although
neural networks are also used by some resear hers as a nonlinear lassi er[20℄ . The system performan e
depends on the normalisation used to determine the head lo ation, translation, rotation and s ale.
Statisti al features are usually generated by algebrai methods su h as prin ipal omponents analysis
(PCA)[15; 16; 27℄ , or the losely related Karhunen-Loeve transform[10; 28℄ , or singular value de omposition[8℄ .
These features are in the form of a set of orthogonal bases su h as prin ipal omponents, or the eigenve tors
(referred to as eigenfa es). On e the eigenve tors are hosen, any image in the gallery (set of training images)
an be approximately re onstru ted with a linear ombination of eigenfa es, and their omponents will be
stored in memory. For an unknown image, its omponents are al ulated by proje ting it to the fa e spa e
(spa e generated by eigenfa es) and looking for the losest mat h.
The well known method of Gabor jet with graph mat hing and Dynami Link Ar hite ture (DLA) extra ts
feature points from a fa e image by nding optimum points to Gabor lter response. Re ognition is arried
out by graph mat hing to nd the losest stored graph in the database[11; 14℄ . As the mat hing pro ess has

1
to ompare an image with all the fa es in the database, its re ognition ost might be too expensive to be
pra ti al for large databases and real time appli ations.
As for neural network re ognition systems, due to the diÆ ulty of sele ting a representation that ould
apture features robustly, most approa hes avoid the feature extra tion pro edure by feeding the pixel images
dire tly to neural networks and making use of the ability of neural networks as an information pro essing
tool[18℄ . Nevertheless, Lawren e et al.[13℄ applied self-organising map (SOM) as a feature extra tor and then
the generated features were exploited as the input of a onvolutional neural network for re ognition, a mu h
similar ar hite ture to neo ognitron[19℄ . Training either the SOM or the onvolutional neural network are
tremendously omputationally expensive.
Re ognition systems are mainly ompared on the basis of their re ognition rate, training time and re og-
nition time. The re ognition rate is the most important index of a re ognition system. The smaller the
training time, the more resour es will be available to exploit other te hniques to improve the performan e,
for example, instead of one MLP one an apply ensemble or bootstrapping te hniques to redu e the gener-
alisation error, or self-renovate the system after mis lassi ation. The requirement of real time appli ations
is the short re ognition time. For example, one ould not ask everybody to wait minutes to pass an a ess
ontrol system.
For the fa e re ognition te hniques mentioned above, most approa hes have to build a database to store
the features from the known fa es in order to ompare the features extra ted from an unknown image with
that in the database, while others, like onvolutional neural network approa h, are too training expensive.
In this paper, we present a new approa h to high speed fa e re ognition using dis rete osine transforms
(DCTs) as a way of information pa king. Redundan y removal to fa ilitate data pro essing and image
ategorisation is not a new idea[2; 24℄ . For example, the Karhunen-Loeve transform (KLT) is widely applied
to image analysis for dimensionality redu tion[3; 9; 15; 28℄ . Although the KLT, not the DCT, is the optimal
transform in an information pa king sense[9℄ , we will apply dis rete osine transforms instead of KLTs to fa e
images. That is be ause KLT is data dependent and obtaining the KLT basis images, in general, is a nontrivial
omputational task, whereas there exist fast algorithms to ompute 2D dis rete osine transforms[4℄, whi h
makes DCTs extremely ompetitive in terms of omputational omplexity.

2 Information Pa king
2.1 Dis rete osine transform
The osine transform, like the Fourier transform, uses sinusoidal basis fun tion. The di eren e is that the
osine transform basis fun tions are not omplex; they use only osine fun tions without sine fun tions. The
dis rete osine transform of an N  M image f (x; y ) is de ned by
h
X1 N
X1 (2x + 1)u i h (2y + 1)v i
p 2  (u) (v)
M

C (u; v) = f (x; y) os os (1)


MN x=0 y =0
2M 2N

for u = 0; 1; 2;    ; M 1; v = 0; 1; 2;    ; N 1, and the inverse transform is de ned by


h
X1 N
X1 (2x + 1)u i h (2y + 1)v i
p2 
M

f (x; y) = (u) (v)C (u; v) os os (2)


MN u=0 v =0
2M 2N

for x = 0; 1; 2;    ; M 1; y = 0; 1; 2;    ; N 1, () is de ned as


8
>
< p1 ; for w = 0;
(w) = 2
>
:1; otherwise:
Sin e the basis images of DCT are input independent and its information pa king ability losely approxi-
mates the optimal KLT[7℄ , most pra ti al transform oding systems are based on the DCT, whi h provides a
good ompromise between information pa king ability and omputational omplexity. In fa t, the properties

2
of DCT have proved to be of su h pra ti al value that it has be ome an international standard as the Joint
Photographi Experts Group (JPEG) image ompression method. In these standards, a two-dimensional
DCT is applied to 8  8 blo ks of pixels in the image. The 64 (8  8 = 64) oeÆ ients produ ed by the DCT
are then quantized to provide the nal ompression. DCTs have also been su essfully used to generate keys
for image retrieval from a large database[26℄ .
Compared to other transforms, DCT has the advantages of having been implemented in a single integrated
ir uit be ause of its input independen y, pa king the most information into the fewest oeÆ ients for most
natural images, and minimizing the blo klike appearan e, alled blo king artifa t, that results when the
boundaries between subimages be ome visible. This last property is parti ularly important in omparisons
with the other sinusoidal transforms.
Another advantage of DCT is that most DCT oeÆ ients on real world images turn out to be very small
in magnitude, espe ially as u and v approa h the image/subimage width and height respe tively. Trun ating,
or removing these small oeÆ ients from the representation introdu es only small errors in the re onstru ted
image. For some image lasses (e.g. fa es), most of the information exists in the oeÆ ients with small u
and v (i.e., the upper-left orner of Figure 1(b)). This hara teristi simpli es optimum oeÆ ient sele tion
for appli ations su h as image re ognition.

2.2 How many oeÆ ients should we use?


To get an impression that a fa e an roughly be re onstru ted by only a few oeÆ ients, we give several
error measures to ompare the di eren e between two images. The ommonly used error measures are the
mean square error (mse) and the peak signal-to-noise ratio (psnr)[29℄ . Suppose f (x; y ) and f^(x; y ) are the
original and the re onstru ted images, respe tively. Then the mean-square error is de ned by
1 M
X1
X1 N 2
mse = f (x; y) f^(x; y) (3)
MN x=0 y =0

and the psnr is de ned by


(L 1)2
psnr = 10  log10 (4)
mse
where L is the number of gray levels, e.g., L = 256 for 8-bit images.
Figure 1 shows a 92  112 8-bit fa e image in (a) and the log magnitude of its dis rete osine transform
in (b) via the following transfrom:
log(1 + 0:01  jC (u; v )j)
C^ (u; v) = 255  :
log(1 + 0:01  maxfjC (u; v )jg)
u;v

As shown in Figure 1(b), the DCT oeÆ ients with large magnitudes (lighter pixels) are mainly on entrated
in the upper-left orner ( orresponding to low spatial frequen y DCT omponents in the image). The
remaining oeÆ ients are very small (almost bla k). Figures 2(a), (b), ( ), (d) and (e) show re onstru tions of
Figure 1(a) using 35, 100, 500, 1000 and 2500 DCT oeÆ ients respe tively (from a total of 92  112 = 10; 304
available DCT oeÆ ients). In ea h ase, the oeÆ ients were sele ted by starting at the top left and s anning
the spe i ed number of oeÆ ients in the order illustrated in Figure 1( ). The re onstru ted images were
obtained by setting the remaining oeÆ ients to zero before taking the inverse DCT transform. The errors
of the re onstru ted images against the original image and the per entage of DCT oeÆ ients used are
illustrated in Table 1. As illustrated in Figure 2(a), using only 35 oeÆ ients , i.e. 0:34% of the full set is
suÆ ient to allow one to re ognise the image as a fa e. The experiments in Se tion 5 demonstrate that in
fa t these few oeÆ ients have most of the information ne essary for fa e re ognition.

2.3 Subimage size sele tion


In most appli ations, images are subdivided so that the orrelation (redundan y) between adja ent subimages
is redu ed to some a eptable level. The subimage size is a signi ant fa tor a e ting the re onstru tion error

3
b

a
( ) ( )b
( )

Figure 1: A 92  112 8-bit fa e image and the log magnitude of its dis rete osine transform.

a
( ) b
( )
( ) d
( ) e
( )

f)
( g
( ) (h) ( ) i j
( )

Figure 2: E e t of in reasing the number of oeÆ ients on re onstru ted images: (a), (b), ( ), (d) and (e) the
re onstru ted images using 35, 100, 500, 1000, 2500 oeÆ ients of the dis rete osine transform of image in
Figure 1(a) respe tively; (f ), (g ), (h), (i) and (j ) are the orresponding s aled di eren es of the re onstru ted
images to the original image; the orresponding errors and the per entage of dis rete osine transform oeÆ ients
used are illustrated in Table 1.

no of oeÆ ients mse psnr %total oeÆ ients


35 338.08 22.84 0.34
100 211.84 24.87 0.97
500 85.22 28.83 4.85
1000 49.94 31.15 9.70
2500 17.75 35.64 24.26
Table 1: Errors of the re onstru ted images against the original image.

and omputational omplexity. In general, the subimage size should be an integer power of 2, to simplify
the omputation of the subimage transform[4℄ . If the image dimensions do not divide by the subimage size,
the image may be zero-padded to the next multiple of that.
Figures 3 and 4 illustrate graphi ally the impa t of subimage size on re onstru tion error. The data
plotted were obtained by dividing the image of Figure 1(a) into subimages of size n  n, for n=4, 8, 16,
1
32, 64, and then re onstru ting the image using only of the resulting oeÆ ients. The orresponding
16

4
numbers of blo ks to these subimage sizes are 23  28, 12  14, 6  7, 3  3, 2  2 respe tively. Generally, it is
true that both the level of ompression and omputational omplexity in rease as subimage size in reases[7℄.
However, the re onstru tion error rea hes the optimum when the subimage size is 16  16 in our ase. The
reason is that our image size is not a power of 2. We have to zero-pad some subimages whi h may introdu e
re onstru tion errors for large subimage size.

a
( ) b
( )
( ) d
( ) e
( )

(f) g
( ) (h) i
( ) j
( )

Figure 3: Illustration of the e e t of the size of subimages on re onstru ted images: (a), (b), ( ), (d) and (e)
the re onstru ted images by dividing image in Figure 1(a) into subimages of size 4  4, 8  8, 16  16, 32  32,
1
64  64, respe tively and then retaining (6.25%) of the DCT oeÆ ients; (f ), (g ), (h), (i) and (j ) are the
16
orresponding s aled di eren es of the re onstru ted images to the original image; the orresponding errors are
illustrated in Table 4.

28
bc
mse c
b
200
psnr b
peak signal-to-noise ratio
mean square error

b
27
180
b

c
b
b

160
26
b
c
b

140 c
b

c
b
b
25
120
4 4 8 8 16 
16 32  32 64  64
subimage size

Figure 4: Re onstru tion error versus subimage size.

5
3 System des ription
The main idea of our approa h is to apply the DCT to redu e the information redundan y and to use the
pa ked information for lassi ation. For a fa e image, the system rst omputes the DCT oeÆ ients of the
image or its subimages, then sele ts only a limited number of the oeÆ ients and feeds them as input into a
lassi er, here a multi-layer per eptron (MLP). DCT omputation and subimage division are performed in
the manner des ribed in se tion 2. A diagrammati des ription of our DCT-based system for fa e re ognition
is shown in Figure 5.

oeÆ ient sele tion bipolar winner-take-all


Images DCT Data representation MLP lassi ation
2d ! 1d
Figure 5: A diagrammati des ription of our DCT-based system for fa e re ognition.

3.1 CoeÆ ient sele tion


The oeÆ ient allo ation method is usually xed in the system after determination and applied to all
images/subimages. The lo ation of the transform oeÆ ients retained for ea h image/subimage remains
un hanged from one image to another, similar to the zonal mask allo ation method in image ompression
literature[7℄ . In our experiments, we also tried the threshold mask allo ation[7℄ method, whi h sele ts the
transform oeÆ ients on the basis of magnitude instead of lo ation. Although the transform oeÆ ients of
largest magnitude make the most signi ant ontribution to the re onstru ted image quality, their lo ations
are not easy to re ord sin e they vary from one image to another. Therefore, threshold mask allo ation
requires the storage of one set of oeÆ ient lo ations per image. As a result, for the same total number
of oeÆ ients stored, threshold mask allo ation results in mu h poorer re ognition performan e than our
method (approximately 30% worse in our experiments).
The s anning strategy for onverting the two-dimensional DCT array into a one dimensional array (for
feeding into the MLP) an a e t the lassi ation rate. Alternate diagonal lines or reverse diagonal lines in
turn are alternatives to the method illustrated in Figure 1( ). These s anning strategies are all based on the
fa t that most of the image information is on entrated in the upper-left orner of the DCT transform as
demonstrated in Figure 1(b).

3.2 Data representation


After the number and lo ations of transform oeÆ ients are sele ted, the sele ted oeÆ ients are arranged
in one dimensional format and are fed into a lassi er for re ognition. The lassi er used in our system is
a feedforward neural network with only one hidden layer. A qui k ba kpropagation algorithm[25℄ is used as
the training algorithm.
Learning is improved by representing the input and output in bipolar form[6℄ . In our approa h, the
number of outputs in the MLP is the number of fa e subje ts. The target outputs are set to bipolar form,
i.e., if the training sample is lassi ed as the i-th subje t, then the target output is 1 for the output neuron
oi and 1 for other all output neurons.
BP theory suggests that the training pro edure an be speeded up if all input and output ve tors are in
the same range. However, DCT oeÆ ients in di erent lo ations usually have di erent orders of magnitude.
If we onvert them into the domain [ 1; 1℄ with one fa tor for all lo ations, some of them ould be very
small. This irregularity ould make the training of the neural network very hard. Hen e, the transform
onverting a oeÆ ient into [ 1; 1℄ should be di erent from one oeÆ ient to another. However, the upper
bound and the lower bound of oeÆ ients in ea h lo ation an not be determined from the training images
sin e the orresponding oeÆ ients of unknown images ould ex eed those bounds.
In our approa h, we use the training images to estimate the upper and lower bounds in the following way.
First, it is not be applied to lo ations whose oeÆ ients are already in the range [ 1; 1℄. This avoids over
emphasis of the importan e of small oeÆ ients. The remaining terms are uniformly s aled to [ 1; 1℄ using

6
the global maximum and minimum. In this way, even for unknown images, the oeÆ ients are onverted
roughly into [ 1; 1℄. To formulate this idea, suppose x1 ; x2 ;    ; xn are the oeÆ ients retained from the
oeÆ ient sele tion pro edure and f(x(1j ) ; x(2j ) ;    ; x(nj ) ); j = 1; 2;    ; pg are oeÆ ients retained from the
training images, where n is the number of DCT oeÆ ients retained and p is the number of training images.
Then the upper bounds (bi ) and lower bounds (ai ) an be determined by

b =  maxf1; x(1); x(2) ;    ; x(


i i i i
p)
g; i = 1; 2;    ; n; (5)
and
a =  minf 1; x(1); x(2) ;    ; x(
i i i i
p)
g; i = 1; 2;    ; n: (6)

Where > 1 is a fa tor to extend the bounds. Then the input ve tors f(z1(j ) ; z2(j ) ;    ; zn(j ) ); j = 1; 2;    ; pg
of the neural network ould be determined by

x( ) a j

z( ) = 2 
j i i
1; i = 1; 2;    ; n: (7)
i
b a i i

For an unknown image, the s aling fa tors obtained for the training set are applied to the retained oeÆ ients
to obtain the input ve tor to the MLP.

4 ORL database
The ORL database was built at the Olivetti Resear h Laboratory in Cambridge, UK and is available free
of harge from http://www. am-orl. o.uk/fa edatabase.html. The database onsists of 400 di erent
images, 10 for ea h of 40 distin t subje ts. There are 4 female and 36 male subje ts. For some subje ts, the
images were taken at di erent times, varying the lighting, fa ial expression (open/ losed eyes, smiling/not
smiling) and fa ial details (glasses/no glasses). All the images were taken against a dark homogeneous
ba kground with the subje ts in an upright, frontal position with toleran e for limited side movement and
limited tilt up to about 20 degrees. The size of ea h image is 92  112 pixels, with 256 grey levels per pixel.
Thumbnails of all images an be viewed from http://www. am-orl. o.uk/fa esataglan e.html.

5 Simulations
5.1 Experimental Setup
In the following experiments, the weights and biases of the MLP are initialised to random values in [ 0:5; 0:5℄.
Three learning parameters max ; 0 , and de ay used in Qui kprop[5; 25℄ are set to 0.02, 0.008, 0.0001, respe -
tively. The maximum number of training epo hs is 1000. The multipli ation fa tor in Equations 5 and 6 is
set to 1.1. No attempt was made to optimise these parameters. To redu e the in uen e of the presentation
order of training samples, for every training loop, the training samples were shued on e randomly. The
neural network used is a multi-layer per eptron with one hidden layer. For the ORL database, the number
of outputs of the MLP is always 40 and a winner-take-all strategy was used for lassi ation.
To allow omparisons, the same training and test set size are used as in [13,22,23℄ , i.e., the rst 5 images
for ea h subje t are the training images and the remaining 5 images are used for testing. Hen e there are
200 training images and 200 test images in total and no overlap exists between the training and test images.
Due to the small size of the available data, a validation set was not used and the best-so-far re ognition rate
on testing images is reported as the testing re ognition rate.
In ea h of the following statisti al results, 30 random runs are arried out with randomly initialised
weights and biases for ea h MLP. The T-tests are based on the 0.05 level of signi an e, whi h mean the
T-test statisti has to ex eed 1.645 for experimental results to be lassi ed as statisti ally di erent from the
referen e ase (35 DCT oeÆ ients, 75 hidden neurons).

7
number of number of mean  max min T-test
signi an e
oeÆ ients hidden neurons (%) (%) (%) statisti
20 60 87.67 0.0094 89.0 86.0 21.203 X
25 60 91.55 0.0111 93.5 89.0 4.929 X
30 60 92.52 0.0085 94.0 90.5 1.513 
30 75 92.53 0.0092 94.0 90.5 1.388 
35 60 92.57 0.0097 94.5 90.5 1.204 
35 75 92.87 0.0096 94.5 91.0 | |
40 60 91.03 0.0126 93.5 88.0 6.354 X
40 75 91.67 0.0113 93.5 89.5 4.440 X
45 75 91.22 0.0121 93.0 88.0 5.868 X
50 60 91.65 0.0164 94.5 88.0 3.524 X
50 75 92.30 0.0119 94.0 90.0 2.046 X
60 60 89.32 0.0126 92.0 87.0 12.270 X
60 75 89.15 0.0117 91.5 87.0 13.475 X
70 60 88.60 0.0130 92.0 86.5 14.502 X
70 75 88.63 0.0163 92.0 85.5 12.272 X
80 75 86.93 0.0153 89.5 84.0 18.004 X
90 75 84.82 0.0137 87.0 82.5 26.398 X
100 75 84.47 0.0178 87.5 81.0 22.799 X
Table 2: Re ognition performan e on testing images versus number of DCT oeÆ ients retained.

5.2 Number of oeÆ ients


Table 2 shows the re ognition performan e on the testing images re onstru ted from di erent numbers of
DCT oeÆ ients. Here the DCT oeÆ ients are al ulated for the whole image size instead of dividing into
subimages. It is demonstrated that the re ognition rate de reases when more DCT oeÆ ients are retained.
The reader may at rst imagine this e e t to be aused by having too many or too few hidden neurons.
However, the number of hidden neurons is suÆ iently large that in all ases the training set an be fully
learnt. Furthermore, as shown in Table 3and dis ussed in Se tion 5.3, redu ing the number of hidden neurons
redu es the performan e too. This on rms the idea that more spe i pixel information introdu ed by using
more DCT oeÆ ients ould de rease the re ognition rate sin e hair, fa e outline, eyes and mouth have been
determined to be the most important fa ial features for per eiving and remembering fa es[3℄ .
In our ase, the best average re ognition rate is 92.87% obtained by retaining 35 DCT oeÆ ients and
using an MLP with 75 hidden neurons. Figure 6 shows the evolution of best re ognition rate of testing images
using 35 DCT oeÆ ients and 75 hidden neurons. The omplete training run takes less than 1 minutes on
a PC with 450MHz CPU, while a onvolutional neural network approa h would need several hours for
training[13℄ . Note that 30 oeÆ ients and 60 hidden neurons produ e statisti ally identi al re ognition
performan e so the above is not our best ase omputational load.

5.3 Number of hidden neurons


Table 3 illustrates the mean, maximum and minimum as well as the standard deviation of re ognition rate on
testing images for di erent numbers of hidden neurons in the MLP trained on 35 DCT oeÆ ients. The T-
tests test the hypothesis that the re ognition rate is statisti ally di erent from that with 75 hidden neurons,
the best mean re ognition rate in ase of 35 DCT oeÆ ients. It is demonstrated that the re ognition rate
is not so sensitive to the number of hidden neurons when there are enough hidden neurons.

5.4 Subimage size


To redu e the omputational load of DCT, images an be divided into subimages. Table 4 shows the testing
re ognition rate against subimage size, number of DCT oeÆ ients retained, and number of hidden neurons

8
1.00

0.80

Re ognition rate
0.60

0.40

mean of best-so-far
maximum of best-so-far
0.20 minimum of best-so-far

0.00
0 200 400 600 800 1000
Training epo h

Figure 6: Evolution of re ognition rate with 35 DCT oeÆ ients retained and 75 hidden neurons.

number of mean max min T-test


hidden neurons (%)  (%) (%) statisti
signi an e
20 88.83 0.0182 92.0 85.5 10.754 X
25 90.48 0.0135 94.0 88.5 7.902 X
30 91.35 0.0113 94.0 89.0 5.615 X
35 91.68 0.0124 94.5 89.0 4.156 X
40 92.00 0.0116 94.5 89.0 3.165 X
45 92.20 0.0095 94.0 90.0 2.717 X
50 91.90 0.0109 93.5 89.5 3.658 X
55 92.28 0.0106 94.0 90.5 2.260 X
60 92.57 0.0097 94.5 90.5 1.204 
65 92.55 0.0100 94.5 90.5 1.264 
70 92.68 0.0069 94.5 91.5 0.880 
75 92.87 0.0096 94.5 91.0 | |
80 92.83 0.0103 95.0 91.0 0.156 
85 92.55 0.0091 94.0 91.0 1.325 
90 92.52 0.0091 94.5 91.0 1.449 
95 92.63 0.0074 94.0 91.5 1.085 
100 92.72 0.0090 94.0 90.5 0.624 
Table 3: Re ognition performan e on testing images for di erent number of hidden neurons in the MLP trained
on 35 DCT oeÆ ients.

in the MLP. For 8  8 subimages only the 3 best performan es are listed. A more extensive listing is provided
for 16  16 subimages. Note that to a hieve re ognition rates omparable to the full-image ase, a larger
number of oeÆ ients, and twi e as many hidden neurons are ne essary. Thus, for a given re ognition rate,
use of subimages does not redu e omputational load. The reason may be that our original fa e image size
is not very large.

5.5 Comparison of di erent re ognition approa hes based on the ORL database
The ORL database has been used to test several fa e re ognition approa hes[13; 22; 23℄ . The re ognition rates
of the best models and their training/ lassi ation times (if available) are shown in Tabel 5. The lassi ation
time of the Hidden Markov Model (P2D-HMM)[23℄ is based on a model with parameters (3-6-6-6-3,12,8,9,6)
and the lassi ation of full resolution images, while 4-times smaller images (redu ed to a resolution of 23  28

9
Subimage number of no of hidden mean  max min T-test
signi an e
Size oeÆ ients neurons (%) (%) (%) statisti
88 168 100 89.93 0.0168 92.5 87.5 8.331 X
88 168 150 89.83 0.0135 92.5 87.5 10.021 X
88 168 250 88.62 0.0164 92.0 85.0 12.239 X
16  16 42 60 91.80 0.0145 94.0 88.5 3.373 X
16  16 42 75 91.97 0.0095 93.5 90.0 3.671 X
16  16 42 100 92.22 0.0103 94.5 89.5 2.540 X
16  16 42 120 92.10 0.0074 94.0 91.0 3.487 X
16  16 42 150 92.65 0.0123 95.0 90.5 0.774 
16  16 84 100 92.15 0.0114 95.0 89.5 2.648 X
16  16 126 100 88.88 0.0178 91.5 84.0 10.799 X
16  16 168 100 89.93 0.0168 92.5 87.5 8.331 X
16  16 672 100 74.60 0.0221 78.5 70.0 41.585 X
Table 4: Re ognition performan e on testing images versus subimage size, number of DCT oeÆ ients retained
and number of hidden neurons in the MLP.

pixels by averaging 4  4 windows) were used in the Convolutional Neural Network (CNN) approa h[13℄ . For
omparison, the performan e of an MLP applied to the similarly redu ed images is also shown. The MLP
has one hidden layer with 60 hidden neurons, the numbers of input neurons and output neurons are 644 and
40 respe tively. Other learning parameters and the target output and input ve tors for training the MLP
are the same as in se tion 5.1.
re ognition rate relative
approa h training time re ognition time
best mean  speed
HMM[22℄ 87% | | | |
eigenfa es(PCA)[23℄ 90% | | | |
P2D-HMM[23℄ 95% | | | 4 minutesy 1/192
onvolutional NN 98.5%z 96.2%x 0.004x 4 hours{ < 0:5 se onds{ 1
[13℄

MLPk 84.0% 77.2% 0.0353 10 minutes 0.0014 se onds 89


DCT+MLP 95.0% 92.9% 0.0096 < 1 minutes 0.0002 se onds 625
 see text for details.
y on a Sun Spar II workstation, p.92 of [23℄ ,`|' means data not available.
z not in luded in the al ulation of the mean, personal ommuni ations.
x average of 3 simulations, the value of the standard deviation is from personal ommuni ations.
{ on an SGI Indy MIPS R4400 100MHz system.
k based on 30 random runs and a 644-60-40 fully onne ted MLP and on redu ed images.
 on a 450MHz IBM ompatible PC with 128M RAM.

Table 5: Performan e omparison of di erent approa hes to re ognition applied to the ORL database.

As shown in Table 5, the re ognition rate of our DCT-based system is omparable to the best reported
results (the CNN and the P2D-HMM). However, the training and lassi ation times of our DCT-based
method are mu h faster than the other approa hes. It is diÆ ult to ompare the speed of algorithms exe uted
on di erent omputing platforms be ause of the intera tions of a large number of fa tors su h as CPU speed,
memory and a he size, ompiler eÆ ien y and even the programmer's skill. The relative re ognition speeds
given in Table 5 are extrapolated from ben hmark evaluations using the MATLAB ben hmark utility and the
published SPEC CPUfp92 data (available from http://www.spe .org). A ording to these ben hmarks,
the 450MHz Pentium used in our experiments is approximately 34 times faster than an SGI Indy MIPS
R4400 100MHz system, and approximately 910 times faster than a Spar II. The right hand olumn in
Table 5 shows the relative re ognition speed of the various methods normalised to a ount for the above

10
di eren es in pro ess speed. Note that the lassi ation time for our DCT-based method is around 600
times faster than the onvolutional neural network approa h. The lassi ation of speed of the onvolutional
neural network approa h is itself about 200 times faster than P2D-HMM approa h (Lawren e et al[13℄ report
CNN to be 500 times faster than P2D-HMM, but their omparison ignores pro essor speed di eren es).
The above speed omparison is onservative. For example, note from Figure 6 that the lengths of our
training epo hs ould be redu ed to a quarter or less without signi ant loss of lassi ation performan e.
Furthermore, for the above omparison, input images to the CNN and the MLP were a quarter of full
resolution. For N  N images, the omputational ost of these approa hes is proportional to O(N 2 ). For
omparison, the omputational omplexity of fast DCT (where N is a power of 2) is only O(N  log(N )).

6 Con lusions
In this paper, we have presented a very fast and eÆ ient approa h to fa e re ognition, whi h ombines image
ompression and neural network te hniques together. The ompression is a hieved by applying a Dis rete
Cosine Transform to the fa e images and trun ating the unimportant omponents. For fa e images, high
frequen y DCT omponents are negligibly small and an be trun ated without loss of the most important
fa ial features su h as hair, eyes, and mouth outline and lo ation. In our approa h, the ompressed transform
oeÆ ients rather than the pixel data are used for neural network lassi ation. The experiments reported
above demonstrate that for the ORL database, using only 0:34% of all the available DCT oeÆ ients produ es
a re ognition rate omparable to the best results reported to date while the pro essing speed is more than
2 orders of magnitude faster.

Referen es
1. R. Brunelli and T. Poggio, \Fa e re ognition: Features versus templates," IEEE Transa tions on Pattern Analysis
and Ma hine Intelligen e, vol. 15, no. 10, pp. 1042{1052, 1993.
2. B. Chalmond and S. Girard, \Nonlinear modeling of s attered multivariate data and its appli ation to shape
hange," IEEE Transa tions on Pattern Analysis and Ma hine Intelligen e, vol. 21, no. 5, pp. 422{432, 1999.
3. R. Chellappa, C. L. Wilson, and S. Sirohey, \Human and ma hine re ognition of fa es: A survey," Pro eedings
of the IEEE, vol. 83, no. 5, pp. 705{740, 1995.
4. C. Christopoulos, J. Bormans, A. Skodras, and J. Cornelis, \EÆ ient omputation of the two-dimensional fast
osine transform," in SPIE Hybrid Image and Signal Pro essing IV, (Orlando, Florida, USA), pp. 229{237, 1994.
5. S. E. Fahlman, \An empiri al study of learning speed in ba k-propagation networks," Te hni al Report, CMU-
CS-88-162, Department of Computer S ien e, Carnegie Mellon University, September 1988. ftp://ftp. s. mu.
edu/afs/ s/proje t/ onne t/tr/qp-tr.ps.Z.
6. L. Fausett, Fundamentals of Neural Networks: Ar hite tures, Algorithms and Appli ations. Englewood Cli s,
NJ: Prenti e Hall, 1994.
7. R. Gonzalez and R. Woods, Digital Image Pro essing. Reading, MA: Addison-Wesley, 1992.
8. Z. Hong, \Algebrai feature extra tion of image for re ognition," Pattern Re ognition, vol. 24, pp. 211{219, 1991.
9. J. Karuhnen and J. Joutsensalo, \Generalization of prin ipal omponent analysis, optimization problems and
neural networks," Neural Networks, vol. 8, no. 4, pp. 549{562, 1995.
10. M. Kirby and L. Sirovi h, \Appli ation of the Karhunen-Loeve pro edure for the hara terization of human
fa es," IEEE Transa tions on Pattern Analysis and Ma hine Intelligen e, vol. 12, no. 1, pp. 103{108, 1990.
11. M. Lades, J. Vorbruggen, J. Buhmann, J. Lange, C. Von Der Malsburg, R. Wurtz, and W. Konen, \Distortion
invariant obje t re ognition in the dynami link ar hite ture," IEEE Transa tions on Computers, vol. 42, no. 3,
pp. 300{311, 1993.
12. A. Lanitis, C. Taylor, and T. Cootes, \Automati interpretation and oding of fa e image using exible models,"
IEEE Transa tions on Pattern Analysis and Ma hine Intelligen e, vol. 19, no. 7, pp. 743{756, 1997.
13. S. Lawren e, C. Lee Giles, A. Tsoi, and A. Ba k, \Fa e re ognition: A onvolutional neural network approa h,"
IEEE Transa tions on Neural Networks, vol. 8, no. 1, pp. 98{113, 1997.
14. T. Maurer and C. von der Malsburg, \Tra king and learning graphs and pose on image sequen es of fa es," in
Pro eedings of the 2nd International Conferen e on Automati Fa e and Gesture Re ognition, (Killington, USA),
pp. 176{181, IEEE Computer So iety Press, 1996.

11
15. B. Moghaddam and A. Pentland, \Probabilisti visual learning for obje t representation," IEEE Transa tions
on Pattern Analysis and Ma hine Intelligen e, vol. 19, no. 7, pp. 696{710, 1997.
16. B. Moghaddam, W. Wahid, and A. Pentland, \Beyond eigenfa es: Probabilisti mat hing for fa e re ognition,"
in Pro eedings of the International Conferen e on Automati Fa e and Gesture Re ognition, (Nara, Japan), Apr.
1998.
17. C. Nebauer, \Evaluation of onvolutional neural networks for visual re ognition," IEEE Transa tions on Neural
Networks, vol. 9, no. 4, pp. 685{696, 1998.
18. A. J. O'Toole, H. Abdi, and D. Valentin, \Fa e re ognition," in Handbook of Brain Theory and Neural Networks
(M. Arbib, ed.), pp. 1388{390, Cambridge (MA): M.I.T. Press, 1995.
19. Z. Pan, T. Sabis h, R. Admas, and H. Bolouri, \Staged training of neo ognitron by evolutionary algorithms,"
in Pro . of IEEE Congress on Evolutionary Computation(CEC'99), (Washington D.C., USA), pp. 1965{1972,
1999.
20. M. Reinders, R. Ko h, and J. Gerbrands, \Lo ating fa ial features in image sequen es using neural networks," in
Pro eedings of the 2nd International Conferen e on Automati Fa e and Gesture Re ognition, (Killington, USA),
pp. 230{235, IEEE Computer So iety Press, 1996.
21. H. Rowley, S. Baluja, and T. Kanade, \Neural network-based fa e dete tion," IEEE Transa tions on Pattern
Analysis and Ma hine Intelligen e, vol. 20, no. 1, pp. 23{38, 1998.
22. F. Samaria and A. Harter, \Parameterisation of a sto hasti model for human fa e identi ation," in Pro eedings
of 2nd IEEE Workshop on Appli ations of Computer Vision, (Sarasota Florida, USA), De 1994.
23. F. Samaria, Fa e Re ognition using Hidden Markov Models. PhD thesis, Trinity College, University of Cambridge,
Cambridge, 1994.
24. E. Saund, \Dimensionality-redu tion using onne tionist networks," IEEE Transa tions on Pattern Analysis and
Ma hine Intelligen e, vol. 11, no. 3, pp. 304{314, 1989.
25. W. S hi mann, M. Joost, and R. Werner, \Optimization of the ba kpropagation algorithm for training multilayer
per eptrons," Te hni al Report, Institute of Physi s, University of Koblenz, ftp://ftp. is.ohio-state.edu/
pub/neuroprose/s hiff.bp_speedup.ps.Z, 1994.
26. M. Shneier and M. Abdel-Mottaleb, \Exploiting the JPEG ompression s heme for image retrieval," IEEE
Transa tions on Pattern Analysis and Ma hine Intelligen e, vol. 18, no. 8, pp. 849{853, 1996.
27. M. Turk and A. Pentland, \Eigenfa es for re ognition," Journal of Cognitive Neuros ien e, vol. 3, pp. 71{86,
1991.
28. M. Uenohara and T. Kanade, \Use of Fourier and Karhunen-Loeve de omposition for fast pattern mat hing
with a large set of templates," IEEE Transa tions on Pattern Analysis and Ma hine Intelligen e, vol. 19, no. 8,
pp. 891{898, 1997.
29. S. E. Umbaugh, Computer Vision and Image Pro essing: A pra ti al approa h using CVIPtools. Prenti e-Hall
International, In ., 1998.
30. A. Yuille, D. Cohen, and P. Hallinian, \Feature extra tion from fa es using deformable templates," in Pro eedings
of IEEE Computer So . Conferen e on Computer Vision and Pattern Re ognition, pp. 104{109, 1989.
31. J. Zhang, Y. Yan, and M. Lades, \Fa e re ognition: Eigenfa e, elasti mat hing, and neural nets," Pro eedings
of the IEEE, vol. 85, no. 9, pp. 1423{1435, 1997.
32. M. Zhang and J. Ful her, \Fa e re ognition using arti ial neural network group-based adaptive toleran e (GAT)
trees," IEEE Transa tions on Neural Networks, vol. 7, no. 3, pp. 555{567, 1996.

12

Вам также может понравиться