Face Recognition: Detection, Extraction, and Identification

1
Chapter 1
Introduction

In this report, we focus on image-based face recognition. Given a picture taken from a
digital camera, wed like to know if there is any person inside, where his/her face locates at, and
who he/she is. Towards this goal, we generally separate the face recognition procedure into three
steps: Face Detection, Feature Extraction, and Face Recognition (shown at Fig. 1.1).

Figure 1.1: Configuration of a general face recognition structure

1.1. Face Detection:

The main function of this step is to determine (1) whether human faces appear in a given
image, and (2) where these faces are located at. The expected outputs of this step are patches
containing each face in the input image. In order to make further face recognition system more
robust and easy to design, face alignment are performed to justify the scales and orientations of
these patches. Besides serving as the pre-processing for face recognition, face detection could be
used for region-of-interest detection, retargeting, video and image classification, etc.

2

Figure 1.2 A successful face detection in an image with a frontal view of a human face [2].

1.2. Feature Extraction:

After the face detection step, human-face patches are extracted from images. Directly
using these patches for face recognition have some disadvantages, first, each patch usually
contains over 1000 pixels, which are too large to build a robust recognition system1. Second,
face patches may be taken from different camera alignments, with different face expressions,
illuminations, and may suffer from occlusion and clutter. To overcome these drawbacks, feature
extractions are performed to do in-formation packing, dimension reduction, salience extraction,
and noise cleaning. After this step, a face patch is usually transformed into a vector with fixed
dimension or a set of fiducial points and their corresponding locations. We will talk more de-
tailed about this step in Section 6. In some literatures, feature extraction is either included in face
detection or face recognition.

1.3. Face Recognition:

After formulizing the representation of each face, the last step is to recognize the
identities of these faces. In order to achieve automatic recognition, a face database is required to
build. For each person, several images are taken and their features are extracted and stored in the
database. Then when an input face image comes in, we perform face detection and feature
3

extraction, and compare its feature to each face class stored in the database. There have been
many researches and algorithms proposed to deal with this classification problem, and well
discuss them in later sections. There are two general applications of face recognition, one is
called identification and another one is called verification. Face identification means given a face
image, we want the system to tell who he / she is or the most probable identification; while in
face verification, given a face image and a guess of the identification, we want the system to tell
true or false about the guess. In figure 1.3, we show an example of how these three steps work
on an input image.

Figure 1.3 An example of how the three steps work on an input image. (a) The input
image and the result of face detection (the red rectangle) (b) The extracted face patch (c) The
feature vector after feature extraction (d) Comparing the input vector with the stored vectors in
the database by classification techniques and determine the most probable class (the red
rectangle). Here we express each face patch as a d-dimensional vector, the vector as the, and as
the number of faces stored in the k
th
class.[2]

4

CHAPTER 2
Fundamental Image of Digital Processing
Introduction

In the design and analysis of image processing systems, it is convenient and often
necessary mathematically to characterize the image to be processed. There are two basic
mathematical characterizations of interest: deterministic and statistical. In deterministic image
representation, a mathematical image function is defined and point properties of the image are
considered. For a statistical image representation, the image is specified by average properties.
An image may be defined as a two-dimensional function (x, y), where x and y are spatial (plane)
coordinates, and the amplitude of at any pair of coordinates (x, y) is called the intensity or gray
level of the image at that point. When x,y, and the amplitude values of/ are all finite, discrete
quantities, we call the image a digital image..The field of digital image processing refers to
processing digital images by means of a digital computer. Note that a digital image is composed
of a finite number of elements, each of which has a particular location and value. These elements
are referred to as picture elements, image elements, pals, and pixels. Pixel is the term most
widely used to denote the elements of a digital image [38].
Origin

Figure 2.1: Image Definition

X
f(x,y)
Y
5

2.1 Human and Computer Vision

We cannot think of image processing without considering the human visual system. This
seems to be a trivial statement, but it has far-reaching consequences. We observe and evaluate
the images that we process without visual system. Without taking this elementary fact into
consideration, we may be much misled in the interpretation of images. It is obvious that a deeper
knowledge would be of immense help for computer vision. Here is not the place to give an
overview of the human visual system. The intention is rather to make us aware of the elementary
relations between human and computer vision. Here, we will make only some introductory
remarks. We discuses about human visual system in some important points:

1. The human visual system interprets the context in its estimate of length. Consequently,
we should be very careful in our visual estimates of lengths and areas in images.

2. The human visual system is extremely powerful in recognizing objects, but is less well
suited for accurate measurements of gray values, distances, and areas.

In comparison, the power of computer vision systems is marginal and should make us
feel humble. A digital image processing system can only perform elementary or well dened
xed image processing tasks such as real-time quality control in industrial production. A
computer vision system has also succeeded in steering a car at high speed on a highway, even
with changing lanes. However, we are still worlds away from a universal digital image
processing system which is capable of understanding images as human beings do and of
reacting intelligently and flexibility in real time. Another connection between human and
computer vision is worth noting. Important developments in computer vision have been made
through progress in understanding the human visual system [38].

6

2.2 Representation of Image
We denote images by two-dimensional functions of the form f(x, y). The value or
amplitude of at spatial coordinates (x, y) is a positive scalar quantity whose physical meaning is
determined by the source of the image. Most of the images in which we are interested in this
book are monochromatic images, whose values are said to span the gray scale. When an image is
generated from a physical process, its values are proportional to energy radiated by a physical
source (e.g., electromagnetic waves). As a consequence, f(x, y) must be nonzero and finite; that
is,

0 < f(x, y) < (2.3-1)

The function f(x, y) may be characterized by two components: A) the amount of source
illumination incident on the scene being viewed, and B) the amount of illumination reflected by
the objects in the scene. Appropriately, these are called the illumination and reflectance
components and are denoted by i(x,y) and r(x, y), respectively. The two functions combine as a
product to form/(x, y):

f(x,y) = i(x,y)r(x,y) (2.2-2)
where
0<i(x,y)< (2.2-3)
and
0<r(x,y)<1 (2.2-4)

Equation (2.2-4) indicates that reflectance is bounded by 0 (total absorption) and 1 (total
reflectance).The nature of i(x, y) is determined by the illumination source, and r(x, y) is
determined by the characteristics of the imaged objects. It is noted that these expressions also are
applicable to images formed via transmission of the illumination through a medium, such as a
chest X-ray. In this case, we would deal with a transmissivity instead of a reflectivity function,
but the limits would be the same as in Eq. (2.2-4), and the image function formed would be
modeled as the product in Eq. (2.2-2).
7

We call the intensity of a monochrome image at any coordinates (x
0
, y
0
) the gray level () of the
image at that point. That is,
=f(x
0
,y
0
) (2.2-5)
From Eqs.(2.1-2) through (2.1-4), it is evident that lies in the range

L
min
< <L
max
(2.2-6)

The interval [L
min
, L
max
] is called the gray scale. Common practice is to shift this interval
numerically to the interval [0, L 1], where = 0 is considered black and = L 1 is
considered white on the gray scale. All intermediate values are shades of gray varying from
black to white.

Figure 2.2: Digital Image Acquisition [38].

8

2.3 Image Sampling and Quantization

We can follow different type of way to acquire images but our objective in all is the
same: to generate digital images from sensed data. The output of most sensors is a continuous
voltage waveform whose amplitude and spatial behavior are related to the physical phenomenon
being sensed. To create a digital image, we need to convert the continuous sensed data into
digital form. This involves two processes: sampling and quantization [38].

F(x,y) f
s
(m,n) u(m,n)

Continuous Sampled Digital
Image Image Image

Figure 2.3: Image Sampling and Quantization.

In figure 2-3, f(x,y) denotes the continuous image, f
s
(m,n) denotes the sampled image and u(m,n)
denote the digital image.

2.3.1 Basic Concept of Image Sampling and Quantization

Image sampling refers to a process of converting the continuous spatial distribution of
light intensity of an object into a discrete array of samples of the light intensity distribution. That
is, digitizing the co-ordinate values is called sampling [38].Conversely, the continuous image can
be obtained from its sample values when the number of samples obtained is greater than a certain
value. Consider the rectangular array of sensors shown in Figure 2.3. Each element senses the
distribution of light intensity of the object being imaged at a discrete spatial location .
Quantizer Sampler
9

Figure 2.4: A rectangular sampling grid [38]

The next step in the image acquisition process is to convert this analog voltage into a
digital value that can be represented in a suitable digital format. A simple way to quantize an
image sample is as follows. Divide the input range of analog values uniformly into a given
number of decision intervals and determine the corresponding output levels as lying midway
between consecutive input decision boundaries. Number the output levels sequentially starting
from the lowest to the highest and represent them in binary format. Then map the given analog
value to the nearest output level and read out its binary number. This type of quantization is
known as uniform quantization. That is, digitizing the amplitude values is called quantization
[38].
10

Figure 2.5: Generating a digital image (a) Continuous image. (b) A scan line from A to B in the
continuous image. (c) Sampling & quantization. (d) Digital scan line [38]

The one-dimensional function shown in Fig. 2-5(a) is a plot of amplitude (gray level)
values of the continuous image along the line segment AB in Fig. 2-5(b).The random variations
are due to image noise. To sample this function, we take equally spaced samples along line AB,
as shown in Fig. 2.5(c).The location of each sample is given by a vertical tick mark in the bottom
part of the figure. The samples are shown as small white squares superimposed on the function.
The set of these discrete locations gives the sampled function. However, the values of the
samples still span (vertically) a continuous range of gray-level values. In order to form a digital
function, the gray-level values also must be converted {quantized) into discrete quantities. The
right side of Fig. 2-5(c) shows the gray-level scale divided into eight discrete levels, ranging
from black to white. The vertical tick marks indicate the specific value assigned to each of the
11

eight gray levels. The continuous gray levels are quantized simply by assigning one of the eight
discrete gray levels to each sample. The assignment is made depending on the vertical proximity
of a sample to a vertical tick mark. The digital samples resulting from both sampling and
quantization are shown in Fig. 2-5(d). Starting at the top of the image and carrying out this
procedure line by line produces a two-dimensional digital image [38]. The resultant image is
found after sampling and quantization of image as following:

Figure 2.6: (a) Continuous image projected onto a sensor array. (b) Result of image sampling
and quantization [38]

2.4 Digital Image Representation

The result of sampling and quantization is a matrix of real numbers. Assume that an
image f(x,y) is sampled so that the resulting digital image has M rows and N columns. The
values of the coordinates (x,y) now become discrete quantities. For notational clarity and
convenience, we shall use integer values for these discrete co-coordinates. Thus, the values of
the coordinates at the origin are (x, y) = (0, 0). The next coordinate values along the first row of
the image are represented as (x, y) = (0,1). It is important to keep in mind that the notation (0,1)
is used to signify the second sample along the first row. It does not mean that these are the actual
values of physical coordinates when the image was sampled. Figure 2-7 shows the coordinate
convention used to represent digital image.
12

Figure 2.7: Coordinate convention used to represent digital images [38]

So now we can write the complete M X N digital image in the following compact matrix form:
f(x,y) =
) 1 , 1 ( ... ) 1 , 1 (( ) 0 , 1 (
.
.
.
...
...
...
.
.
.
.
.
.
) 1 , 1 ( ... ) 1 , 1 ( ) 0 , 1 (
) 1 , 0 ( ... ) 1 , 0 ( ) 0 , 0 (
N M f M f M f
N f f f
N f f f
(2.4-1)
This digitization process requires decisions about values for M, N, and for the number, L, of
discrete gray levels allowed for each pixel. There are no requirements on M and N,
other than that they have to be positive integers. However, due to processing, storage, and
sampling hardware considerations, the number of gray levels typically is an integer power of 2:
L= 2
k
(2.4-2)
We assume that the discrete levels are equally spaced and that they are integers in the interval [0,
L 1], Sometimes the range of values spanned by the gray scale is called the dynamic range of
an image, and we refer to images whose gray levels span a significant portion of the gray scale as
having a high dynamic range. When an appreciable number of pixels exhibit this property, the
13

image will have high contrast. Conversely, an image with low dynamic range tends to have a
dull, washed out gray look [38]. The number, b, of bits required to store a digitized image is

b = M N k (2.4-3).

When M = N, this equation becomes

b = N
2
k (2.4-4)

2.5 Outline of a Typical Face Recognition System

Face image Normalized face image Based on PCA

Recognized face

Figure 2.8: Outline of a typical face recognition system.

The issues of the design of the Face Recognition System (FRS) can be subdivided into two main
parts. The first part is image processing and the second part is Eigenfaces technique.
The image processing part contains-
Faces image acquisition,
Image scaling,
Image filtering,
Converting the image into grayscale image,
Image segmentation.
Pre-processing Acquisition Feature extraction
Save weight in Knowledge base Feature vector
From knowledge base Testing algorithm
14

The second part consists of two main subparts. This are-
Learning
Recognition.

The first part of FRS consists of several image processing techniques. Firstly, faces
image acquisition is achieved by using digital camera and webcam. Then image scaling and
filtering technique is performed. Then image segmentation is performed. The next step is
recognition which is performed using eigenfaces technique.

The acquisition module: This is the entry point of the face recognition process. It is the
module where the face image under consideration is presented to the system. In other
words, the user is asked to present a face image to the face recognition system in this
module. An acquisition module can request a face image from several different
environments: The face image can be an image file that is located on a magnetic disk; it
can be captured by a digital camera or webcam.

The pre-processing module: The following pre-processing steps may be implemented in
a face recognition system.

Image scaling: Image scaling includes zooming and shrinking. One can use image
enlarging to zoom in on a part of an image for closer examination, image shrinking for
saving disk space, fitting a large image into a small display, and pasting several images
into one image of the same size. Generally there are two techniques for image zooming:
replication and interpolation. One enlarges an image by repeating a single pixel several
times in both directions. The replication method uses the simplest technique; it just copies
the same pixel repeatedly depending on the scaling factor. The interpolation technique is
harder to implement, but the result is smoother. Replication copies it, but interpolation
smoothes the transition. The curves in the replication technique are more jagged then the
interpolation case. Here it is usually done to change the acquired image size to a default
image size.

15

Spatial filtering: The uses of spatial masks for image processing usually is called spatial
filtering and the mask themselves are called spatial filters. In use of linear spatial filter for
all type of filtering, the basic approach is to sum products of the mask coefficients and
the intensities of the pixels under the mask at a specific location in an image.

Figure 2.9 shows a general 3x3 mask. Denoting the gray levels of pixels under the mask
at any location by z
1
, z
2
, , z
9
. The response of the linear mask R is
w
1
z
1
+ w
2
z
2
++ w
9
z
9
.
Here w
i
= intensity of the pixel

1
w
2
w
3
w

7
w
8
w
9
w

Figure 2.9: A 3 x 3 mask with arbitrary coefficients

If the center of the mask is at location (x, y) in the image, the gray level of the pixel
located at (x, y) will be replaced by response R. The mask is then moved to the next pixel
location in the image and the process is repeated. This continues until all pixel locations have
been covered. The value of R is computed by using partial neighborhoods for pixels that are
located in the border of the image [38].

4
w
5
w
6
w
16

2.6 Fundamental steps in Digital image processing

Figure 2.10: Fundamental steps in Digital image processing [38]

There are some fundamental steps in digital image processing. These are summarized below-

Image acquisition is the first process of image processing. Note that acquisition could be
as simple as being given an image that is already in digital form. Generally, the image
acquisition stage involves preprocessing, such as scaling.

Image enhancement is among the simplest and most appealing areas of digital image
processing. Basically, the idea behind enhancement technique is to bring out detail that is
obscured, or simply to highlight certain features of interest in an image. A familiar example of
enhancement is when we increase the contrast of an image because "it looks better". It is
important to keep in mind that enhancement is a very subjective area of image processing. This
approach allows the reader to gain familiarity with these concepts in the context of image
processing. A good example of this is the Fourier transform.

Color image
processing
Image
restoration
Image
enhancement
Image
acquisition
Wavelets
Compression Morphological
processing

Segmentation
Representation and
description
Object recognition

Knowledge base
17

Image restoration is an area that also deals with improving the appearance of an image.
However, unlike enhancement, which is subjective, image restoration is objective, in the sense that
restoration techniques tend to be based on mathematical or probabilistic models of image degradation.
Enhancement, on the other hand, is based on human subjective preferences regarding what constitutes a
"good" enhancement result.

Color image processing: is an area that has been gaining in importance because of the significant
increase in the use of digital images in internet.

Wavelets are the foundation for representing images in various degree of resolution. This is used for
image data compression and for pyramidal representation, in which images are subdivided into smaller
region.

Compression deals with techniques for reducing the storage required saving an image or the
bandwidth required transmitting it. Recently image compression is closely related to transmission media in
internet. We are familiar with this through file extensions like JPEG image compression standard.

Morphological processing deals with tools for extracting image components that are useful the
representation and description of shape. The material in this chapter begins a transition from
processes that output images to processes that output image attributes.

Segmentation procedures partition an image into its constituent parts or objects. In general,
autonomous segmentation is one of the most difficult tasks in digital image processing. A rugged
segmentation procedure brings the process a long way toward successful solution of imaging problems
that require objects to be identified individually.

Representation and description almost always follow the output of a segmentation stage, which
usually is raw pixel data, constituting either the boundary of a region (i.e., the set of pixels separating one
image region from another) or all the points in the region itself. In either case, converting the data to a
form suitable for computer processing is necessary. The first decision that must be made is whether the data
should be represented as a boundary or as a complete region. Boundary representation is appropriate when
18

the focus is on external shape characteristics, such as corners and inflections. Regional representation is
appropriate when the focus is on internal properties, such as texture or skeletal shape.

Recognition is the process that assigns a label (e.g., "vehicle") to an object based on its
descriptors [7].

2.7 Components of an Image Processing System

Network
Problem domain

Figure 2.11: Components of a general purpose image processing system [38]

Image sensor has two elements. The first is a physical device that is sensitive to the
energy radiated by the object we wish to image. Second is digitizer which is a device for
converting the output of physical sensing device into digital form.

Specialized image processing hardware consists of digitizer and hardware that
performs other primitive operations as ALU.
Image displays
Computer
Mass storage
Hardcopy Specialized image
processing hardware
Image processing
hardware
Image sensors
19

Computer can range from a PC to a supercomputer. Sometimes it is used to achieve a
required level of performance.

Image processing software consists of specialized modules that perform specific tasks.
A well-designed package allow user to write code.

Mass storage capability is must in image processing applications. An image of size
1024x1024 pixels, in which the intensity of each pixel is an 8-bit quantity, requires one
megabyte storage space if the image is not compressed .

Image displays is used in color TV monitors. They are driven by outputs of image and
graphics display cards that are an integral part of computer system.

Hardcopy devices for recording images include laser printers, cameras etc. image are
displayed on film transparencies or in digital medium.

Networking is almost default function in computer system today. Because of large
amount of data inherent here, key consideration is bandwidth. In dedicated networks it is not a
major problem but for remote sites is not efficient. But we can improve this condition by using
optical fiber or others.
Conclusion
In this chapter we discussed Image Fundamentals, Image Sampling and Quantization,
Basic Concept of Image Sampling and Quantization, Digital Image Representation, Fundamental
steps in Digital image processing, Components of an Image Processing System, Pattern Classes
and Patterns, Outline of a Typical Face Recognition system and finally Fundamental steps in
Digital image processing

20

Chapter 3
Fundamental of pattern recognition

Before going into details of techniques and algorithms of face recognition, wed like to
make a digression here to talk about pattern recognition. The discipline, pattern recognition,
includes all cases of recognition tasks such as speech recognition, object recognition, data
analysis, and face recognition, etc. In this section, we wont discuss those specific applications,
but introduce the basic structure, general ideas and general concepts behind them.

The general structure of pattern recognition is shown in fig.3.1. In order to generate a
system for recognition, we always need data sets for building categories and com-pare
similarities between the test data and each category. A test data is usually called a query in
image retrieval literatures, and we will use this term throughout this re-port. From fig. 3, we can
easily notice the symmetric structure. Starting from the data sets side, we first perform dimension
reduction2on the stored raw data. The methods of dimension reduction can be categorized into
data-driven methods and do-main-knowledge methods, which will be discussed later. After
dimension reduction, each raw data in the data sets is transformed into a set of features, and the
classifier is mainly trained on these feature representations. When a query comes in, we per-form
the same dimension reduction procedure on it and enter its features into the trained classifier.
The output of the classifier will be the optimal class (sometimes with the classification accuracy)
label or a rejection note (return to manual classification).

3.1 Notation

There are several conventional notations in the literatures of pattern recognition and
machine learning. We usually denote a matrix with an upper-case character and a vector with a
lower-case one. Each sample in the training data set with N samples is expressed as for the
supervised learning case (the label is known for each sample) and for the unsupervised case. The
input query is represented as without the indicator T to distinguish from the training set. When
doing linear projection for dimension reduction, we often denote the projection vector as and the
projection matrix as.
21

Figure 3.1 the general structure of a pattern recognition system [].

3.2 Different kinds of pattern recognition (four categories)

Following the definition of Jain et al. [5], Techniques of pattern recognition can be
classified into four categories: Template matching, statistical approaches, syntactic approach,
and neural networks. The template matching category builds several tem-plates for each label
class and compares these templates with the test pattern to achieve a suitable decision. The
statistical approaches is the main category that will be discussed in this report, which extracts
knowledge from training data and uses different kinds of machine learning tools for dimension
reduction and recognition. Fig. 3.2 shows the categories of the statistical approach.

The syntactic approach is often called the rule-based pattern recognition, which is built
on human knowledge or some physical rules, for example, the word classification and word
correction requires the help of grammars. The term, knowledge, is referred to the rule that the
recognition system uses to perform certain actions. Finally, the well-known neural network is a
framework based on the recognition unit called perceptron. With different numbers of
22

perceptrons, layers, and optimization criteria, the neural networks could have several variations
and be applied to wide recognition cases.

Figure 3.2: Various approaches in statistical pattern recognition. [5]

3.3 Dimension Reduction: Domain-knowledge Approach and Data-driven Approach

Dimension reduction is one of the most important steps in pattern recognition and
machine learning. Its difficult to directly use the raw data (ex. face patches) for pat-tern
recognition not only because significant parts of the data havent been extracted but also because
the extremely high dimensionality of the raw data. Significant parts (for recognition purposes or
the parts with more interest) usually occupy just a small portion of the raw data and cannot
directly be extracted by simple methods such as cropping and sampling. For example, a one-
channel audio signal usually contains over 10000 samples per second, and there will be over
1800000 samples for a three minute-long song. Directly using the raw signal for music genre
recognition is prohibitive and we may seek to extract useful music features such as pitch, tempo,
and information of instruments which could better express our auditory perception. The goal of
23

dimension reduction is to extract useful information and reduce the dimensionality of input data
into classifiers in order to decrease the cost of computation and solve the curse of dimensionality
problem.

Therere two main categories of dimension reduction techniques: domain- knowledge
approaches and data-driven approaches. The domain-knowledge approaches perform dimension
reduction based on knowledge of the specific pattern recognition case. For example, in image
processing and audio signal processing, the discrete Fourier transform (DFT) discrete cosine
transform (DCT) and discrete wavelet transform are frequently used because of the nature that
human visual and auditory perception have higher response at low frequencies than high
frequencies. Another significant example is the use of language model in text retrieval which
includes the contextual environment of languages.

In contrast to the domain-knowledge approaches, the data-driven approaches directly
extract useful features from the training data by some kinds of machine learning techniques. For
example, the eigenface determines the most important projection bases based on the principal
component analysis which are dependent on the training data set, not the fixed basis like the DFT
or DCT.

3.4 Two tasks: Unsupervised Learning and Supervised Learning

Therere two general tasks in pattern recognition and machine learning: supervised
learning and unsupervised learning. The main difference between these two tasks is if the label
of each training sample is known or unknown. When the label is known, then during the learning
phase in pattern recognition, were trying to model the relation between the feature vectors and
their corresponding labels, and this kind of learning is called the supervised learning. On the
other hand, if the label of each training sample is unknown, then what we try to learn is the
distribution of the possible categories of feature vectors in the training data set, and this kind of
learning is called the unsupervised learning. In fact, there is another task of learning called the
semi-supervised learning, which means only part of the training data has labels, while this kind
of learning is beyond the scope of this report.

24

3.5 Conclusion

The tasks and cases discussed in the previous sections give an overview about pattern
recognition. To gain more insight on the performance of pattern recognition techniques, we need
to take care about some important factors. In template matching, the number of templates for
each class and the adopted distance metric directly affects the recognition result. In statistical
pattern recognition, there are four important factors: the size of the training data N, the
dimensionality of each feature vector d, the number of classes C, and the complexity of the
classifier h. In syntactic approach, we expect that the more rules are considered, the higher
recognition performance we can achieve, while the system will become more complicated. And
sometimes, its hard to transfer and organize human knowledge into algorithms. Finally in
neural networks, the number of layers, the number of used perceptrons (neurons), the
dimensionality of feature vectors, and the number of classes all has effects on the recognition
performance. More interesting, the neural networks have been discussed and proved to have
closed relationships with the statistical pattern recognition techniques [5].

25

Chapter 4:
Face detection

Nowadays some applications of Face Recognition dont require face detection. In some
cases, face images stored in the data bases are already normalized. There is a standard image
input format, so there is no need for a detection step. An example of this could be a criminal data
base. There, the law enforcement agency stores faces of people with a criminal report. If there is
new subject and the police has his or her passport photograph, face detection is not necessary.
However, the conventional input image of computer vision systems is not that suitable. They can
contain many items or faces. In these cases face detection is mandatory. Its also unavoidable if
we want to develop an automated face tracking system. For example, video surveillance systems
try to include face detection, tracking and recognizing. So, its reasonable to assume face
detection as part of the more ample face recognition problem.

Face detection must deal with several well known challenges [2].They are usually present
in images captured in uncontrolled environments, such as surveillance video systems. These
challenges can be attributed to some factors:

Pose variation: The ideal scenario for face detection would be one in which only frontal
images were involved. But, as stated, this is very unlikely in general uncontrolled conditions.
Moreover, the performance of face detection algorithms drops severely when there are large pose
variations. Its a major research issue. Pose variation can happen due to subjects movements or
cameras angle.

Feature occlusion: The presence of elements like beards, glasses or hats introduces high
variability. Faces can also be partially covered by objects or other faces.

Facial expression: Facial features also vary greatly because of different facial gestures.

26

Imaging conditions: Different cameras and ambient conditions can affect the quality of
an image, affecting the appearance of a face.

There are some problems closely related to face detection besides feature extraction and
face classification. For instance, face location is a simplified approach of face detection. Its goal
is to determine the location of a face in an image where theres only one face. We can
differentiate between face detection and face location, since the latter is a simplified problem of
the former. Methods like locating head boundaries [3] were first used on this scenario and then
exported to more complicated problems. Facial feature detection concerns detecting and locating
some relevant features, such as nose, eye-brow, lips, ears, etc. Some feature extraction
algorithms are based on facial feature detection. There is much literature on this topic, which is
discussed later. Face tracking is other problem which sometimes is a consequence of face
detection. Many systems goal is not only to detect a face, but to be able to locate this face in real
time. Once again, video surveillance system is a good example.

4.1. Face detection problem structure

Face Detection is a concept that includes many sub-problems. Some systems detect and
locate faces at the same time, others first perform a detection routine and then, if positive, they
try to locate the face. Then, some tracking algorithms may be needed - see Figure 1.2.

Figure 4.1: Face detection processes []

Face detection algorithms usually share common steps. Firstly, some data dimension
reduction is done, in order to achieve a admissible response time. Some pre-processing could
also be done to adapt the input image to the algorithm prerequisites. Then, some algorithms
analyze the image as it is, and some others try to extract certain relevant facial regions. The next
27

phase usually involves extracting facial features or measurements. These will then be weighted,
evaluated or compared to decide if there is a face and where is it. Finally, some algorithms have
a learning routine and they include new data to their models.

Face detection is, therefore, a two class problem where we have to decide if there is a
face or not in a picture. This approach can be seen as a simplified face recognition problem. Face
recognition has to classify a given face, and there are as many classes as candidates.
Consequently, many face detection methods are very similar to face recognition algorithms. Or
put another way, techniques used in face detection are often used in face recognition.

4.2 Approaches to face detection

Its not easy to give taxonomy of face detection methods. There isnt a globally accepted
grouping criterion. They usually mix and overlap. In this section, two classification criteria will
be presented. One of them differentiates between distinct scenarios. Depending on these
scenarios different approaches may be needed. The other criterion divides the detection
algorithms into four categories.

Detection depending on the following scenario

Controlled environment: Its the most straightforward case. Photographs are taken under
controlled light, background, etc. Simple edge detection techniques can be used to detect faces
[1].

Color images: The typical skin colors can be used to find faces. They can be weak if light
conditions change. Moreover, human skin color changes a lot, from nearly white to almost black.
But, several studies show that the major difference lies between their intensity, so chrominance is
a good feature [2]. Its not easy to establish a solid human skin color representation. However,
there are attempts to build robust face detection algorithms based on skin color [4].

Images in motion: Real time video gives the chance to use motion detection to localize faces.
Nowadays, most commercial systems must locate faces in videos. There is a continuing
28

challenge to achieve the best detecting results with the best possible performance [6]. Another
approach based on motion is eye blink detection, which has many uses aside from face detection
[7, 8].

Detection methods divided into categories
Yan, Kriegman and Ahuja presented a classification that is well accepted [2]. Methods are
divided into four categories. These categories may over-lap, so an algorithm could belong to two
or more categories. This classification can be made as follows:

Knowledge-based methods: Ruled-based methods that encode our knowledge of human faces.

Feature-invariant methods: Algorithms that try to find invariant features of a face despite its
angle or position.

Template matching methods: These algorithms compare input images with stored patterns of
faces or features.

Appearance-based methods: A template matching method whose pat-tern database is learnt
from a set of training images.

Let us examine them on detail:

4.3 Knowledge-based methods

These are rule-based methods. They try to capture our knowledge of faces, and translate
them into a set of rules. Its easy to guess some simple rules. For example, a face usually has two
symmetric eyes, and the eye area is darker than the cheeks. Facial features could be the distance
between eyes or the color intensity difference between the eye area and the lower zone. The big
problem with these methods is the difficulty in building an appropriate set of rules. There could
be many false positives if the rules were too general. On the other hand, there could be many
false negatives if the rules were too detailed. A solution is to build hierarchical knowledge-based
29

methods to overcome these problems. However, this approach alone is very limited. Its unable
to find many faces in a complex image.

Other researches have tried to find some invariant features for face detection. The idea is
to overcome the limits of our instinctive knowledge of faces. One early algorithm was developed
by Han, Liao, Yu and Chen in 1997 [9].The method is divided in several steps. Firstly, it tries to
find eye-analogue pixels, so it removes unwanted pixels from the image. After performing the
segmentation process, they consider each eye-analogue segment as a candidate of one of the
eyes. Then, a set of rule is executed to determinate the potential pair of eyes. Once the eyes are
selected, the algorithm calculates the face area as a rectangle. The four vertexes of the face are
determined by a set of functions. So, the potential faces are normalized to a fixed size and
orientation. Then, the face regions are verificated using a back propagation neural network.
Finally, they apply a cost function to make the final selection. They report a success rate of 94%,
even in photographs with many faces. These methods show themselves efficient with simple
inputs. But, what happens if a man is wearing glasses?

There are other features that can deal with that problem. For example, there are
algorithms that detect face-like textures or the color of human skin. It is very important to choose
the best color model to detect faces. Some recent researches use more than one color model. For
example, RGB and HSV are used together successfully [10]. In that paper, the authors chose the
following parameters

0.4 r 0.6, 0.22 g 0.33, r > g > (1 r)/2 ..(4.1)
0 H 0.2, 0.3 S 0.7, 0.22 V 0.8 (4.2)

Both conditions are used to detect skin color pixels. However, these methods alone are
usually not enough to build a good face detection algorithm. Skin color can vary significantly if
light conditions change. Therefore, skin color detection is used in combination with other
methods, like local symmetry or structure and geometry.

30

4.4 Template matching

Template matching methods try to define a face as a function. We try to find a standard
template of all the faces. Different features can be defined independently. For example, a face
can be divided into eyes, face contour, nose and mouth. Also a face model can be built by edges.
But these methods are limited to faces that are frontal and un-occluded. A face can also be
represented as a silhouette. Other templates use the relation between face regions in terms of
brightness and darkness. These standard patterns are compared to the input images to detect
faces. This approach is simple to implement, but its inadequate for face detection. It cannot
achieve good results with variations in pose, scale and shape. However, deformable templates
have been proposed to deal with these problems.

Appearance-based methods

The templates in appearance-based methods are learned from the examples in the images.
In general, appearance-based methods rely on techniques from statistical analysis and machine
learning to find the relevant characteristics of face images. Some appearance-based methods
work in a probabilistic net-work. An image or feature vector is a random variable with some
probability of belonging to a face or not. Another approach is to define a discriminate function
between face and non-face classes. These methods are also used in feature extraction for face
recognition and will be discussed later. Nevertheless, these are the most relevant methods or
tools:

Eigenface-based: Sirovich and Kirby [13] developed a method for efficiently representing
faces using PCA (Principal Component Analysis). Their goal of this approach is to represent a
face as a coordinate system. The vectors that make up this coordinate system were referred to as
eigenpictures. Later, Turk and Pentland used this approach to develop a eigenface-based
algorithm for recognition [12].

Distribution-based: These systems where first proposed for object and pattern detection
by Sung [14]. The idea is collect a sufficiently large number of sample views for the pattern class
we wish to detect, covering all possible sources of image variation we wish to handle. Then an
appropriate feature space is chosen. It must represent the pattern class as a distribution of all its
31

permissible image appearances. The system matches the candidate picture against the
distribution-based canonical face model. Finally, there is a trained classifier which correctly
identifies instances of the target pattern class from background image patterns, based on a set of
distance measurements between the input pattern and the distribution-based class representation
in the chosen feature space. Algorithms like PCA or Fishers Discriminant can be used to define
the subspace representing facial patterns.

Neural Networks: Many pattern recognition problems like object recognition, character
recognition, etc. have been faced successfully by neural networks. These systems can be used in
face detection in different ways. Some early researches used neural networks to learn the face
and non-face patterns [93]. They defined the detection problem as a two-class problem. The real
challenge was to represent the images not containing faces class. Other approach is to use
neural networks to find a discriminant function to classify patterns using distance measures.
Some approaches have tried to find an optimal boundary between face and non-face pictures
using a constrained generative model.

Support Vector Machines: SVMs are linear classifiers that maximize the margin between
the decision hyper plane and the examples in the training set. So, an optimal hyper plane should
minimize the classification error of the unseen test patterns. This classifier was first applied to
face detection by Osuna [15].

Sparse Network of Winnows: SNoWs were first used for detection by Yang [16]. They
defined a sparse network of two linear units or target nodes, one representing face patterns and
the other for the non-face patterns. The SNoW had a incrementally learned feature space. New
labeled cases served as positive example for one target and as a negative example for the
remaining target. The system proved to be effective at the time, and less time consuming.

Naive Bayes Classifiers: Schneiderman and Kanade described an object recognition
algorithm that modeled and estimated a Bayesian Classifier [17]. They computed the probability
of a face to be present in the picture by counting the frequency of occurrence of a series of
patterns over the training images. They emphasized on patterns like the intensity around the eyes.
32

The classifier captured the joint statistics of local appearance and position of the face as well as
the statistics of local appearance and position in the visual world. Overall, their algorithm
showed good results on frontal face detection. Bayes Classifiers have also been used as a
complementary part of other detection algorithms.

Hidden Markov Model: This statistical model has been used for face detection. The
challenge is to build a proper HMM, so that the output probability can be trusted. The states of
the model would be the facial features, which are often defined as strips of pixels. The
probabilistic transition between states is usually the boundaries between these pixel strips. As in
the case of Bayesians, HMMs are commonly used along with other methods to build detection
algorithms.

Information-Theoretical Approach: Markov Random Fields (MRF) can be used to model
contextual constraints of a face pattern and correlated features. The Markov process maximizes
the discrimination between classes (an image has a face or not) using the KullbackLeibler
divergence. Therefore, this method can be applied in face detection.

Inductive Learning: This approach has been used to detect faces. Algorithms like Quinlans C4.5
or Mitchells FIND-S have been used for this purpose [18].

4.5 Face tracking

Many face recognition systems have a video sequence as the input. Those systems may
require to be capable of not only detecting but tracking faces. Face tracking is essentially a
motion estimation problem. Face tracking can be performed using many different methods, e.g.,
head tracking, feature tracking, image-based tracking, model-based tracking. These are different
ways to classify these algorithms [19]:

Head tracking/Individual feature tracking: The head can be tracked as a whole entity, or certain
features tracked individually.

33

2D/3D: Two dimensional systems track a face and output an image space where the face is
located. Three dimensional systems, on the other hand, perform a 3D modeling of the face. This
approach allows estimating pose or orientation variations.

The basic face tracking process seeks to locate a given image in a picture. Then it has to
compute the differences between frames to update the location of the face. There are many issues
that must be faced: Partial occlusions, illumination changes, computational speed and facial
deformations.

One example of a face tracking algorithm can be the one proposed by Baek [20]. The
state vector of a face includes the center position, size of the rectangle containing the face, the
average color of the face area and their first derivatives. The new candidate faces are evaluated
by a Kalman estimator. In tracking mode, if the face is not new, the face from the previous frame
is used as a template. The position of the face is evaluated by the Kalman estimator and the face
region is searched around by a SSD algorithm using the mentioned template. When SSD finds
the region, the color information is embedded into the Kalman estimator to exactly confine the
face region. Then, the state vector of that face is updated. The result showed robust when some
faces overlapped or when color changes happened.

34

Chapter5
Kohonen SOM

5.1 Introduction to Neural Networks
Neural networks are a fascinating concept. The first neural network,
Perceptron, was created in 1956 by Frank Rosenblatt. Thirteen years later in 1969, a
publication known as Perceptrons, by Minsky and Papert, brought a devastating blow to
neural network research. In this book, it formalized the concept neural networks, but
also pointed out some serious limitations in the original architecture. Specifically, it showed
that Perceptron could not perform a basic logical computation of a XOR (exclusive-or). This
nearly brought neural network research to a halt. Fortunately, research has resumed, and at the
time of this writing, neural network popularity has increased dramatically as a tool to provide
solutions to difficult problems [21].

Designed around the brain-paradigm of Artificial Intelligence, neural networks
attempt to model the biological brain. Neural networks are very different from most
standard computer science concepts. In a typical program, data is stored in some structure such
as frames, which are then stored within a centralized database, such as with an Expert System
or a Natural Language Processor. In neural networks, however, information is distributed
throughout the network [21]. This mirrors the biological brain, which stores its information
(memories) throughout its' synapses.

Neural networks have features that make them appealing to both connectionist
researchers and individuals needing ways to solve complex problems. Neural networks
can be extremely fast and efficient. This facilitates the handling of large amounts of data [21].
The reason for this is that each node in a neural network is essentially its own autonomous entity.
Each performs only a small computation in the grand-scheme of the problem. The aggregate of
all these nodes, the entire network, is where the true capability lies. This
architecture allows for parallel implementation, much like the biological brain, which performs
nearly all of its tasks in parallel. Neural networks are also fault tolerant. This is a fundamental
property. A small portion of bad data or a sector of non-functional nodes will not cripple the
35

network. It will learn to adjust. This does have a limit, though, as too much bad data will
cause the network to produce incorrect results. The same can be said of the human brain. A slight
head trauma may produce no noticeable effects, but a major head trauma (or a slight head trauma
to the correct spot) may leave the person incapacitated. Also, as pointed out in many studies, we
can still understand a sentence when the letters of a word are out of order. This proves that our
brain can manage with some corrupt input data.

How do neural networks learn? First off, we must define learning.
Biologically, learning is an experience that changes the state of an organism such that the new
state leads to an improved performance in subsequent situations. Mechanically, it is similar:
Computational methods for acquiring and organizing new knowledge that will lead to new
skills [200]. Before the network can become useful, it must learn about the information at hand.
After training, it can then be used for pragmatic purposes. In general, there are two flavors of
learning [22]:

Supervised Learning . The correct answers are known and this information is used to
train the network for the given problem. This type of learning utilizes both input
vectors and output vectors. The input vectors are used to provide the starting data, and the
output vectors can be used to compare with the input vectors to determine some
error. In a special type of supervised learning, reinforcement learning, the network
is only told if its output is right or wrong. Back-propagation algorithms make use of
this style.
Unsupervised Learning . The correct answers are not known (or just not told to the
network). The network must try on its own to discover patterns in the input data. The
input vectors are used solely. Output vectors generated will not be used to learn from.
Also and possibly most importantly: nohuman interaction is needed for unsupervised
learning. This can be an extremely important feature, especially when dealing with a
large and/or complex data set that would be time-consuming or difficult to a human to
compute. Self-Organizing Maps utilize this style of learning.

36

As can be seen from the above descriptions, a neural network can have many different
features from another neural network. Because of this, there are many different types of neural
networks. One in particular, the Self-Organizing Map, is discussed next.

5.2 Introduction to Kohonen Self-Organizing Maps

Kohonen Self-Organizing Maps (or just Self-Organizing Maps, or SOMs for short), are a
type of neural network. They were developed in 1982 by Tuevo Kohonen, a professor
emeritus of the Academy of Finland. Self-Organizing Maps are aptly named. Self-Organizing
is because no supervision is required. SOMs learn on their own through unsupervised
competitive learning. Maps is because they attempt to map their weights to conform to the
given input data. The nodes in a SOM network attempt to become like the inputs presented to
them. In this sense, this is how they learn. They can also be called Feature Maps, as in Self-
Organizing Feature Maps. Retaining principle 'features' of the input data is a fundamental
principle of SOMs, and one of the things that makes them so valuable. Specifically, the
topological relationships between input data are preserved when mapped to a SOM network.
This has a pragmatic value of representing complex data. For example [23]:

Fig 5.1 A map of the world quality-of-life
37

Yellows and oranges represent wealthy nations, while purples and blues are the poorer
nations. From this view, it can be difficult to visualize the relationships between
countries. However, represented by a SOM as shown if Figure 5.2, it is much easier to see what
is going on.

Fig 5.2 Self-Organizing Map of Countries [23]
Here we can see the United States, Canada, and Western European countries, on the left
side of the network, being the wealthiest countries. The poorest countries, then, can be found on
the opposite side of the map (at the point farthest away from the richest countries), represented
by the purples and blues.

Figure 5.2 is a hexagonal grid. Each hexagon represents a node in the neural network.
This is typically called a unified distance matrix, or a u-matrix [23], and is probably the most
popular method of displaying SOMs.

Another intrinsic property of SOMs is known as vector quantization. This is a data
compression technique. SOMs provide a way of representing multi-dimensional data in a
38

much lower dimensional space typically one or two dimensions [23]. This aides in their
visualization benefit, as humans are more proficient at comprehending data in lower dimensions
than higher dimensions.

The above examples show how SOMs are a valuable tool in dealing with complex or
vast amounts of data. In particular, they are extremely useful for the visualization and
representation of these complex or large quantities of data in manner that is most easily
understood by the human brain.

5.3 Structure of a SOM

The structure of a SOM is fairly simple, and is best understood with the use of an illustration
such as Figure 5.3.

Figure 5.3 A 4x4 SOM network (4 nodes down, 4 nodes across) [22]

It is easy to overlook this structure as being trivial, but there are a few key things to
notice. First, each map node is connected to each input node. For this small 4x4 node network,
39

that is 4x4x3=48 connections. Secondly, notice that map nodes are not connected to each other.
The nodes are organized in this manner, as a 2-D grid makes it easy to visualize the results. This
representation is also useful when the SOM algorithm is used. In this configuration, each map
node has a unique (i,j) coordinate. This makes it easy to reference a node in the
network, and to calculate the distances between nodes. Because of the connections only to
the input nodes, the map nodes are oblivious as to what values their neighbors have. A map node
will only update its' weights (explained next) based on what the input vector tells it.

The following relationships describe what a node essentially is:

1. networkmapNode float weights [numWeights]
2. inputVectorsinputVector float weights[numWeights ]

1 says that the network (the 4x4 grid above) contains map nodes. A single map node
contains an array of floats, or its' weights. numWeights will become more apparent during
application discussion. The only other common item that a map node should contain is it's (i,j)
position in the network. 2 says that the collection of input vectors (or input nodes) contains
individual input vectors. Each input vector contains an array of floats, or its' weights. Note
that numWeights is the same for both weight vectors. The weight vectors must be the same
for map nodes and input vectors or the algorithm will not work.

40

5.4 The SOM Algorithm
The Self-Organizing Map algorithm can be broken up into 6 steps [23].
1). Each node's weights are initialized.
2). A vector is chosen at random from the set of training data and presented to the network.
3). Every node in the network is examined to calculate which ones' weights are most like the
input vector. The winning node is commonly known as the Best Matching Unit (BMU).
(Equation 5.1).
4). The radius of the neighborhood of the BMU is calculated. This value starts large. Typically it
is set to be the radius of the network, diminishing each time-step. (Equation 5.2, 5.3).
5). Any nodes found within the radius of the BMU, calculated in 4), are adjusted
to make them more like the input vector (Equation 5.4, 5.5). The closer a node is to the BMU,
the more its' weights are altered (Equation 5.6).
6). Repeat 2) for N iterations.
The equations utilized by the algorithm are as follows:
Calculate the BMU.
d=
..5.1
where I = current input vector, W = node's weight vector, n = number of weights

Radius of the neighborhood.
(t)=
..5.2
t = current iteration, = time constant (Equation 2b),
0
= radius of the map
41

Time constant
=numIterations /mapRadius5.3
New weight of a node.
.5.4
Learning rate.

5.5
Distance from BMU.

..5.6

There are some things to note about these formulas. Equation 5.1 is simply the
Euclidean distance formula, squared. It is squared because we are not concerned with
the actual numerical distance from the input. We just need some sort of uniform scale in order
to compare each node to the input vector. This equation provides that, eliminating the need
for a computationally expensive square root operation for every node in the network.
Equations 5.2 and 5.5 utilize exponential decay. At t=0 they are at their max. As t (the
current iteration number) increases, they approach zero. This is exactly what we want. In 5.2, the
radius should start out as the radius of the lattice, and approach zero, at which time the radius is
simply the BMU node (see Figure 5. 4).
Equation 5.5 is almost arbitrary. Any constant value can be chosen. This provides a
good value, though, as it depends directly on the map size and the number of iterations to
perform.
Equation 5.6 is the main learning function. W(t+1) is the new, educated, weight value
of the given node. Over time, this equation essentially makes a given node weight more like
42

the currently selected input vector, I. A node that is very different from the current input
vector will learn more than a node very similar to the current input vector. The difference
between the node weight and the input vector are then scaled by the current learning rate of the
SOM, and by (t). (t), Equation 3c, is used to make nodes closer to the BMU learn more than
nodes on the outskirts of the current neighborhood radius. Nodes outside of the neighborhood
radius are skipped completely. distFromBMU is the actual number of nodes between the current
node and the BMU, easily calculated as:

Fig 5.4 Weight adaptation of a node by reducing neighborhood radius [23]

distFromBMU
2
=(bmuInodeI)
2
+(bmuJnodeJ)
2

This can be done since the node network is just a 2-D grid of nodes. With this in mind,
nodes on the very fringe of the neighborhood radius will learn some fraction less
1.0. As distFromBMU decreases, (t) approaches 1.0. The BMU itself will have a
distFromBMU equal to 0, which gives (t) it's maximum value of 1.0. Again, this Euclidean
distance remains squared to avoid the square root operation.
43

There exists a lot of variation regarding the equations used with the SOM algorithm.
There is also a lot of research being done on the optimal parameters. Some things of particular
heavy debate are the number of iterations, the learning rate, and the neighborhood radius. It has
been suggested by Kohonen himself, however, that the training should be split into two phases.
Phase 1 will reduce the learning coefficient from 0.9 to 0.1, and the neighborhood radius from
half the diameter of the lattice to the immediately surrounding nodes. Phase 2 will reduce the
learning rate from 0.1 to 0.0, but over double or more the number of iterations in
Phase 1. In Phase 2, the neighborhood radius value should remain fixed at 1 (the BMU only).
Analyzing these parameters, Phase 1 allows the network to quickly 'fill out the space',
while Phase 2 performs the 'fine-tuning' of the network to a more accurate representation.

44

Chapter 6
Feature Extraction and Face Classification
Humans can recognize faces since we are 5year old. It seems to be an automated and
dedicated process in our brains [24], though its a much debated issue [25, 26]. What its clear is
that we can recognize people we know; even when they are wearing glasses or hats. We can also
recognize men who have grown a beard. Its not very dicult fours to see our grandmas
wedding photo and recognize her, although she was 23 years old. All these processes seem
trivial, but they represent a challenge to the computers. In fact, face recognitions core problem is
to extract information from photographs. This feature extraction process can be dened as the
procedure of extracting relevant information from a face image. This information must be
valuable to the later step of identifying the subject with an acceptable error rate. The feature
extraction process must be ecient in terms of computing time and memory usage. The output
should also be optimized for the classication step.
Feature extraction involves several steps dimensionality reduction, feature extraction and
feature selection. These steps may overlap, and dimensionality reduction could be seen as a
consequence of the feature extraction and selection algorithms. Both algorithms could also be
dened as cases of dimensionality reduction.
Dimensionality reduction is an essential task in any pattern recognition system. The
performance of a classier depends on the amount of sample images, number of features and
classier complexity. One could think that the false positive ratio of a classier does not increase
as the number of features increases. However, added features may degrade the performance of a
classication algorithm see Figure6.1. This may happen when the number of training samples is
small relative to the number the features.
45

Figure 6.1: PCA algorithm performance
This problem is called curse of dimensionality or peaking phenomenon.
A generally accepted method of avoiding this phenomenon on is to use at least ten times
as many training samples per class as the number of features. This requirement should be
satised when building a classier. The more complex the classier, the larger should be the
mentioned ratio [27]. This curse is one of the reasons why its important to keep the number of
features as small as possible. The other main reason is the speed. The classier will be faster and
will useless memory. Moreover, a large set of features can result in a false positive when these
features are redundant. Ultimately, the number of features must be carefully chosen. Too less or
redundant features can lead to a loss of accuracy of the recognition system. We can make a
distinction between feature extraction and feature selection. Both terms are usually used
interchangeably. Nevertheless, it is recommendable to make a distinction. A feature extraction
algorithm extracts features from the data. It creates those new features based on transformations
or combinations of the original data. In other words, it transforms or combines the data in order
to select a proper subspace in the original feature space. On the other hand, a feature selection
algorithm selects the best subset of the input feature set. It discards non-relevant features. Feature
selection is often performed after feature extraction. So, features are extracted from the face
images, and then an optimum subset of these features is selected. The dimensionality reduction
46

process can be embedded in some of these steps, or performed before them. This is arguably the
most broadly accepted feature extraction process approach see gure 6.2.

Figure 6.2: Feature extraction processes [27]

6.1 Feature extraction methods
There are many feature extraction algorithms. They will be discussed later on this paper.
Most of them are used in other areas than face recognition. Researchers in face recognition have
used modied and adapted many algorithms and methods to their purpose. For example, PCA
was invented by Karl Pearson in 1901[28], but proposed for pattern recognition 64 years later
[29]. Finally, it was applied to face representation and recognition in the early 90s [30]. See
table6.1 for a list of some feature extraction algorithms used in face recognition.

47

Table 6.1: Feature extraction algorithms.
Method Notes
Principal Component Analysis (PCA) Eigenvector-based, linear map
Kernel PCA Eigenvector-based , non-linear map, uses
kernel methods
Weighted PCA PCA using weighted coefficients
Linear Discriminant Analysis (LDA) Eigenvector-based, supervised linear map
Kernel LDA LDA-based, uses kernel methods
Semi-supervised Discriminant Analysis (SDA)

Semi-supervised adaptation of LDA

Independent Component Analysis (ICA) Linear map, separates non-Gaussian
distributed features

Neural Network based methods Diverse neural networks using PCA, etc.

Multidimensional Scaling (MDS) Nonlinear map, sample size limited, noise
sensitive.
Self-organizing map (SOM) Nonlinear, based on a grid of neurons in th
feature space

Active Shape Models (ASM) Statistical method, searches boundarie

6.2 Feature selection methods
Feature selection algorithms aim is to select a subset of the extracted features that cause
the smallest classication error. The importance of this error is what makes feature selection
dependent to the classication method used.
The most straight forward approach to this problem would be to examine every possible
subset and choose the one that fullls the criterion function.
However, this can become a unaordable task in terms of computational time. Some
eective approaches to this problem are based on algorithms like branch and bound algorithms.
See table6.2 for selection methods proposed in [27].

48

Table 6.2: Feature selection methods.
Method Definition Comments
Exhaustive search

Evaluate all possible subsets
of
Features.
Optimal, but too complex.
Branch and bound

Use branch and bound
algorithm.
Can be optimal.

Best individual features

Evaluate and select features
individually
Not very effective. . Simple
algorithm.

Sequential Forward Selection
(SFS)

Evaluate growing feature sets
(starts with best feature).
Retained features cant be
discarded.
Faster than SBS.
Sequential Backward
Selection (SBS)

Evaluate shrinking feature
sets (starts with all the
features).
Deleted features cant be
reevaluated.
Plus l -take away r selection

First do SFS then SBS. Must choose l and
r values.

Recently more feature selection algorithms have been proposed. Feature selection is a
NP-hard problem, so researchers make an aord towards a satisfactory algorithm, rather than an
optimum one. The idea is to create an algorithm that selects the most satisfying feature subset,
minimizing the dimensionality and complexity. Some approaches have used resemblance co-
ecient [31] or satisfactory rate [32] as a criterion and quantum genetic algorithm (QGA).

6.3 Face Classication
Once the features are extracted and selected, the next step is to classify the image.
Appearance based face recognition algorithms use a wide variety of classication methods.
Sometimes two or more classiers are combined to achieve better results. On the other hand,
most model-based algorithms match the samples with the model or template. Then, a learning
method is can be used to improve the algorithm. One way or another, classiers have a big
impact in face recognition. Classication methods are used in many areas like data mining,
nance, signal decoding, voice recognition, natural language processing or medicine. Therefore,
there is much bibliography regarding this subject. Here classiers will be addressed from a
general pattern recognition point of view.
49

Classication algorithms usually involve some learning supervised, un-supervised or
semi-supervised. Unsupervised learning is the most dicult approach, as there are no tagged
examples. However, many face recognition applications include a tagged set of subjects.
Consequently, most face recognition systems implement supervised learning methods. There are
also cases where the labeled dataset is small. Sometimes, the acquisition of new tagged samples
can be infeasible. Therefore, semi-supervised learning is required.

6.3.1 Classiers
According to Jain, Duinand Mao [27], there are three concepts that are keys in building a
classier similarity, probability and decision boundaries. We will present the classiers from that
point of view.

Similarity
This approach is intuitive and simple. Patterns that are similar should belong to the same
class. This approach has been used in the face recognition algorithms implemented later. The
idea is to establish a metric that denes similarity and are presentation of the same-class samples.
For example, the metric can be the Euclidean distance. The representation of a class can be the
mean vector of all the patterns belonging to this class. The 1-NN decision rule can be used with
these parameters. Its classication performance is usually good. This approach is similar to a K-
means clustering algorithm in unsupervised learning. There are other techniques that can be
used. For example, Vector Quantization, Learning Vector Quantization or Self-Organizing Maps.
Other example of this approach is template matching. Researches classify face recognition
algorithm based on dierent criteria. Some publications dened Template Matching as a kind or
category of face recognition algorithms [27]. However, we can see template matching just as
another classication method, where unlabeled samples are compared to stored patterns.

Probability
Some classifiers are built based on a probabilistic approach. Bayes decision rule is often
used. The rule can be modified to take into account different factors that could lead to miss-
classification. Bayesian decision rules can give an optimal classifier, and the Bayes error can be
50

the best criterion to evaluate features. Therefore, a posteriori probability functions can be
optimal.
Table 6.3 Similarity-based classiers.

Two well-known non-parametric estimates are the k-NN rule and the Parzen classifier.
They both have one parameter to be set, the number of neighbors k, or the smoothing parameter
(bandwidth) of the Parzen kernel, both of which can be optimized. Moreover, both these
classifiers require the computation of the distances between a test pattern and all the samples in
the training set. These large numbers of computations can be avoided by vector quantization
techniques, branch-and-bound and other techniques.

Decision boundaries
This approach can become equivalent to a Bayesian classifier. It depends on the chosen
metric. The main idea behind this approach is to minimize a criterion (a measurement of error)
between the candidate pattern and the testing patterns. One example is the Fishers Linear
Discriminate (often FLD and LDA are used interchangeably). Its closely related to PCA. FLD
attempts to model the difference between the classes of data, and can be used to minimize the
mean square error or the mean absolute error. Other algorithms use neural networks. Multilayer
perceptron is one of them. They allow non-linear decision boundaries. However, neural networks
can be trained in many different ways, so they can lead to diverse classifiers. They can also
provide a confidence in classification, which can give an approximation of the posterior
probabilities. Assuming the use of a Euclidean distance criterion, the classifier could make use of
51

the three classification concepts here explained. A special type of classifier is the decision tree. It
is trained by an iterative selection of individual features that are most salient at each node of the
tree. During classification, just the needed features for classification are evaluated, so feature
selection is implicitly built-in. The decision boundary is built iteratively. There are well known
decision trees like the C4.5 or CART available. See table 6.4 for some decision boundary-based
methods, including the ones proposed in [36]:
Table 6.4: Classifiers using decision boundaries.

Other method widely used is the support vector classifier. It is a two-class classifier,
although it has been expanded to be multiclass. The optimization criterion is the width of the
margin between the classes, which is the distance between the hyper plane and the support
vectors. These support vectors define the classification function. Support Vector Machines
(SVM) is originally two-class classifiers. Thats why there must be a method that allows solving
multiclass problems. There are two main strategies [37]:
1. On-vs.all approach. A SVM per class is trained. Each one separates a single class from
the others.
2. Pair wise approach. Each SVM separates two classes. A bottom-up decision tree can be
used, each tree node representing a SVM [33]. The coming faces class will appear on
top of the tree.

Other problem is how to face non-linear decision boundaries. A solution is to map the samples to
a high-dimensional feature space using a kernel function [34].

52

6.3.2 Classifier combination
The classifier combination problem can be defined as a problem of finding the
combination function accepting N-dimensional score vectors from M-classifiers and outputting
N final classification scores [35]. There can be several reasons to combine classifiers in face
recognition:
The designer has some classifiers, each developed with a different approach. For
example, there can be a classifier designed to recognize faces using eyebrow templates.
We could combine it with another classifier that uses other recognition approach. This
could lead to a better recognition performance.
There can be different training sets, collected in different conditions and representing
different features. Each training set could be well suited for a certain classifier. Those
classifiers could be combined.
One single training set can show different results when using different classifiers. A
combination of classifiers can be used to achieve the best results.

There are different combination schemes. They may differ from each other in their
architectures and the selection of the combiner. Combiner in pattern recognition usually uses a
fixed amount of classifiers. This allows taking advantage of the strengths of each classifier. The
common approach is to design certain function that weights each classifiers output score.
Then, there must be a decision boundary to take a decision based on that function. Combination
methods can also be grouped based on the stage at which they operate. A combiner could operate
at feature level. The features of all classifiers are combined to form a new feature vector. Then a
new classification is made. The other option is to operate at score level, as stated before. This
approach separates the classification knowledge and the combiner. This type of combiners is
popular due to that abstraction level. However, combiners can be different depending on the
nature of the classifiers output. The output can be a simple class or group of classes (abstract
information level). Other more exact output can be an ordered list of candidate classes (rank
level). A classifier could have a more informative output by including some weight or
confidence measure to each class (measurement level). If the combination involves very
specialized classifiers, each of them usually has a different output. Combining different output
scales and confidence measures can be a tricky problem. However, they will have a similar
53

output if all the classifiers use the same architecture. Combiners can be grouped in three
categories according to their architecture:
Parallel: All classifiers are executed independently. The combiner is then applied.
Serial: Classifiers run one after another. Each classifier polishes previous results.
Hierarchical: Classifiers are combined into a tree-like structure.
Combiner functions can be very simple or complex. A low complexity combination could
require only one function to be trained, whose input is the scores of a single class. The highest
complexity can be achieved by defining multiple functions, one for each class. They take as
parameters all scores. So, more information is used for combination. Higher complexity
classifiers can potentially produce better results. The complexity level is limited by the amount
of training samples and the computing time. Thus its very important to choose a complexity
level that best complies these requirements and restrictions. Some combiners can also be
trainable. The trainable combiners can lead to better results at the cost of requiring additional
training data.

54

Chapter 7
Implementation
7.1 Introduction
Feature extraction and recognition process are implemented by MATLAB7.12.0 (R201a)
using Kohonen self-organizing map. We use the Indian Face Database for training and testing.
The database contains a set of face images taken in February, 2002 in the IIT Kanpur campus.
There are eleven different images of each of 40 distinct subjects. For some subjects, some
additional photographs are included. All the images were taken against a bright homogeneous
background with the subjects in an upright, frontal position. The files are in JPEG format. The
size of each image is 640x480 pixels, with 256 grey levels per pixel. The images are organized in
two main directories - males and females. In each of these directories, there are directories with
name as a serial numbers, each corresponding to a single individual. In each of these directories,
there are eleven different images of that subject, which have names of the form abc.jpg, where
abc is the image number for that subject. The following orientations of the face are included:
looking front, looking left, looking right, looking up, looking up towards left, looking up towards
right, looking down. Available emotions are: neutral, smile, laughter, sad/disgust. The overall
process of recognizing face is discussed below:

7.1 Kohonen Self-Organizing Network:
In figure 7.1 the network is shown. For a face, input node will be just one (normalized)
output layer is an 8 by 8 grid. After training, for all data image elements there is only node of
output layer will win. In testing phase, test image acts as input and wining an output node. By
checking the similarity between test wining node and train wining nodes and get the face from
training database.

55

Figure 7.1: Our proposed network
7.2 Kohonen SOM:
When we train the Kohonen SOM, we get some characteristics of network and weights. These
are discussed in figures.
SOM Topology:
For SOM training, the weight vector associated with each neuron moves to become the center of
a cluster of input vectors. In addition, neurons that are adjacent to each other in the topology
should also move close to each other in the input space, therefore it is possible to visualize a
high-dimensional inputs space in the two dimensions of the network topology.
In figure 7.2, each of the hexagons represents a neuron. The grid is 8-by-8, so there are a
total of 64 neurons in this network. There are four elements in each input vector, so the input
space is four-dimensional. The weight vectors (cluster centers) fall within this space.

Fig.7.2: The Topology of Self-Organizing Map
Because this SOM has a two-dimensional topology, you can visualize in two dimensions the
relationships among the four-dimensional cluster centers. One visualization tool for the SOM is
the weight distance matrix (also called the U-matrix).
56

Initial SOM Neighbor Connections:
Initially all the nodes are connected by all other neighbor in hexagons region. Figure 7.3 shows
the connections.

Fig.7.3: Initial Neighbor Connections

57

U-Matrix:
In figure 7.4. the blue hexagons represent the neurons. The red lines connect neighboring
neurons. The colors in the regions containing the red lines indicate the distances between
neurons. The darker colors represent larger distances, and the lighter colors represent smaller
distances. A band of dark segments crosses from the lower-center region to the upper-right
region. The SOM network appears to have clustered the faces into two distinct groups.

Fig.7.4: Neighbor Weights Distance of SOM

58

Weights for an Input:
Figure 7.5, shows the 64 weights in two dimensional graph. The dark one indicates its the
weight of winning node.

Fig.7.5: Weight Matrix for an input.
Wining Nodes:
In figure 7.6, the blue hexagons represent the winning nodes for 9 training faces.

59

Fig.7.6: The Winning nodes
Weight Positions:
Fig.7.7 shows the weight distribution. In this case the 64 weights are distributed in the range of
(164.0 to 193.0).

Fig. 7.7: Weight Distribution
Testing:
In figure 7.8, shows the test image and equivalent train image (from right to left).

Fig. 7.8: Testing the Train Network

60

Chapter 8
Results and Performance Analysis
This chapter presents the results produced by using Kohonen Self-Organizing Map (SOM). At
first experimental setups are presented then the results are discussed.
8.1 Experimental Setup
In this experiment we use the face images from Indian Face Database. There are eleven different
images of each of 40 distinct subjects. For some subjects, some additional photographs are
included. All the images were taken against a bright homogeneous background with the subjects
in an upright, frontal position. The files are in JPEG format. The size of each image is 640x480
pixels, with 256 grey levels per pixel. The images are organized in two main directories - males
and females. In each of these directories, there are directories with name as a serial numbers,
each corresponding to a single individual. In each of these directories, there are eleven different
images of that subject, which have names of the form abc.jpg, where abc is the image number for
that subject. The following orientations of the face are included: looking front, looking left,
looking right, looking up, looking up towards left, looking up towards right, looking down.
Available emotions are: neutral, smile, laughter, sad/disgust. A sample of 15 images from Indian
Face Database is shown in figure 8.1.

Fig.8.1: A portion of Indian Face Database
To report the performance of the system, the acceptance ratio (%) and the execution time
(milliseconds) for SOM is calculated using the following equation.
Acceptance ratio =

100 %...........................( 8.1)
61

This is used to analysis the performance of the system.
8.2 Performance on Indian Face Database
The performance of the face recognition system on Indian Face Database is presented in the
following table 8.1. From the table 8.1 it has been observed that the system shows consistence
performance for Kohonen Self-Organizing Network. In all cases the accuracy of the system is
better. The execution time grows linearly with the no. of train images.

Table 8.1 Acceptance ratio and execution time for varying no. of train images.
No. of Train
Image
No. of Test
Image
No. of
Successfully
Recognized
images
No. of
Unsuccessfully
Recognized
images
Acceptance
Ratio
(%)
Execution
Time
(ms)
10 5 5 0 100 22
20 15 14 0 100 28
30 25 24 1 96 35
40 32 30 2 93.75 45

62

In figure 8.2 and 8.3 show the line charts of acceptance ratio and execution time varying with no.
of test images.

Figure 8.2: Line chart for acceptance ratio varying no. of test images

Figure 8.3: Line chart for execution time varying no. of test images

90%
91%
92%
93%
94%
95%
96%
97%
98%
99%
100%
101%
5 15 25 32
A
c
c
e
p
t
a
n
c
e

R
a
t
i
o
(
%
)

Number of Test Images
Acceptance Ratio
0
5
10
15
20
25
30
35
40
45
50
5 15 25 32
E
x
e
c
u
t
i
o
n

T
i
m
e

(
m
s
)

Number of Test Images
Execution Time (ms)
63

8.3 Conclusion
Though the experiment it has been cleared that the face recognition using Kohonen SOM
make the system more robust in accuracy and fast in execution time. This method has the
acceptance ratio is not less than 93% and execution time of only few milliseconds. Hence this
method can be applied in security measure at Air ports, Passport verification, Criminal list
verification in police department, Visa processing, Verification of Electoral identification and
Card security measure at ATM. Where reliability and recognition time is critical issue.

64

References:
[1] R. Louban. Image Processing of Edge and Surface Defects Theoretical Basis of Adaptive
Algorithms with Numerous Practical Applications, volume 123, chapter Edge Detection, pages
929. Springer Berlin Heidelberg, 2009.
[2] M.-H. Yang, D. Kriegman, and N. Ahuja. Detecting faces in images: A survey. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 24(1):3458, January 2002.
[3] K.-M. Lam and H. Yan. Fast algorithm for locating head boundaries. Journal of Electronic
Imaging, 03(04):351359, October 1994.
[4] S. K. Singh, D. S. Chauhan, M. Vatsa, and R. Singh. A robust skin color based face detection
algorithm. Tamkang Journal of Science and Engineering, 6(4):227234, 2003.
[5] A. K. Jain, R. P. W. Duin, and J. C. Mao, Statistical pattern recognition: a review, IEEE
Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 437, 2000.
[6] M. C. Nechyba, L. Brandy, and H. Schneiderman. Lecture Notes in Computer Science,
volume 4625/2009, chapter PittPatt Face Detection and Tracking for the CLEAR 2007
Evaluation, pages 126137. Springer Berlin Heidelberg, 2009.
[7] M. Divjak and H. Bischof. Real-time video-based eye blink analysis for detection of low
blink-rate during computer use. In 1st Interna-tional Workshop on Tracking Humans for the
Evaluation of their Mo-tion in Image Secuences (THEMIS), pages 99107, Leeds, UK,
September 2008.
[8] S. Kawato and N. Tetsutan. Detection and tracking of eyes for gaze-camera control. In Proc.
15th Int. Conf. on Vision Interface, pages348353, 2002.
[9] C.-C. Han, H.-Y. M. Liao, K. chung Yu, and L.-H. Chen. LectureNotes in Computer Science,
volume 1311, chapter Fast face detection via morphology-based pre-processing, pages 469476.
Springer Berlin Heidelberg, 1997.
[10] K. Wan-zeng and Z. Shan-an. Multi-face detection based on downsam-pling and modified
subtractive clustering for color images. Journal of Zhejiang University SCIENCE A, 8(1):7278,
January 2007.
[11] L. Sirovich and M. Kirby. Low-dimensional procedure for the characterization of human
faces. Journal of the Optical Society of America A- Optics, Image Science and Vision, 4(3):519
524, March 1987.
65

[12] M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neurosicence,
3(1):7186, 1991.
[13] M. Kirby and L. Sirovich. Application of the karhunen-loeve procedure for the
characterization of human faces. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 12(1):103108, 1990.

[53] S. Kawato and N. Tetsutan. Detection and tracking of eyes for gaze-camera control. In Proc.
15th Int. Conf. on Vision Interface, pages348353, 2002.
[14] K.-K. Sung. Learning and Example Selection for Object and Pattern Detection. PhD thesis,
Massachusetts Institute of Technology, 1996.
[15] E. Osuna, R. Freund, and F. Girosi. Training support vector machines: An application to
face detection. Proceedings of the IEEE Conf. Com-puter Vision and Pattern Recognition, pages
130136, June 1997.
[16] M.-H. Yang, D. Roth, and N. Ahuja. A snow-based face detector. Advances in Neural
Information Processing Systems 12, pages 855861, 2000.
[17] H. Schneiderman and T. Kenade. Probabilistic modeling of local ap-pearance and spatial
relationships for object recognition. Proc. IEEE Conf. Computer Vision and Pattern Recognition,
pages 4551, 1998.
[18] N. Duta and A. Jain. Learning the human face concept from black and white pictures. In
Proc. International Conf. Pattern Recognition, pages 13651367, 1998.
[19] W. Zhao, R. Chellappa, A. Rosenfeld, and P. Phillips. Face recognition: A literature survey.
ACM Computing Surveys, pages 399458, 2003.
[20] K. B. et al. AI 2005, chapter Multiple Face Tracking Using Kalman Estimator Based Color
SSD Algorithm, pages 12291232. LNAI 3809. Springer-Verlag Heidelberg, 2005.
[21]Krose, B., and van der Smagt, PAn Introduction to Neural Networks. The University of
Amsterdam,1996.
[22]Kohonen, T., Self-Organizing Maps, Berlin: Springer Verlag, p. VII, 1995
[23] Rowley, H., Baluja, S. and Kanade, T Neural Network-Based Face Detection. Computer
Vision and Pattern Recognition, 1996
[24] S. Bentin, T. Allison, A. Puce, E. Perez, and G. McCarthy. Electro-physiological studies of
face perception in humans. Journal of Cognitive Neuroscience, 8(6):551565, 1996.
66

[25] R. Diamond and S. Carey. Why faces are and are not special. an effect of expertise. Journal
of Experimental Psychology: General, 115(2):107117, 1986.
[26] J. S. Bruner and R. Tagiuri. The percepton of people. Handbook of Social Psycology, 2(17),
1954.
[27] A. Jain, R. Duin, and J. Mao. Statistical pattern recognition: A re-view. IEEE Transactions
on Pattern Analysis and Machine Intelli-gence, 22(1):437, January 2000.
[28] K. Pearson. On lines and planes of closest fit to systems of points in
space. Philosophical Magazine, 2(6):559572, 1901.
[29] S. Watanabe. Karhunen-loeve expansion and factor analysis theoretical remarks and
applications. In Proc. 4th Prague Conference on Infor-mation Theory, 1965.
[30] L. Sirovich and M. Kirby. Low-dimensional procedure for the c harac-terization of human
faces. Journal of the Optical Society of America A-Optics, Image Science and Vision, 4(3):519
524, March 1987
[31] G. Zhang, L. Hu, and W. Jin. Discovery Science, volume 3245/2004 of LNCS, chapter
Resemblance Coefficient and a Quantum Genetic Algorithm for Feature Selection, pages 125-
153. Springer Berlin / Hei-delberg, 2004.
[32] G. Zhang, W. Jin, and L. Hu1. CIS, volume 3314/2005 of LNCS, chapter A Novel Feature
Selection Approach and Its Application, pages665671. Springer-Verlag Berlin Heidelberg,
2004.
[33] G. Guo, S. Li, and K. Chan. Face recognition by support vector ma-chines. In Proc. of the
IEEE International Conference on Automatic Face and Gesture Recognition, pages 196201,
Grenoble, France, March 2000.
[34] K. Jonsson, J. Matas, J. Kittler, and Y. Li. Learning support v ectors for face verification
and recognition. In Proc. of the IEEE International Conference on Automatic Face and Gesture
Recognition, pages 208213, Grenoble, France, March 2000.
[35] S. Tulyakov. Review of classifier combination methods. Studies in Computational
Intelligence (SCI), 90:361386, 2008.
[36] R. Brunelli and T. Poggio. Face recognition: Features versus tem-plates. IEEE Transactions
on Pattern Analysis and Machine Intelli-gence, 15(10):10421052, October 1993.
67

[37] B. Heisele, P. Ho, and T. Poggio. Face recognition with support v ec-tor machines: Global
versus component-based approach. In Proc. Of the Eighth IEEE International Conference on
Computer Vision, ICCV 2001, volume 2, pages 688694, Vancouver, Canada, July 2001.
[38] Rafael C. Gonzalez, Richard E. Woods, Digital Image Processing, 2nd Edition,

Face Recognition: Detection, Extraction, and Identification

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Face Recognition: Detection, Extraction, and Identification

Загружено:

Авторское право:

Доступные форматы

1

Вам также может понравиться