Академический Документы
Профессиональный Документы
Культура Документы
Multiplicative Updates
Clyde Shavers, Robert Li, Gary Lebby
North Carolina A and T State University, Department of Electrical Engineering, Greensboro, North Carolina
Abstract - This paper implements an approach to face –1 for any sample that is a non-face object. We use 80% of
detection based on support vector machines (SVM). Our the examples from the dataset to train the SVM and the
approach uses the multiplicative updates algorithm remaining 20% are used for testing.
proposed in [1][2] instead of the conventional approach that
Figure 1 provides an overview of the various
uses the Lagrangian to determine the λ-coefficients of the
approaches to face detection. For a detailed discussion of
SVM decision function. We implement the simplest SVM,
the these approaches see [4][5] . Section II provides some
i.e. the hyperplane decision surface is constrained to pass
background on SVMs describing geometrically how SVMs
through the origin. This SVM implementation does not
work and a description of the proposed multiplicative
incorporate the sum constraint or box constraint for hard
updates algorithm.
and soft margin classifiers that do not pass through the
origin, respectively. Even so, our results yield a ninety
percent (or higher) detection rate for each trial.
Edges
Gray -levels
I Introduction
Low -level
The face detection system implemented is based on Analysis
Color
Approaches Analysis
programming) to determine the λ-coefficients. Our goal is Constellation
Analysis
to simply detect whether an image presented to the system
contains a face object or not.
Snakes
Facial images from the Olivetti Research Lab (ORL)
database are used to train the system to detect faces. No Face Detection Active Shape Deformable
Models Templates
preprocessing (i.e. illumination correction and histogram
equalization) is performed on the images as observed in Point Distribution
Models ( PDMs )
other face detection studies such as [3]. A data set
composed of 200 face examples is constructed from the Linear Subspace
Methods
ORL database. An example from the dataset is composed
of: Image -based
Approaches
Neural
Networks
underfitting overfitting
realize the linear SVM decision function in equation (1).
True risk R(α )
(Function Approximator)
[
x2 and x3) that satisfy y j < wx j > + b ≡ 1 , j = 1, NSV (see ]
Figure 5). The SVM must also satisfy the constraint
yi [< wx i > + b] ≥ 1 , i= 1, l, where l indicates the number of
System
System y training samples.
f(x,λ)
∑ λ {y [w ] }
1 T
Linearly Separable Case: L p (w , b, λ ) = w w− i i
T
x − b −1 ( 2)
2 i =1
Statistical learning involves finding a set of functions
that best approximate the unknown input-output We locate the optimal saddle point of the primal
dependency of a system based on a limited number of Lagrangian by taking its derivative with respect to variables
observations. w and b and set equal to zero. We can then rewrite the
x2 primal Lagrangian as the dual Lagrangian. Notice that the
<wx> + b = +1
Lagrangian is now written in terms of the λi - coefficients.
<wx> + b = 0
l l
∑ ∑y y λλ x
1
<wx> + b = -1
D2 Ld (λ ) = λi − i j i
T
j i xj ( 3)
D1 x1 i =1
2 i =1
Class 1, y = +1
x2 The λi - coefficients may be found via quadratic
w margin M
x3
programming (QP) or other traditional method. Instead of
Class 2, y = -1 the traditional method, this paper proposes the
x1 multiplicative update technique described in section E to
locate these coefficients.
Figure 5 Linearly Separable Samples
Assuming the λi - coefficients have been determined
The SVM is an implementation of such set of functions
(using either the traditional approach or the proposed
known as hyperplanes. These hyperplanes form a decision
multiplicative updates algorithm), the decision function can
surface used to classify samples of two distinct classes.
be expressed as
More specifically, SVM implements the decision function
that is used to construct these hyperlanes (see Figure 5). In m
the linearly separable case, this decision function is given
by:
f ( x) = ∑ λ y Φ ( x ) ⋅ Φ ( x) + b
i =1
i i i ( 4)
where Φ is a mapping function that maps the samples from
input space to a higher dimension feature space. The
Another solution considered in the nonlinearly
mapping function used is generally one of the three kernel
separable case is to add a slack variable to the linear
functions described in section D. The decision function may
decision function (i.e. linear SVM). Figure 7 shows an
be rewritten as
example of how the decision function behaves when a slack
m variable is added.
f ( x) = ∑ λ y K (x, x ) + b
i =1
i i i ( 5)
m
i F (x) = sgn ( f (x) ) = sgn
∑ λ y K (x, x ) + b
i i i ( 6).
i =1
If the selected kernel function can support the bias term b,
then the indicator function can be rewritten as Figure 7 Non-separable overlapping samples
m
i F (x) = sgn ( f (x) ) = sgn
∑ λ y K (x, x )
i i i ( 7)
i =1 D. Kernel Function
Selection of an appropriate kernel function is used to
A pictorial representation of this function is given as
map the samples to a higher dimensional feature space
the SVM architecture shown in Figure 2.
where the samples may be separated by a linear hyperplane.
These kernel functions are designated as K (x i , x j ) ,
Nonlinearly Separable Case:
and Φ(x i ) ⋅ Φ (x j ) = K (x i , x j ) .
In the nonlinearly separable case, the samples cannot
be properly classified (or separated) with a linear decision Three general kernel functions are:
function given in (3.1). However, samples are not
Polynomial (of degree d)
necessarily overlapping (i.e. inter-mingled) as they are in
the non-separable case discussed later.
[( ) ]
K ( x, x i ) = x T x i + 1
d
( 8)
A solution to the nonlinearly separable case is to use an
appropriate kernel function (e.g. Gaussian RBF or a high Gaussian RBF (Radial Basis Function)
degree polynomial, i.e. d > 1) to map the samples to a
x−x 2
K (x, x i ) = exp
higher dimensional feature space where they are linearly i
( 9)
separated (see Figure 6). Kernel functions are given in 2σ 2
section D.
Sigmoidal
[( ) ]
K (x, x i ) = tanh x T x i + b ( 10)
x =Φ(x )
i i
E. Multiplicative Updates
The SVM implements the decision
function f (x) = sgn( K (w*, x) + b) , where w* is a solution
vector (or coefficient vector). The solution vector is re-
written in terms of the λi - coefficients, i.e.
Figure 6 Nonlinearly Separable Samples
w* = ∑λ y x
i
i i i .
There are various methods for calculating these in the kernel function. Specifically, any test vector found to
coefficients. In this paper, the coefficients are calculated closely resemble a support vector is scaled by the non-zero
using the proposed multiplicative update algorithm. The coefficient given to that support vector. Otherwise the test
algorithm is given as: vector is scaled by a coefficient with value zero. Figure 11
shows the nonzero λi - coefficients corresponding to the
1 + 1 + 4( A + λ ) ( A − λ )
support vectors.
λi ← λ i
i i
( 11)
+
2( A λ ) i
III Experiment Results
+ Aij if Aij > 0
A + = A ij = ( 12)
0 otherwise A. Overview of Face Detection Process
1. Input face and non-face images (i.e. sample
− A if Aij < 0
A − = A ij = ij (13) data).
0 otherwise
2. Scale the images
Aij = yi y j K (x i , x j ) ( 14) 3. Extract a 20 x 20 subimage.
In this experiment we have chosen the simplest SVM 5. Add target labels to vectors (i.e. designate as
implementation where the decision surface (i.e. separating face or non-face)
hyperplane) is constrained to pass through the origin as in 6. Partition sample data into “training samples”
[1]. The SVM in this experiment is implemented as: and “test samples”.
l 7. Input the “training samples” to SVM program
f ( x) = ∑ λ y K (xx ) + b ,
i =1
i i i ( 15)
8. Train the SVM
where the hyperplane decision surface passes through the 9. Calculate the Lagrangian coefficients
origin. 10. Randomize the Test Data
During SVM training, input samples (i.e. vectors) from 11. Input Test Data and Lagrangian coefficients to
the training set are input to the multiplicative update the SVM algorithm.
algorithm. The algorithm processes the samples and
generates a set of λ-coefficients that are used as scaling 12. Run the SVM to detect faces
factors. A coefficient is generated for each corresponding 13. Record results.
training vector. Only the vectors that are support vectors
receive a non-zero valued coefficient. The coefficients for B. Preliminary Setup
all other vectors are determined to be equal to a value of
The data set for this application is composed of one-
zero. The support vectors themselves become the
hundred and sixty face images and forty non-face images.
archetypes for facial images. Test vectors (i.e. vectors not
The facial images were obtained from the ORL database.
previously observed) presented to the SVM are essentially
Images were initially read into the Mathcad program and
compared to these archetypes for classification.
labeled M1 thru M41 (only the first 3 of the face images are
The support vectors are the vectors closest to the shown below).
decision surface. These vectors influence the contours of
the decision surface.
During SVM testing, test vectors are presented to the
SVM. The kernel function calculates a correlation-like
value that indicates the resemblance between the test vector
and training vectors. The larger the value, the greater the
resemblance between the test vector and the support vector. M1 M2 M3
0
0 5 10 15 20 25 30
i
20
AlphaV04i 10
0
0 5 10 15 20 25 30
M1a M2a M3a i
IV Conclusion
For each set of images tested, 90% or better tested
correctly in each trial. We found that it is important to
include among the training examples non-face examples.
M1b M2b M3b
The work is based on previous work done by Sung and
Figure 10 Subimages (20 x 20) Poggio which resulted in a well-known successful
implementation of a face detection system. In their work, a
bootstrapping method is used where false positive
C. SVM training
detections are inserted into the training samples as non-face
Eighty percent of the dataset is used for training the examples. This step helps to more accurately train the SVM
SVM. Training resulted in a set of coefficients. A graphical to recognize non-face images. Their work also included
representation of the coefficients are shown in Figure 11. preprocessing, where variation in illumination brightness is
Notice that as expected the coefficients tend to zero for compensated. Our work did not include these steps. It is
non-support vectors. possible that preprocessing similar to that done by Sung and
Poggio would allow the misclassified images to be correctly
classified. This work proved the application of the
multiplicative updates approach to be a viable alternative in
determining the λ-coefficients.