SIA 544 Final Exam 132

King Fahd University of Petroleum and Minerals
Information and Computer Science Department

Second Semester 2013/2014 (132)
SIA 544: Biometric Systems
Final Exam
Saturday, 17
th
May 2014 [07:00 PM 10:00 PM]

Important Instructions:
The first four problems are mandatory. Then, you must select on of the
remaining problems.
You must solve five problems in total.

ID Name
201206040 FAHIM DJATMIKO

Problem 1: [20 Points]
1. In fingerprint-based biometric systems, image features are very important.
Consider the following fingerprint image:

Provide a detailed outline of all necessary pre-processing tasks that you would apply to
enhance this type of fingerprint images?
Answer:
1. Histogram equalization
Histogram equalization is a technique for adjusting image intensities to enhance contrast.
This method usually increases the global contrast of many images. It basically remaps the
given image pixel histogram distribution to another distribution that is wider and more
uniform distribution of intensity values). In this case, the ridges in a fingerprint image
would be clearer.
2. FFT enhancement

FFT is used to enhance a specific block by its dominant frequencies (formed by ridges
and valleys). This process starts from dividing the image into small processing blocks
(for example, 32 by 32 pixels) and perform the Fourier transform according to:
( ) ( )
{ (
)}
for u = 0, 1, 2, ..., 31 and v = 0, 1, 2, ..., 31. (1)
The FFT for each block is multiplied by its magnitude a set of times k. to get the
enhanced block, given by
( )
*( ) |( )|
+
where
(( )) is given by
( ) ( )
{ (
)}
for x = 0, 1, 2, ..., 31 and y = 0, 1, 2, ..., 31.
The value of k needs to be experimentally adjusted. Having a higher "k" improves the
appearance of the ridges and filling up small holes in ridges, but having too high a "k"
can result in false joining of ridges.
The enhanced image after FFT has the improvements to connect some falsely broken
points on ridges and to remove some spurious connections between ridges.
3. Locally adaptive binarization
Fingerprint Image Binarization makes the grayscale fingerprint image to a binary image
with 0-value for ridges and 1-value for furrows. After the operation, ridges in the
fingerprint are highlighted with black color while furrows are white. This process starts
from dividing an image into blocks (for example, 16 x 16 size), calculating the mean
intensity value for each block, then assigning a pixel value to 1 if the value is larger than
the mean of its corresponding block, and 0 otherwise.
4. Thinning
Thinning is a morphological operation that is used to remove selected foreground pixels
from binary images. Ridge thinning is to eliminate the redundant pixels of ridges till the

ridges are just one pixel wide. In each scan of the full fingerprint image, the algorithm
marks down redundant pixels in each small image window (3x3). And finally removes all
those marked pixels after several scans.

2. Show in details all the invariance properties of the 2D Fourier transform.

Answer:
If fl, fl are fourier transforms of u
k
, u
k
respectively, we get these following properties:
Translation:
()
Rotation:
()
()
Scaling:

Translation of the sampling origin:
)

Translation affects only the f
0
coefficient. Rotation affects the phase of all the
coefficients by the same factor, and it has no effect on their magnitude. Scaling affects all
coefficients in the same way, and thus it has no effect on the ratios f
i
/f
j
. The sampling
point origin, within the boundary, affects the phase but f
j
leaves invariant the magnitude
|f
l
|. This deterministic manner, in which the three geometric transformations affect the
Fourier coefficients, allows the development of appropriate normalized versions that are
invariant to these actions.

3. Do 2D discrete wavelet transforms enjoy such properties?

Answer:
No, wavelet transform is localized. Its result depends on which part of the signal that will
be transformed, so it is not translation invariant. Multiresolution in wavelets enables us to
have scale invariant property.

4. Why pixel-based fingerprint matching is not recommended?
Answer:

In general, pixels give too redundant information and it may distract classifiers from
recognizing pattern. For example, even for a small 64 x 64 image the number of pixels is
4096. For most classification tasks this number is too large, raising computational as well
as generalization problems. In fingerprint matching problem, minutia-based features are
more popular because it can distinguish different fingerprints with much less dimension
than pixel-based features.

1. Provide a detailed comparison of 3 different features used to represent face images
in face-based biometric systems.
Answer:

1. Eigenfaces

Eigenfaces are the eigenvectors of the covariance matrix of the set of face images,
i.e. principal components of the distribution of face images. These eigenvectors
can be thought of as a set of features which together characterize the variation
between face images. Once eigenfaces are generated, every image can be
represented by "proportions" of all k or eigenfaces.

2. Fisherfaces

One drawback of eigenface method is that not only the between-class scatter, but
also the within-class scatter is maximized, which is one should be avoided in
classification. If we do PCA on images that have much illumination changes,
points in the projected space would not be easily separated. One way to cope with
the variation due to lighting is discarding some most significant principal
components. However, information that is useful for discrimination may be lost.

In fisherface, the new basis for images are done using LDA, instead of PCA. LDA
finds vectors onto which to project the data. That maximize the between-class
scatter and minimize the within-class scatter data. The Fisherface method appears
to be the best at simultaneously handling variation in lighting and expression.

3. Independent Component Analysis

In ICA-based face image features, the new basis for image are done using ICA.
This is done by assuming that the subcomponents are non-Gaussian signals and
that they are statistically independent from each other. ICA focuses on statistically
independent, non-Gaussian components, higher-order statistics, and non-
orthogonal transformation, while PCA focuses on uncorrelated and Gaussian
components, second-order statistics, and orthogonal transformation.

1. Provide a detailed analysis of the class discrimination provided by the
Fisher/Linear discriminant analysis (FDA/LDA).

The idea of FDA is to find vectors onto which we project the data that maximize
between-class scatter and minimize within-class scatter. It is done by doing the
following:

Calculate between-class scatter matrix, defined as
)(

where Ni is the number of samples per class,
is the mean image of each

class/person, is the global mean, c is the number of classes.
Calculate within-class scatter matrix, defined as
)(

Calculate vectors onto which we will project the data, that maximize
and
minimize
,defined as
|
|

If the scatter matrices are singular, which is likely, we need to apply PCA on the
scatter matrices first so those matrices become non-singular and then we can
compute the vectors using eigenvectors approach. The
now becomes:

|
|

where

FDA considers labels when doing dimensionality reduction, so it gives more class
separability than the eigenface approach. And because the fisherface method minimizes
within-class scatter, it minimizes classification error caused by intra-class variation due to
lighting and expression.

2. What are the main problems of principal component analysis when used for
feature extraction in face-based biometric systems?
Answer:

1. Calculating eigenfaces directly from covariance matrix C is usually very
computationally expensive because the dimension is the number of pixels in
face images, which are usually very high (N
2
). This issue can be solved by first
solving for the eigenvectors of an M by M matrix and then taking linear
combinations of face images.
2. One drawback of eigenface method is that not only the between-class scatter,
but also the within-class scatter is maximized, which is one should be avoided
in classification. If we do PCA on images that have much illumination
changes, points in the projected space would not be easily separated.
3. One way to cope with the variation due to lighting is discarding some most
significant principal components. However, information that is useful for
discrimination may be lost.
4. It may not be invariant to occlusion. need accurate locations of key facial
features such as eyes, nose, and mouth to normalize the detected face

3. Is there a need for directional features in face-based biometric systems? Provide a
detailed answer:

Answer:
We do not have to use directional features if we assume our data contain only frontal
view of the faces. But we may use directional features to deal with non-frontal views of
the face.

1- In iris-based biometric systems, Gabor features are dominantly used. What is the
main advantage of these features over those extracted using 2D Fourier
transforms?

Answer:

Gabor features is scale invariant and capture orientation information. This is
important for iris because the features are usually rotated.

2- In iris-based biometric systems, Gabor features are dominantly used. What is the
main advantage of these features over those extracted using 2D discrete wavelet
transforms?

Answer:
Gabor can detect orientation. This is important for iris because the features are
usually rotated. In wavelet features, we cannot distinguish between 45 and 135
degree orientation.

3- Do you recommend eigen-iris to improve the performance of In iris-based
biometric systems? Provide a detailed answer:

Yes, similar to eigenface method for face recognition, eigen-iris would capture variation
between iris texture while reducing redundant information and noise.

4- Provide simple texture representations for iris images. Explain your choice in
details:

Answer:
From a raw eye image, locate pupil boundary, normalize iris texture using polar
coordinate into fixed-size rectangular image.
Take the 48 rows nearest the pupil to mitigate the effect of eyelid occlusion.
Create aligned overlapping patches from the normalized iris image, with rotation
45 degrees.
In the vertical direction (45 degrees from the iris radial), eight pixels from each
patch form a 1D patch vector, which is then windowed using a similar Hanning
window prior to application of the DCT in order to reduce spectral leakage during
the transform. The differences between the DCT coefficients of adjacent patch
vectors are then calculated and a binary code is generated from their zero
crossings. The binary code can capture orientation variation in an iris texture.
1. Provide a detailed comparison of 3 different features used to represent speech
signal in speaker-based biometric systems.
Answer:

MFCC
The difference between the cepstrum and the Mel- frequency cepstrum is that in the
MFC, the frequency bands are positioned logarithmically (on the mel scale) which
approximates the human auditory system's response more closely than the linearly-spaced
frequency bands obtained directly from the FFT or DCT. To extract MFCC, these
following steps are taken:
Take the Fourier transform of (a windowed excerpt of) a signal
Map the log amplitudes of the spectrum obtained above onto the Mel scale, using
triangular overlapping windows.

Take the Discrete Cosine Transform of the list of Mel log-amplitudes, as if it were
a signal.
The MFCCs are the amplitudes of the resulting spectrum.

In noisy condition some works have observed that MFCC based system gives a relatively
robust performance compared to LPCC based system.

LPCC

LPC starts with the assumption that a speech signal is produced by a buzzer at the end of
a tube (voiced sounds), with occasional added hissing and popping sounds.. The vocal
tract (the throat and mouth) forms the tube, which is characterized by its resonances,
which are called formants. Hisses and pops are generated by the action of the tongue, lips
and throat during sibilants and plosives.
LPC analyzes the speech signal by estimating the formants, removing their effects from
the speech signal, and estimating the intensity and frequency of the remaining buzz. The
process of removing the formants is called inverse filtering, and the remaining signal
after the subtraction of the filtered modeled signal is called the residue.

Perceptual Linear Predictive Coding
Perceptual linear prediction, similar to LPC analysis, is based on the short-term spectrum
of speech. In contrast to pure linear predictive analysis of speech, perceptual linear
prediction (PLP) modifies the short-term spectrum of the speech by several
psychophysically based transformations.
This technique uses three concepts from the psychophysics of hearing to derive an
estimate of the auditory spectrum:
The critical-band spectral resolution,
The equal-loudness curve, and
The intensity-loudness power law.

The auditory spectrum is then approximated by an autoregressive all-pole model. In
comparison with conventional linear predictive (LP) analysis, PLP analysis is more
consistent with human hearing.

2. Do you recommend features extracted using 1D Fourier transforms? Explain your
choice in details:
Answer:
It is not recommended to do original fourier transform over a whole speech signal,
because speech signals are statistically nonstationary, i.e, their statistical properties vary
with time. A way to circumvent this problem is to divide the time signal into a series of
successive frames and do fourier transform over each frame. Each frame consists of a
finite number, N , of samples. During the time interval of a frame, the signal is assumed
to be reasonably stationary or quasistationary. For speech signals sampled at a
frequency of fs =10 KHz, reasonable frame sizes range from 100 to 200 samples,
corresponding to 1020 msecs time duration.

3. How would you represent speech signals pertaining to a specific speaker that are
captured over a long period of time (say 30-40 years)? Therefore, how do you
extract features that do not change over ageing.

Extract features with some technique such as MFCC and quantize them so that
similar features will be converted to only one feature. One way to do so is VQ,
which is a process of mapping vectors from a large vector space to a finite number
of regions in that space. Each region is called a cluster and can be represented by
its center called a centroid. The collection of all codewords is called a codebook.
In the training phase, a speaker-specific VQ codebook is generated for each
known speaker by clustering his/her training acoustic vectors. The distance from a
vector to the closest codeword of a codebook is called a VQ-distortion. In the
recognition phase, an input utterance of an unknown voice is vector-quantized

using each trained codebook and the total VQ distortion is computed. The speaker
corresponding to the VQ codebook with smallest total distortion is identified.

4. How would you use the dynamic time warping technique with speech features
extracted using the mel frequency cepstrum (MFCC)? Explain your choice in
details:
Answer:

We calculate DTW-distance between two time series as the following:
We are given two (time-dependent) feature vector sequences generated using
MFCC.
o (
) of length .
o (
) of length .
The DTW distance between and is D(M,N), where D(i,j) is given by
o ( ) {
( )
( )
( )
} (
)
We define (
) as the Euclidean distance between vectors
and
having
the same dimensionality. Typically:
o (
) is small (low cost) if and are similar to each other, and

o (
) is large (high cost) otherwise.

When comparing a template series, shorter series would be favored (i.e. produce smaller
costs). For this reason, our final matching cost is normalized by the length of the warping
path:
( ) ( )
Finally, having this dissimilarity score, we can use nearest neighbor to classify speech.

1. 1D/2D discrete cosine transforms are used to extract features from iris images.
What are the main advatanges of 1D/2D discrete cosine transforms over Gabor
wavelets in this case? Provide a detailed answer with pros and cons:

Answer:
DCT computation is much faster than Gabor, which need sconvolution process
with images. Because of its speed, DCT implementation may be commercially
applicable than Gabor.
Gabor implementation needs more parameters to be tuned. Mostly the best
configuration really depend on the data set.

SIA 544 Final Exam 132

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

SIA 544 Final Exam 132

Загружено:

Авторское право:

Доступные форматы

King Fahd University of Petroleum and Minerals

Information and Computer Science Department

is the mean image of each

) as the Euclidean distance between vectors

) is small (low cost) if and are similar to each other, and

) is large (high cost) otherwise.

Вам также может понравиться