Академический Документы
Профессиональный Документы
Культура Документы
ACADEMY OF TECHNOLOGY
Certificate
This is to certify that the project report entitled Face Recognition using Deep
Learning, submitted to the Department of Electronics and Communication Engineering,
Academy of Technology, in partial fulfillment for 6th semester SEMINAR PRESENTATION
[EC-681] of Bachelor of Technology in Electronics and Communication Engineering, is a
record of bona fide work carried out by Rik Mitra, Roll No-16900316074 and Pritam
Sengupta, Roll No-16900316082, under my supervision and guidance.
All help received by them from various sources have been duly acknowledged.
No part of this report has been submitted elsewhere for award of any other degree.
(Sahadeb Santra)
Assistant Professor
Seminar Guide
Place: Adisaptagram,Hooghly.
Date:
3
Acknowledgement
We are thankful to my guide Prof. Sahadeb Santra whose personal enrolment
in the technical seminar presentation and report has been a major source of inspiration for us
to be flexible in our approach and thinking for tackling various issues. He assumes the critical
role of ensuring that we are always on the right track.
Last but not the least we would like to give a big thanks to all the staffs and
assistants of Electronics and Communication.
Abstract
Face recognition is the task of identifying an individual from an image of their face and a
database of know faces. Despite being a relatively easy task for most humans,
“unconstrained” face recognition by machines, specifically in settings such as malls, casinos
and transport terminals, remains an open and active area of research. However, in recent
years, a large number of photos have been crawled by search engines, and uploaded to social
networks, which include a variety of unconstrained material, such as objects, faces and
scenes. This large volume of data and the increase in computational resources have enabled
the use of more powerful statistical models for general challenge of image classification. This
research project evaluates the use of deep learning approaches such as deep convolutional
neural networks for image classification for the problem of unconstrained facial recognition.
Deep learning is an emerging area of machine learning (ML) research. It comprises multiple
hidden layers of artificial neural networks. The deep learning methodology applies nonlinear
transformations and model abstractions of high level in large databases. The recent
advancements in deep learning architectures within numerous fields have already provided
significant contributions in artificial intelligence. This article presents a state of the art survey
on the contributions and the novel applications of deep learning. The following review
chronologically presents how and in what major applications deep learning algorithms have
been utilized. Furthermore, the superior and beneficial of the deep learning methodology and
its hierarchy in layers and nonlinear operations are presented and compared with the more
conventional algorithms in the common applications. The state of the art survey further
provides a general overview on the novel concept and the ever-increasing advantages and
popularity of deep learning.
5
Theory
Introduction
Face Recognition (FR) is one of the areas from Computer Vision (CV) that has drawn more
interest for long. The practical applications for it are many, ranging from biometrical security,
to automatically tagging your friends pictures, and many more. Because of the possibilities,
many companies and research centers have been working on it.
The uses for an automatic face recognition system are many. Typical ones are biometric
identification − usually combined with other verification methods −, automatic border
control, or crowd surveillance. One of its main advantages is its non intrusivity. Most
identification methods require some action from people, either putting the fingerprint in a
machine, introducing a password, etc. On the contrary, face recognition can work by simply
having a camera recording. Among other uses, some of its most well knows uses belong to
the social network field.
As of 2016, there are already system being used that rely on face recognition, a brief sample
of which are introduced here. This sample is by no means exhaustive, but it tries to show the
variety of applications. It comes as no surprise that one of the most uses that draws most
attention is to track criminals. As forensic TV series have shown, having a system
automatically scanning city cameras to try to catch an escapee would be of great help. In fact,
United States is already using this technology. Although far from the quality level depicted in
fiction, they are already using it − although there is some skepticism regarding whether it
works − to identify people from afar. Although the large criticism there is involving this kind
of methods, there is little doubt that in the future they will become widely used. A not so well
known use of face recognition is to authorize payments. As a part of a pilot test, some users
are, under some circumstance, asked to take a picture of themselves before the payment is
accepted. This kind of applications have a double goal: to facilitate the process to users −
being easier than remembering a password −, and to discourage credit card thefts.
On a more technical way, there have been, historically, many approaches to the problem.
However, there is one key issue in the face recognition problem that most of them have
shared, that is, the feature extraction. Most approaches to the problem start by transforming
the original images to a more expressive set of features, either manually crafted, or
automatically selecting some statistically relevant ones. In fact, working with the raw images
is extremely difficult, due to factors such as light, pose, or background, among others.
Therefore, by keeping only the information relevant to the face, most of this “noise” is
discarded. Finding an efficient feature selection strategy is likely to benefit almost any kind
of ulterior classification method. There have been, traditionally, two main approaches to the
problem: the geometric, which uses relevant facial features and the relations between them,
and the photometric ones, which extracts statistical information from the image to use in
different kinds of comparisons.
7
Deep Learning
In recent years a new method has appeared which has affected the whole Computer Vision
community. Since its appearance, Deep Learning, and more concretely Deep Neural
Networks and Convolutional Neural Networks, has steadily achieved state-of-art results in
many CV problems, even in those in which research was stuck. We provide a more technical
description of this method, so here we will just say that DL is, roughly, a kind of Neural
Network composed of multiple layers. When applied to CV, they are capable of
automatically finding a set of highly expressive features. Based on empirical results, these
features have proven to be better than those manually crafted in many occasions. They have
the additional advantage of not having manually design these features, as it is the network the
one in charge of doing so. On top of that, the features learned can be considerably abstract.
Interestingly, the way CNNs work is closely related to the way biological visual system
works [Itti and Koch, 2001; Kim, Kim, and Lee, 2015]. Whether this is the reason of its
success is out of the scope of this document, but it can not be denied that the results they are
obtaining make them a choice to consider when faced with CV problems. In fact, a large
number of the most successful applications of CV in recent years have used CNNs, and this
tendency is expected to continue. Because of this, the work in this thesis makes use of them.
Two of the most successful applications of CNNs in the FR problem are DeepFace [Taigman
et al., 2014] and FaceNet [Schroff, Kalenichenko, and Philbin, 2015]. These two have
provided state-of-art results in recent years, with the best results being obtained by the second
ones. Although there are other methods providing close results, such as involving Joint
Bayesian methods [Cao et al., 2013; Chen et al., 2013], we decided to focus on CNN. The
reasons were not only result driven, but also interest driven, as we were personally interested
in working with them.
Problems
Unfortunately, even though its potential, automatic face recognition has many problems. One
of the most important ones is face variability in a single person. There are many factors that
can influence so that two pictures from the same person look totally different, such as light,
face expression or occlusion. Actually, when dealing with faces in controller environments,
8
face recognition systems are already delivering quality results, but they still have problems
when faced with faces in the “wild”. Even more, factors such as sunglasses, beards, different
hairstyles,emotions or even age, can greatly difficult the task. An example of these problems
can be seen in Figure. Another problem to be taken into account is the environment. Except
in controlled scenarios, face pictures have very different backgrounds, which can make the
problem of face recognition more difficult. In order to address this issue, many of the most
successful systems focus on treating the face alone, discarding all the surroundings. Taking
all of it into consideration, our goal was to develop a system capable of working with faces in
uncontrolled environments. In order to do so, we used Convolutional Neural Networks as a
feature extraction method. We also planned on applying some pre-processing in order to
minimizing the impact of the environment, and make our system more robust. That being
said, we were aware of the difficulties involved in such a project, so we were cautious about
the expected results.
Technological Details
Theoretical Background: CNN
We aim to provide an introduction into the concept of Convolutional Neural Networks. In
order to do so, it is necessary to understand the concept of Artificial Neural Network, so the
first part of the chapter is devoted to do so. After that, Deep Learning and CNN are
explained.
They have proven their capacity in many problems, such as Computer Vision ones, which
are difficult to address by extracting features in a traditional way. This section aims to briefly
introduce the main technical concepts of the method, in order to make it easier to understand
the Deep Learning explained afterwards.
This is, roughly speaking, the basic structure of an ANN. There are many variations over it,
such as Recurrent Neural Networks, in which connections form a directed circle, but they are
all based in this. They can be understood as a function f that maps an input X into an output
Y . The training task, then, consists in learning the weight associated to each edge.
Given the data, there are various learning algorithms, from which gradient descent combined
with backpropagation can be considered, given its widely spread use, the most successful of
all of them. In fact, to a certain degree it could be considered that using it is enough for
training most ANNs.
This algorithm starts by initializing all weights in the network, which can be done following
various strategies. Some of the most common ones include drawing them from a probability
distribution, or randomly setting them, although low values are advisable. The process
followed afterwards consists of 3 phases that are repeated many times over. In the first one,
an input instance is propagated through all the network, and the output values are calculated.
Then, this output is evaluated, using a loss function, with the correct output, and this is used
to calculate how far off the network is. The final phase consists in updating each weight in
order to minimize the obtained error. This is done by obtaining the gradient of each neuron,
that could be understood as a “step” towards to actual value. When these three phases are
repeated for all input instances we consider this an epoch. The algorithm can run for as many
11
epochs as specified, or as required to find the solution. Briefly, the obtaining of the gradient
goes as follows. Once the outputs have been calculated for an instance, we obtain the error
achieved for each output neuron o, calling it δo. This value allows finding the gradient of each
o. For this, we need to find the derivative of the output of o with respect to its input Xo, that
is, the partial derivative of its activation function φ. For the logistic regression case, this
becomes:
Deep Learning
One of the key aspects in most machine learning methods is the way data is represented, that
is, which features to use. If the features used are badly chosen, the method will fail regardless
of its quality. Even more, this selection affects the knowledge with which the method can
work: if you have trained your market analysis algorithm with numerical values, it will not be
able to make any sense from a written report, no matter its quality. Therefore, it is no surprise
that there has been an historical interest on finding the appropriate. Theoretical Background:
CNN features. This becomes especially relevant in the case of Computer Vision problems.
The reason is that, when faced with an image, there are usually way too many features − a
simple 640 × 480 RGB image has almost 1 million pixels −, and most of them are irrelevant.
Because of this, it is important to find some way of condensing this information in a more
compact way.
There are two main ways of obtaining features, manually choosing them − such as
physiological values in medical applications − or automatically generating them, an approach
known as representation learning. The latter has proven to be more effective in problems such
as computer vision, as it is very difficult for us humans to know what makes an image
distinguishable. Instead, in many cases machines have been able to determine which features
were relevant for them, resulting in some state of art results. The most paradigmatic case of
representation learning are the autoencoders. They perform a 2 step process, first they encode
the information they receive into a compressed representation, and they later try to decode, or
reconstruct, the original input from this reduced representation.
We are going to focus on Computer Vision problems from now on, as it will make it easier
to understand some of the next sections. Regarding the features extracted, people may have
some clear ideas about what makes an object, such as a car, recognizable. Having 4 wheels,
doors in the lateral, a glass at the front, it is made of metal, etc. However, these are high level
features, that are not easy for a machine to find in an image. To make it even worse, each
kind of object in the world has its particular features, usually with a large intra-class
variability. Because of this, developing a general object recognition application would be
impossible, as we would need manually selected features for each of them. Therefore, it has
not been a successful line of research recently. On the contrary, if machines are capable of
12
determining on their own what is representative of an object for them on their own, they will
have the potential of learning how to represent any object they are trained with.
However, there is an additional difficulty for this kind of problems, that is, the variability
depending on the conditions of each picture. We do not only have to deal with the intra-class
variability, but also the same object variability. The same car can be pictured in almost
endless ways, depending on the pose of the car, light conditions, image quality, etc. Us
humans are capable of making rid of this variation by extracting what we could consider
abstract features. These features can be include the ones we mentioned before, such as
number of wheels, but also others we are not aware of, such as the fact that they are usually
on a road, or that their wheels should be in contact with the floor. In order to develop a
successful representation learning method, it should be able to extract this kind of high-level
features, regardless of their variation. The problem is that this process can be extremely
difficult to develop into a machine, which may lead into thinking that it makes no sense to
make the effort of doing so. This is, precisely, where Deep Learning has proven to be
extremely useful.
Applications
You’re used to unlocking your door with a key, but maybe not with your face. As strange as
it sounds, our physical appearances can now verify payments, grant access and improve
existing security systems. Protecting physical and digital possessions is a universal concern
which benefits everyone, unless you’re a cybercriminal or a kleptomaniac of course. Facial
biometrics are gradually being applied to more industries, disrupting design, manufacturing,
construction, law enforcement and healthcare. How is facial recognition software affecting
these different sectors, and who are the companies and organisations behind its development?
1. Payments
It doesn’t take a genius to work out why businesses want payments to be easy. Online
shopping and contactless cards are just two examples that demonstrate the seamlessness of
postmodern purchases. With FaceTech, however, customers wouldn’t even need their cards.
In 2016, MasterCard launched a new selfie pay app called MasterCard Identity Check.
Customers open the app to confirm a payment using their camera, and that’s that. Facial
recognition is already used in store and at ATMs, but the next step is to do the same for
online payments. Chinese ecommerce firm Alibaba and affiliate payment software Alipay are
planning to apply the software to purchases made over the Internet.
3. Criminal identification
If FaceTech can be used to keep unauthorised people out of facilities, surely it can be used to
help put them firmly inside them. This is exactly what the US Federal Bureau of Investigation
is attempting to do by using a machine learning algorithm to identify suspects from their
driver’s licences. The FBI currently have a database which includes half of the national
population’s faces. This is as useful as it is creepy, giving law enforcers another way of
tracking criminals across the country. AI equipped cameras have also been trialled in the UK
to identify those smuggling contraband into prisons.
4. Advertising
The ability to collect and collate masses of personal data has given marketers and advertisers
the chance to get closer than ever to their target markets. FaceTech could do much the same,
14
5. Healthcare
Instead of recognising an individual via FaceTech, medical professionals could identify
illnesses by looking at a patient’s features. This would alleviate the ongoing strain on medical
centres by slashing waiting lists and streamlining the appointment process. The question is,
would you really want to find out you had a serious illness from a screen? If it’s a choice
between a virtual consultation or a month long wait for an appointment, then maybe so.
Another application of facial biometrics within healthcare is to secure patient data by using a
unique patient photo instead of passwords and usernames.
Advantages
The Improvement of Security Level
As we said in the first paragraph, a face biometric system greatly improves your security
measures. All corporation’s premises would be protected since you’ll be able to track both
the employees and any visitors that come into the area. Anyone who doesn’t have access or
permission to be there will be captured by the recognition system that alerts you instantly
about the trespassing.
As an example, let’s take a 24/7 drugstore. Any owner prefer to keep their money and clients
safe, avoiding unpleasant troubles with difficult visitors. When you have a FRT in place,
you’d be instantly alerted as soon as the wanted or suspicious character arrives. Which leads
to a significant reduces of expenses one usually spends on security staff.
software will successfully track every aspect of attendances to provide a better level of
protection for your facilities.
Accuracy ensures that there won’t be any misunderstandings and uncool awkwardness that
comes from bad face recognition software. With high levels of accuracy you’d sure that the
right person will be recognized at the right time.
Full Automation
Instead of manual recognition, which is done by security guards or the official representatives
outside of company’s premises, the facial recognition tech automates the identification
process and ensures its flawlessness every time without any haltings. You won’t even need an
employee to monitor the cameras 24/7.
Automation means convenience and reduces the expenses too. Therefore, any entrepreneur
would be fond of the fact that image identification systems are fully automated.
Disadvantages
distance that was between a target and a CCTV… What proportions will the detected face
have? No more than 100×200 pixels.
Pretty hard to get a clear identity in such case. What’s more, scanning a photo for varying
face sizes is a processor-intensive task. Most systems allow identification of a face-size range
to eliminate false recognition and speed up image processing. But the initial investment in
such face tracking software is not a cheap one, however, it will pay off in no time.
Surveillance Angle
The identification process is also under a great pressure of the surveillance angle that was
responsible for the target’s face capturing. To enroll a face through the recognition software,
the multiple angles are being used – profile, frontal, 45-degree, etc. But to generate a clear
template for the face, you’ll need nothing less than a frontal view. The higher resolution
photo has and the more direct its angle is (goes for both enrolled and compared images) the
more accurate resulting matches would be.
Then, there are also troubles with such things as facial hair or sunglasses. One can still fool
the FRT with a suddenly appeared or removed beard, same goes for obscuring face’s parts
with glasses or masks. To avoid such failures, the databases must be regularly updated with
the most up-to-date images.
Future Scope
The use of spherical canonical images allows us to perform matching in the spherical
harmonic transform domain, which does not require preliminary alignment of the images.
The errors introduced by embedding into an expressional space with some predefined
geometry are avoided. In this facial expression recognition setup, end-to-end processing
comprises the face surface acquisition and reconstruction, smoothening, sub sampling to
approximately 2500 points. Facial surface cropping measurement of large positions of
distances between all the points using a parallelized parametric version is utilized.
The general experimental evaluation of the face expressional system guarantees better face
recognition rates. Having examined techniques to cope with expression variation, in future it
may be investigated in more depth about the face classification problem and optimal fusion of
color and depth information. Further study can be laid down in the direction of allele of gene
matching to the geometric factors of the facial expressions. The genetic property evolution
framework for facial expressional system can be studied to suit the requirement of different
security models such as criminal detection, governmental confidential security breaches etc.
Conclusions
The facial expression recognition system presented in this research work contributes a
resilient face recognition model based on the mapping of behavioural characteristics with the
physiological biometric characteristics. The physiological characteristics of the human face
with relevance to various expressions such as happiness, sadness, fear, anger, surprise and
17
disgust are associated with geometrical structures which restored as base matching template
for the recognition system.
The behavioural aspect of this system relates the attitude behind different expressions as
property base. The property bases are alienated as exposed and hidden category in genetic
algorithmic genes. The gene training set evaluates the expressional uniqueness of individual
faces and provide a resilient expressional recognition model in the field of biometric security.
The design of a novel asymmetric cryptosystem based on biometrics having features like
hierarchical group security eliminates the use of passwords and smart cards as opposed to
earlier cryptosystems. It requires a special hardware support like all other biometrics system.
This research work promises a new direction of research in the field of asymmetric biometric
cryptosystems which is highly desirable in order to get rid of passwords and smart cards
completely. Experimental analysis and study show that the hierarchical security structures are
effective in geometric shape identification for physiological traits.
The facial expression based face recognition system is made efficient with genetic algorithm
invariants of the facial surface resulting to a recognition rate of 95.4%. The illustration of this
model is given in this research work to build expressional representations using the concept
of hierarchy based embedding approach. The facial representation model is deployed in
laptop for biometric authentication process. The impact of the embedding space choice on the
metric (distortion) concludes that spaces with spherical geometry are more favorable for
representation of facial surfaces.
Bibliography
Aarts, Emile and Jan Korst (1989). Simulated Annealing and Boltzmann Machines: A
Stochastic Approach to Combinatorial Optimization and Neural Computing. New York, NY,
USA: John Wiley & Sons, Inc. ISBN: 0-471-92146-7. Abadi, Martín et al. (2016).
“TensorFlow: A system for large-scale machine learning”. In: CoRR abs/1605.08695.
Berg, Thomas and Peter N. Belhumeur (2012). “Tom-vs-Pete Classifiers and Identity-
Preserving Alignment for Face Verification”. In: BMVC.
Cao, Xudong et al. (2013). “A Practical Transfer Learning Algorithm for Face Verification”.
In: Proceedings of the 2013 IEEE International Conference on Computer Vision. ICCV ’13.
Washington, DC, USA: IEEE Computer Society, pp. 3208–3215. ISBN: 978-1-4799-2840-8.
18