Вы находитесь на странице: 1из 3

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 7July 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2000

Depth Detection of Facial feature
M.tech student ,Department of ISE,P.E.S.I.T

Abstract The main problem in computer vision is to
calculate depth using two images.This paper aims in calculating
the depth of facial feature(i.e nose),using left and right images.
This left and right images are captured by left and right camera
which are mounted horizontally separated by slight distance such
that they are able to capture image from left and right camera
.The extracted correspondent feature point(i.e nose)from both
left and right image is used and triangulation method is used to
calculate 3D distance of nose.The triangulation method requires
disparity map along with correspondence points.

Keywords Stereo calibration,Stereo rectification.

Depth detection of facial feature is about finding the depth
from camera using two images. Depth detection are applied in
various fields like Robotics and 3D-Model generation.

In Robotics to find information about the position of an
object. This can be used by robots to identify one or more
similar objects by calculating the distance of similar object,it
becomes easier for robot to distinguish between objects.

By calculating the 3D values of multiple points, we can
generate 3D-Model.

Depth information for a 2D image can be calculated in several
ways.One way to detect depth is using laser range camera and
one more method is using a two image pair in combination
and then apply triangulation. The usual and most common
method is called as stereo vision, stereo matching or stereo
correspondence. In stereo correspondence is about finding
same correspondent point from given input left and right
image taken,these images are captured from two cameras
which are slightly separated and are stored in database and
then these images are given as input for detecting depth .
One of the vital problem definition in computer vision is to
calculate the depth to an object using left and right image
taken by two cameras.This method makes use of focal length
to pixel point ratio.This ratio relates to image formed in the
lens and the image formed on the outside world [1].
Markov Random Field (MRF) algorithm is used to capture
monocular cues, and then use them into a stereo system.
Monocular cues along with stereo (triangulation) ones, are
used to get more accurate depth estimates by using monocular
cues or stereo cues alone [2].

Stereo camera depth is applied in human centered
applications. This gives us the study of how stereo camera
depth resolution and human depth resolution varies.Stereo
vision is the one which provides depth to human eye by
disparity between two images taken from left and right eye [3].

This method shows importance of camera calibration on
performance of depth reconstruction using stereo imaging.
This method provides formulae that relate different parameter
errors to the 3D reconstruction measurements[4].
In this paper we use two images to calculate the depth of
facial feature the methodology used here is stereo imaging.
Stereo imaging is ability that our eye has given us. How far
can this be achieved in computational systems is the question.
Computers can achieve this by finding correspondence of
same point that on both left and right image.Once
correspondent method is found from both left and right image
Triangulation method is applied to calculate the 3D distance
of the corresponding point.

Stereo imaging involves following steps when using two
images to calculate depth.
Step 1: Removing errors that are caused by lens due to
manufacturing defects of lens and bulging effect and this
process is called undistortion.
Step 2: Alignment of both left and right images so that they lie
on the same planer.
Step 3:Finding the correspondent points from both left and
right Image.The output of this is disparity map.
Step 4:The geometric arrangements camera is known,then
disparity map is used in triangulation to find the 3D distance.
The first phase is image acquisition, where images of
chessboard patterns and user from left and right cameras are
captured and saved as left and right images respectively in the
database. The second phase is the stereo calibration, where
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 7July 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2001

cameras are calibrated and the output matrix of calibrated
camera is obtained which is used for rectification, matrix
obtained after rectification is further required in Depth
detection of facial feature.The third phase is Face detection
using the built-in library of OpenCv,followed by feature
extraction of the detected face.The last phase is the one where
the extracted correspondent nose points and triangulation is
applied on these correspondent nose points to Detect the
depth of facial feature.

A.Image Acquisition
This is a prerequisite stage where Images are to
be captured and stored in the database. In order to capture
Images the two cameras are horizontally mounted separated
by a distance such that both the cameras are able to capture
the same image. The OpenCV functions that allow us to
interact with hardware such as cameras are collected into a
library called HighGUI (which stands for high-level
graphical user interface).

Figure 1: The SystemArchitecture
B.Stereo calibration
Stereo calibration is the process of finding the geomety
between the two cameras[5].Stereo calibration is about finding
the rotation matrix and translation vector between the two
cameras.Both Rotation and Translation are calculated by the
function cvStereoCalibrate().In cvStereoCalibrate(),a single
rotation matrix and translation vector that relate the right
camera to the left camera is produced.The input of stereo
calibration is used for stereo rectification.Stereo rectification
is the process of correcting the individual images such that
images will appear that they are coplanar.

The result of the process of aligning the two image planes will
be eight terms, four each for the left and the right cameras.The
two cameras that is right and left cameras,requires distortion
vector that is distortion Coefficients which is obtained,by
rotation matrix (to apply to the image), and the rectified and
unrectified camera matrices.In order perform rectification
process Bouguets algorithm is used, which uses the rotation
and translation parameters from two calibrated cameras.

The captured left and right chessboard patterns are given as
the input to stereo calibration.Stereo calibration is performed
in Opencv using function CvStereoCalibrate.The output of
Stereo calibration is rotation, translation and undistortion
vectors.Following stereo calibration is stereo rectification
which uses CvStereoRectify() function.

C.Face detection
The main purpose of face detection is to find the
corresponding point in both left and right image. OpenCV
implements face-detection technique first developed by Paul
Viola and Michael Jones and is known as the Viola-Jones
detector. The face detector uses Haar Feature-based Cascade
Classifier.OpenCV uses a set of pre-trained face recognition
file to detect face, the pre-trained face recognition file is in
xml format.

D.Facial feature extraction
The Facial feature i.e. nose is extracted in OpenCV using pre-
trained nose recognition file to detect nose which is in xml
format. Once nose is extracted a midpoint is drawn on centre
of detected nose. This nose point is taken as corresponding
feature point from both left and right image.

E.Depth detection of Facial feature
Once the stereo calibration and stereo rectification is
performed we obtain disparity map in the form of vector.This
disparity map is used to calculate the depth by triangulation
and is called reprojection, and the output is a depth map.

The stereo calibration process eliminates radial distortion,the
bulging phenomenon of image,and tangential distortion which
is due to manufacturing defects.Stereo Rectification makes
two images that is left and right image to be row aligned and
lie on the same plane.Now we have undistorted,aligned left
and right image which are coplanar.with exactly parallel
optical axes that are a known distance apart, and with equal
focal lengths fl =fr.

Taking Nl and Nr to be the nose positions of the points in the
left and right images,and that the depth is inversely
proportional to the disparity between left nose point and right
nose point.The disparity is defined as difference between the
Image Acquisition
Stereo Calibration
Face Detection
Facial Feature Extraction
Depth Detection of Facial Feature
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 7July 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2002

left nose point and right nose point and is denoted as d =Nl
Nr. This is shown in Figure 2, and the depth Z is derived by
using similarity of triangles.

Nl z Nr



Ol d=Nl-Nr Or

Figure 2:Triangulation(Similarity of Triangles).

The equation to calculate the depth Z is given by:
T- (Nl-Nr)/ Z f = T / Z

Z =f T / Nl-Nr.

Table 1 Results for Image set
Left and
Right image
Face and
X-axis Y-axis Z-axis
Image 001 Yes -1.23 13.33 64.90
Image 002 Yes -3.66 12.30 66.56
Image 003 Yes -3.48 13.01 63.32
Image 004 Yes -2.73 11.49 64.90
Image 005 Yes -5.31 12.22 61.81
Image 006 Yes -4.95 08.43 63.31
Image 007 Yes -3.40 11.90 61.80
Image 008 Yes -6.43 11.78 60.37
Image 009 Yes -5.31 9.81 61.81
Image 010 Yes 0.09 10.69 60.37
Image 011 Yes -1.68 9.24 45.54
Image 012 Yes -3.18 10.49 55.23
Image 013 Nose not
Image 014 Nose not
Image 015 Nose not

The above table indicates the distance of x,y and z-axis
where the images arecaptured for varying position and
different distances.Face detection is done from both left and
right image.After face detection nose mid point is used as
correspondent point from left and right image and
triangulation is applied to detect depth.

The Depth detection of facial feature (i.e. nose) is achieved,
using Stereo Imaging Concept.This gives us calculated values
in Z-axis, Thus depth can be acquired using left and right
Images, by extracting corresponding feature point from both
left and right images and finally applying Triangulation
method from the acquired correspondent feature points of Left
and Right Images.The Depth Detection can be extracted of
more facial features like eye ,nose mouth,eyebrows,jawline
points and so on can be extracted and can be used in 3D face
model generation.If Deph Can be estimated for the entire
image rather than a single feature,can be applied in Free
Viewpoint Television (FTV) and Multi-View Coding (MVC).
[1] Luis Copertari,Unidad Acadmica de Ingeniera Elctrica,Universidad
Autnoma de Zacatecas, Stereoscopic vision for depth
perception,Investigation scientific,August 2007,Volume 3.
[2] Ashutosh Saxena, J amie Schulte and Andrew Y. Ng. Depth Estimation
using Monocular and Stereo Cues, Stanford University, Stanford, CA 94305.
[3] Mikko Kyt*, Mikko Nuutinen, Pirkko Oittinen,Method for measuring
stereo camera depth accuracy based on stereoscopic vision, Aalto University
School of Science and Technology, Department of Media
Technology,Otaniementie 17, Espoo, Finland.
[4] Wenzi Zhao, N.Nandhana Kumar, Effects of Camera Alignment Errors
on Stereoscopic Depth Estimates,University of Virginia,VA 22903.
[5] Gary Bradski and Adrian Kaehler,Learning openCV, September 2008:
First Edition.
[6] Viola, P. and J ones, M. Rapid object detection using boosted cascade of
simple features. IEEE Conference on Computer Vision and Pattern
Recognition, 2001.
[7] Hua Gu Guangda Su Cheng Du, Feature Points Extraction from Faces
Image and Vision Computing NZ, Palmerston North, November 2003.