Академический Документы
Профессиональный Документы
Культура Документы
IMAGE
,
.,
. .
1. INTRODUCTION
Human body extraction from an image give rise to various applications like pose
detection,activity recognistion,video survellience ,human tracking and so on.
Detection and extraction of human bodies from an image are a challenging due
The different approaches for human body extraction are Bottom-up approaches
[1] use low-level elements, such as pixels or superpixels, and try to group them
into semantic entities of higher
levels, Interactive methods [2] that requires user input in order to differentiate
the foreground and background they are not appropriate for real world
applications. Top down approach [3] requires high-level knowledge about the
foreground, which in the case of humans is their pose.
b.Skin detection: Colour of the skin in a person's face can be used to match the
rest of his or her visible skin areas, making the skin detection process adaptive
to each person.And hence the position of the persons hands and legs can be
obtained using the persons skin colour.
c.upper body segmentation:The torso is the most visible body part, connected to
the face region and below it.
d.Lower body segmentation:Lower body lies below the upper body which helps to
determine the lower body's position.
2. PROJECT DESCRIPTION
a.Face detection: Face detection provides a strong indication about the presence
of humans in an image,and reduces the search for the upper body, and gives
information about skin color.The face detection method is based on facial feature
detection namely eyes, mouth, bridge of nose .
1.
2.
Real time - For practical applications at least 2 frames per second must be
processed.
3.
b.Skin colour detection: Colour of the skin in a person's face can be used to find
the rest of his or her visible skin areas, making the skin detection process
adaptive to each person.The skin detection method is based on skin color model
which involves extraction of chromatic color from the image and removal of
luminance.
3. FACE DETECTION
a.
b.
Real time - For practical applications at least 2 frames per second must be
processed.
c.
approach is time consuming since multiple times resizing of the input image is
involved. To overcome this drawback Viola-Jones rescale the detector instead of
the input image and run the detector several times through the image each time
with a different size.This detector is constructed by converting the input image to
integral image and using simple rectangular features.
a.Detector
The initial step of the Viola-Jones face detection algorithm is to convert the input
image into an integral image. This is carried out by making each pixel equal to
the entire sum of all pixels above and to the left of the concerned pixel. This
provides an easy way for the calculation of the sum of all pixels inside any given
rectangle using just four values. These values are the pixels in the integral image
that coincide with the feature.This makes it possible for the calculation of the
sum of all pixels inside any given rectangle using only four values.
And therefore it is shown how the sum of pixels within rectangles of arbitrary size
can be calculated in constant time. The Viola-Jones face detector investigates a
given input image using features consisting of rectangles. Different types of
features are shown in Figure 4.
Figure 2 :The different types of features. Each feature results in a single value
which is obtained by subtracting the sum of the white rectangles from the sum of
the black rectangles.By observation it has been found that a feature along with
an image of 24*24 pixels gives satisfactory results.
b.AdaBoost algorithm.
Among all these features only few are expected to give fairly high values when
on top of a face. In order to find these features Viola-Jones uses a AdaBoost
algorithm. AdaBoost is a machine learning boosting algorithm capable of building
up a strong classifier through a weighted combination of weak classifiers. Each
feature is a potential weak classifier. A weak classifier is mathematically
described as:
h(x, f, p, ?) = {
0 otherwise
Where x is a 24x24 pixel image, f is the feature that is applied, p is the polarity
and ? the threshold that verifies whether x is a face or non-face. The main aim of
AdaBoost algorithm is the determination of the best feature, polarity and
threshold. The determination of each new weak classifier involves calculating
each feature on all the examples in
order to find the best feature. Final setup has around 6000 features.
(x1, y1) (xn, yn) where 1 = 0,1 for negative and positive
examples.
b.Initializeweights
YCbCr for skin color detection was first proposed by Chai and Ngan in [5]. Facial
color is very uniform and therefore the skin color pixels belonging to the facial
region will appear in a large cluster as white pixels. The RGB representation of
color images is not appropriate for characterizing
c.For t=1,...,T:
1) Normalize the weight
wt,i ? wt,i
n
?j=1 wt,j
(4)
skin-color because the component r, g, b represents not only color but also
luminance. Luminance vary across a person's face due to the ambient lighting
and hence it is not a reliable measure in isolating skin from non-skin region.
Luminance can be eliminated from the color representation in the chromatic
color space. Chromatic colors, is also
2) Select the best weak classifier with respect to the weighted error:
?t = minf,p,q ?i wi |h(xi, f, p, ?) ? yi|
(5)
3)
Define ht(x) = h(x, ft, pt, ?t) where ft, pt, ?tare the minimizers of ?t.
4)
1?ei
(6)
wt+1,i = wt,j?
?h
(x)
(7)
t t
c.Cascade classifiers.
The Viola-Jones face detection algorithm scans the detector several times
through the same image each time with a different size. The algorithm should
discard non-faces, instead of finding faces,it is faster to discard a non-face than
to find a face. Considering this in mind the concept of cascaded classifier arises.
The cascade classifier is made up of stages in which each stage contains a
strong classifier. The job of each stage of cascaded classifier is to determine
whether a given image is definitely not a face or maybe a face. When a given
stage classifises the image as non-face it is immediately discarded. On the other
hand a image classified as a maybe-face is passed on to the next stage in the
cascade classifier. It follows that the more stages a given image passes, the
higher the chance the image actually contains a face. The concept is explained in
Figure 6. Cascade detector has 6000+ features with 38 stages with 1, 10, 25, 25
and 50 features in first five stages.
Cb = B - Y
(8)
Cr = R - Y
According to the authors the most suitable ranges of Cb and Cr that can be used
to represent skin
Even though skin colors of different people appear to vary over a wide range,
they differ much less in color than in brightness, that is skin colors of different
people are very close, but they differ mainly in intensities.
b. Morphological Operations.
Facial color is very uniform and therefore the skin color pixels belonging to the
facial region will appear in a large cluster therefore the skin color pixels
belonging to the background should be eliminated, and this elimination can be
done using morphological operations. Morphological operation can reduce image
data while preserving essential shape characteristics and can eliminate
irrelevancies. Morphological open process involves erosion followed by dilation
for investigation of binary image with '1' representing skin pixels and
'0' representing non-skin pixels in order to separate the skin areas which are
closely connected. Morphological erosion uses structuring element of particular
disk size. The dilation is applied to regrow the binary skin areas which are lost
due to erosion step.
Erosion:
A?B= {z
(B)z
? A}
(10)
Dilation:
(A?B) ?B={z
(B)z ? A ?? }(11)
(12)
c. Region Labeling
The binary image obtained from the morphological operation needs to be labeled
such that each clustered group of pixels can be identified as a single region, in
order for each region to be analyzed further to determine if it is a skin region or
not. Each region is labeled as 1, 2, 3 and so on. Pixels with 0 values are kept
same. At the end, the number of labels gives the number of regions in the
segmented image.
d. Skin region
A skin region is described as a closed region in the image, which can have 0,1 or
more holes inside it [6]. All holes in a binary image is black and have pixel value
of zero. The color boundary of the skin is represented by pixels with value 1 for
binary images it can also be described as set of connected components within an
image .Usually the face region contains holes. Imfill command is used to fill any
holes in the skin region.
To determine the number of holes inside a region, Euler number of the region, is
defined as:
E=C-H
(13)
E=Euler number.
Each local minima of a gray-scale image which can be regarded as a surface has
a hole and the surface is immersed out into water. Then, starting from the
minima of lowest intensity value, the water will progressively fill up different
catchment basins of image (surface).Conceptually, the algorithm then builds a
dam to avoid a situation that the water coming from two or more different local
minima would be merged. At the end of this immersion process, each local
minimum is totally enclosed by dams corresponding to watersheds of image.
Let M1, M2, ,MR sets denoting the coordinates in the regional minima of an
image g(x, y), where g(x, y) is the pixel value of coordinate (x, y). Let C(Mi) be
the coordinates in the catchment basin associated with regional minimum Mi and
let T[n] be the set of coordinates (s, t) for which g(s, t) < n and is given by
T[n]= {(s ,t) | g(s, t) < n} (14) Then the process of watershed algorithm is
discussed as below:
Step1:The boundary value of the pixels is to be found and the minimum value is
to be noted The boundary values of the pixels of g(x,y) is to be found and the
minimum value is to be assigned to Mi. Flooding is done by initializing n=min+1
Let Cn(Mi) as the coordinates in the catchment basin associated with minimum
Mi that are flooded at stage n.
(15)
( , ) ? C(Mi)
Let C[n] denote the union of the flooded catchment basins at stage n:
C[n]= ?i=1R Cn(Mi)
(16)
Set n = n + 1.
a.
If connected components is empty it represents a new minimum is
encountered.
b.
If connected components contains at least one connected component it
means connected components lies within the catchment basin of some regional
minimum.
If ?C[ ?1] contains one connected component of C[n - 1], connected component q
is incorporated into C[n - 1] to form C[n] because it means q lies within the
catchment basin of some regional minimum.
c.
If connected components contains more than one connected component it
represents all or part of a
ridge separating two or more catchment basins and set them as "dam".
If ?C[ ?1] contains more than one connected component of C[n- 1], it represents
all or part of a ridge separating two or more catchment basins is encountered so
that we have to find the points of ridges and set them as "dam".
? = | ( ) ? ??
(17)
If set B is reflected about its origin and shifted by z, then the dilation of A by B is
the set of all displacements z such that B and A have at least one common
element. The dilation operation makes the boundary of the object to grow, to the
extent decided by the shape and size of the structuring element. This effect is
analogous to the smoothing operation performed in the spatial low pass filtering
application. This operation is used to fill "holes? in the object of an image. The
other consequence of performing the dilation operation is the blurring effect.
? = | ( ) ? (18)
The erosion of image A by structuring element B is the set of all points z such
that the structuring element B is translated by z is a subset of the image. The
erosion operation is contradictory to the dilation operation and it causes the
boundary of the object to shrink and is decided by the shape and size of the
structuring element. The thinning effect is equivalent to the spatial high pass
filtering. The erosion operation removes those structures which are lesser in size
than that of the structuring element. It can be used to remove the noisy
2.
3.
Repeat:
4. RG(F) = hk+1.
Marker F must be a subset of G: F?G
4.
RESULT
4.1
Face detection is done using viola jones algorithm in computer vision toolbox
embedded in matlab 2013a. Figure 6 shows the face detected of an image.
Bounding box is returned around the face that has to be detected. Bounding box
contains four-element vector [x,y,width,height] that specifies in pixels,the upper
left corner and size of a bounding box.
Skin regions of the persons face and the remaining visible skin areas are exposed
which is indicated by white pixels.
Figure 10: Markers and object boundaries superimposed on the original image.
c.From this image the Upper body is segmented and this is denoted by yellow
region in the image. And lower body blue colour..
5. CONCLUSION
a.Face detection is done using viola jones algorithm in computer vision toolbox
embedded in matlab 2013a. Face detection shows the presence of a human in an
image,and gives information about skin color.This information guides the search
for the upper body, and hence it will lead to the search for the lowerbody.
c.Upper body and lower body segmentation is carried out using watershed
segmentation algorithm.
6.
REFERENCES
[1]
S. Klonus, D. Tomowski, Manfred Ehlers, Peter Athanasios Tsitsoulis,
Member, IEEE, and Nikolaos G. Bourbakis, Fellow, IEEE, " A Methodology for
Extracting Standing Human
[2]
L. Zhao and L. S. Davis, "Iterative figure-ground discrimination," in Proc.
17th Int. Conf. Pattern Recog., pp. 67-70, 2004.
[3]
S. Li,H. Lu, and L. Zhang, "Arbitrary body segmentation in static images,"
Pattern Recog., vol. 45, no. 9, pp. 3402-3413, 2012.
[4]
Ole Helvig Jensen Kongens, "Implementing the Viola-Jones Face Detection
Algorithm".
[5]
D. Chai and K. Ngan, "Face segmentation using Skin color map in
Videophone Applications",
[7] Sitapa Rujikietgumjorn, "segmentation methods for multiple body parts," July
31, 2008.