Вы находитесь на странице: 1из 21

EXTRACTION OF HUMAN BODIES FROM AN

IMAGE
,

Department of Electronics and Communication, St. Joseph Engineering College,


Vamanjoor,

.,

. .

ABSTRACT: Detection and extraction of human bodies in an image are a


challenging task that can facilitate applications, like activity recognition and
detection of the human pose. This project focuses on extracting human bodies
from an image into various parts. The extracted parts are classified into human
body organs such as legs, arm, torso, and face. Face detection provides a strong
indication about the presence of humans in an image,and gives information
about skin color and upper body.Face detection is done using viola jones
algorithm. Colour of the skin in a person's face can be used to match the rest of
his or her visible skin areas, making the skin detection process adaptive to each
person the skin color model involves extraction of chromatic color from the
image and removal of luminance. The upper and lower body can be extracted
using watershed segmentation method.

Key words:Viola jones algorithm,YCbCr color model,Watreshed segmentation.

1. INTRODUCTION

Human body extraction from an image give rise to various applications like pose
detection,activity recognistion,video survellience ,human tracking and so on.
Detection and extraction of human bodies from an image are a challenging due

to several factors, including shading, occlusions, background clutter, the high


degree of human body deformability. This project focuses on extracting a human
body from an image into various parts. The extracted parts are classified into
human body organs such as legs, arm, torso, and head .The result can be further
applied to many useful applications. One useful application is parts recognition.
Once the parts are recognized, they can be analyzed for gesture types for
instance, the position of body parts can be interpreted to sitting, standing, or
lying.

The different approaches for human body extraction are Bottom-up approaches
[1] use low-level elements, such as pixels or superpixels, and try to group them
into semantic entities of higher

levels, Interactive methods [2] that requires user input in order to differentiate
the foreground and background they are not appropriate for real world
applications. Top down approach [3] requires high-level knowledge about the
foreground, which in the case of humans is their pose.

Extraction of human body can be divided into four sequential steps

a.Face detection: Face detection shows the presence of a human in an image,and


gives information about skin color.This information guides the search for the
upper body, and hence it will lead to the search for the lowerbody.

b.Skin detection: Colour of the skin in a person's face can be used to match the
rest of his or her visible skin areas, making the skin detection process adaptive
to each person.And hence the position of the persons hands and legs can be
obtained using the persons skin colour.

c.upper body segmentation:The torso is the most visible body part, connected to
the face region and below it.

d.Lower body segmentation:Lower body lies below the upper body which helps to
determine the lower body's position.

2. PROJECT DESCRIPTION

a.Face detection: Face detection provides a strong indication about the presence
of humans in an image,and reduces the search for the upper body, and gives
information about skin color.The face detection method is based on facial feature
detection namely eyes, mouth, bridge of nose .

Localization of the face region is performed using Viola-Jones algorithmas given


in [3] that achieves both high performance and speed.
The characteristics of Viola-Jones algorithm is:

1.

Robust - very high detection rate.

2.
Real time - For practical applications at least 2 frames per second must be
processed.

3.

Face detection - The goal is to distinguish

faces from non-faces .

b.Skin colour detection: Colour of the skin in a person's face can be used to find
the rest of his or her visible skin areas, making the skin detection process
adaptive to each person.The skin detection method is based on skin color model
which involves extraction of chromatic color from the image and removal of
luminance.

c.Upper and lower body segmentation:Torso region is extracted by using


watershed segmentation algorithm.Watershed segmentation is used to separate
touching objects in an image. The watershed transform is often applied to this
problem. The watershed transform finds "catchment basins" and "watershed
ridge lines" in an image by treating it as a surface where light pixels are high and
dark pixels are low.

3. FACE DETECTION

Face detection provides a strong indication about the presence of humans in an


image,and reduces the search space for the upper body, and providesinformation
about skin color.The face detection method is based on facial feature detection
namely eyes, mouth, bridge of nose . Localization of the face region is
performedusing Viola-Jones algorithmas given in [4] that achieves both high
performance and speed.

The characteristics of Viola-Jones algorithmwhich make it a good detection


algorithmare:

a.

Robust - very high detection rate.

b.
Real time - For practical applications at least 2 frames per second must be
processed.

c.

Face detection - The goal is to distinguish faces from non-faces .

3.1 Viola-Jones Algorithm:

The basic concept of the Viola-Jones algorithm is to scan a detector capable of


detecting faces across a given input image. The basic image processing
approach would be to alter the scale of the input image to different sizes and
then move the fixed size detector through these images. This basic

approach is time consuming since multiple times resizing of the input image is
involved. To overcome this drawback Viola-Jones rescale the detector instead of
the input image and run the detector several times through the image each time
with a different size.This detector is constructed by converting the input image to
integral image and using simple rectangular features.

a.Detector

The initial step of the Viola-Jones face detection algorithm is to convert the input
image into an integral image. This is carried out by making each pixel equal to
the entire sum of all pixels above and to the left of the concerned pixel. This
provides an easy way for the calculation of the sum of all pixels inside any given
rectangle using just four values. These values are the pixels in the integral image
that coincide with the feature.This makes it possible for the calculation of the
sum of all pixels inside any given rectangle using only four values.

Sum of rectangle = D - (B + C) + A. (1) As both the rectangle B and C include


rectangle A the sum of A has to be added to the calculation.

And therefore it is shown how the sum of pixels within rectangles of arbitrary size
can be calculated in constant time. The Viola-Jones face detector investigates a
given input image using features consisting of rectangles. Different types of
features are shown in Figure 4.

Figure 2 :The different types of features. Each feature results in a single value
which is obtained by subtracting the sum of the white rectangles from the sum of
the black rectangles.By observation it has been found that a feature along with
an image of 24*24 pixels gives satisfactory results.

b.AdaBoost algorithm.

Among all these features only few are expected to give fairly high values when
on top of a face. In order to find these features Viola-Jones uses a AdaBoost
algorithm. AdaBoost is a machine learning boosting algorithm capable of building
up a strong classifier through a weighted combination of weak classifiers. Each
feature is a potential weak classifier. A weak classifier is mathematically
described as:
h(x, f, p, ?) = {

if pf(?) > p? (2)

0 otherwise

Where x is a 24x24 pixel image, f is the feature that is applied, p is the polarity
and ? the threshold that verifies whether x is a face or non-face. The main aim of
AdaBoost algorithm is the determination of the best feature, polarity and
threshold. The determination of each new weak classifier involves calculating
each feature on all the examples in

order to find the best feature. Final setup has around 6000 features.

a.Given example images

(x1, y1) (xn, yn) where 1 = 0,1 for negative and positive
examples.

b.Initializeweights

YCbCr for skin color detection was first proposed by Chai and Ngan in [5]. Facial
color is very uniform and therefore the skin color pixels belonging to the facial
region will appear in a large cluster as white pixels. The RGB representation of
color images is not appropriate for characterizing

w1,i = 2m1 , 2l1 for y1 = 0,1

m and l are the numbers of examples.

c.For t=1,...,T:
1) Normalize the weight
wt,i ? wt,i
n
?j=1 wt,j

(3) positive and negative

(4)

skin-color because the component r, g, b represents not only color but also
luminance. Luminance vary across a person's face due to the ambient lighting
and hence it is not a reliable measure in isolating skin from non-skin region.
Luminance can be eliminated from the color representation in the chromatic
color space. Chromatic colors, is also

2) Select the best weak classifier with respect to the weighted error:
?t = minf,p,q ?i wi |h(xi, f, p, ?) ? yi|

(5)

3)

Define ht(x) = h(x, ft, pt, ?t) where ft, pt, ?tare the minimizers of ?t.

4)

update the weights

1?ei

(6)

wt+1,i = wt,j?

where ei =0 if example xi is classified correctly and ei = 1 otherwise, and ?t =


1??t?t
d. The final strong classifier is:
c(x) = ?T
t=1

?h

(x)

(7)

t t

c.Cascade classifiers.

The Viola-Jones face detection algorithm scans the detector several times
through the same image each time with a different size. The algorithm should
discard non-faces, instead of finding faces,it is faster to discard a non-face than
to find a face. Considering this in mind the concept of cascaded classifier arises.
The cascade classifier is made up of stages in which each stage contains a
strong classifier. The job of each stage of cascaded classifier is to determine
whether a given image is definitely not a face or maybe a face. When a given
stage classifises the image as non-face it is immediately discarded. On the other
hand a image classified as a maybe-face is passed on to the next stage in the
cascade classifier. It follows that the more stages a given image passes, the
higher the chance the image actually contains a face. The concept is explained in
Figure 6. Cascade detector has 6000+ features with 38 stages with 1, 10, 25, 25
and 50 features in first five stages.

Figure 3 :The cascaded classifier

4. SKIN DETECTION a.Skin color model.

known as "pure" colors in the absence of luminance. The conversion of RGB to


YCbCr is done by the equation given in equation (8).
Y = 0.299R + 0.587G + 0.114B

Cb = B - Y

(8)

Cr = R - Y

According to the authors the most suitable ranges of Cb and Cr that can be used
to represent skin

color pixels are shown in equation (9)


77 <= Cb <= 127; 133 <=Cr <=173 (9)

Even though skin colors of different people appear to vary over a wide range,
they differ much less in color than in brightness, that is skin colors of different
people are very close, but they differ mainly in intensities.

b. Morphological Operations.

Facial color is very uniform and therefore the skin color pixels belonging to the
facial region will appear in a large cluster therefore the skin color pixels
belonging to the background should be eliminated, and this elimination can be
done using morphological operations. Morphological operation can reduce image
data while preserving essential shape characteristics and can eliminate
irrelevancies. Morphological open process involves erosion followed by dilation
for investigation of binary image with '1' representing skin pixels and

'0' representing non-skin pixels in order to separate the skin areas which are
closely connected. Morphological erosion uses structuring element of particular
disk size. The dilation is applied to regrow the binary skin areas which are lost
due to erosion step.

Erosion:

A?B= {z

(B)z

? A}

(10)

Dilation:

(A?B) ?B={z

(B)z ? A ?? }(11)

Morphological open: AB =(A?B) ?B

(12)

c. Region Labeling

The binary image obtained from the morphological operation needs to be labeled
such that each clustered group of pixels can be identified as a single region, in
order for each region to be analyzed further to determine if it is a skin region or
not. Each region is labeled as 1, 2, 3 and so on. Pixels with 0 values are kept
same. At the end, the number of labels gives the number of regions in the
segmented image.

d. Skin region

A skin region is described as a closed region in the image, which can have 0,1 or
more holes inside it [6]. All holes in a binary image is black and have pixel value
of zero. The color boundary of the skin is represented by pixels with value 1 for
binary images it can also be described as set of connected components within an
image .Usually the face region contains holes. Imfill command is used to fill any
holes in the skin region.

To determine the number of holes inside a region, Euler number of the region, is
defined as:

E=C-H

(13)

E=Euler number.

C=The number of connected components. H=The number of holes in a region

5. UPPER BODY AND LOWER BODY


SEGMENTATION

Watershed segmentation is used to separate touching objects in an image [7].


The watershed transform is often applied to this problem. The watershed
transform finds "catchment basins" and "watershed ridge lines" in an image by
treating it as a surface where light pixels are high and dark pixels are low.

Each local minima of a gray-scale image which can be regarded as a surface has
a hole and the surface is immersed out into water. Then, starting from the
minima of lowest intensity value, the water will progressively fill up different
catchment basins of image (surface).Conceptually, the algorithm then builds a
dam to avoid a situation that the water coming from two or more different local
minima would be merged. At the end of this immersion process, each local
minimum is totally enclosed by dams corresponding to watersheds of image.

Figure 4:Block diagram of upper and lower body segmentation.

The gradient of the image is to be found before applying watershed. The


characteristic of a pixel will be compared with the neighboring pixel and if found
similar, the pixels are added to form a region. The process is carried out till edge
of the region is found or the neighboring regions are above to merge

Following are the steps used in algorithm [8].

Let M1, M2, ,MR sets denoting the coordinates in the regional minima of an
image g(x, y), where g(x, y) is the pixel value of coordinate (x, y). Let C(Mi) be
the coordinates in the catchment basin associated with regional minimum Mi and
let T[n] be the set of coordinates (s, t) for which g(s, t) < n and is given by

T[n]= {(s ,t) | g(s, t) < n} (14) Then the process of watershed algorithm is
discussed as below:

Step1:The boundary value of the pixels is to be found and the minimum value is
to be noted The boundary values of the pixels of g(x,y) is to be found and the
minimum value is to be assigned to Mi. Flooding is done by initializing n=min+1
Let Cn(Mi) as the coordinates in the catchment basin associated with minimum
Mi that are flooded at stage n.

Step2:Compute catchment basins. Compute

Cn(Mi) = C(Mi) ?T[n]

(15)

Cn(Mi) is 1 at location (x,y) if

( , ) ? C(Mi)

AND (x,y) ?T[n];otherwise Cn(Mi) =0 .

Let C[n] denote the union of the flooded catchment basins at stage n:
C[n]= ?i=1R Cn(Mi)

(16)

Set n = n + 1.

Step3:Derive the set of connected components Derive the set of connected


components in T[n] denoting as Q. For each connected component q ?Q[n], there
are three conditions:

a.
If connected components is empty it represents a new minimum is
encountered.

If ?C[ ?1] is empty, connected component q is incorporated into C[n - 1] to form


C[n] because it represents a new minimum is encountered.

b.
If connected components contains at least one connected component it
means connected components lies within the catchment basin of some regional
minimum.

If ?C[ ?1] contains one connected component of C[n - 1], connected component q
is incorporated into C[n - 1] to form C[n] because it means q lies within the
catchment basin of some regional minimum.

c.
If connected components contains more than one connected component it
represents all or part of a

ridge separating two or more catchment basins and set them as "dam".

If ?C[ ?1] contains more than one connected component of C[n- 1], it represents
all or part of a ridge separating two or more catchment basins is encountered so
that we have to find the points of ridges and set them as "dam".

5.1 Morphological operations.

Morphological operations is based on ordering of the pixels instead of the


numerical value of pixels. Morphological operations uses binary image of very
small size called as structuring elements to process the test image. Dilation and
Erosion are the two fundamental operations that are used in morphological
operations and any other morphological operation, like opening, closing,
reconstruction etc., are the combination of these two operations.

a.Dilation: The dilation of an image A by structuring element B is defined as

? = | ( ) ? ??

(17)

If set B is reflected about its origin and shifted by z, then the dilation of A by B is
the set of all displacements z such that B and A have at least one common
element. The dilation operation makes the boundary of the object to grow, to the
extent decided by the shape and size of the structuring element. This effect is
analogous to the smoothing operation performed in the spatial low pass filtering
application. This operation is used to fill "holes? in the object of an image. The
other consequence of performing the dilation operation is the blurring effect.

b.Erosion: The erosion of an image A by structuring element B is defined as

? = | ( ) ? (18)

The erosion of image A by structuring element B is the set of all points z such
that the structuring element B is translated by z is a subset of the image. The
erosion operation is contradictory to the dilation operation and it causes the
boundary of the object to shrink and is decided by the shape and size of the
structuring element. The thinning effect is equivalent to the spatial high pass
filtering. The erosion operation removes those structures which are lesser in size
than that of the structuring element. It can be used to remove the noisy

"connection? between two objects in the image. c.Image reconstruction:


Reconstruction is a morphological transformation that involves two images, a
marker image and a mask image, and a structuring element. If G is the mask and
F is the marker, the reconstruction of G from F, denoted RG(F), is defined by the
following iterative procedures:
1.

Initialize h1to be the marker image, F.

2.

Create the structuring element: B

3.

Repeat:

hk+1 = (hk? B)? G until hk+1=hk .

4. RG(F) = hk+1.
Marker F must be a subset of G: F?G

c.Obtain foreground markers & background: The application of watershed


algorithm is greatly influenced by the noise and other local irregularities in the
image .This leads to over-segmentation of regions to an extent that the
segmented image itself looks like a noisy image. This is the major drawback of
watershed algorithm. To avoid this problem markers are used for segmentation.
A marker is a connected component belonging to an image. The markers which
are connected components do possess the same intensity values and are treated
as regional minima. Markers can be classified as internal (foreground) or external
(background) depending on its location from region of interest.

4.

RESULT

4.1

Face detection using viola jones algorithm.

Face detection is done using viola jones algorithm in computer vision toolbox
embedded in matlab 2013a. Figure 6 shows the face detected of an image.

Figure 5:Original image

Bounding box is returned around the face that has to be detected. Bounding box
contains four-element vector [x,y,width,height] that specifies in pixels,the upper
left corner and size of a bounding box.

Figure 6: Face is detected

4.2 Skin colour detection.

Skin regions of the persons face and the remaining visible skin areas are exposed
which is indicated by white pixels.

Figure 7: Skin colour indicated by large cluster of white pixels.

4.3 Upper and lower body segmentation.

a.Input image is converted to grayscale image for futher segmentation of the


image.

Figure 8:Grayscale image

Figure 10: Markers and object boundaries superimposed on the original image.

c.From this image the Upper body is segmented and this is denoted by yellow
region in the image. And lower body blue colour..

Figure 11:Upper body segmented denoted by green region

5. CONCLUSION

a.Face detection is done using viola jones algorithm in computer vision toolbox
embedded in matlab 2013a. Face detection shows the presence of a human in an
image,and gives information about skin color.This information guides the search
for the upper body, and hence it will lead to the search for the lowerbody.

b. Skin color is detected,colour of the skin in a person's face can be used to


match the rest of his or her visible skin areas, making the skin detection process
adaptive to each person. The skin detection method is based on skin color model
which involves extraction of chromatic color from the image and removal of
luminance.

c.Upper body and lower body segmentation is carried out using watershed
segmentation algorithm.

6.

REFERENCES

[1]
S. Klonus, D. Tomowski, Manfred Ehlers, Peter Athanasios Tsitsoulis,
Member, IEEE, and Nikolaos G. Bourbakis, Fellow, IEEE, " A Methodology for
Extracting Standing Human

Bodies From Single Image." IEEE transactions on human-machine systems, vol.


45, no. 3, june 2015.

[2]
L. Zhao and L. S. Davis, "Iterative figure-ground discrimination," in Proc.
17th Int. Conf. Pattern Recog., pp. 67-70, 2004.

[3]
S. Li,H. Lu, and L. Zhang, "Arbitrary body segmentation in static images,"
Pattern Recog., vol. 45, no. 9, pp. 3402-3413, 2012.

[4]
Ole Helvig Jensen Kongens, "Implementing the Viola-Jones Face Detection
Algorithm".

[5]
D. Chai and K. Ngan, "Face segmentation using Skin color map in
Videophone Applications",

IEEE transactions on Circuits and Systems for Video Technology.

[6]Pallabi, Saikia,Gollo Janam, Margaret Kathing,"Face Detection using Skin


Colour Model and distance between Eyes", International Journal of Computing,
Communications and Networking, Volume 1, No.3, November - December 2012.

[7] Sitapa Rujikietgumjorn, "segmentation methods for multiple body parts," July
31, 2008.

[8] Ravi S, A M Khan," bio-medical image segmentation using marker controlled


watershed algorithm", International Journal of Research in Engineering and
Technology.

Вам также может понравиться