Prashant Thesis

An Application of Human Action
Recognition in Calories Burning

Monitoring System.
A Dissertation submitted to
THE MAHARAJA SAYAJIRAO UNIVERSITY OF BARODA
In Partial Fulfilment of the Requirements For the
Degree of
MASTER OF ENGINEERING (ELECTRICAL)

in
Microprocessor System and Applications (MSA)

Submitted by
Ramoliya Prashant Chhaganbhai
(Roll No. – 273)

Under the guidance of
Sandhya R. Sharma
DEPARTMENT OF ELECTRICAL ENGINEERING

FACULTY OF TECHNOLOGY AND ENGINEERING
GUJARAT, INDIA
JULY 2018
DEPARTMENT OF ELECTRICAL ENGINEERING

FACULTY OF TECHNOLOGY AND ENGINEERING
GUJARAT, INDIA
JULY 2018
CERTIFICATE
This is to certify that the dissertation report entitled “An Application of
Human Action Recognition in Calories Burning Monitoring System” has
been carried out by Mr. Ramoliya Prashant Chhaganbhai under my guidance
in partial fulfillment of the requirements for the Degree of MASTER OF
ENGINEERING (ELECTRICAL) with specialization in Microprocessor
System and Applications (part III - IV) of The Maharaja Sayajirao University
of Baroda during the academic year 2017-18. The results embodied in this
dissertation work, to the best of my knowledge, have not been submitted to any
other university or institution for award of any degree or research.
Guide Head of the Department
Mrs. Sandhya R. Sharma Dr. S. K. Joshi

Assistant Professor H.O.D
Electrical Engineering Department Electrical Engineering Department
Faculty of Technology and Engineering Faculty of Technology and Engineering
The Maharaja Sayajirao University of The Maharaja Sayajirao University of
Baroda Baroda
DEAN
Faculty of Technology & Engineering
2
The Maharaja Sayajirao University Of Baroda
ACKNOWLEDGEMENT
I would like to express my deep sense of respect and gratitude towards my guide
Mrs. Sandhya R. Sharma (Assistant Professor) at faculty of Technology and Engineering,
The Maharaja Sayajirao University of Baroda for his exemplary guidance, cordial support,
valuable information, monitoring and constant encouragement throughout my dissertation
work. I want to thank him for giving me an opportunity to work under him. Without his
experience and insights, it would have been very difficult for me to do quality work.
I would like to express my gratitude towards members of The Maharaja Sayajirao

University of Baroda for their kind co-operation and encouragement which helped me in
completion of the dissertation. I am especially thankful to respected HOD, Dr. S K Joshi,
who permitted me to do this dissertation.
Finally, I am especially thankful to almighty, my parents, my brother for being

constant motivation, love, inspiration, dedication and giving me moral support in difficult
time.
-Ramoliya Prashant Chhaganbhai
3
Table of Content
Abstract 7
List of Figure 8
List of Table 9
Chapter 1 Introduction 10-14
1.1 Overview 10
1.2 Dissertation Objectives 11
1.3 Applications 12
1.4 Literature Review 13
1.5 Organization of Dissertation 14
Chapter 2 Human Action Recognition 15-21
2.1 Introduction of Human Action Recognition Method 15
2.2 Representation 16
2.3 Support Vector Machines 18
2.4 Methods 19
2.5 Matching of Local Features 20
Chapter 3 Calories Measurement Method 22-22
3.1 Calories Measurement Method 22
4
Chapter 4 System Block Diagram 23-35
4.1 Camera System 24
4.2 Human Action Recognition Method 25
4.2.1 Video Read 25
4.2.2 Segmentation 26
4.2.3 RGB image to Gray scale image Conversion 26
4.2.4 Gaussian Convolution Kernel 28
4.2.5 Spatio-temporal Jet Descriptor 29
4.2.6 Harris Corner Detector 31
4.2.7 Langrage Multiplier 32
4.3 Human Action 34
4.4 Calculation of Average Calorie Burning 35
Chapter 5 Hardware and Software Tools 36-40
5.1 Hardware Tools 36
5.1.1 Logitech Web Camera 36
5.1.2 Twain Driver 37
5.2 Software Tools 39
5.2.1 MATLAB 39
Chapter 6 Result and Result Analysis 41-45
6.1 Result 41
6.2 Result Analysis 45
5
Chapter 7 Conclusion and Future Scope 46-47
7.1 Conclusion 46
7.2 Future Scope 47
References 48
6
ABSTRACT
A new method for calorie burning monitoring during human activity by SVM model for
human action recognition has been developed. First task is action recognition and second
to find the calorie burnt in that action. Action recognition can be defined as the ability to
determine whether a given action occurs in the video. This problem is complicated due to the high
complexity of human actions such as appearance variation, motion pattern variation, occlusions,
etc. Here action recognition is done by focusing s on a local spatio-temporal neighbourhood. In this
method local space time features capture local events in video and can be adopted the
size, frequency and velocity of moving patterns. Firstly the video is constructed in terms of
local space time features and then this representation is integrated with SV [Space Vector
Machine] classification scheme of recognition. The calorie monitoring done by identifying
type of human activity and then calculating average calories during the activity. For the
purpose of I have used video database from KTH video dataset which contains video sequences of
different human actions performed by different people different scenarios. The presented results
of action recognition justify the proposed method and demonstrate its advantage compared to
other relative approaches for action recognition
7
List of Figures
Figure 2.1 Local space-time features detected for a walking pattern. 15
Figure 2.2 Examples of matched features in different sequences. 21
Figure 2.3 Examples of matching local features for pairs of sequences with complex 21
non-stationary backgrounds.
Figure 4.1 System Block Diagram 23
Figure 4.2 Human Action Recognition Method Process Diagram 25
Figure 4.3 A RGB Image 27
Figure 4.4 Grayscale Image 28
Fig. 4.5 Examples of scale and Galilean adapted spatio-temporal interest points. 31
Figure 5.1. Logitech Web Camera (C170) 36
Figure 5.2 Twain Driver 37
Figure 5.3 MATLAB 39
Figure 6.1 Segmentation Result 41
Figure 6.2 RGB to Gray Conversion Result 42
Figure 6.3 Feature Point 43
Figure 6.4 Harris Corner Detector Result 43
Figure 6.5 Human Action Recognition Result 44
Figure 6.6 Calorie Monitoring Result 44
8
List of Table
Table 3.1 Average Calories Data Sheet 22
Table 6.1 Confusion Matrix 45
9
CHAPTER 1
INTRODUCTION
1.1 Overview: -
AT present, physical fitness is prime important. It’s no doubt that inactivity
can lead to a number of health and personal issues, including weight gain, onset of
chronic and acute illness and even low productivity in school, work and daily life.
Conversely, constant activity can prevent and may even reverse many of these issues.
Moving around – by walking, running, even fidgeting in your seat, can help boost a
person’s overall health.
For physical fitness, calorie burning during day the entire day should be
monitored. Many applications like hike messenger, fitness tracker and many more
application are available for calorie burning monitoring. But these applications have
some disadvantages like some of these applications mistake insignificant activities and
mannerisms for an activity while sometimes the application says you have worked out
a lot, but you have spent much of your time just shaking your legs underneath your
office desk.
To overcome such disadvantages, a new calorie burning monitoring system is

introduce. In these system, the calorie monitoring done by identifying type of human
activity and then calculating average calories during the activity.
For this the spatial-temporal information for human action recognitionis explored
To generate the recognition result, there are three important and necessary steps:
• Feature extraction: Raw video sequence consists of massive spatio-temporal pixel intensity
variations that contribute nothing to the action itself, such as pixels related to the color of
clothes and cluttered background. Feature extraction is a process that detects and extracts
most representative information from raw data as features.
• Feature representation: Any video sequence will generate a specific number of features, and
different video sequences will have distinctive number of features. Feature representation is a
process to give a unique representation for every video sequence based on the extracted
features. The final representation should be of the same dimension among different videos.
• Classification: Based on the calculated representation, classification is a process to assign

class labels to any unknown video sequence according to the classifier.
10
1.1 Dissertation Objectives: -
(1) To develop calories burning monitoring system.
(2) To identifying type of human action and find the how many calories burn
during human activity.
(3) To provide fast and accurate calorie burning monitoring system .
11
1.2 Applications: -
(1) In fitness tracker.
(2) In sports, to measure calorie burning during boxing, running, jumping etc.
(3) In gym, its use for monitoring calorie burning during exercise and keeping
record of your exercise statics.
12
1.4 Literature Review: -
Serval researches have carried out work on monitoring calorie burning during Human
action and different method of Human Action Recognition.
In this section a modest attempt is made to some review some of pertinent research
papers related to this topic with emphasis Human Action Recognition method.
[1] In this paper, we studied Local space-time features capture local events in video and
can be adapted to the size, the frequency and the velocity of moving patterns like human
action. In this paper we demonstrate how such features can be used for recognizing
complex motion patterns. We construct video representations in terms of local space-time
features and integrate such representations with SVM classification schemes for
recognition. For the purpose of evaluation, we introduce a new video database containing
2391 sequences of 6 human actions performed by 25 people in four different scenarios.
The presented results of action recognition justify the proposed method and demonstrate
its advantage compared to other relative approaches for action recognition.in this paper
only six types of human action can be recognition possible. We can reorganization more
action by modifying the algorithm.
[2] In this paper, A new method for Human action recognition is proposed by LBP based
dynamic texture operators. It captures the similarity of motion around key points tracked
by a semi dense point tracking method. The use of self-similarity operator allows to
highlight the geometric shape of rigid parts of foreground object in video sequence.
Inheriting from the efficient representation of LBP based methods and the appearance
invariance of patch matching method, the method is well designed for capturing action
primitives unconstrained videos. Action recognition experiments, made on serval action
academic video. We are interested in serval perspectives related to this method such as
multi scale SMPs and extension to moving backgrounds.
[3] In this paper, the recent shift in computer vision from static images to video sequence
has focused research on the understanding of action and behaviour. The lure of wireless
interface and interactive environments has heightened interest in understanding human
actions. Recently a number of approaches have appeared attempting the full 3-D
reconstruction of the human form from image sequence with the presumption that such
information would be useful and perhaps even necessary to understand the action taking
place. We develop a view based approach to the representation and recognition of action
that is designed to the full 3-D reconstruction of the human form from image sequence
with the presumption that such information would be useful and perhaps even necessary
to action taking place.
13
1.5 Organization of Dissertation: -
Chapter 1: This chapter introduce overview of the title, Dissertation Objectives,
Applications, Literature Review and organization of Dissertation etc.
Chapter 2: This chapter introduce brief overview of the Human Action Recognition
method and procedure for identifying which type of human action in video.
Chapter 3: This chapter introduce how to measure calorie burning during different
human action.
Chapter 4: This Chapter introduce system block diagram and its brief explanation.
Chapter 5: This Chapter introduce which hardware tools and software tools are used in
this dissertation.
Chapter 6: This Chapter introduce result of the dissertation result and its analysis.
Chapter 7: This Chapter introduce conclusion of Dissertation and the future scope of the
Dissertation.
14
CHAPTER 2
HUMAN ACTION RECOGNITION
2.1 Introduction of human action recognition method: -
Applications such as video retrieval and human – computer interaction require
methods for recognizing human actions in various scenarios. Typical scenarios include
scenes with cluttered, moving backgrounds, scale variations, induvial variations in
appearance and cloth of people, changes in light and view point and so forth. All of these
conditions introduce challenging problems that have been addressed in computer vision in
the past.
Recently, serval successive methods for learning and recognizing human actions
directly from image methods have been proposed like (a) Probabilistic recognition of
activity using local appearance (b) The representation and recognition of action using
temporal templates jets. (c) Recognizing action at a particular distance.
When using image measurements in terms of optic flow or spatio-temporal gradients,

however, these measurements and therefore the results of recognition may depend on the
recording conditions such as position of the pattern in the frame, spatial resolution and
relative motion with respect to the camera, camera motion etc.
Moreover, image measurement can be influenced by motions of multiple objects and

variations in the background. Whereas these problems can be solved in principle by
external mechanisms for spatial segmentation and /or camera stabilization, such
mechanisms might be unstable in complex situations.
This motivates the need of alternative video representation that are stable with
respect to changes of recording condition.
We demonstrate that action recognition can be achieved using local measurement

in terms of spatiotemporal interest points. Such features capture local motion events in
video and can be adapted to the size, the frequency and the velocity of moving patterns,
hence, resulting in video representation that are stable with respect to corresponding
transformations.
In spatial recognition, local features have recently been combined with SVM in
robust classification approach in a similar manner, here we explore the combination of
local features and SVM and apply the resulting approach to the recognition of human
actions. For the purpose of evaluation we introduce a new video database and present
results of recognizing eight human action. [1]
15
2.2 Representation: -
To represent motion patterns, we use local space time features which can be considered
as primitive events corresponding to moving two-dimensional image structures at
moments of non -constant motion (see Figure 2.1).
To detect local features in image sequence f (x, y, t), we construct its scale -space
representation
L (., 𝜎s2 ,T2) = f*g(. , 𝜎2, T2)
using Gaussian convolution
g = exp (-(x2+y2)/2 𝜎2l – t2/2T2l)/((2𝜋)3𝜎4lT2l)(1/2).
We compute the second-moment matrix using spatio-temporal image gradients 𝛻𝐿 =

(𝐿x, Ly, Lt)T within a Gaussian neighbourhood of each point
µ (. , 𝜎2, T2) = g (. ,𝑠 𝜎2, s T2) * ( 𝛻𝐿(𝛻𝐿)T ) (1)
and define positions of features by local maxima of
H = det (µ) – k trace3 (µ) over (x, y, t).
The spatio-temporal neighbourhood of features in space and time is then defined by spatial
and temporal scale parameters (𝜎, 𝑇) of the associated Gaussian kernel. The size of
features can be adapted to match the spatio-temporal extent of underlying image
structures by automatically selecting scales parameters (𝜎, 𝑇).
Moreover, the shape of the features can be adapted to the velocity of the local pattern,
hence, making the features stable with respect to different amounts of camera motion.
Here we use both of these methods and adapt features with respect to the scale and
velocity to obtain invariance with respect to the size of the moving pattern in the image as
well as the relative velocity of the camera.
Spatio-temporal neighbourhoods of local features contain information about the

motion and the spatial appearance of events in image sequences. To capture this
information, we compute spatio- temporal jets
l = (Lx, Ly, Lt, Lxx,……,, Ltttt) (2)
at the center of each feature using normalized derivatives
Lxmyntk = 𝜎m+nTk (dxmyntk g)*f
16
computed using selected values (𝜎2,T2). To enable invariance with respect to relative
camera motions, we also warp the neighbourhood of features using estimated velocity
values prior to computation of l.
K- means clustering of descriptor l in the training set gives a vocabulary of primitive events
hi. the numbers of features with labels hi in a particular sequence define a features
histogram
H = (h1,…………, hn).
We use such histograms as one alternative representation when recognizing motions in

image sequences.
Figure 2.1 Local space-time features detected for a walk
Features overlaid on selected frames of a sequence. [1]
2.3 Support Vector Machines: -

Support Vector Machines (SVMs) are state of the art large margin classifiers which
have recently gained popularity within visual pattern recognition and many others. In this
section we provide a brief review of theory behind this type of algorithm.
Consider the problem of separating the set of training data (x1, y1), (x2,y2),….(xm,ym)
into two classes, where xi 𝜖 RN is a feature vector and yi 𝜖 {-1,1} its class label. If we assume
that the two classes can be separated by a hyper plane
𝜔 *x +b =0
17
in a some space and that we have no prior knowledge about the data distribution, then the
optimal hyper plane is the one which maximization problem, using Lagrange multipliers
αi(i=1,2,3,….,m)
f(x) = sgn ( ∑𝑚
𝑖=1 αi yi K(xi, x) + b) (3)
Where, αi and b are found by using SVC learning algorithm. Those xi with nonzero αi are the
“support vectors”. For K(x, y) =x*y, this correspond to constructing an optimal separating
hyper plane in the input space RN .
Based on results reported in the literature, we use the kernel
K(x,y) = exp {-𝛾X2 (x,y)
For histogram features H, and for local features we use the kernel
̂ (Lh, Lk) + 𝐾
KL (Lh, LK) = 1/2[𝐾 ̂ (Lk, Lh)]
With
̂ (Lh, Lk) = 1 ∑𝑛ℎ

𝐾 𝑚𝑎𝑥
𝑗ℎ=1 𝑗𝑘=1,…,𝑛𝑘 {Kl (ljh, ljk)} (4)
𝑛ℎ
Where Li = { lji }nij=1 and lji is a jet descriptor of interest point j in sequence i and
<𝑥−𝜇𝑥|𝑦−𝜇𝑦>
Kl(x,y) = exp {−𝜌 (1 − ||𝑥−𝜇𝑥|| ||𝑦−𝜇𝑦||)} (5)
Where 𝜇𝑥 is the mean of 𝑥. [1]
18
2.4 Methods: -
We compare results of combining three different representations and two classifiers.
The representations are
(1) Local features described by spatio –temporal jets l of order four (LF).
(2) 128-bin histograms of local features (HistLF).
(3) Marginalized histograms of normalized spatio-temporal gradients (HistSTG)
computed at 4 temporal scale of a temporal pyramid.
In the latest approach we only used image points with temporal derivative higher than
some threshold which value was optimised on the validation set.
For the classification we use
(1) SVM with either local feature kernel in combination with LF or SVM with X2 kernel
for classifying histogram based representation HistLF and HistSTG .
(2) Nearest neighbour classification in combination with HistLF and HistSTG. [1]
19
2.5 Matching of local features: -
A necessary requirement for action recognition using the local feature kernel in
Equation (5) is the match between corresponding features in different sequence. Figure
2.2 present a few pairs of matched features in different sequence with human actions. The
pairs correspond to features with human actions.
The pairs correspond to features with jet descriptor ljh and ljk selected by maximizing
the feature kernel over jk in Equation (4). As can be seen, matches are found for similar
parts (legs, arms and hands) at moments of similar motion. The locality of descriptors
allows for matching of similar events in spite of variations in clothing, lighting, individual
patterns of motion.
Due to the local nature of features and corresponding jet descriptors, however, some
of the matched features correspond to different parts of action which are difficult to
distinguish based on local information only. Hence, there is an obvious possibility for
improvement of our method by taking the spatial and the temporal consistency of local
features into account.
The locality of our method also allow for matching similar events in sequence with
complex non-stationary backgrounds as illustrated in Figure 2.3. This indicates that local
space time features could be used for motion interpretation in complex scenes.
Successful application of local features for action recognition in unconstrained

scenes with moving heterogeneous backgrounds has recently been presented. [1]
20
Figure 2.2 Examples of matched features in different sequences. [1]
Figure 2.3 Examples of matching local features for pairs of sequences with complex non-
stationary backgrounds. [1]
21
CHAPTER 3
CALORIES MEASUREMENT METHOD
3.1 Calories Measurement Method: -

The calories burn during the human activity like running, walking, boxing, hand
clapping, hand waving, cycling is measure by various method.
In our method the average calories burn is given by
Average Calories Burn = (Type of Action) * (Average Calorie for particular type action/60)
* *(frame rate/60)
The average calories burn table is given below:
Activity Average Calories burn (calories per minute)
Jogging 5
Walking 3
Running 9.5
Boxing 7
Hand clapping 1
Hand waving 0.5
Cycling 9.8
Surfing 11.2
Table 3.1 Average Calories Data Sheet
22
CHAPTER 4
SYSTEM BLOCK DIAGRAM
Camera Calculation
Human
system Type of of average
Action
Human calorie
(Action Recognition
Action burning
video) method
system
Figure 4.1 System Block Diagram
23
4.1 Camera System (Action video): -
It captures human action and give it to the simulator.
It contains:
(1) Camera
(2) Twain Driver
24
4.2 Human Action Recognition Method: -
RGB
image Gaussian
Spatio-
to Gray Convolution Harris
Video Segment temporal Langrage
scale Kernel corner
read ation Jet Multiplier
image (Gaussian detector
descriptor
conversi smoothing)
on
Figure 4.2 Human Action Recognition Method Process Diagram
4.2.1 Video Read: -

It includes how to read video from video storage in MATLAB.
v= VideoReader(‘path name, file format’)
while hasFrame(v)
video =readFrame(v);
end
whose video
25
4.2.2 Segmentation: -
Segmentation as the partition of an image into a set of non- overlapping regions whose
union is the entire image, some rules to be followed for regions resulting from the image
segmentation can be stated as:
(1) They should be uniform and homogenous with respect to some characteristics;
(2) Their interiors should be simple and without many holes;
(3) Adjacent regions should have significantly different values with respect to the
characteristics on which they are uniform
(4) Boundaries of each segment should be simple and must be spatially accurate.
Segmentation algorithms divided into three groups:
(1) Thresholding or clustering

(2) Edge detection
(3) Region detection
If the basic 2-D still gray level image is represented by f(x, y), then the extension of 2-D
images to 3-D can be represented by f(x,y) ⟹ f(x,y,z); the extension of still image to moving
images or sequence of image can be represented by f(x,y) ⟹ f(x,y,t); a combination of the
above extensions can be represented by f(x,y) ⟹ f(x,y,z,t); and the extension of gray level
images to, for example, color images or multi-band images can be represented by by f(x,y)
⟹ f(x,y,z,t).
26
4.2.3 RGB image to Gray scale image conversion: -
Color to gray image conversion has been widely used in real world application such
as printing color images in black and white format and pre-processing in image processing.
The main reason for using gray scale representation instead of operating on color image
directly is that gray-scale simplifies algorithm and reduces computational requirements.
For many applications of image processing, color information does not help us to
identifying important edge or other features and also producing unwanted information
which could increase the number of training data required to achieve good performance.
A gray scale image is constructing of different shades of gray color. A true color
image can be converted to a gray scale image by maintaining the brightness of image.
MATLAB provides the ‘rgb2gray” function which converts RGB to gray-scale by removing
hue and saturation information.
I = rgb2gray (image)
The above function converts the true color image to gray scale image I. The RGB image
is a combination of RED, BLUE, GREEN colors. It is three dimensional image. At a particular
position call (i,j) in a image (i,j,1) produce the value of RED pixel. image (i,j,2), produce the
value of GREEN pixel. image (i,j,3), produce the value of BLUE pixel. The combination of
these primary color are normalized with R+G+B=1. This gives the neutral white color. The
gray scale image is attained from the RGB image by combining 30% of RED, 59% of GREEN,
11% of BLUE. This produces the brightness information of the image. The resulting image
will be two dimensional. The value 0 represents black and the value 255 represents white.
The range will be black and white values.
27
Figure 4.3: A RGB Image
Figure 4.4: Grayscale Image
28
4.2.4 Gaussian Convolution Kernel (Gaussian smoothing: -
Gaussian convolution is used to blur images and remove noise and detail. The
Gaussian function is used in numerous research areas:
(1) It defines a probability distribution for noise or data.

(2) It is a smoothing operator.
(3) It is used in mathematics.
When working with images we need to use the two-dimensional Gaussian function.
The Gaussian filters works by using the 2D distribution as a point spread function. This is
achieving by convolving the 2D Gaussian di8stribution function with the image. We need
to produce a discrete approximation to the Gaussian function. The theoretically requires
an infinitely large convolution kernel, as the Gaussian distribution is non-zero everywhere.
Fortunately, the distribution has approached very close to zero at about three standard
deviations from the mean. 99% of the distribution falls within 3 standard deviations. This
means we can normally limit the kernel size to contain only values within three standard
deviations of the mean.
Gaussian kernel co-efficient s is sampled from the 2D Gaussian function.
Where, 𝜎 is the standard deviation of the distribution. The distribution is assumed to have
a mean of zero. An integer valued 5 by 5 convolution kernel approximating a Gaussian with
a 𝜎 of 1 shown below.
The Gaussian filter is a non- uniform low pass filter. The kernel co-efficient diminish
with increasing distance from the kernel’s Centre. Central pixels have a higher weighting
than those on the periphery. Larger values of 𝜎 produce a wider peak. Kernel size must
29
increase with increasing 𝜎 to maintain the Gaussian nature of filter. Gaussian kernel
coefficient depends on the value of 𝜎. At the edge of the mask, coefficient must be close
to 0. The kernel is rotationally symmetric with no directional bias. Gaussian filters might
not preserve image brightness.
4.2.5 Spatio-temporal Jet descriptor: -

The idea of local features in the spatial domain, we use the notion of space-
time interest points and represent video data in terms of local space-time events. To
describe such events, we define several types of image descriptors over local spatio-
temporal neighbourhoods and evaluate these types of descriptors in the context of
recognizing human activities. we compare motion representations in terms of spatio-
temporal jets, position dependent histograms, position and principal component
analysis computed for either spatio-temporal gradients or optic flow. An
experimental evaluation on a video database with human actions shows that high
classification performance can be achieved, and that there is a clear advantage of
using local position dependent histograms, consistent with previously reported
findings regarding spatial recognition.
A local interest point approach for capturing spatiotemporal events in video data.
Consider an image sequence f and construct a spatio-temporal scale-space
representation L by convolution with a spatiotemporal Gaussian kernel
t2/2τ2) with spatial and temporal scale parameters σ and τ. Then, at any point p = (x,y,t) in
space-time define a spatio-temporal second-moment matrix µ as
(1)
where ∇L = (Lx, Ly, Lt)T denotes the spatio-temporal gradient vector and (σi = γσ,τi = γτ) are
spatial and temporal integration scales with 𝛾 = 2.
Neighbourhoods with µ of rank 3 correspond to points with significant variations of image

values over both space and time. Points that maximize these variations can be detected by
30
maximizing all eigenvalues λ1,..,λ3 of µ or, similarly, by searching the maxima of the interest
point operator H = detµ − k(traceµ)2 = λ1λ2λ3 − k(λ1 + λ2 + λ3)3 over (x,y,t) subject to H ≥ 0
with k ≈ 0.005.
To estimate the spatial and the temporal extents (σ0,τ0) of events, we maximize the
following normalized feature strength measure over spatial and temporal scales at each
detected interest point p0 = (x0,y0,t0)
(σ0,τ0) = argmax(σ2τ1/2(Lxx + Lyy) + στ3/2Ltt)2. (2)
Velocity adaptation. Moreover, to compensate for relative motion between the camera
and the moving pattern, we perform velocity adaptation by locally warping the
neighbourhoods of each interest point with a Galilean transformation using image velocity
u estimated by computing optic flow at the interest point.
Figure 2.7.1 shows a few examples of spatio-temporal interest points computed in this
way from image sequences with human activities. the method allows us to extract scale-
adaptive regions of interest around spatiotemporal events in a manner that is invariant to
spatial and temporal scale changes as well as to local Galilean transformations. [1]
boxing hand waving Walking
Fig. 4.5: Examples of scale and Galilean adapted spatio-temporal interest points.
The illustrations show one image from the image sequence and a level surface of
image brightness over space-time with the space-time interest points illustrated as
dark ellipsoids.
31
4.2.6 Harris Corner Detector: -
The Harris Corner Detector is a mathematical operator that finds features in an

image. It is rotation, scale and illumination variation independent. Harris corner
detector is used to find little patches of image that generate a large variation when
moved around. The Harris corner detector is given by
E(𝑢, 𝑣) = ∑𝑥,𝑦 𝑤(𝑥, 𝑦)[𝐼(𝑥 + 𝑢, 𝑦 + 𝑣) − 𝐼(𝑥, 𝑦)
Where, E is the difference between the original and the moved window.
u is the window’s displacement in the x direction.
v is the window’s displacement in the y direction.
w (x, y) is the window at position (x, y). This acts like a mask. Ensuring that only the desired
window is used.
I is the intensity of the image at a position (x, y).
I(x+u, y+v) is the intensity of moved window.
I(x, y) is the intensity of original window.
The matrix form of Harris corner detector is given by
𝐼𝑥2 𝐼𝑥 𝐼𝑦 𝑢
E(𝑢, 𝑣) ≈ [𝑢 𝑣] (∑ [ 2 ]) [𝑣 ]
𝐼𝑥 𝐼𝑦 𝐼𝑦
The above equation is converted into below form

𝑢
E(𝑢, 𝑣) ≈ [𝑢 𝑣] 𝑀 [ ]
𝑣
𝐼𝑥2 𝐼𝑥 𝐼𝑦
Where, M= ∑ [ ]
𝐼𝑥 𝐼𝑦 𝐼𝑦2
32
It was figured out that eigenvalues of the matrix can help determine the suitability of a
window:
R = det M –k(trace M)2
det M = 𝜆1 𝜆2
trace M = 𝜆1 + 𝜆2
In short, The Harris corner detector is just a mathematical way of determining which
window produce large variations when moved in any direction.
4.2.7 Langrage Multiplier: -

In mathematical optimization, the method of Lagrange multipliers is a strategy for
finding the local maxima and minima of a function subject to equality constraints (i.e.,
subject to the condition that one or more equations have to be satisfied exactly by the
chosen values of the variables). The great advantage of this method is that it allows the
optimization to be solved without explicit parametrized in terms of the constraints. As a
result, the method of Lagrange multipliers is widely used to solve challenging constrained
optimization problems. The method can be summarized as follows:
(1) isolate any possible singular point of the solution set of the constraining equations,
(2) find all the stationary points of the Lagrange function,
(3) establish which of those stationary points and singular points are global maxima of
the objective function.
33
4.3 Human Action: -
It specifies the output of Human action recognition method. It gives type of human
action like Running, Walking, Jogging, Cycling, Surfing etc. The type of human action gives
as a input to the calories calculation system.
34
4.4 Calculation of average calories burning system: -
It contains the average calorie data sheet for particular human action. The calories
burn during the human activity like running, walking, boxing, hand clapping, hand waving,
cycling is measure by various method.
In our method, after human action recognition based on action the average calories for
particular action will be calculated.
35
CHAPTER 5
HARDWARE AND SOFTEWARE TOOLS
5.1 Hardware Tools: -

(1) Logitech Web Camera (C170)
(2) Twain Driver
5.1.1 Logitech Web Camera: -
36
Figure 5.1 Logitech Web Camera (C170)
Technical Specification: -
Megapixel 5
Frame Rate 30
Video Resolution 640×480
Sensor Type CMOS
System Requirements: -
OS Windows 10 onwards
CPU Dual-core CPU with 1 GB RAM
5.1.2 Twain Driver: -

37
Figure 5.2 Twain Driver
Twain is an applications programming interface (API) and communication protocol that

regulates communication between software and digital imaging devices such as image
scanners and digital cameras. Twain is not a hardware level protocol, it requires a driver
called Data source for each level.
Features: -
Operating System Linux, macOS, Microsoft Windows
Plat Form x86, x86-64, PowerPC
Type Application Programming Interface
Supported Technologies: -
Twain provides support for:
(1) Production, high-speed scanning

(2) ICC colour profiles
(3) Digital Cameras
(4) Multiple operating system platforms including Windows, classic Mac OS, macOS
and LINUX.
38
5.2 SOFTWARE TOOLS: -
(1) MATLAB
5.2.1 MATLAB: -
39
Figure 5.3 MATLAB
MATLAB is a multiparadigm numerical computing environment and proprietary

programming language developed by MathWorks. MATLAB allows matrix manipulations,
plotting of functions and data, implementation of algorithms, creation of user interfaces,
and interfacing with programs written in other languages, including C, C++, C#, Java,
Fortran and Phython.
Features of MATLAB: -
(1) It is high level language for numerical computation, visualization and application
development.
(2) It also provides an interactive environment for interactive environment for iterative
exploration, design and problem solving.
(3) It provides vast library of mathematical functions for mathematical functions for
linear algebra, statics, Fourier analysis, Filtering, numerical integration, and solving
ordinary differential equations.
40
(4) It provides built in graphics for visualizing data and tools for creating custom plots.
(5) MATLAB’s programming interface gives development tools for improving code
quality maintainability and maximizing performance.
(6) It provides tools for building applications with custom graphical interfaces.
(7) It provides functions for integrating MATLAB based algorithms with external
applications and languages such as C, Java, .NET, and Microsoft Excel .
CHAPTER 6
RESULT AND RESULT ANALYSIS
6.1 Result: -
41
Figure 6.1 Segmentation Result
(a)
42
(b)
Figure 6.2 RGB to Gray Conversion Result
(a) RGB image
(b) Gray image
Figure 6.3 Feature Points
43
Figure 6.4 Harris Corner Detector Result
Figure 6.5 Human Action Recognition Result
44
Figure 6.6 Calorie Monitoring result
6.2 Result Analysis: -

Confusion between different human action like walking and jogging as well as
between jogging and running can be partly explained by high similarities of these classes
(running of some people may appear very similar to the jogging of others in some cases).
Global motion of subjects in the database is a strong for discriminating between the
leg and arm actions when using histograms of spatio-temporal gradients. This information
is cancel when representing the actions in terms of velocity adapted local features. Hence,
LF and HistLF representation can be expected to give similar recognition performance
disregarding global motion of the person relative to stationary camera.
Walking Jogging Running Boxing Hand Hand Cycling Surfing

clapping waving
Walking 83.3 16..2 0.0 0.0 0.0 0.0 1.5 0.3
Jogging 22.9 60.4 16.7 0.0 0.0 0.0 0.3 0.5
Running 6.3 38.9 54.9 0.0 0.0 0.0 0.6 0.3
45
Boxing 0.7 0.0 0.0 97.9 0.7 0.7 0.0 0.0
Hand 1.4 0.0 0.0 35.4 59.7 3.5 0.1 0.0

clapping
Hand 0.7 0.0 0.0 20.8 4.9 73.6 0.0 0.0

waving
Cycling 0.5 0.1 0.8 0.1 0.0 0.0 99.5 0.1
Surfing 0.2 0.0 0.0 0.0 0.0 0.0 0.9 95.2
Table 6.1 Confusion Matrix
CHAPTER 7
CONCLUSION AND FUTURE SCOPE
7.1 Conclusion: -
The core ideas of this thesis are to develop calories burning monitoring system which
can monitoring calorie burning during Human activity like running, walking, boxing, hand
clapping, hand waving, jogging, surfing, cycling etc. Human action identified by the local
SVM technique which described in the chapter 2 Human Action Recognition. According to
the human action how to calorie burning is measure is described in chapter 3 calorie
measurement. All the experiment shows Human action recognition identifies more
accurately than other method of Human Action Recognition and the calorie burning during
human action monitoring is far better than other method or device.
46
7.2 Future Scope: -
The work presented in this dissertation has numerous opportunities for future
research in the field of calorie burning monitoring during Human action:
 The calorie burning monitoring system have to become live calorie monitoring
system by capturing live human action using camera. This live system is helpful for
calorie monitoring during athletics, boxing etc.
 The human action recognition with non-stationary background can be possible.
47
REFERENCES: -
[1] Christian Schuldt, Ivan Laptev, Barabara Caputo: Recognizing Human Actions: A
Local SVM Approach. In: IEEE International Conference on Pattern
Recognition(ICPR’04).1051-4651/04.
[2] d'Angelo, E., Paratte, J., Puy, G., Vandergheynst, P.: Fast TV-L1 optical flow for
interactivity. In: IEEE International Conference on Image Processing (ICIP'11).
pp.1925{1928. Brussels, Belgium (September 2011)
[3] Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV’03)
[3] Aggarwal, J., Ryoo, M.: Human activity analysis: A review. ACM Comput. Surv.
16, 16:1–16:43 (2011)
[4] Kellokumpu, V., Zhao, G., Pietikäinen, M.: Human activity recognition using a
dynamic texture-based method. In: BMVC (2008)
[5] Kellokumpu, V., Zhao, G., Pietikäinen, M.: Texture based description of
movements for activity analysis. In: VISAPP (2), pp. 206–213 (2008)
48
[6] Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal
templates. PAMI 23, 257–267 (2001)
[7] Nanni, L., Brahnam, S., Lumini, A.: Local ternary patterns from three orthogonal
planes for human action classification. Expert Syst. Appl. 38, 5125–5128 (2011)
[8] Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and
rotation invariant texture classification with local binary patterns. PAMI 24, 971–
987 (2002)
[9] Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from video “in the wild”.
In: CVPR, pp. 1996–2003 (2009)
[10] J. Aggarwal and Q. Cai. Human motion analysis: A review. CVIU, 73(3):428–
440, 1999.
49

Prashant Thesis

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Prashant Thesis

Загружено:

Авторское право:

Доступные форматы

An Application of Human Action

Recognition in Calories Burning

THE MAHARAJA SAYAJIRAO UNIVERSITY OF BARODA

In Partial Fulfilment of the Requirements For the

MASTER OF ENGINEERING (ELECTRICAL)

Microprocessor System and Applications (MSA)

Ramoliya Prashant Chhaganbhai

(Roll No. – 273)

DEPARTMENT OF ELECTRICAL ENGINEERING

DEPARTMENT OF ELECTRICAL ENGINEERING

Guide Head of the Department

Mrs. Sandhya R. Sharma Dr. S. K. Joshi

I would like to express my gratitude towards members of The Maharaja Sayajirao

Finally, I am especially thankful to almighty, my parents, my brother for being

-Ramoliya Prashant Chhaganbhai

Chapter 1 Introduction 10-14

1.2 Dissertation Objectives 11

1.4 Literature Review 13

1.5 Organization of Dissertation 14

Chapter 2 Human Action Recognition 15-21

2.1 Introduction of Human Action Recognition Method 15

2.3 Support Vector Machines 18

2.5 Matching of Local Features 20

Chapter 3 Calories Measurement Method 22-22

3.1 Calories Measurement Method 22

4.1 Camera System 24

4.2 Human Action Recognition Method 25

4.2.1 Video Read 25

4.2.3 RGB image to Gray scale image Conversion 26

4.2.4 Gaussian Convolution Kernel 28

4.2.5 Spatio-temporal Jet Descriptor 29

4.2.6 Harris Corner Detector 31

4.2.7 Langrage Multiplier 32

4.3 Human Action 34

4.4 Calculation of Average Calorie Burning 35

Chapter 5 Hardware and Software Tools 36-40

5.1 Hardware Tools 36

5.1.1 Logitech Web Camera 36

5.1.2 Twain Driver 37

5.2 Software Tools 39

Chapter 6 Result and Result Analysis 41-45

6.2 Result Analysis 45

7.2 Future Scope 47

Figure 2.1 Local space-time features detected for a walking pattern. 15

Figure 2.2 Examples of matched features in different sequences. 21

Figure 4.1 System Block Diagram 23

Figure 4.2 Human Action Recognition Method Process Diagram 25

Figure 4.3 A RGB Image 27

Figure 4.4 Grayscale Image 28

Figure 5.1. Logitech Web Camera (C170) 36

Figure 5.2 Twain Driver 37

Figure 5.3 MATLAB 39

Figure 6.1 Segmentation Result 41

Figure 6.2 RGB to Gray Conversion Result 42

Figure 6.3 Feature Point 43

Figure 6.4 Harris Corner Detector Result 43

Figure 6.5 Human Action Recognition Result 44

Figure 6.6 Calorie Monitoring Result 44

Table 6.1 Confusion Matrix 45

To overcome such disadvantages, a new calorie burning monitoring system is

• Classification: Based on the calculated representation, classification is a process to assign

(1) To develop calories burning monitoring system.

(3) To provide fast and accurate calorie burning monitoring system .