Human Action Recognition Using 4W1H and

HSI 2010
Rzeszow, Poland, May 13-15, 2010
Human Action Recognition using 4W1H and

Particle Swarm Optimization Clustering
Leon Palafox1, Hideki Hashimoto2
The University of Tokyo, Institute of Industrial Science, Tokyo, Japan,
2
The University of Tokyo, Institute of Industrial Science, Tokyo, Japan,
leon@hlab.iis.u-tokyo.ac.jp, hashimoto@iis.u-tokyo.ac.jp
1
Abstract. Tracking and recording human activities have

been a major interest in the iSpace, for this purpose different
recognition and clustering techniques have been used, like
using a Learning Classifier System and data Mining
Techniques. These techniques share the common factor of
database dependence and there was actually little effort into
making the system to understand the way human were
behaving in a given time in the space. Using Artificial
Intelligence techniques, we present a work that reads and
classifies user object activity
Keywords: 4W1H, Clustering, Machine Learning, PSO
I. INTRODUCTION
Advances in networking computing, sensor technology
and robotics allow us to create more convenient
environments for humans. In that context the Intelligent
Space (iSpace) concept was proposed. The iSpace is then a
space that has ubiquitous distributed sensory intelligence
and actuators for manipulating the space and providing
useful services (Fig 1). It can be regarded as a system that
is able to support humans, i.e. users of the space, in various
ways. Actuators provide physical services as well as
information to people in the space, where sensors are used
for observing the space and gathering information. [1]
The iSpace consists of three functions, observing,
understanding, and acting. The Observing function
is the most important one, because it will deliver the
information to know what kind of services are required.
Conventionally, observation focus has been limited to the
human and robot, but there are a large number of objects in
our living environment and thus we need to analyze the
relation the users have with them.[2] In order to offer the
appropriate services using the objects, not only the physical
information of the object but also the kind of use the human
gives to it needs to be known. Such information cannot be
written beforehand and is only provided by observing the
interaction. In other words the relations among humans and
objects are important, thus, 4W1H, a paradigm where
when, who, what, where and how variables of the
environment are sensed, and are used to determine the
elements interaction information within the space. [3]
In the whole scope of human activity sensing, this has
been more focused on sensing broad movements such as
walking, sitting and laying down. Activities performed

within confined spaces using accelerometers have not had
enough mainly because they require the user to use sensors,
and using a large number of them may result invasive.[4]
Past works have proven that in order to interpret the Who
element of the 4W1H a Self Organized Map can be used for
movement training and recognition.[5] Yet organizing and
categorizing all the information retrieved from the sensors
are a necessity if we are to construct a real human action
tracking algorithm.
Then, in order to classify and use the data retrieved by the
4W1H sensing paradigm, we recur to a clustering technique
that is both flexible and reliable so that it allows us to make
a dynamic clustering around the different variables of our
system.
Fig. 1. iSpace sensor actuator architecture

Particle Swarm Optimization (PSO) Clustering has
proven to be robust when dealing with clustering untrained
elements, and has a good convergence rate when clustering
pixels.[6][7]
This paper presents an implementation of the 4W1H
sensing technique being clustered by a PSO clustering to
have activity recognition within a confined space, such as a
desk or a bed.
We are going to use following organization, first we will
introduce the basic concepts behind 4W1H and Particle
Swarm Optimization Clustering technique, then we will
describe the hardware used to do the sensing of the system,
finally we will present the algorithm as well as some
experimental results.
II. THEORETICAL FRAMEWORK

A. 4W1H
Among the activities in the iSpace the one this paper is
focused on is observation, and thus, an observation system
that is both versatile and robust, and must be able to sense
every significant variable that reflects at least a change in
the environment is needed in this kind of system. We need
an observation system that is able in some way to relate the
users with the objects, so we obtain information that is both
accurate and of significance with the current activities of
the human in the space.
On the other hand, there is information which occurs
only after a person uses the object, such as the use history or
the movement history, as in how was the interaction with
the object done in the defined time. Such information is vast,
and considering the cost it is not realistic to describe the use
history information on a large number of objects that exists
in the space beforehand. Therefore, it is necessary that the
object's information is written automatically without human
intention when the object is used by human.
The 4W1H paradigm is a data acquisition technique in
which we categorize the data in our space, in order to
reduce greatly the input data and thus the processing time.
There is no point in tracking every possible variable in the
environment, when the most basic variables are enough to
determine user usage history and human action
interpretation in a confined space.
Night
Working
Cup
User
Desk
Fig. 2. 4W1H space defined by its 5 elemental variables

Given the last statements, we try to describe
human-object relations based on following the use history
of the object via the 4W1H tracking paradigm in which we
declare a number of significant variables that are
considered to be the most important (Fig 2) when tracking
usage history of the objects, those variables are:
room, and since all the information is fed to a database we

are able to perform an analysis of the history of use of every
object, as well as the different activities human were
performing at certain moments in the space.
B. Particle Swarm Optimization Clustering
Particle swarm optimization is inspired by natures social
optimization. Where a group of particles have each a
position and a velocity and have simple instructions. These
individuals are candidate solutions. Each of them has the
ability to record its best and its neighbors best solutions to
a given fitness function. As well they are capable of
recording the global best solution of all the particle
population.[8]
The particles iteratively evaluate the fitness of the
candidate solutions and remember the location where they
had their best success. The individual's best solution is
called the particle best or the local best. The basic algorithm
consists in updating the current particle velocity and
position based on its community bests results forwards a
common optimum solution to the given function.
The actualization equations are given by:
Vid (t + 1 ) = w Vid (t) + C1 1 (Plid X id (t)) + C 2 2 (PGid X id (t))
Where C1,2 are constants that determine how much the

particle is directed towards a correct position. W is an
inertial constant, that determines the freedom of the particle
and 1,2 are random numbers between 0 and 1.
Using the basic principle of PSO, Omran et al described
[6] a clustering algorithm that proved to be as robust as
K-means and, depending on the application, faster than it.
In their algorithm the particle space is simply regarded as a
set of possible solutions cluster centroids, where each
particle has a set of K cluster centroids and is then updated
to the best distribution of the centroids. The fitness function
is defined as:
f (Z i , M i ) = w1 d max (M i , X i ) + w2 (Rmax d min (Z i )) (2)
In the above expression, Rmax is the maximum feature value
in the dataset and Mi is the matrix representing the
assignment of the patterns to the clusters of the i-th particle.
Each element mi;k;p indicates whether the pattern Xp
belongs to cluster Ck of i-th particle. The user defined
constants w1, w2, are used to weigh the contributions from
different sub-objectives. In addition:
d min (Z i ) = min {d (Vi , p ,Vi ,q )}

p , q , p q
Where: the position of the object in a given space

Who: the user of the object
What: ID of the object
When: the time of the object used
How: the way of the object used
Each parameter provides information that allows us to

know location, object interaction and human activity in the
(1)
X id (t + 1 ) = X id (t) + Vid (t + 1 )
(3)
and,
d max = max d (X p , Vi ,k ) ni ,k
k1, 2 ,K, K
X p Ci ,K
(4)
is the minimum Euclidean distance between any pair of

clusters. In the above, ni;k is the number of patterns that
belong to cluster Ci,k of particle i. The fitness function is
then a multi-objective optimization problem, which

minimizes the distance within the cluster, maximizes
separation between clusters.
III. HARDWARE
ALGORITHM DESCRIPTION
The designed algorithm consists in 2 main blocks (Fig 4),

the sensing segment in which we retrieve the data from the
set of sensors and the clustering segment in which we apply
the PSO Clustering technique to have an interpretation of
the current or past activity of the user.
In the first part we obtain the data from our set of sensors
and depending on the sensor we are retrieving the data from,
we are going to allocate the sensing values in each of the
variables of the 4W1H architecture. RFID tags will feed the
Who and What variables, the IP of the machine is going to
feed the Where variable, a running clock will feed the
When variable and finally the How variable will be fed by a
SOM previously trained whose results and conditions are
discussed in [5].
WHERE
WHO
WHAT
Sensors
B. Hardware Disposition
PSO
Clustering
A. Hardware Description
To perform the experiments we used an MTx sensor
from the company Xsens, which is a small and accurate
3DOF inertial Orientation Tracker. It provides drift-free 3D
orientation as well as cinematic data: 3D acceleration, 3D
rate of turn (rate gyro) and 3D earth-magnetic field [9]. The
system contains nine sensors which can be interlinked with
each other in order to obtain a more complex set of data out
of one specific object, as well as to provide a good
architecture for setting referenced cinematic systems. [9]
The data is retrieved using the Matlab toolbox that comes
with the product, which allows us to acquire in real time all
the needed data form the sensors.
The computer used was a computer with an Intel Core 2
Duo processor running Windows XP Professional Edition
and Matlab 7.0.
We will be using as well RFID tags equipped each with a
rough accelerometer for the implementation of the 4W1H
architecture.
IV.
WHEN
To perform the experiments we asked the users to wear 2

sets of sensors, first we attached the XSens sensing system
to each of them wrists (Fig 3). Then each of them wears an
RFID tag which was paired with an ID in a database that
allows us to know which user is performing the current
actions.
We fixed tags in different objects in the space as well,
that in the same way as in the users, are paired in a database
in order to keep track of their usage. Whenever an object is
being moved, the accelerometer in the tag will allows us to
know when and which object was the one moved.
Fig. 4. Algorithm Description

Each variable of the 4W1H will feed a PSO Clustering
algorithm that will be trained before hand in order to
recognize a certain set of variables, those being, users,
times, objects, etc. The PSO Clustering will be tuned before
hand using similar variables with a set of predefined users.
The clustering objective is categorize new data that
enters the sensors without being previously trained, it will
be detected and properly clustered for recognition from the
detection system, so a small number of users can train a
system for a larger user number, and a small set of objects
can define a system of any number of objects as long as
they are used in similar ways.
MTx
ZPS
Z
HOW
Y
X
Fig. 3. Sensor setting on the body

We used a fixed IP computer network to be able to track
where the user were performing his current work. We had
every computer set for a wakeup alarm to be sent to the
server if a new user logged.
At the end, the output of the algorithm will have a

graphical representation of the data clusters with their
centroids represented, each of these centroids, depending
on the query, will show clustering around time, user,
objects, etc.
V. EXPERIMENTS
A. Experiment Design
When designing the sensing experiments, we defined a
set of actions that were found to be the most common ones
in the sets we decided. As we justified before we are going
to use a small set of actions since the objective is to test the
training capabilities of the system.
Table 1 shows the sets and number of Users, actions,
places, etc that we chose for the experiments. This data is
used to train the SOM [5] for the How segment of the
algorithm as well as for the final training.
We had each user performing a set of each action with
different objects at different times in different places, in
such a way that not a pair of users may have the same set of
activities. The objective is to recognize data from a user in a
place he did not sample data for, for example if User 1
never performed a reading action in the Bed, the system
would be able to do recognize it, since it can detect the
reading action from that user regardless of the place where
he is performing it.
TABLE 1. VARIABLE SELECTION
User
Action
Place
Time
Object
User 1
User 2
User 3
User 4
User 5
User 6
User 7
User 8
User 9
User 10
Drink
Work
Texting
Read
Desk
Bed
Table
Morning
Noon
Afternoon
Night
Cup
Glass
Mouse
Keyboard
Mobile
Book
Magazine
We as well selected a set of objects for each user to

interact with, since time restrain keeps every user from
using every object; we had different set of users interacting
with different objects.
B. Parameter tuning and selection
For the particle swarm optimization algorithm we choose
the conventionally used C1,2 = 1.8 since values near 2 have
proven to have good results in the past. [6] ,W has a static
value of 0.7, since we want to keep the particles from
varying too much between iterations, 1,2 are random
numbers between 0 and 1.
For the clustering algorithm we set the weights constants
w1 and w2 to be of 0.5 and 0.9 respectively due to giving
priority to having a well define set of clusters rather than
the intra cluster distance.
The experiments parameters were mostly selected after
an empiric set of tests with the collected data was
performed, in order for them to show the best possible
results, different parameters may be used if a different set of
data were to be presented to the system.
C. Results
In table 2 we show the results of the system, the first
column being the variable we clustered around, the second
one being the number of users we had the system trained
with, then the error rate and finally the time elapsed for the
data to be clustered.
TABLE 2. RESULT TABLE USING DIFFERENT VARIABLES
Cluster
#Users
Error %
Time [s]
Time
Time
Object
Object
User
User
Action
Action
Places
Places
5
10
5
10
5
10
5
10
5
10
0
0
5
2
10
0
7
2
0
0
3
4
3
4
3
4
3
4
3
4
We can clearly see in the table how a larger number of

trained users are directly represented as a less number of
errors when doing the clustering, since more data was feed
to the system. Less trained users generate errors because the
objects and actions were randomly assigned in the sensing
so some actions or objects might not even be trained in the
system for some specific test. This reflects clearly when
clustering around objects, and actions, being those 2 the
clusters that presented more error rate.
As well, when clustering around Time and Places, we
found no mistakes, since we handle the sensing to have
always elements of these variables, that is, at least one
sensing was done in each of them. So there was always data
related to them. Also the system is very straightforward
sensing these variables and presents little challenge to
cluster them correctly
Finally, it is worth mentioning that the error related to
users was obviously zero when having all of our users using
the system, but it became 10%, a rather high number, when
only half of them trained the system, this being probably
due to the fact that some of the users might have an entirely
different way of performing actions and the system has
more misfires because of this.
In Figure 5, we have done a plot that shows a clustering
around places using a total number of 10 users and a
particle set of 4 centroids, we can clearly see how while the
centroids are not directly over the cluster centers, they do
clearly separate the four variables sensed, this being due to
us selecting a preference over the system being able to
differentiate form each cluster rather than the centroids
having an close distance to its particles in equation 2.
going to improve the training time by using compressed or

under sampled signals.
REFERENCES
[1]
[2]
[3]
[4]
Fig 5. Cluster result around place variable

[5]
In Figure 6, we present the clustering graph obtained

when the attention centre was the time and we selected 4
centroids, we did this to show (in green)how the system
discriminates an excessive number of centroids, yet the
restricting variables we choose prevents the extra centroids
from going farther than the set limits.
[6]
[7]
[8]
[9]
Fig 6. Cluster result around time variable

This is an important point because an excessive number
of centroids may translate in an excessive number of
misfires, referring again to Figure 6, the points near the
green centroids might be represented as errors due to them
being part of an undefined centroid, a system to prevent
this from happening is yet to be designed.
VI. CONCLUSION
We have presented a human action recognition system
using 4W1H for sensing the variables and a later clustering
using a technique defined by the PSO optimization
algorithm originally used for pixel clustering.
We have shown that while having some problems
identifying untrained variables, the system proved to be
robust enough to detect correctly most of the trained data
without errors. The system lacks the non invasive factor
that remote sensing via video cameras might provide.
Future work will extent the system to the use of cameras
for object detection as well as a face recognition algorithm
to avoid the use of ID tags for every object, as well we are
J.H. Lee and H. Hashimoto, Intelligent Space concept and

contents, Advanced Robotics, vol. 16, 2002, pp. 265280.
T. Sasaki and H. Hashimoto, Human Observation Based Mobile
Robot Navigation in Intelligent Space", the Proc. of IEEE/RSJ Int.
Conf. on Intelligent Robots and Systems 2006, pp. 10441049.
M. Niitsuma, K. Yokoi, and H. Hashimoto, Describing
Human-Object Interaction in Intelligent Space., Proceedings of the
2nd conference on Human System Interactionspp 392-396, Catania,
Italy
J. Yamato, J. Ohya, and K. Ishii, Recognizing human action in
time-sequential images using hidden Markov model, Proc. Comp.
Vis. and Pattern Rec, 1992, pp. 379385.
L. Palafox and H. Hashimoto, A Movement Profile Detection
System Using Self Organized Maps in the Intelligent Space., Tokyo,
Japan: IEEE Workshop on Advnaced Robotics and its Social
Impacts, 2009, p. 114.
M. Omran, A. Salman, and A.P. Engelbrecht, Image classification
using particle swarm optimization, , Proceedings of the 4th
Asia-pacific Conference on Simulated Evolution and Learning,2002,
pp. 370374.
A. Abraham, S. Das, and S. Roy, Swarm intelligence algorithms for
data clustering, Soft Computing for Knowledge Discovery and
Data Mining, O. Maimon and L. Rokach (Eds.), Springer Verlag,
Germany, 2007, pp. 279313.
J. Kennedy, R.C. Eberhart, and others, Particle swarm
optimization, , Springer 2007
http://www.xsens.com/, Xsens,

Human Action Recognition Using 4W1H and

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Human Action Recognition Using 4W1H and

Загружено:

Авторское право:

Доступные форматы

HSI 2010

Rzeszow, Poland, May 13-15, 2010

Human Action Recognition using 4W1H and

Abstract. Tracking and recording human activities have

walking, sitting and laying down. Activities performed

Fig. 1. iSpace sensor actuator architecture

II. THEORETICAL FRAMEWORK

Fig. 2. 4W1H space defined by its 5 elemental variables

room, and since all the information is fed to a database we

Where C1,2 are constants that determine how much the

d min (Z i ) = min {d (Vi , p ,Vi ,q )}

Where: the position of the object in a given space

Each parameter provides information that allows us to

is the minimum Euclidean distance between any pair of

then a multi-objective optimization problem, which

The designed algorithm consists in 2 main blocks (Fig 4),

To perform the experiments we asked the users to wear 2

Fig. 4. Algorithm Description

Fig. 3. Sensor setting on the body

At the end, the output of the algorithm will have a

We as well selected a set of objects for each user to

We can clearly see in the table how a larger number of

going to improve the training time by using compressed or

Fig 5. Cluster result around place variable

In Figure 6, we present the clustering graph obtained

Fig 6. Cluster result around time variable

J.H. Lee and H. Hashimoto, Intelligent Space concept and

Вам также может понравиться