Вы находитесь на странице: 1из 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/323667856

Dance Pose Identification from Motion Capture Data: A Comparison of


Classifiers

Article · March 2018


DOI: 10.3390/technologies6010031

CITATIONS READS

19 473

6 authors, including:

Athanasios Voulodimos Anastasios Doulamis


University of West Attica National Technical University of Athens
131 PUBLICATIONS   1,265 CITATIONS    408 PUBLICATIONS   4,283 CITATIONS   

SEE PROFILE SEE PROFILE

Stephanos Camarinopoulos

14 PUBLICATIONS   62 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

ROBO-SPECT EC project View project

4D-CH-World View project

All content following this page was uploaded by Athanasios Voulodimos on 09 March 2018.

The user has requested enhancement of the downloaded file.


technologies
Article
Dance Pose Identification from Motion Capture Data:
A Comparison of Classifiers †
Eftychios Protopapadakis 1, * ID , Athanasios Voulodimos 2 ID , Anastasios Doulamis 1 ,
Stephanos Camarinopoulos 3 , Nikolaos Doulamis 1 and Georgios Miaoulis 2
1 School of Rural and Surveying Engineering, National Technical University of Athens,
15780 Zografou, Greece; adoulam@cs.ntua.gr (A.D.); ndoulam@cs.ntua.gr (N.D.)
2 Department of Informatics, Technological Educational Institute of Athens, 12243 Egaleo, Greece;
avoulod@teiath.gr (A.V.); gmiaoul@teiath.gr (G.M.)
3 RISA Sicherheitsanalysen GmbH, 10707 Berlin, Germany; s.camarinopoulos@risa.de
* Correspondence: eftprot@mail.ntua.gr
† This paper is an extended version of our paper published in Proceedings of the 10th International
Conference on PErvasive Technologies Related to Assistive Environments (PETRA 2017),
Island of Rhodes, Greece, 21–23 June 2017.

Received: 18 December 2017; Accepted: 6 March 2018; Published: 9 March 2018

Abstract: In this paper, we scrutinize the effectiveness of classification techniques in recognizing


dance types based on motion-captured human skeleton data. In particular, the goal is to identify poses
which are characteristic for each dance performed, based on information on body joints, acquired
by a Kinect sensor. The datasets used include sequences from six folk dances and their variations.
Multiple pose identification schemes are applied using temporal constraints, spatial information,
and feature space distributions for the creation of an adequate training dataset. The obtained results
are evaluated and discussed.

Keywords: pose identification; Kinect sensor; motion capture device; classification; dance analysis

1. Introduction
Intangible cultural heritage (ICH) is a major element of peoples’ identities and its preservation
should be pursued along with the safeguarding of tangible cultural heritage. In this context, traditional
folk dances are directly connected to local culture and identity [1]. Recent technological advancements,
including ubiquitous mobile devices and applications [2], pervasive video capturing sensors and
software, increased camera and display resolutions, cloud storage solutions, and motion capture
technologies have completely changed the landscape and unleashed tremendous possibilities in
capturing, documenting and storing ICH content, which can now be generated at a greater volume and
quality than ever before. However, in order to exploit the full potential of the massive, high-quality
multimodal (text, image, video, 3D, mocap) ICH data that are becoming increasingly available, we need
to appropriately adapt state-of-the-art technologies, and also build new ones, in the fields of artificial
intelligence (AI), computer vision, and image processing. Such progress is essential for the ICH—in our
case, dance—content’s efficient and effective organization and management, fast indexing, browsing,
and retrieval, but also semantic analysis, such as automatic recognition [3,4] and classification [5,6].
Furthermore, the advent of motion sensing devices and depth cameras has brought about new
possibilities in applications related to motion analysis and monitoring, including human tracking,
action recognition, and pose estimation. The main advantage of a depth camera is the fact that it
produces dense and reliable depth measurements, albeit over a limited range and offers balance
in usability and cost. The Kinect sensor has been frequently used in such applications and will be

Technologies 2018, 6, 31; doi:10.3390/technologies6010031 www.mdpi.com/journal/technologies


Technologies 2018, 6, 31 2 of 16

employed in this work to capture sets of dance moves and gestures in 3D space and in real time,
resulting in a recorded sequence of points in 3D space for each joint at certain moments in time.
This paper focuses on the evaluation of classification algorithms on Kinect-captured skeleton
data from folkloric dance sequences for dance pose identification. We explore the applicability of
raw skeleton data from a single low-cost sensor for determining dance genres through well-known
classifiers. This paper extends the work presented in [7], in that multiple pose identification schemes
are applied using temporal constraints, spatial information, and feature space distributions.
The remainder of this paper is structured as follows: Section 2 briefly reviews the state of the art
in the field; Section 3 describes the methodology employed for motion capturing, data preprocessing
and feature extraction, while Section 4 presents the classifiers whose applicability for dance pose
identification are explored; the related experimental evaluation is given in Section 5; and, finally,
Section 6 concludes the paper with a summary of findings.

2. Related Work
Starting with a brief review of approaches proposed for the more general problem of human pose
estimation in computer vision, one could note that many techniques are based on the detection of
body parts, for example, through pictorial structures [8]. The advent of deep learning [9] has brought
forward two main groups of methods: holistic and part-based ones, which differ in the way the input
images are processed. The holistic processing methods do not create a separate model for every part.
DeepPose [10] is a holistic model that handles pose determination as a joint regression problem without
formulating a graphical model. A drawback of holistic-based methods is that they are often inaccurate
in the high-precision region due to the difficulty in learning direct regression of complicated posture
vectors based on images.
Part-based processing methods focus on detecting the human body parts individually, followed
by a graphic model to incorporate the spatial information. In [11], the authors, instead of training
the network using the whole image, use the local part patches and background patches to train
a convolutional neural network (CNN), in order to learn conditional probabilities of the part presence
and spatial relationships. In [12], a multiresolution CNN is designed to carry out body-part-specific
heat-map likelihood regression, which is in the sequel succeeded by an implicit graphic model for
assuring joint consistency.
As regards the more specific field of dance pose and move analysis, there is a relatively limited
number of works. In [13] a gesture classification system is described for skeletal wireframe motion
for certain gestures, among several dozen, in real-time and with high accuracy. In [14], a simple
non-parametric Moving Pose framework is proposed, for low-latency human action and activity
recognition. A method to recognize individual persons from their walking gait using 3D skeletal
data from a MS Kinect device using the k-means algorithm is described in [15], while a key posture
identification method is proposed in [16].
In [17], a methodology is proposed for dance learning and evaluation using multi-sensor and
3D gaming technology. In [18], a 3D game environment for dance learning is presented, which is
based on the fusion of multiple depth sensors data in order to capture the body movements of the
user/learner. In [19], improved robustness of skeletal tracking is achieved by using sensor data fusion
to combine skeletal tracking data from multiple sensors. The fused skeletal data is split into different
body parts, which are then transformed to allow view invariant pose recognition using a Hidden
State Conditional Random Field (HCRF). The proposed framework is tested on traditional “Tsamiko”
folk dance sequences. The attained recognition rates range from 38.4% up to 93.9% depending on
the particularities of the dancer and the experimental setup. In [20], a skeletal representation of the
dancer is again obtained by using data from multiple depth sensors. Using this information, the dance
sequence is partitioned, first, into periods and, subsequently, into patterns.
In [21], human action recognition is treated as a special case of the general problem of
classifying multidimensional time-evolving data in dynamic scenes. To solve detection correlations
Technologies 2018, 6, 31 3 of 16

between channels, a generalized form of a stabilized higher-order linear dynamical system and the
multidimensional
Technologies 2018, 6, xsignal is REVIEW
FOR PEER represented as a third-order tensor. The work of [22] focuses 3 of 16on the
application of segmentation and classification algorithms to Kinect-captured depth images and videos
application
of folkloric dancesof segmentation and classification
in order to identify key movementsalgorithms to Kinect-captured
and gestures and comparedepththemimages
againstand
database
instances. However, that work considers individual joints for the analysis, rather than theagainst
videos of folkloric dances in order to identify key movements and gestures and compare them entire body
database instances. However, that work considers individual joints for the analysis, rather than the
pose, attaining recognition rates up to 42% in the general case.
entire body pose, attaining recognition rates up to 42% in the general case.
The contribution of the paper at hand is twofold. Firstly, it extends the work of [7] by exploiting
The contribution of the paper at hand is twofold. Firstly, it extends the work of [7] by exploiting
the information
the information of ofmultiple
multiplejoints
jointssimultaneously
simultaneously andandinvestigates
investigates whether
whether temporal
temporal dependencies
dependencies
can be modeled using consecutive frame subtraction. Secondly, unlike
can be modeled using consecutive frame subtraction. Secondly, unlike [7], multiple [7], multiple pose identification
pose
schemes are applied using temporal constraints, spatial information, and feature space
identification schemes are applied using temporal constraints, spatial information, and feature space distributions.
distributions.
3. Data Capturing and Dance Representation
3. Data Capturing and Dance Representation
A three-step approach is adopted for the evaluation of dance pattern over traditional folk dances:
(i) motion A three-step
capturing,approach
(ii) datais adopted for the evaluation
preprocessing of dance
and feature pattern over
extraction, traditional
followed folk dances:
by (iii) comparative
(i) motion capturing, (ii) data preprocessing and feature extraction, followed
evaluation among well-known classification techniques. Motion capturing is performed using by (iii) comparative
evaluation among well-known classification techniques. Motion capturing is performed using
markerless, low-cost sensors. The motion sensors provide as an output the position and the rotation of
markerless, low-cost sensors. The motion sensors provide as an output the position and the rotation
specific body joints at a constant frame rate. The available information is processed to form low-level
of specific body joints at a constant frame rate. The available information is processed to form low-
features which will
level features bewill
which used as inputs
be used to the
as inputs dance
to the dancerecognition mechanism.
recognition mechanism. TheThe
problemproblem at hand,
at hand,
i.e., i.e.,
dance dance recognition, constitutes a traditional multi-class classification problem. Given a frame, orframe,
recognition, constitutes a traditional multi-class classification problem. Given a
or sequence
sequence of of frames,
frames,during
during thethe performance
performance of a dancer,
of a dancer, our goalour
is togoal is to identify
correctly correctly identify
which dancewhich
dance is performed.
is performed.

3.1. 3.1.
Capturing Dance
Capturing Poses
Dance Poses
Microsoft
Microsoft Kinect
Kinect 2 2isiscurrently
currently one
one of
of the
the most
mostadvanced
advanced motion sensing
motion inputinput
sensing devices that isthat is
devices
available to the public. It is a physical device with depth sensing technology, a built-in color camera,
available to the public. It is a physical device with depth sensing technology, a built-in color camera,
infrared (IR) emitter, and microphone array, which projects and captures an infrared pattern to
infrared (IR) emitter, and microphone array, which projects and captures an infrared pattern to estimate
estimate depth information. Based on the depth map data, the human skeleton joints are located and
depth information. Based on the depth map data, the human skeleton joints are located and tracked via
tracked via the Microsoft Kinect 2 for Windows SDK [23]. Figure 1 shows a snapshot of our
the Microsoft Kinect 2 for Windows SDK [23]. Figure 1 shows a snapshot of our experiment conducted.
experiment conducted.

Figure 1. The dance capturing process. Image on the left demonstrates the sensor position. On the
Figure 1. The dance capturing process. Image on the left demonstrates the sensor position. On the
right, we can see the dancer while acting.
right, we can see the dancer while acting.
More specifically, the Microsoft Kinect 2 sensor can achieve real-time 3D skeleton tracking while,
atMore
the same time, it is
specifically, relatively
the Microsoftcheap and 2easy
Kinect to set
sensor can upachieve
and use.real-time
The tracked 3D skeleton
skeletonconsists
tracking of while,
twenty five joints with each one to include the 3D position coordinates, its rotation
at the same time, it is relatively cheap and easy to set up and use. The tracked skeleton consists of and a tracking
state property: “Tracked”, “Inferred”, and “UnTracked” [24]. Moreover, the sensor can work in dark
twenty five joints with each one to include the 3D position coordinates, its rotation and a tracking
and bright environments and the capture frame rate is 30 fps. On the other hand, there are some
state property: “Tracked”, “Inferred”, and “UnTracked” [24]. Moreover, the sensor can work in dark
limitations that should be considered: it is designed to track the front side of the user and, as a result,
and the
bright
frontenvironments
and back sides andof thethe capture
user cannot frame rate is 30 and
be distinguished, fps.that
Onthethemovement
other hand,
areathere are some
is limited
limitations that should be considered: it is designed to track the front side of the user
(approximately 0.7–6 m). In this work, the entirety of the captured joints have been used in the featureand, as a result,
the front and process
extraction back sides
(see of
[2] the
for auser cannot
graphical be distinguished,
presentation and
of the exact that the movement area is limited
joints).
Technologies 2018, 6, 31 4 of 16

(approximately 0.7–6 m). In this work, the entirety of the captured joints have been used in the feature
extraction process (see [2] for a graphical presentation of the exact joints).

3.2. Identifying Key Poses


There are multiple sources of variation when investigating dance patterns. At first, there are
temporal variations that affect the movement speed, the main cause is mainly the music tempo.
Another source of variance is the dancer himself. The body build varies from person to person.
As such, joints positions span different space despite the same choreography. Furthermore, dancer
mentality often adds a personalized touch to the performance. In dances with pre-defined steps,
this leads to minor changes in positioning, e.g., different hand movements, legs bend more than
expected, and denser body rotations.
When building analytical predictive models for dance analysis, all possible sources of variation
must be included in the training dataset, so that the model can provide reliable predictions for new
instances; the training data should have a greater variation in feature attributes than the data to be
analyzed. Crucial factors influencing the predictive performance of classification models, involve
outliers, low-quality features, as well as differences in the size of the classes. In our case three sampling
approaches for the creation of an adequate training dataset are employed: temporally-constrained,
cluster-based, and uniform feature space selection.
Temporally-constrained selection divides a dance sequence into consecutive clusters by taking
into account factors related to both the dance itself and the motion capture device parameters. In each
of the initially-created clusters, a density-based approach, i.e., OPTICS algorithm output analysis,
identifies possible outliers and representative samples. Since similar instances are likely to be clustered
together, the few random samples from each cluster are expected to provide adequate information,
over the entire dataset. In this context, density-based approaches have often been employed for more
effective data selection [25].
The classic Kennard Stone algorithm is a uniform mapping algorithm; it yields a flat distribution
of the data. It is a sequential method that uniformly covers the experimental region. The procedure
consists of selecting, as the next sample (candidate object), the one that is most distant from those
already selected (calibration objects). For initialization, one can select either the two observations that
are most distant from each other, or, preferably, the one closest to the mean.
From all the candidate points, the one is selected that is furthest from those already selected
and from its closest neighbors, and added to the set of calibration points. To do this, we measure the
distance from each candidate point x0 to each point x which has already been selected and determine
which is smallest, i.e., mind(x, x0 ). From these we select the one for which the distance is the maximum:
i
 
dselected = max mind(x, x0 ) . (1)
i0 i

4. Classifiers for Dance Pose Identification


We have scrutinized the effectiveness of a series of well-known classifiers in dance recognition
from skeleton data. In this section, the investigated classification techniques are briefly described.

4.1. k Nearest Neighbors


The k-nearest neighbors (k-NN) algorithm is a non-parametric method used for classification [26].
A majority vote of its neighbors classifies an object, with the object being assigned to the class most
common among its k nearest neighbors; it is, therefore, a type of instance-based learning, where the
function is only approximated locally and all computation is deferred until classification.
Technologies 2018, 6, 31 5 of 16

4.2. Naïve Bayes


Naive Bayes (NB) classifiers are a family of simple probabilistic classifiers based on applying
Bayes’ theorem with strong independence assumptions between the features [27]. Abstractly, naive
Bayes is a conditional probability model: given a problem instance to be classified, represented by
a vector x = {x1 , . . . , xn } representing some n features (independent variables), it assigns to this
instance probabilities p(Ck |x1 , . . . , xn ) for each of K possible outcomes or classes Ck .

4.3. Discriminant Analysis


Discriminant analysis is a statistical analysis method useful in determining whether a set of
variables is effective in predicting category membership. Discriminant analysis (Discr) classifiers
assume that different classes generate data based on different Gaussian distributions [28] so that
p(x|y = Ck ) ∼ N (µk , ), k = 1, . . . , K. In order to train such a classifier, we need to estimate the
parameters of a Gaussian distribution for each class. Then, to predict the classesnof new data, the trained
o
µ T
classifier finds the class with the smallest misclassification cost: ŷ = argmax x − 2k β k + log πk ,
k
−1
where β k = µk .

4.4. Classification Trees


Decision tree learning uses a decision tree as a predictive model which maps observations about an
item to conclusions about the item’s target value. In classification tree structures, leaves represent class
labels and branches represent conjunctions of features that lead to those class labels [27]. Each internal
(non-leaf) node is labeled with an input feature. The arcs coming from a node labeled with a feature
are labeled with each of the possible values of the feature. Each leaf of the tree is labeled with a class or
a probability distribution over the classes.

4.5. Ensemble Methods


Ensembles of classifiers is, actually, a combination of classifiers approach [29]; such methods use
multiple learning algorithms to obtain better predictive performance than could be obtained from any
of the constituent learning algorithms alone. In the case of classification trees, we have further used
the random forests algorithm (denoted as TreeBagger).

4.6. Support Vector Machines


Support vector machines (SVMs) are supervised learning models with associated learning
algorithms [30]. An SVM model is a representation of the examples as points in space, mapped
so that the examples of the separate categories are divided by a clear margin that is as wide as possible.
New examples are then mapped into that same space and predicted to belong to a category based on
which side of the margin they fall on.

5. Experimental Results
In order to capture and record the performers’ body motions, we used a motion capture system
using one Kinect 2 depth sensor and the i-Treasures Game Design module (ITGD), developed in the
context of the i-Treasures project [31]. The ITGD module enables the user to record and annotate
motion capture data received from a Kinect sensor.
The recording process took place at the School of Physical Education and Sport Science of the
Aristotle University of Thessaloniki. Six Greek traditional dances with a different degree of complexity
were recorded. Each dance was performed by three experienced dancers twice: the first time in
a straight line, and the second in a semi-circular curving line. Dancers’ movements were limited in
a predefined rectangular area.
Technologies 2018, 6, 31 6 of 16

Experimental results are based on a set of 648 observations. A dancer is selected to provide
the training paradigms, the remaining two provide the test sets. Few representative data samples
are selected (see Section 5.3) to form the training set, using three different sampling approaches.
Then, we have a total of eight test sets, using a 20% holdout approach in each. The problem at hand is
a standard multiclass classification problem. We have six classes (as the number of dances).
The investigation emphasizes on the dance recognition per recorded frame. The performance
impact of the following factors is investigated:

1. Classifier input type: related to the input features’ values. The possible alternatives for the
creation of input features are four: (i) leg joints per frame (1Fr Legs), (ii) leg joints and frame
difference (FrDiff Legs), (iii) all joints per frame (1Fr All) and, (iv) all joints and frame difference
(FrDiff All).
2. Projection techniques: related to the dimensionality of inputs. There are two alternatives: PCA or
raw data.
3. Sampling approaches: related to training sets creation. There are three approaches: (i) random
sampling over kmeans clusters (K-random), (ii) time constrained OPTICS (TC-OPTICS),
and (iii) Kennard Stone (KenStone).
4. Classifier: i.e., the classification technique used, i.e., k-nearest neighbors (k-NN), naïve Bayes
(NB), classification trees (CT), linear kernel support vector machines (SVMs), a random forest
approach (TreeBagger), as well as Ensemble (Ens) versions.

In order to identify the impact of each parameter on the final classification scores, ANOVA
analysis has been performed (Section 5.5).

5.1. Dataset Description


The dances dataset consists of six different dances. Their execution was either in a straight line or
circle (Table 1). A set of consecutive image frames describes every dance. Every frame, Ii,i = 1, . . . , n, has
a corresponding extensible mark-up language (XML) file with positions, rotations, and confidence scores
for 25 joints on the body, in addition to timestamps. In the following a brief description of the dances
is provided.
Enteka (eleven): A dance, performed by both women and men, which is popular mainly in the
large urban centers of Western Macedonia (Grevena, Kozani, Florina, Kastoria, etc.). The dance is
performed freely as a street carnival dance, but also around the carnival fires. The dancers’ hands
during the dance move freely or are placed at the waist.
Kalamatianos: It is a popular Greek folkdance throughout Greece, Cyprus and internationally, often
performed at many social gatherings worldwide. It is a circle dance performed in a counterclockwise rotation
with the dancers holding hands. It is a twelve steps dance and the musical beat is 7/8. Makedonikos:
A circle dance, performed by both women and men, with a 7/8 musical beat. The basic pattern of
dance is performed in twelve movements/steps. Therefore, it resembles the Kalamatianos dance to
a great degree with the difference that it is a more joyous dance. It is popular in the region of Western
and Central Macedonia.
Syrtos (two-beat): The Syrtos (two-beat) dance is organized in a quick (two-beat) rhythm. It is
a circle dance, performed by both women and men mostly in the region of Pogoni of Epirus. In the
past, the dance was performed separately by men and women, in one, two, or more lines.
Syrtos (three-beat): Syrtos is one of the most popular dances throughout Greece and Cyprus. The Syrtos
(three-beat) dance is organized in a slow (three-beat) rhythm. It is a line dance and a circle dance, performed
by dancers (both women and men) in a curving line holding hands, facing right. It is widespread through
Epirus, Western Macedonia, Thessaly, Central Greece, and Peloponnese.
Technologies 2018, 6, 31 7 of 16

Table 1. Number of representative frames per dance when using TC-OPTICS sampler. This example treats straight and circular trajectories as different cases.

KalCirc KalStr8 MakCirc MakStr8 Syrt2Circ Syrt2Str8 Syrt3Circ Syrt3Str8 Syrt11Str8 TrehCirc TrehStr8
Single Frame Legs Only
D1 18 10 11 7 21 19 44 24 19 34 10
D2 19 11 19 10 17 19 31 21 22 22 10
D3 18 14 11 15 9 10 30 16 25 16 11
Frame Difference Legs Only
D1 17 8 14 8 19 18 39 21 18 28 10
D2 16 11 18 11 15 17 32 21 18 17 8
D3 16 14 10 10 11 9 32 16 18 12 10
Frame Difference All Joints
D1 18 8 14 8 18 18 44 20 18 27 8
D2 18 10 16 8 15 21 30 21 22 17 8
D3 17 13 9 11 9 9 34 16 21 12 9
Technologies 2018, 6, 31 8 of 16

Trehatos (Running): A circle dance, performed by both women and men, which is danced in the
village Neochorouda of Thessaloniki. The kinetic theme of the dance is composed of three different
dance patterns. The first one resembles the Syrtos (three-beat) pattern, the second takes place once and
connects the first and
Technologies the
2018, 6, x FORsecond pattern, and the third one is characterized by intense8motor
PEER REVIEW of 16 activity.
An illustration of Syrtos dance is shown in Figure 2. At first the positions and rotation values for
each frame, Ii An,i= illustration of Syrtos dance is shown in Figure 2. At first the positions and rotation values for
1, . . . , n of a dance, with n consecutive frames, are extracted. Thus, the dance is
each frame, , = 1, … , of a dance, with consecutive frames, are extracted. Thus, the dance is
described described
by a matrix, Di , of size
by a matrix, b × m× × ×
, of size n, where
, where b is
isthe
thenumber
number of body
of body joints
joints (i.e., 25), (i.e., 25), m is the
is the
number ofnumber
feature of vectors (i.e., three
feature vectors coordinates
(i.e., three coordinates and four
and four rotations,
rotations, plus plus twobinary
two more moreindicators,
binary indicators,
explainingexplaining
if valuesifare values are measured
measured or estimated),and
or estimated), and n is
is the
theduration
duration of the
of dance.
the dance.

Figure 2. Illustration
Figure ofthe
2. Illustration of theSyrtos
Syrtos dance.
dance.

5.2. Feature5.2. Feature Extraction


Extraction
For feature extraction, a simple process is followed: For any dance , we have 24 × 9 ×
For feature
values, extraction, a simple
or, in a 2D form, a 216process
× is followed:
matrix. A technical For any dance
limitation Di , allow
did not we have 24 × 9 × n values,
the successful
or, in a 2Dcapturing
form, a 216of the×right
n matrix. A technical
thumb position in a fewlimitation
dances; it did
was, not allowexcluded
therefore, the successful
from the capturing
pattern of the
analysis. Thus, each captured frame describes the entire body pose via 216
right thumb position in a few dances; it was, therefore, excluded from the pattern analysis. Thus, eachvalues.
One should note that joints’ positions are correlated to each other due to physical restrictions of
captured frame describes the entire body pose via 216 values.
the body skeleton. As such, the application of a dimensionality reduction approach should be
One should note
considered. thata joints’
Ideally, small setpositions are correlated
of feature values containing mostto each other due
of the available to physical
information support restrictions
of the body skeleton.
a smooth As such,
performance for a the application
variety of classifiersof a In
[32]. dimensionality reduction
this case PCA is used, approach
maintaining 99.1% ofshould be
considered. theIdeally,
initial feature space
a small setvariance as in values
of feature [7]. The PCA outcome most
containing resulted
ofinthe
a projection
available space less than support
information
nine times that of the original.
a smooth performance for a variety of classifiers [32]. In this case PCA is used, maintaining 99.1% of
However, dance is not a static act; a comparison of the difference among frames could provide
the initial significant
feature space variance
insights. Therefore,as in
the[7].
timeThe PCA outcome
dimension should also resulted in a projection
be considered. We utilized space
the less than
nine timesinformation
that of theoforiginal.
successive frames of time intervals, and , by subtracting them. In the end,
each dance,
However, dance is , was static×act;
notofa size (2 a ) ×comparison
− 1. Prior toof thethe
dimensionality
difference reduction via PCA could
among frames step, provide
data were normalized using minmax normalization. In the former case, i.e., single frame analysis,
significantPCAinsights. Therefore, the time dimension should also be considered. We utilized the
resulted in 21 dimensions. For the latter case, i.e., two successive frames, we had 41 dimensions
information in the reduced space.frames of t time intervals, Ii and Ii +t , by subtracting them. In the end,
of successive
each dance, Di , was of size b × (2m) × n − 1. Prior to the dimensionality reduction via PCA step,
data were 5.3. Variation, Space, and Noise Handling
normalized using minmax normalization. In the former case, i.e., single frame analysis, PCA
Table 1 illustrates
resulted in 21 dimensions. Forthethe
number
latterof case,
representative
i.e., twoframes for each frames,
successive of the investigated
we had dances, using
41 dimensions in the
TC-OPTICS sampler. Results indicate that the applied summarization technique is robust to noise,
reduced space.
which in our case affects the dance duration. Even for extreme cases, the number of representative
frames remains similar for all dancers. The Syrtos 3 line dance is an extreme case. The first dancer
5.3. Variation, Space, and
performance Noisewas
duration Handling
twice as long to other dancers. Yet, the representative frames are around
Table 1 illustrates the number of representative frames for each of the investigated dances,
using TC-OPTICS sampler. Results indicate that the applied summarization technique is robust to noise,
which in our case affects the dance duration. Even for extreme cases, the number of representative
frames remains similar for all dancers. The Syrtos 3 line dance is an extreme case. The first dancer
Technologies 2018, 6, 31 9 of 16

performance duration was twice as long to other dancers. Yet, the representative frames are around
20 for all dancers. Table 2 indicates the training sets’ size (i.e., the number of observations) depending
on the sampling approach.

Table 2. Number of training samples, depending on the dancer and the sampling algorithm.
Technologies 2018, 6, x FOR PEER REVIEW 9 of 16
Row Labels Frame_Diff_Legs FrameDiff_All_Joints Single_Frame_Legs Single_Frame_All
20 for all dancers. Table 2 indicates the training sets’ size (i.e., the number of observations) depending
KenStone
on the sampling approach.
D1 180 180 189 189
D2 171 171 180 180
Table 2. Number of training samples, depending on the dancer and the sampling algorithm.
D3 146 146 155 155
Row Labels
K-Random Frame_Diff_Legs FrameDiff_All_Joints Single_Frame_Legs Single_Frame_All
D1 KenStone 186 186 196 196
D2 D1 176180 180
177 189
186 189 187
D2
D3 151171 171
152 180
160 180
161
D3 146 146 155 155
TC-OPTICS
K-Random
D1 D1 56 186 56
186 58
196 196 60
D2 D2 56 176 56
177 57
186 187 60
D3 D3 52 151 52
152 51
160 161 54
TC-OPTICS
D1 56 56 58 60
5.4. Algorithms Setup
D2 56 56 57 60
D3 52 52 51 54
All algorithms were implemented in MATLAB. In our case the knn parameterization process
considers the
5.4. number
Algorithmsof k nearest points, which was set to k = 5, a similar adaptation as in [22].
Setup
The value provides a good tradeoff.
All algorithms If k = 3 orinless
were implemented we areInunable
MATLAB. our casetothe
distinguish among different
knn parameterization process dances
that share the same the
considers steps. On of
number the other points,
k nearest k = 7 was
hand, which set to =
or greater 5, a similar
results in matching
adaptationwith similar
as in [22]. The steps of
value
other dances. provides
The ensemblea goodmethods
tradeoff. Ifused
k = 3 or
16less we are unable
ensemble to distinguish
members. among
The rest ofdifferent dances
the parameters were
that share the same steps. On the other hand, k = 7 or greater results in matching with similar steps
used at the default values.
of other dances. The ensemble methods used 16 ensemble members. The rest of the parameters were
used at the default values.
5.5. Classification Scores
5.5. Classification Scores
The proposed methodology involved data selection, dimensionality reduction, and samplers-classifiers
The proposed methodology involved data selection, dimensionality reduction, and samplers-
combinatory approaches. As such, all the above fields were investigated in terms of their impact at the
classifiers combinatory approaches. As such, all the above fields were investigated in terms of their
dance identification problem. Their performance
impact at the dance identification problem. Their was quantified
performance was by using by
quantified traditional performance
using traditional
measures asperformance
accuracy,measures
precision, recall, and
as accuracy, F1 scores.
precision, A F1
recall, and further
scores. insight
A further is provided
insight via via
is provided analysis of
analysis 3–6
variance. Figures of variance. Figures
illustrate the3–6 illustrate
impact fortheeach
impactof for
theeach of the investigated
investigated factors, factors, namely,
namely, projection
projection technique (Figure 3), sampling approach (Figure 4), classifier (Figure 5) and input type
technique (Figure 3), sampling approach (Figure 4), classifier (Figure 5) and input type (Figure 6).
(Figure 6).

Accuracy (Train) Accuracy (Test) Precision (Train)

Precision (Test) Recall (Train) Recall (Test)


1

0.8
Performance score

0.6

0.4

0.2

0
PCA RAW
Data projection

Figure 3. The impact of data projection techniques on performance scores.


Figure 3. The impact of data projection techniques on performance scores.
Technologies 2018, 6, 31 10 of 16
Technologies 2018, 6, x FOR PEER REVIEW 10 of 16
Technologies 2018, 6, x FOR PEER REVIEW 10 of 16
Figure 3 provides comparison results between raw input data and input data using principal
Figure analysis
component
component 3analysis
provides comparison
(PCA).
(PCA).The
The results between
performance
performance scores areraw
scores areinput
average datavalues
values
average and input
over all data usingofprincipal
combinations
over utilized
all combinations of
component
classifiers, analysis
utilized classifiers, (PCA).
samplingsampling The
approaches, performance
and inputand
approaches, scores
typeinput are
selection. average
type There values
are two
selection. over
aspects
There all combinations
worth
are two mentioning.
aspects worthof
utilized
At
mentioning.classifiers,
first, the sampling
performance
At first, thescoresapproaches,forand
are close scores
performance input
the are
two type
forselection.
approaches.
close theIntwo
bothThere
cases,are twoIn aspects
a significant
approaches. worth
bothdecline
cases, isa
mentioning.
observed over At first,
the test the
set. performance scores
significant decline is observed over the test set. are close for the two approaches. In both cases, a
significant decline is observed over the test set.

Accuracy (Train) Accuracy (Test) Precision (Train)


Accuracy (Train) Accuracy (Test) Precision (Train)
Precision (Test) Recall (Train) Recall (Test)
1 Precision (Test) Recall (Train) Recall (Test)
1
score

0.8
score

0.8
Performance

0.6
Performance

0.6
0.4
0.4
0.2
0.2
0
0 KenStone K-Random TC-OPTICS
KenStone K-Random
Sampler TC-OPTICS
Sampler

Figure 4. Illustrating the impact over performance scores for the proposed sampling approaches.
Figure 4. Illustrating the impact over performance scores for the proposed sampling approaches.
Figure 4. Illustrating the impact over performance scores for the proposed sampling approaches.
Figure 4 describes the samplers’ impact on the dance identification task. Centroid-based random
Figure
clustering, 44 describes
Figurei.e.,describes
K-randomthe samplers’provides
thesampling,
samplers’ impact on
impact on the dance
the
betterdance identification
identification
results. task.results
task.
Overall, similar Centroid-based
Centroid-based random
random
are observed. The
clustering,
clustering, i.e.,
i.e., K-random
K-random sampling,
sampling, provides
provides better
better results.
results.
K-random sampling is also faster compared to the alternatives. Overall,
Overall, similar
similar results
results are are observed.
observed. The
The K-random sampling is also faster compared to the alternatives.
K-random sampling is also faster compared to the alternatives.

Accuracy (Train) Accuracy (Test) Precision (Train)


Accuracy (Train) Accuracy (Test) Precision (Train)
Precision (Test) Recall (Train) Recall (Test)
Precision (Test) Recall (Train) Recall (Test)
1
0.91
0.9
0.8
score

0.8
0.7
score

0.7
0.6
Performance

0.6
0.5
Performance

0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0

Classifier
Classifier

Figure 5. Classifiers’ performance scores.


Figure 5.
Figure 5. Classifiers’ performance scores.
Classifiers’ performance scores.
Figure 5 demonstrates classifiers’ volatility in performance between training and test sets.
Figure of
Regardless 5 the
demonstrates classifiers’
adopted approach, volatility indrop
a significant performance between scores
in all performance training
is and test sets.
observed. All
Figure 5 demonstrates classifiers’ volatility in performance between training and test sets.
Regardless of the adopted approach,
classifiers’ average scores are below 0.5. a significant drop in all performance scores is observed. All
Regardless of the adopted approach, a significant drop in all performance scores is observed.
classifiers’ average scores are below 0.5.
All classifiers’ average scores are below 0.5.
Technologies 2018, 6, 31 11 of 16
Technologies 2018, 6, x FOR PEER REVIEW 11 of 16

Accuracy (Train) Accuracy (Test) Precision (Train)


Precision (test) Recall (Train) Recall (Test)
1
0.9
0.8
Performance score

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1Fr All 1Fr Legs FrDiff All FrDiff Legs
Features' Type

Figure 6. Impact of the feature creation related assumptions on the performance.


Figure 6. Impact of the feature creation related assumptions on the performance.

Figure 6 provides a further insight on the question whether the joint selection or frame difference
Figurethe
can boost 6 provides
classifiersa performance.
further insightThe
on the question of
information whether thejoints,
all body joint selection
based onor frame
the difference
current frame,
can boost the classifiers performance. The
appears to provide (slightly) better results. information of all body joints, based on the current frame,
appears to provide (slightly) better results.
5.6. Statistical Analysis
5.6. Statistical Analysis
To obtain further insights into the results and the relative performance of the different
To obtain further insights into the results and the relative performance of the different algorithms
algorithms we conducted an analysis of variance (ANOVA) on the F1 score results for the test
we conducted an analysis of variance (ANOVA) on the F1 score results for the test samples. The F1
samples. The F1 score is the harmonic mean of precision and recall. Thus, it contains a significant
score is the harmonic mean of precision and recall. Thus, it contains a significant amount of information
amount of information regarding the overall performance. ANOVA also enables the statistical
regarding the overall performance. ANOVA also enables the statistical assessment of the effects that
assessment of the effects that the main design factors of this analysis have (i.e., the sampling schemes,
the main design factors of this analysis have (i.e., the sampling schemes, feature extraction, and the
feature extraction, and the classifiers).
classifiers).
Table 3 shows the results of the ANOVA analysis. In this Table, the Source column corresponds
Table 3 shows the results of the ANOVA analysis. In this Table, the Source column corresponds to
to the source of variation in data (i.e., the performance impact factors described earlier in this section
the source of variation in data (i.e., the performance impact factors described earlier in this section
and their combined impact). Sum and Mean Sq. correspond to mean measurements between the m
and their combined impact). Sum and Mean Sq. correspond to mean measurements between the
groups and the grand mean; practically it quantifies the variability among the groups of interest. The
m groups and the grand mean; practically it quantifies the variability among the groups of interest.
degrees of freedom (d.f.) are defined as d. f. = − 1. The F column refers to the F statistic, i.e., the
The degrees of freedom (d.f.) are defined as d.f. = m − 1. The F column refers to the F statistic,
“average” variability between the groups divided by the “average” variability within the groups.
i.e., the “average” variability between the groups divided by the “average” variability within the
Finally, we calculate the p-value, by comparing the F-statistic to an F-distribution with −1
groups. Finally, we calculate the p-value, by comparing the F-statistic to an F-distribution with
numerator degrees of freedom and − denominator degrees of freedom, for the total set of n
m − 1 numerator degrees of freedom and n − m denominator degrees of freedom, for the total set of
observations.
n observations.
As shown in Table 4, all main factors (i.e., projection, sampling, classifier, and input type) are
As shown in Table 4, all main factors (i.e., projection, sampling, classifier, and input type)
strongly significant for explaining variations in F1 score, since the corresponding p-value is
are strongly significant for explaining variations in F1 score, since the corresponding p-value is
approximately zero.
approximately zero.
In addition to the above basic ANOVA results, we use the Tukey honest significant difference
In addition to the above basic ANOVA results, we use the Tukey honest significant difference
(HSD) post-hoc test to identify sampling schemes and classifiers that provide the best results, while
(HSD) post-hoc
considering test to identify
the statistical sampling
significance of the schemes and
differences classifiers
between that provide the best results,
the results.
while considering the statistical significance of the differences between the results.
Technologies 2018, 6, x FOR PEER REVIEW 12 of 16
Technologies 2018, 6, 31 12 of 16

Table 3. List of available dances and their variations as well as their duration, depending on the dancer.
Table 3. List of available dances and their variations as well as their duration, depending on the dancer.
Dance Variation Short Name Duration (Frames)
D1 D2 D3
Dance Variation Short Name Duration (Frames)
Enteka Straight Syrt_11_Str8 749 807 858
D1 D2 D3
Kalamatianos Circular Kal_Circ 655 593 561
Enteka Straight
Straight Syrt_11_Str8
Kal_Str8 749304 807378 858 455
Kalamatianos
Makedonitikos Circular
Circular Kal_Circ
Mak_Circ 655424 593582 561 409
Straight
Straight Kal_Str8
Mak_Str8 304283 378367 455418
Makedonitikos
Syrtos 2 Circular
Circular Mak_Circ
Syrt_2_Circ 424608 582543 409 352
Straight Mak_Str8 283
Straight Syrt_2_Str8 623 367639 418334
Syrtos 23
Syrtos Circular
Circular Syrt_2_Circ
Syrt_3_Circ 608608 543964 352947
Straight Syrt_2_Str8 623 639 334
Straight Syrt_3_Str8 1366 678 511
Syrtos 3
Trehatos Circular
Circular Syrt_3_Circ
Treh_Circ 608991 964723 947443
Straight Syrt_3_Str8 1366 678 511
Straight Treh_Str8 315 295 355
Trehatos Circular Treh_Circ 991 723 443
Straight Treh_Str8 315 295 355
Table 4. ANOVA outcomes.

Source Sum4.Sq.
Table d.f. outcomes.
ANOVA Mean Sq. F p-Value
Projection 0.0232 1 0.0232 13.3600 0.0003
Sampling
Source 0.0261
Sum Sq. 2
d.f. 0.0130
Mean Sq. F p-Value
7.5000 0.0006
Classifier
Projection 2.2686
0.0232 18 0.2836 13.3600
0.0232 163.30000.00030.0000
Sampling 0.0261 2 0.0130
InputType 0.2790 3 0.0930 7.5000
53.56000.00060.0000
Classifier 2.2686 8 0.2836 163.3000 0.0000
ProjectionInputType
× Sampling 0.0064
0.2790 32 0.0032 53.5600
0.0930 1.84000.00000.1590
Projection
Projection × Sampling 0.0064
× Classifier 0.0118 28 0.0015 1.8400
0.0032 0.85000.15900.5621
Projection × Classifier 0.0118 8 0.0015 0.8500 0.5621
ProjectionProjection
× InputType×
0.0226 3 0.0075 4.3400 0.0049
Sampling InputType
× Classifier 0.0226
0.0818 316 0.0075
0.0051 4.34002.94000.00490.0001
Sampling
Sampling × Classifier
× InputType 0.0818
0.0147 166 0.0051
0.0025 2.94001.41000.00010.2073
Sampling × InputType 0.0147 6 0.0025 1.4100 0.2073
Classifier × InputType
Classifier 0.2830 2424
× InputType 0.2830 0.0118 6.7900
0.0118 6.79000.00000.0000
ErrorError 0.9967 574
0.9967 574 0.0017
0.0017
Total Total 4.0138
4.0138 647647

Figure 7 illustrates that classification scores are better when using information of of all
all body
body joints,
joints,
without employing frame differences,
differences, i.e., subtracting
subtracting joint values over specified time intervals. Mean
scores for each approach are shown as ‘o’. The average scores from subgroups in the experiment are
also provided. Since there is no overlap between the F1 values for the 1FrAll input type compared to
the others, 1FrAll scores are clearly statistically better than
than the
the others.
others.

Figure
Figure 7.
7. F1
F1 scores
scores for
for different
different input
input feature setups.
feature setups.
Technologies 2018, 6, 31 13 of 16
Technologies 2018,
Technologies 2018, 6, xx FOR
FOR PEER REVIEW
REVIEW 13 of
of 16
Technologies 2018, 6,
6, x FOR PEER
PEER REVIEW 13
13 of 16
16

Figure 8 indicates
Figure
Figure that
8 indicates PCA
that PCAshould notnot
should bebe
used since
used thethe
since overall scores
overall areare
scores statistically worse
statistically than
worse
Figure 88 indicates
indicates that
that PCA
PCA should
should not
not be
be used
used since
since the
the overall
overall scores
scores are
are statistically
statistically worse
worse
than
using
than using
raw raw
feature feature
values.values.
than using
using raw
raw feature
feature values.
values.

Figure8.8.
Figure 8. F1
F1 scores
scores for
for raw and projected data.
Figure F1 scores
Figure 8. F1 scores for raw
raw and
raw andprojected
and projecteddata.
projected data.
data.

Figure
Figure 9 indicatesthat,
9 indicates
Figure that,statistically,
statistically, Kennard
Kennard Stone
Stone
Stone sampling
samplingisisno
no
noworse
worsethan
thancentroid-based
centroid-based
Figure 99 indicates
indicates that,
that, statistically,
statistically, Kennard
Kennard Stone sampling
sampling is
is no worse
worse than
than centroid-based
centroid-based
random
random
random sampling,since
sampling, sincethere
thereare
arepartly
partly overlapping
overlapping areas
areas
areas on
on the
the F1
F1scale.
scale.Generally,
Generally,K-random
K-random
random sampling, since there are partly overlapping areas on the F1 scale. Generally, K-random
sampling, since there are partly overlapping on the F1 scale. Generally, K-random
sampling
sampling approach
approach provides the best results.
sampling
sampling approachprovides
approach providesthe
provides thebest
the bestresults.
best results.
results.

Figure 9.
Figure 9. F1
F1 scores
scores for
for the employed
employed samplers.
samplers.
Figure9.9. F1
Figure scores for the
F1 scores the
the employed
employedsamplers.
samplers.
Figure 10
Figure 10 illustrates
illustrates that
that the
the best
best classifiers
classifiers for
for the
the problem
problem at
at hand
hand are
are -nearest
-nearest neighbors
neighbors and
and
Figure
Figure1010illustrates
illustrates that thebest
that the bestclassifiers
classifiers
forfor
the the problem
problem at hand
at hand are k-nearest
are -nearest neighbors
neighbors and
random
random forests
forests (denoted
(denoted as
as TreeBagger), with the the kNN approach attaining a slightly greater
and random
random forests
forests (denoted as TreeBagger),
(denoted with
with the
as TreeBagger),
TreeBagger), the
with
the kNN
thethe
kNNtheapproach attaining
kNN approach
approach aa slightly
slightly agreater
attaining attaining slightly
greater
performance.
performance.
performance.
greater performance.

Figure 10.
Figure 10. F1
F1 scores
scores by
by classification
classification technique.
technique.
Figure10.
Figure 10. F1
F1 scores
scores by
by classification
classificationtechnique.
technique.
Regarding the
Regarding the combined approach
approach of all all four factors
factors (i.e., feature
feature type, projection
projection space,
Regarding the combined
combined approach of of all four
four factors (i.e.,
(i.e., feature type,
type, projection space,
space,
sampling, and classifier) the single-frame, PCA projected, k means-random sampler, kNN
space,classifier
projected, k sampler, kNN
Regarding
sampling,
sampling, and
the combined
and classifier)
classifier) the
approach of
the single-frame,
all
single-frame, PCA
four factors
PCA projected,
(i.e., feature type,
k means-random
projection
means-random sampler,
sampling,
kNN classifier
classifier
and classifier) the single-frame, PCA projected, k means-random sampler, kNN classifier provides
Technologies 2018, 6, 31 14 of 16

the best possible results (0.52) with a marginal mean significantly different from 167 different groups.
The obtained results denote the superiority of the aforementioned best performing framework over
the approach proposed in [19], which is the only one currently in the literature that has been evaluated
on the same dataset of the specific folk dances. It is clear, however, that the current experimental setup
has not attained a fully satisfactory performance in the problem at hand, which could be justified
by the limited capability of a single Kinect sensor to capture the complex spatiotemporal variations
residing within folk dance movements, as well as the low amount of data available for training of the
utilized classifiers.

6. Conclusions
We have presented a comparative study of classifiers and data sampling schemes for dance pose
identification based on motion capture data acquired from Kinect sensors. Skeleton data served as
inputs to classifiers. Feature extraction process involved subtraction between successive frames and
principal component analysis for dimensionality reduction. Multiple pose identification schemes were
applied using temporal constraints, spatial information and feature space distributions for the creation
of an adequate training data set. Experimental results show that frame differencing and PCA lead
to lower recognition rates and that k nearest neighbors and random forests are the best-performing
classifiers among the ones explored. Future work directions include experimenting with data from
multiple Kinect sensors, as well as multimodal skeleton and RGB data, which may contribute to greater
precision rates.

Acknowledgments: This work was supported by the EU H2020 TERPSICHORE project “Transforming Intangible
Folkloric Performing Arts into Tangible Choreographic Digital Objects” under the grant agreement 691218.
Author Contributions: Eftychios Protopapadakis and Athanasios Voulodimos conceived and designed the
experiments. Eftychios Protopapadakis performed the experiments. Athanasios Voulodimos, Anastasios Doulamis
and Stephanos Camarinopoulos analyzed the data. Nikolaos Doulamis and Georgios Miaoulis contributed to
the experimental evaluation and results analysis. Eftychios Protopapadakis, Athanasios Voulodimos, Anastasios
Doulamis and Georgios Miaoulis wrote the paper.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Shay, A.; Sellers-Young, B. The Oxford Handbook of Dance and Ethnicity; Oxford University Press:
Oxford, UK, 2016.
2. Voulodimos, A.S.; Patrikakis, C.Z. Quantifying privacy in terms of entropy for context aware services.
Identity Inf. Soc. 2009, 2, 155–169. [CrossRef]
3. Kosmopoulos, D.I.; Voulodimos, A.S.; Doulamis, A.D. A System for Multicamera Task Recognition and
Summarization for Structured Environments. IEEE Trans. Ind. Inform. 2013, 9, 161–171. [CrossRef]
4. Voulodimos, A.S.; Doulamis, N.D.; Kosmopoulos, D.I.; Varvarigou, T.A. Improving Multi-Camera Activity
Recognition by Employing Neural Network Based Readjustment. Appl. Artif. Intell. 2012, 26, 97–118.
[CrossRef]
5. Doulamis, N.D.; Voulodimos, A.S.; Kosmopoulos, D.I.; Varvarigou, T.A. Enhanced Human Behavior
Recognition Using HMM and Evaluative Rectification. In Proceedings of the First ACM International
Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams, New York, NY, USA,
21–25 October 2010; pp. 39–44.
6. Voulodimos, A.; Kosmopoulos, D.; Veres, G.; Grabner, H.; van Gool, L.; Varvarigou, T. Online classification
of visual tasks for industrial workflow monitoring. Neural Netw. 2011, 24, 852–860. [CrossRef] [PubMed]
7. Protopapadakis, E.; Voulodimos, A.; Doulamis, A.; Camarinopoulos, S. A Study on the Use of Kinect
Sensor in Traditional Folk Dances Recognition via Posture Analysis. In Proceedings of the 10th
International Conference on PErvasive Technologies Related to Assistive Environments, New York, NY, USA,
21–23 June 2017; pp. 305–310.
8. Felzenszwalb, P.F.; Huttenlocher, D.P. Pictorial Structures for Object Recognition. Int. J. Comput. Vis. 2005,
61, 55–79. [CrossRef]
Technologies 2018, 6, 31 15 of 16

9. Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep Learning for Computer Vision:
A Brief Review. Comput. Intell. Neurosci. 2018, 7068349. [CrossRef] [PubMed]
10. Toshev, A.; Szegedy, C. DeepPose: Human Pose Estimation via Deep Neural Networks. In Proceedings of the
2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014;
pp. 1653–1660.
11. Chen, X.; Yuille, A. Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise
Relations. In Proceedings of the 27th International Conference on Neural Information Processing Systems,
Cambridge, MA, USA, 8–13 December 2014; Volume 1, pp. 1736–1744.
12. Tompson, J.; Jain, A.; LeCun, Y.; Bregler, C. Joint Training of a Convolutional Network and a Graphical
Model for Human Pose Estimation. arXiv, 2014.
13. Raptis, M.; Kirovski, D.; Hoppe, H. Real-time Classification of Dance Gestures from Skeleton Animation.
In Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, New York,
NY, USA, 26–28 September 2011; pp. 147–156.
14. Zanfir, M.; Leordeanu, M.; Sminchisescu, C. The Moving Pose: An Efficient 3D Kinematics Descriptor for
Low-Latency Action Recognition and Detection. In Proceedings of the IEEE International Conference on
Computer Vision, Los Angeles, CA, USA, 1–8 December 2013; pp. 2752–2759.
15. Ball, A.; Rye, D.; Ramos, F.; Velonaki, M. Unsupervised Clustering of People from ‘Skeleton’ Data.
In Proceedings of the Seventh Annual ACM/IEEE International Conference on Human-Robot Interaction,
New York, NY, USA, 5–8 March 2012; pp. 225–226.
16. Rallis, I.; Georgoulas, I.; Doulamis, N.; Voulodimos, A.; Terzopoulos, P. Extraction of key postures from 3D
human motion data for choreography summarization. In Proceedings of the 9th International Conference
on Virtual Worlds and Games for Serious Applications (VS-Games), Athens, Greece, 6–8 September 2017;
pp. 94–101.
17. Kitsikidis, A.; Dimitropoulos, K.; Yilmaz, E.; Douka, S.; Grammalidis, N. Multi-sensor Technology and
Fuzzy Logic for Dancer’s Motion Analysis and Performance Evaluation within a 3D Virtual Environment.
In Proceedings of the Universal Access in Human-Computer Interaction. Design and Development Methods
for Universal Access, Heraklion, Greece, 22–27 June 2014; pp. 379–390.
18. Kitsikidis, A.; Dimitropoulos, K.; Uğurca, D.; Bayçay, C.; Yilmaz, E.; Tsalakanidou, F.; Douka, S.;
Grammalidis, N. A Game-like Application for Dance Learning Using a Natural Human Computer Interface.
In Proceedings of the Universal Access in Human-Computer Interaction. Access to Learning, Health and
Well-Being, Los Angeles, CA, USA, 2–7 August 2015; pp. 472–482.
19. Kitsikidis, A.; Dimitropoulos, K.; Douka, S.; Grammalidis, N. Dance analysis using multiple Kinect sensors.
In Proceedings of the 2014 International Conference on Computer Vision Theory and Applications (VISAPP),
Lisbon, Portugal, 5–8 January 2014; Volume 2, pp. 789–795.
20. Kitsikidis, A.; Boulgouris, N.V.; Dimitropoulos, K.; Grammalidis, N. Unsupervised Dance Motion Patterns
Classification from Fused Skeletal Data Using Exemplar-Based HMMs. Int. J. Herit. Digit. Era 2015,
4, 209–220. [CrossRef]
21. Dimitropoulos, K.; Barmpoutis, P.; Kitsikidis, A.; Grammalidis, N. Classification of Multidimensional
Time-Evolving Data using Histograms of Grassmannian Points. IEEE Trans. Circuits Syst. Video Technol. 2016.
[CrossRef]
22. Protopapadakis, E.; Grammatikopoulou, A.; Doulamis, A.; Grammalidis, N. Folk Dance Pattern Recognition
over Depth Images Acquired via Kinect Sensor. In Proceedings of the 3D Virtual Reconstruction and
Visualization of Complex Architectures, Nafplio, Greece, 1–3 March 2017.
23. Kinect—Windows App Development, 2017. Available online: https://developer.microsoft.com/en-us/
windows/kinect (accessed on 15 January 2017).
24. Webb, J.; Ashley, J. Beginning Kinect Programming with the Microsoft Kinect SDK; Apress: New York, NY, USA, 2012.
25. Protopapadakis, E.; Doulamis, A. Semi-Supervised Image Meta-Filtering Using Relevance Feedback in
Cultural Heritage Applications. Int. J. Herit. Digit. Era 2014, 3, 613–627. [CrossRef]
26. Vandana, N.B. Survey of Nearest Neighbor Techniques. arXiv, 2010.
27. Farid, D.M.; Zhang, L.; Rahman, C.M.; Hossain, M.A.; Strachan, R. Hybrid decision tree and naïve Bayes
classifiers for multi-class classification tasks. Expert Syst. Appl. 2014, 41, 1937–1946. [CrossRef]
Technologies 2018, 6, 31 16 of 16

28. Silva, C.S.; Borba, F.d.L.; Pimentel, M.F.; Pontes, M.J.C.; Honorato, R.S.; Pasquini, C. Classification of blue
pen ink using infrared spectroscopy and linear discriminant analysis. Microchem. J. 2013, 109, 122–127.
[CrossRef]
29. Rokach, L.; Schclar, A.; Itach, E. Ensemble methods for multi-label classification. Expert Syst. Appl. 2014,
41, 7507–7523. [CrossRef]
30. Abe, S. Support Vector Machines for Pattern Classification; Springer: Berlin, Germany, 2010.
31. Dimitropoulos, K.; Manitsaris, S.; Tsalakanidou, F.; Nikolopoulos, S.; Denby, B.; Al Kork, S.;
Crevier-Buchman, L.; Pillot-Loiseau, C.; Adda-Decker, M.; Dupont, S. Capturing the intangible
an introduction to the i-Treasures project. In Proceedings of the 2014 International Conference on Computer
Vision Theory and Applications (VISAPP), Lisbon, Portugal, 5–8 January 2014; Volume 2, pp. 773–781.
32. Protopapadakis, E.; Doulamis, A.; Makantasis, K.; Voulodimos, A. A Semi-Supervised Approach for
Industrial Workflow Recognition. In Proceedings of the Second International Conference on Advanced
Communications and Computation (INFOCOMP 2012), Venice, Italy, 21–26 October 2012; pp. 155–160.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

View publication stats

Вам также может понравиться