Вы находитесь на странице: 1из 6

Proceedings of the 3rd IFAC Conference on Embedded Systems,

Computational Intelligence and Telematics in Control


Proceedings of
Proceedings of the
the 3rd
3rd IFAC
IFAC Conference
Conference on
on Embedded
Embedded Systems,
Systems,
June 6-8, 2018. University of Algarve,Available
Faro, Portugal
online at www.sciencedirect.com
Computational
Proceedings
Computationalof Intelligence
the 3rd IFACand
Intelligence Telematics
Conference
and onin
Telematics Control
inEmbedded
Control Systems,
June
June 6-8,
6-8, 2018.
Computational University
University of
2018. Intelligence Algarve,
ofand Faro,
Telematics
Algarve, Portugal
Faro, in Control
Portugal
June 6-8, 2018. University of Algarve, Faro, Portugal
ScienceDirect
IFAC PapersOnLine 51-10 (2018) 199–204
Fall monitoring and detection for at-risk persons using a UAV
Fall monitoring
Fall monitoring and
and detection
detection for for at-risk
at-risk persons
persons using
using aa UAV
UAV
Fall monitoring and detection for at-risk persons
Cristi Iuga, Paul Drăgan, Lucian Bușoniu
using a UAV
Cristi
Cristi Iuga,
Iuga, Paul
Paul Drăgan,
Drăgan, Lucian
Lucian Bușoniu
Bușoniu
Cristi Iuga, Paul Drăgan,
Technical University of Cluj-Napoca, MemorandumuluiLucian
28,Bușoniu
400114 Cluj-Napoca, Romania
(e-mails:University
Technical iugacristi@gmail.com, paul.andrei.dragan@gmail.com, lucian@busoniu.net)
Technical University of Cluj-Napoca, Memorandumului 28, 400114 Cluj-Napoca,
of Cluj-Napoca, Memorandumului 28, 400114 Cluj-Napoca, Romania
Romania
Technical
(e-mails:University
(e-mails: of Cluj-Napoca,
iugacristi@gmail.com,
iugacristi@gmail.com, Memorandumului 28, 400114 Cluj-Napoca,
paul.andrei.dragan@gmail.com,
paul.andrei.dragan@gmail.com, Romania
lucian@busoniu.net)
lucian@busoniu.net)
(e-mails: iugacristi@gmail.com, paul.andrei.dragan@gmail.com, lucian@busoniu.net)
Abstract: We describe a demonstrator application that uses a UAV to monitor and detect falls of an at-risk
person. TheWe
Abstract: position and state (upright or fallen) of the person are determined with deep-learning-based
Abstract: We describe describe aa demonstrator
demonstrator application
application that that uses
uses aa UAVUAV to to monitor
monitor and and detect
detect falls
falls of of an
an at-risk
at-risk
computer
Abstract:
person. The vision,
We positionwhere
describe anda existing
demonstrator
state network
(upright or weights
application
fallen) are
that
of used
uses
the for
a
person position
UAV are to detection,
monitor
determined and while
detect
with for
falls fall
of
deep-learning-baseddetection
an at-risk
person. The position and state (upright or fallen) of the person are determined with deep-learning-based
the last
person.
computer layer
The is fine-tuned
position and in additional
state (upright training.
or fallen) A simple
of the visual
person servoing
are control
determined strategy
with keeps the
deep-learning-based person
computer vision, vision, where
where existing
existing network
network weights
weights are are used
used for for position
position detection,
detection, while while for for fallfall detection
detection
in view
computer ofvision,
theisdrone,where andexisting
maintains thetraining.
network drone
weightsat Aa set
are distance
used for from
position the person. In experiments, falls were
the
the last
last layer
layer is fine-tuned
fine-tuned in
in additional
additional training. A simple
simple visual
visual servoingdetection,
servoing control while for
control strategy
strategy keeps
keeps fallthe
detection
the person
person
reliably
the
in last detected,
layer is and and
fine-tunedthe algorithm
in additional was able toatsuccessfully
training. A simple track
visual thetheperson
servoing indoors.
control strategy keeps the person
in view of the drone, and maintains the drone at a set distance from the person. In experiments, falls were
view of the drone, maintains the drone a set distance from person. In experiments, falls were
in view
reliably
reliably
© 2018, of the (International
detected,
detected,
IFAC drone,and
and and
the
the maintains
algorithm
algorithm
Federation theofdrone
was
was able
able atsuccessfully
to
to
Automatic a setControl)
distance
successfully fromthe
track
track
Hosting thethe
by person.
person
person
Elsevier In
indoors.
indoors.
Ltd.experiments,
All rights falls were
reserved.
Keywords: unmanned aerial vehicles, deep learning, fall detection.
reliably detected, and the algorithm was able to successfully track the person indoors.
Keywords:
Keywords: unmanned unmanned aerial aerial vehicles,
vehicles, deep deep learning,
learning, fall detection.
Keywords: unmanned aerial vehicles, deep learning,  fall detection.
fall detection.
 images received from the Parrot AR.Drone 2.0. The output
1. INTRODUCTION  data
images from the detector
received from isthe
theused to find
Parrot the position
AR.Drone 2.0.andThedistance
output
images received from Parrot AR.Drone 2.0. The output
More than a billion1. INTRODUCTION
1.people today experience some form of of
INTRODUCTION data
data
person
images from
from
in
received
the
the
the scene,
from
detector
detector
and
is
isthe thereby
used
usedParrot
to
to
to
find
find
direct
AR.Drone
the
the
the
position
position
drone
2.0. The
and
and
todistance
remain
output
distance
1. INTRODUCTION at aperson
set distance and orientation from the person, using adistance
simple
disability,
More while the world population is rapidly ageing. data
of from the
in thedetector
scene, is
and used to
thereby find
to the
direct position
the and
drone to remain
More than
than aa billion
billion peoplepeople todaytoday experience
experience some some formform of of of person in the scene, and thereby to direct the drone to remain
visual
of servoing strategy (Thuilot ettoal., 2002).the Fall detection is
Between
More than
disability, 2000
a andthe2050,
billion
while people
world thetoday
proportion
experience
population of rapidly
is people
some over
form
ageing.60
of at
at aaperson
set in the scene,
set distance
distance and and
and orientation
orientationthereby from
from direct
the
the person,
person, drone
usingtoaa remain
using simple
simple
disability, while the world population is rapidly ageing. achieved with
at a setservoing the
distance strategy same deep-learning
and orientation from method,
the2002).
person,Fall YOLOv2,
using but
a simple
years will double
disability, whileand from about 11% to 22%. We considerageing.
here a visual (Thuilot et
et al., detection is
Between
Between 2000 2000 andthe2050, world
2050, the population
the proportion is
proportion of rapidly
of people
people overover 60 60 in visual servoing
this servoing
visual case we the
strategy
fine-tune
strategy
(Thuilot
the last layer
(Thuilot et
al.,
al.,
2002).
using
2002).
Fall
a Fall
detection
custom dataset.
detection
is
is
robotic
Between assistant
years will 2000
will double for
and
double from elderly
2050,
from about and
the
about 11% disabled
proportion
11% to to 22%.
22%. Wepersons
of that
people
We consider must
over
consider here be
60 achieved with same deep-learning
here aa achieved with the same deep-learning method, YOLOv2, but method, YOLOv2, but
years We chose
achieved a we
with vision-based
the same learning
deep-learning solution because
method, it isdataset.
YOLOv2, simple
but
constantly
years will monitored
double
robotic assistant
assistant for from
for elderly due
about
elderly and to
11% the to
and disabled risk
22%.
disabled persons of
We falling.
consider
persons that Falling
that musthere
must be a in this case fine-tune the last layer
be in this case we fine-tune the last layer using a custom dataset. using a custom
robotic to apply,
in this caseflexible, and relies on alayer
sensor already available on
frequency
robotic ismonitored
assistant
constantly
constantly
about for28-35%
monitored elderly
due and
due each year for people
to disabled
to the
the risk risk persons
of over
that 64
of falling.
falling. must years
Falling
Fallingbe We We chose
chose aa we fine-tune
vision-based
vision-based the last
learning
learning solution
solution using a custom
because
because it isdataset.
it is simple
simple
of age and
constantly is 32%-42% for those over 70 (WHO, 2007). the
We UAV
chose
to apply, (the
a
apply, flexible, camera)
vision-based
flexible, and rather
and relieslearning
relies on than e.g.
solution
on aa sensor wearable
sensor already because sensors.
it
already available is
available on All
simpleon
frequency
frequency ismonitored
about
about 28-35%
28-35% due each to the
each yearrisk
year for of falling.
for people
people over
over 6464Falling
years
years to theapply,
detection andand control
frequency
of age and is about
32%-42% 28-35%
for thoseeach year
over 70 for people
(WHO, over
2007). 64 years to
the
the UAV
UAV flexible,
(the camera)
(the camera) ratheralgorithms
relies
rather on a sensor
than
than e.g.
run
e.g. wearable
wearableoff-board,
already sensors.onAll
available
sensors. ona
All
of age and 32%-42%
State-of-the-art effortsfor in those
assistiveoverrobotics
70 (WHO, are 2007).
predominantly the computer
UAV that
(the wirelessly
camera) communicates
rather than e.g. with
wearable the drone.
sensors. Our
All
of age and 32%-42% for those over 70 (WHO, 2007). the detection and control algorithms run off-board, on aa
detection and control algorithms run off-board, on
placed on fixed efforts
manipulators or mobile ground robots, which the experimental results confirm that therun method works Our for
State-of-the-art efforts in assistive robotics are predominantly the
State-of-the-art in assistive robotics are predominantly detection
computer
computer that and
that control
wirelessly
wirelessly algorithms
communicates
communicates withoff-board,
with the drone.on
the drone. Oura
are limited
State-of-the-art in the types of environments they can address tracking the person indoors and detects falls reliably.
placed
placed on fixed efforts
on fixed manipulators
manipulatorsin assistiveor robotics
or mobile
mobile are predominantly
ground
ground robots, which computer
robots, which experimental that results
wirelessly communicates
confirm that
that the the withmethod the drone.
works Our
works for
(Boucher
placed
are on
limited et al.,the
fixed
in 2013).
manipulators
types On of theor other
mobile
environments hand,
ground unmanned
they robots,
can aerial experimental
which
address experimental
tracking the
results
results
person
confirm
confirm
indoors and that
detects the
method
method
falls reliably. works
for
for
are limited in the types of environments they can address tracking Next, the
Section person
2 indoors
provides the and detects
background falls reliably.
required in the vision
vehicles
are limited
(Boucher (UAVs)
et inal.,the trade
typesOn
2013). off
On manipulation
of theenvironments
the other hand,hand,capability
they can to
unmanned gain tracking the person indoors and detects falls reliably.
address
aerial
(Boucher et al., 2013). other unmanned aerial techniques we provides
use. Sectionbackground 3 explains the most important
mobility
(Boucher and
et al.,
vehicles (UAVs)
(UAVs) speed,
2013).
trade being
On
off theunaffected
other hand,
manipulation bycapability
terrain
unmanned difficulty
toaerial
gain Next, Next, Section
Section 2 2 provides the the background required required in in the
the vision
vision
vehicles trade off manipulation capability to gain component
Next, Section of 2 our method
provides the – detection
background of upright
required in and
the fallen
vision
(Mathe and
vehicles Busoniu, 2015).off Despite their potential (Baerto et al, techniques
techniques we we use.
use. Section
Section 3 3 explains
explains the the most
most important
mobility (UAVs)
mobility and
and speed,
speed, trade being
being manipulation
unaffected
unaffected by bycapability
terrain
terrain difficulty gain
difficulty persons
techniques from we images.
use. Section
Section 3 4 outlines
explains the the
most
important
method
important for
2014)
mobilityUAVs and have so far
speed, 2015). been
being Despitevirtually
unaffected unexplored in assistive component of of our
our method
method – – detection
detection of of upright
upright and and fallen
fallen
(Mathe
(Mathe and
and Busoniu,
Busoniu, 2015). Despite theirby
their terrain(Baer
potential
potential difficulty
(Baer et
et al,
al, component
tracking
component the person
of our across
method multiple
– detectionimages. of Section
upright 5
and presents
fallen
care.
(Mathe and Busoniu, persons from from images.
images. SectionSection 4 4 outlines
outlines the the method
method for for
2014)
2014) UAVs
UAVs have
have so so 2015).
far beenDespite
far been virtually
virtually their potential in
unexplored
unexplored (Baer
in et al, persons
assistive
assistive the control
persons from technique
images. asSection
well as 4theimages.
overallSection
outlines results
the method obtained.for
2014)
care. UAVs have so far been virtually unexplored in assistive tracking
tracking the
the person
person across
across multiple
multiple images. Section 5
5 presents
presents
care.
In this paper we present a first application along this direction: tracking Section 6 gives
the our
person conclusions
across multipleand outlines
images. future
Section work.
5 presents
care. the control technique as well as the overall results obtained.
a UAVpaper
that usespresent computer vision to monitor a person for falls the control technique as well as the overall results obtained.
In
In this
this paper we we present aa first first application
application along along this direction: the
this direction: Section
Sectioncontrol
6
6 gives
givestechnique
our as well asand
our conclusions
conclusions theoutlines
and overall future
outlines resultswork.
future obtained.
work.
while
In this
aa UAV autonomously
paper
that we
uses presentfollowing
computer a first
visionthemto in
application an indoor
monitor along
a environment.
this
person direction:
for falls Section 6 gives our 2. BACKGROUND
conclusions and outlines future work.
UAV that uses computer vision to monitor a person for falls
While
while UAVs
a UAVautonomously
that uses
autonomously havecomputer
been widely
following visionused
themto in disaster
monitor
in an indoor a monitoring
indoor person and
for falls
environment.
while following them in an environment. 2.
2. BACKGROUND
BACKGROUND
search-and-rescue,
while autonomously
While UAVs
UAVs have have been see e.g.
following
been widely Andriluka
widely used them
used in in anet al.
indoor(2010), Murphy
environment.
disaster monitoring
in disaster monitoring and and 2.1 Classification and2.object BACKGROUND
While detection from images
(2014),UAVs
While to ourhave
search-and-rescue, knowledge
been
see they
widely have
usednever been(2010),
in disaster used forMurphy
monitoring person
and
search-and-rescue, see e.g.
e.g. Andriluka
Andriluka et
et al.
al. (2010), Murphy 2.1
fall detection.
search-and-rescue, Instead, see this
e.g. isAndriluka
done usually with
et been
al. fixed cameras
(2010), 2.1 Classification
Classification and and object
object detection
detection from from images
images
(2014),
(2014), toto our
our knowledge
knowledge they
they have
have never
never been used forMurphy
used for person
person 2.1 Image classification
Classification andand object
object detection
detection fromareimages
two of the most
(Cucchiara
(2014),
fall to
detection. et
our al., 2007,
knowledge
Instead, Skubic
thisthey
is et
have
done al., 2016),
never
usually been wearable
with used
fixed for or other
person
cameras
fall detection. Instead, this is done usually with fixed cameras important and well-studied computer vision tasks. The aim of
sensors (Bourke et2007,
al., 2007, is de Lima et al. 2017). work is Image
Ourcameras classification
classification and and object
object detection
detection are are twotwo ofof the
the most
fall detection.
(Cucchiara
(Cucchiara et Instead,
et al.,
al., 2007,this Skubic
Skubic done
et usually
et al.,
al., 2016),with
2016), fixed
wearable
wearable or other Image
or other classification
Image is
classification
important and to assign
and well-studied and a
well-studied computerlabel
object to an
detection
computer vision image,
are
vision tasks. where
two of
tasks. The the
The aim
most
label
most
aim of
of
closely related toet2007,
that of Andriluka et 2016),
al. (2010), where UAVs
(Cucchiara
sensors
sensors (Bourke et al.,
(Bourke et al., Skubic
al., 2007,
2007, de et
de al.,
Lima
Lima et
et al. wearable
al. 2017).
2017). Our or
Our other
work
work is important
is is taken from
important
classification andis isatofixed
well-studied
assign seta oflabelclasses,
computer to an whiletasks.
vision
image, object
whereThe detection
theaim of
label
are used
sensors
closely to find
(Bourke injured
et al., persons
2007, de in
Limaa search-and-rescue
et al. 2017). Our setting.
work is classification to assign a label to an image, where the label
closely related
related to to that
that of of Andriluka
Andriluka et et al.
al. (2010),
(2010), where
where UAVsUAVs classification
focuses
is taken onfromlocalizing
isa to inset
assign
fixed ana image
label
of allan
to
classes, theimage,
objects
while that belong
where
object the
detection to
label
closely
are used
are usedrelated
to find
to find toinjured
that ofpersons
injured Andriluka
persons aaetsearch-and-rescue
in2.0
in al. (2010), wheresetting.
search-and-rescue UAVs one
setting. is taken from a fixed set of classes, while object detection
We choose the Parrot AR.Drone quadrotor, a very popular is or multiple
taken
focuses from a categories.
fixed set ofClassification
classes, while algorithms
object usually
detection
are used to find injured persons in a search-and-rescue setting. take focuses on localizing in an image all the objects that belong to
on localizing in an image all the objects that belong to
low-cost UAV that is easy to automate using ROS (Robotic focuses
one as input
or an image
on localizing inand
an output
image a single
all label
the objects or class,
that belongwhileto
We choose
We choose the the Parrot
Parrot AR.Drone
AR.Drone 2.0 2.0 quadrotor,
quadrotor, aa very very popular
popular one or multiple
multiple categories.
categories. Classification
Classification algorithms
algorithms usually
usually
Operating
We choose
low-cost System).
UAV the Parrot
that Our
is method
AR.Drone
easy to achieves
2.0
automate person
quadrotor,
using a
ROSdetection
very popular
(Robotic by object
one
take or
as detection
multiple
input an algorithms
categories.
image and output
Classification
output a the enclosing
single algorithms
label or bounding
class, usually
while
low-cost UAV that is easy to automate using ROS (Robotic take boxes
as input an image and output a single label or class, while
running
low-cost theSystem).
deepthat learning object detector YOLOv2 (You Only take asand
input classes
an imageof objects present
and output inthe
a single a given
labelimage.
or class, With the
while
OperatingUAV
Operating System). is easy
Our
Our method
methodto automate
achievesusing
achieves person
person ROS (Robotic
detection
detection by object
by object
advent
detection
detection
of large
algorithms
algorithms
image datasets,
output
output
e.g.
the enclosing
Imagenet
enclosing bounding
bounding
(Russakovsky et
Look Once
Operating version 2) (Redmon and Farhadi, 2017) on the object detection algorithms output the enclosing bounding
running
running theSystem).
the deep
deep learning Our method
learning object achievesYOLOv2
object detector
detector person detection
YOLOv2 (You
(You OnlyOnlyby boxesboxes and and classes
classes of of objects
objects present
present in in aa given
given image.
image. With With the
the
running the deep learning object detector YOLOv2 (Youon boxes and classes of objects present in a given image. With the
Look
Look Once
Once version
version 2)
2) (Redmon
(Redmon and
and Farhadi,
Farhadi, 2017)
2017) onOnly
the advent of
the advent of large
large image
image datasets,
datasets, e.g. e.g. Imagenet
Imagenet (Russakovsky
(Russakovsky et et
Look Once version 2) (Redmon and Farhadi, 2017) on the advent of large image datasets, e.g. Imagenet (Russakovsky et
Copyright
2405-8963 ©© 2018,
2018 IFAC 203Hosting by Elsevier Ltd. All rights reserved.
IFAC (International Federation of Automatic Control)
Peer review under responsibility of International Federation of Automatic
203 Control.
Copyright
Copyright ©
© 2018
2018 IFAC
IFAC 203
10.1016/j.ifacol.2018.06.262
Copyright © 2018 IFAC 203
2018 IFAC CESCIT
June 6-8, 2018. Faro, Portugal
200 Cristi Iuga et al. / IFAC PapersOnLine 51-10 (2018) 199–204

al., 2015) and MS-COCO (Lin et al., 2014), Convolutional recover from the fall; the latter would be unlikely to work
Neural Networks (CNNs), became the de facto method of given the limitations of our platform.
approaching both classification (Krizhevsky et al., 2012;
Simonyan and Zisserman, 2015; He et al., 2016) and object 3.1 Fall detection methodology
detection (Ren et al., 2015; Redmon and Farhadi, 2017).
Detection of fallen persons (called simply fall detection in the As already hinted in Section 2.1, we formulate the problem of
sequel) can be formulated both as a classification task, where fall detection as an object detection task and use the YOLOv2
the algorithm should output whether an image contains a fallen CNN architecture to identify the fallen person, if any. We also
person or not, or as an object detection problem, requiring the considered treating the problem as a classification task, but this
localization of all the fallen people in an image. Works such as would mean that the network would be difficult to train for
Nunez-Marcos et al. (2017) take the first approach, while in unconstrained environments, body posture and placement, and
this paper we will treat fall detection as an object detection task for variations of clothing and body appearance. This would
and we will make use of the YOLO (You Only Look Once) impair generalization in practice.
(Redmon and Farhadi, 2017) architecture to detect both
Furthermore, solving the problem with an object detection
upright and fallen people, the former being necessary to track
algorithm comes with certain advantages that will be useful in
the person with the drone.
future extensions of our method. Firstly, the algorithm can
detect multiple fallen people in the same scene and enclose
2.2 YOLOv2 object detection network them in the image within different bounding boxes. Secondly,
having bounding boxes means that we know estimates of the
YOLOv2 (Redmon and Farhadi, 2017) is a CNN architecture position of the detected fallen persons in the scene, which can
for object detection. The method obtained good results on the be useful, for example, for targeted medication delivery.
VOC 2012 detection dataset (Everingham et al., 2012)
performing on par with state-of-the-art detectors at that time To work for fall detection, the YOLOv2 object detector has to
such as Faster R-CNN (Ren et al., 2015). Due to its lightweight be retuned. We start from the standard set of weights pre-
structure, YOLO can run at around 40 FPS on a GeForce GTX trained on MS-COCO, which we used above for upright
Titan X and at between 20 and 25 FPS on a GeForce GTX 970, person detection. However, here we replace the last layer with
the graphics card in our hardware setup, rendering it attractive a simpler one, capable of only single-class detection, and then
for applications that require soft real-time constraints. fine-tune the network on a custom dataset. This custom
training dataset consists of 500 manually labeled images, taken
YOLOv2 outputs, for each object 𝑂𝑂𝑗𝑗 present in an image 𝐼𝐼, a from the frames of two videos. These images contain a single
probability distribution over the classes 𝑐𝑐𝑖𝑖 ∈ 𝐶𝐶, where 𝐶𝐶 is the person as subject, wearing the same set of clothes across all
set of all classes; and in addition a bounding box 𝐵𝐵𝑗𝑗 = frames. The videos are recorded indoors, in our laboratory,
[𝑥𝑥, 𝑦𝑦, 𝑤𝑤, ℎ] enclosing object 𝑂𝑂𝑗𝑗 , where (𝑥𝑥, 𝑦𝑦) are the using the AR.Drone 2.0 camera, from two different
coordinates of the top-left corner of the bounding box and perspectives, while the UAV is inflight.
(𝑤𝑤, ℎ) are its width and height, respectively. The network is For training we use the stochastic gradient descent algorithm
trained in a supervised manner, meaning that both true labels with momentum, with a learning rate of 0.001 and a
and true bounding box coordinates must be fed for each object momentum of 0.99. We use mini-batches that each consist of
to the training algorithm. 4 randomly selected images. We train the network for 2000
iterations or 2000 mini-batches, meaning that each image is
3. PERSON AND FALL DETECTION seen by the algorithm around 16 times.

Due to its speed and accuracy, we select the YOLOv2 object While in principle the same network (or at least the common
detection framework, and in this section we explain how we parts up to the last layer) can be used for both upright and
adapt and use it for our application. fallen people detection, for ease of implementation and testing
we decided to run two separate networks, one for upright
To detect the person in the upright position, we use the CNN person detection and one for fall detection, in parallel on
with the standard set of weights pre-trained on the MS-COCO separate threads.
dataset. Although this method is sensitive to arm positioning,
since it tends to enlarge the bounding box in order to include 3.2 Fall detection results
the possibly raised arms of a subject, it still achieves 75%
accuracy. This result is good enough to infer distance To evaluate the effectiveness of detection, we test the retuned
information from the bounding boxes, keeping in mind that network on a different dataset, which was obtained in a similar
these bounding boxes are further filtered by using a Kalman manner as the training dataset. In this case, however, we take
filter, as explained in the upcoming Section 4. Thus, detection the video from a single perspective, with two subjects, both
of a standing person works nearly off-the-shelf, and we wearing different clothes than in the training images. In total,
dedicate the rest of this section to the more interesting fall we have gathered 619 images with our subjects in upright or
detection task. Note that we aim to detect falls a posteriori, fallen-like postures.
which may then be used to alarm a caretaker, rather than in
real-time, which could conceivably help the monitored person

204
2018 IFAC CESCIT
June 6-8, 2018. Faro, Portugal Cristi Iuga et al. / IFAC PapersOnLine 51-10 (2018) 199–204 201

Table 1 shows the test results. We can see that, in spite of the (𝑣𝑣𝑥𝑥1,𝑘𝑘 𝑣𝑣𝑦𝑦1,𝑘𝑘 ) and (𝑣𝑣𝑥𝑥2,𝑘𝑘 𝑣𝑣𝑦𝑦2,𝑘𝑘 ) are the velocities of these
small number of examples in the training dataset, the network corners. The discrete time sample is denoted by 𝑘𝑘. The
obtains high accuracy and generalizes well for different measurement (output of the vision algorithm) only provides
clothing styles and different subjects. An example of the the positions of the corners, denoted together by 𝑧𝑧𝑘𝑘 =
output from the specialized network can be observed in Fig. 1. 𝑇𝑇
[𝑥𝑥1,𝑘𝑘 𝑦𝑦1,𝑘𝑘 𝑥𝑥2,𝑘𝑘 𝑦𝑦2,𝑘𝑘 ] . Denote also the vector of velocities
𝑇𝑇
Table 1. Fall detection test results 𝑣𝑣𝑘𝑘 = [𝑣𝑣𝑥𝑥1,𝑘𝑘 𝑣𝑣𝑦𝑦1,𝑘𝑘 𝑣𝑣𝑥𝑥2,𝑘𝑘 𝑣𝑣𝑦𝑦2,𝑘𝑘 ] .

False False The dynamics describing state transitions consists of a noisy


Total Positives first-order Euler integration of the velocities to obtain the
positives negatives
positions, and random-walk velocities:
Number of 619 535 37 84
images 𝑧𝑧𝑘𝑘+1 = 𝑧𝑧𝑘𝑘 + 𝑣𝑣𝑘𝑘 ∙ 𝑇𝑇𝑆𝑆 + 𝑤𝑤𝑧𝑧𝑧𝑧 , 𝑣𝑣𝑘𝑘+1 = 𝑣𝑣𝑘𝑘 + 𝑤𝑤𝑣𝑣𝑣𝑣

Percentage 100% 86.24% 5.97% 13.57% Such a model is called a constant-velocity model in computer
vision, where it is often used to track objects with unknown
motion. These dynamics are noisy linear:

𝑥𝑥𝑘𝑘+1 = 𝐴𝐴𝑥𝑥𝑘𝑘 + 𝑤𝑤𝑘𝑘


𝐼𝐼4 𝑇𝑇𝑆𝑆 𝐼𝐼4 𝑇𝑇 𝑇𝑇 ]𝑇𝑇
with 𝐴𝐴 = [ ] and the overall noise 𝑤𝑤𝑘𝑘 = [𝑤𝑤𝑧𝑧𝑧𝑧 , 𝑤𝑤𝑣𝑣𝑣𝑣
04 𝐼𝐼4
is zero-mean Gaussian with covariance matrix 𝑄𝑄 = 10−4 𝐼𝐼8 .
For the measurement equation, the positions are mapped
directly from the measurement to the state, while the velocities
are not directly measured and thus they are not included in the
mapping. This is represented as:

𝑧𝑧𝑘𝑘 = 𝐶𝐶 ∙ 𝑥𝑥𝑘𝑘 + 𝑢𝑢𝑘𝑘


with 𝐶𝐶 = [𝐼𝐼4 04 ]. The measurement noise 𝑢𝑢𝑘𝑘 is also zero-
mean Gaussian, with covariance 𝑅𝑅 = 10𝐼𝐼4 . The Kalman filter
is run with the initial error covariance matrix 𝑃𝑃0 = 0.1𝐼𝐼8 . For
the equations of the Kalman filter see e.g. Ristic et al. (2004).
Fig. 2 illustrates the tracking results, with the filtered bounding
box in cyan smoothly approaching the measurement from the
classifier, shown in solid green.

Fig. 1. Fall detection with the modified YOLOv2 CNN

4. PERSON TRACKING
In order to follow the person, the controller we will discuss in
Section 5 takes as input a bounding box enclosing the subject
in the image. The bounding box detected by YOLOv2 is
sometimes too noisy and therefore the vertices coordinates
change abruptly, causing performance issues with the control
of the drone. We address this problem using a Kalman filter
that smooths the bounding box outputs from the detection
network. This will make the coordinates of the vertices of the
bounding boxes change steadily as the target moves laterally,
distances themselves from, or approaches the drone.
The model used in the Kalman filter equations is detailed next.
We use the state vector:
𝑇𝑇
𝑥𝑥𝑘𝑘 = [𝑥𝑥1,𝑘𝑘 𝑦𝑦1,𝑘𝑘 𝑥𝑥2,𝑘𝑘 𝑦𝑦2,𝑘𝑘 𝑣𝑣𝑥𝑥1,𝑘𝑘 𝑣𝑣𝑦𝑦1,𝑘𝑘 𝑣𝑣𝑥𝑥2,𝑘𝑘 𝑣𝑣𝑦𝑦2,𝑘𝑘 ]

where (𝑥𝑥1,𝑘𝑘 𝑦𝑦1,𝑘𝑘 ) and (𝑥𝑥2,𝑘𝑘 𝑦𝑦2,𝑘𝑘 ) are the coordinates of the Fig. 2. Measured and filtered box in two subsequent frames
top-left and bottom-right corners of the bounding box, and

205
2018 IFAC CESCIT
June 6-8, 2018. Faro, Portugal
202 Cristi Iuga et al. / IFAC PapersOnLine 51-10 (2018) 199–204

5. DRONE CONTROL and angular control setpoints to be applied are computed as


follows for the rotation and distance correction:
In this section we present the final part of our application, the
control strategy, as well as the overall experimental results. −0.1, if 𝑐𝑐 < −𝑐𝑐̅
𝜔𝜔 = {0, if − 𝑐𝑐̅ ≤ 𝑐𝑐 ≤ 𝑐𝑐̅
5.1 Vision-based control 0.1,if 𝑐𝑐 > 𝑐𝑐̅


−0.1, if 𝑑𝑑𝑥𝑥 < 𝑑𝑑𝑟𝑟𝑟𝑟𝑟𝑟 − 𝑑𝑑̅


We apply a visual servoing strategy where the goal is to
maintain the drone at a reference distance 𝑑𝑑𝑟𝑟𝑟𝑟𝑟𝑟 of 4 meters 𝑉𝑉 = {0, if 𝑑𝑑𝑟𝑟𝑟𝑟𝑟𝑟 − 𝑑𝑑̅ ≤ 𝑑𝑑𝑥𝑥 ≤ 𝑑𝑑𝑟𝑟𝑟𝑟𝑟𝑟 + 𝑑𝑑̅
from the target, and to maintain the target in the center of the 0.1, if 𝑑𝑑𝑥𝑥 > 𝑑𝑑𝑟𝑟𝑟𝑟𝑟𝑟 + 𝑑𝑑̅
image. The commands correct the orientation 𝜃𝜃 of the drone 

on the z axis (the yaw) and the position on the x axis where 𝑐𝑐 is the position of the box center relative to the center
(represented as a distance 𝑑𝑑𝑥𝑥 from the person), by altering the of the image, normalized to [−0.5, 0.5] over the image width,
angular and linear velocity setpoints 𝜔𝜔, 𝑉𝑉 along these two axes. and the thresholds are 𝑐𝑐̅ = 0.15, 𝑑𝑑̅ = 0.6m. The setpoint
These setpoints are then sent to the AR.Drone 2 firmware, values are given in normalized units, as required by the
which applies a low-level control in order to track them. Fig. AR.Drone 2.0 firmware.
3 visually illustrates the control strategy.
5.2 High-level strategy

The high-level behavior of the drone is implemented in the


form of a state machine. The states and the algorithm for
switching between them are presented in Fig. 4. Assuming that
the initialization and takeoff activities of the drone have been
carried out successfully, the state machine enters the Follow
target loop. In this state, the UAV is mostly in a hovering
mode, however the yaw and distance controllers do make
position adjustments based on bounding box estimates
received from the person detector, by following the control
strategy presented above. When no bounding box is received,
e.g. due to transient network losses, the position error is
considered to be 0 and the drone does not exit the Hover state.
Fig. 3. Illustration of the vision-based control strategy
The feedback to the yaw and distance controllers is computed
based on measurements derived from the filtered bounding
box around the person. Specifically, the yaw controller uses
the deviation of the box center from the center of the image in
order to keep the target in the center of the frame, while the
distance controller uses the computed distance from the target
in order to keep the drone at the desired reference distance
from the person. Experiments have shown that in our case, the
distance at which the target is from the drone is most reliably
computed from the area of the bounding box:
𝑑𝑑𝑥𝑥 = 𝛼𝛼𝛼𝛼 + 𝛽𝛽
where 𝑆𝑆 is the area, 𝛼𝛼 is a scaling factor and 𝛽𝛽 a scaling bias,
both determined experimentally. The area is simply the width
times the height of the box in pixels, see again Fig. 3.
Preliminary control experiments with the drone have shown
that computing continuous commands and sending them at
each frame to the drone does not result in a good flight
behavior: the drone either did not respond promptly or well to
the commands, or it over-responded and lost the target before
further corrections could be applied. A practical solution to
this issue was to use for both the yaw and distance controllers
tripositional control laws (bipositional above some magnitude
threshold, plus a zero level in-between). Therefore, the linear
Fig. 4. The state-machine algorithm used to fly the drone

206
2018 IFAC CESCIT
June 6-8, 2018. Faro, Portugal Cristi Iuga et al. / IFAC PapersOnLine 51-10 (2018) 199–204 203

At each loop iteration, the controller also parses the input perspective of the drone, where the detected bounding box
stream coming from the Fall detector. When a fallen message around the person is shown as a rectangle and its centre is
is received, the drone proceeds with a landing procedure. Our shown as a disk. To the right of each still, the perspective of
demonstrator implements no special action here, but this the monitored person and a third-person view are shown. The
condition could be used to e.g. alert a caretaker or deliver first- three stills illustrate, in order: the normal situation where the
aid medication. person is at the reference distance; the controller in action
when the person is moving; and a successful fall detection
5.3 Results event (in which case the bounding box becomes red).

In our practical experiments, the drone generally behaved


appropriately, responding correctly to the given commands
and flying without losing its target, with some limited
resilience to occlusions (see the demo video below).
Fig. 5 illustrates how the distance controller works to maintain
the reference distance of 4 meters. In the experiment, the target
was first at 6 meters from the drone, before the controller
started to send correction commands. After the drone correctly
approached the target and reached a relatively stable hover, the
person approached the drone. This resulted in the drone
moving backwards in order to maintain the reference distance.

Fig. 5. Evolution of distance between drone and target during


a control experiment
Fig. 6 shows, for the same experiment, the evolution of the
target center coordinate, in pixels along the horizontal
dimension of the image. The center of the image corresponds
to 300 pixels. The drone uses yaw rotation to maintain target
center at the center of the image. Note that during the
experiment the person was actually slightly moving sideways
with respect to the drone.

Fig. 7. Practical demonstration

6. CONCLUSIONS
In this paper we presented a first application for monitoring
and detecting falls of at-risk (e.g. elderly) persons using a
UAV. The position and state (upright or fallen) of the person
are determined with deep-learning-based vision methods, and
Fig. 6. Evolution of target center coordinate a simple control strategy keeps the person in view of the drone
and maintains a set distance between the two. In experiments,
A video of a practical demonstration is available at falls were reliably detected, and the algorithm was able to
http://rocon.utcluj.ro/files/aufdemo.mp4. Fig. 7 shows a few correct the position of the drone so as to follow the person.
representative video stills, each including on the left the

207
2018 IFAC CESCIT
June 6-8, 2018. Faro, Portugal
204 Cristi Iuga et al. / IFAC PapersOnLine 51-10 (2018) 199–204

This application is a proof of concept and many elements can Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012).
be improved. On the vision side, open issues include e.g. Imagenet classification with deep convolutional neural
obstacle detection, explicit handling of occlusions, and networks. In Advances in Neural Information Processing
robustness to multiple persons in the scene. At least as systems, 1097-1105.
important is the control strategy, where better controllers de Lima, A.L.S. et al. (2017), Freezing of gait and fall
should provide increased performance and a smoother detection in Parkinson’s disease using wearable sensors: a
behavior of the drone; here it will be important to address systematic review, Journal of Neurology 264(8): 1642-
effects due to the wireless communication network, using 1654.
techniques from networked control systems. On the
implementation side, merging into a single network the two Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
networks currently responsible for detecting respectively the Ramanan, D., Dollar, P., and Zitnick, C.L. (2014).
position and the state of the person would lead to Microsoft COCO: Common objects in context. In
computational savings for the GPU or CPU. The final European Conference on Computer Vision.
application objective is to have the entire sensing and control Mathe, K., Busoniu, L. (2015) Vision and Control for UAVs:
pipeline run on board of the drone, for which a different drone A Survey of General Methods and of Inexpensive
with stronger on-board processors is needed. Platforms for Infrastructure Inspection, Sensors 15(7):
14887-14916.
ACKNOWLEDGEMENT Murphy, R. (2014). Disaster Robotics, MIT Press, 2014.
This work was supported by a grant of the Romanian National Nunez-Marcos, A., Azkune, G., and Arganda-Carreras, I.
Authority for Scientific Research, CNCS-UEFISCDI, project (2017). Vision-based fall detection with convolutional
number PN-III-P1-1.1-TE-2016-0670. neural networks. In Wireless Communications and Mobile
Computing.
REFERENCES Redmon, J. and Farhadi, A. (2017). Yolo9000: better, faster,
Andriluka, M. et al. (2010). “Vision based victim detection stronger. In Computer Vision and Pattern Recognition.
from unmanned aerial vehicles”, IEEE/RSJ International Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn:
Conference on Intelligent Robots and Systems. Towards real-time object detection with region proposal
Baer, M., Tilliette, M-A., Jeleff, A., Ozguler, A., and Loeb, T. networks. In Advances in Neural Information Processing
(2014). Assisting older people: From robots to drones, Systems, 91-99.
Gerontechnology 13(1):57-58. Ristic, B., Arulampalam, S., Gordon, N. (2004) Beyond the
Boucher, P. et al. (2013). Design and validation of an Kalman Filter: Particle Filters for Tracking Applications.
intelligent wheelchair towards a clinically-functional Artech House.
outcome, Journal of NeuroEngineering and Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,
Rehabilitation 10(58):1-16. Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein,
Bourke, A.L., O’Brien, J.V., Lyons, and G.M. (2007), M., Berg, A.C., and Fei-Fei, L. (2015). ImageNet Large
Evaluation of a threshold-based tri-axial accelerometer Scale Visual Recognition Challenge. International
fall detection algorithm, Gait & Posture 26(2): 194-199. Journal of Computer Vision, 115(3): 211-252.

Cucchiara, R., Prati, A. and Vezzani, R. (2007) A multi- Simonyan, K. and Zisserman, A. (2015). Very deep
camera vision system for fall detection and alarm convolutional networks for large-scale image recognition.
generation, Expert Systems 24(5): 334-345. In International Conference on Learning
Representations.
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., and
Zisserman, A. (2012). The PASCAL Visual Object Skubic, M. et al. (2016). Testing non-wearable fall detection
Classes Challenge (VOC2012) Results. methods in the homes of older adults, IEEE Annual
International Conference on Engineering in Medicine and
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual Biology Society.
learning for image recognition. In Computer Vision and
Pattern Recognition. WHO (2007). Global report on fall prevention in older age,
World Health Organization.

208

Оценить