Академический Документы
Профессиональный Документы
Культура Документы
Abstract—The present paper considers the usage of deep season, it is essential to provide first aid as soon as possible
learning and transfers learning techniques in fall detection by in order to prevent the negative consequences associated with
means of surveillance camera data processing. As a dataset, the hypothermia. Therefore, the up-to-date task is to design
an open dataset gathered by the Laboratory of Electronics and
Imaging of the National Center for Scientific Research in Chalon- technical means that will detect an emergency situation in an
sur-Saone was used. The architecture of the CNN AlexNet, which automated mode and inform the operator if help is needed.
was used as a starting point for the classifier, was adapted to In the present work, the problem of developing such a
solve fall detection problem. The proposed method was tested on system for recognizing human movements through the records
a dataset of 30 records containing a single fall episode each.
We achieved Cohen’s kappa of 0.93 and 0.60 for the fall – of CCTV cameras and detecting falls has been considered. The
non-fall classification for the known and unknown for classifier fall of a person can be caused by many factors: loss of balance
surrounding conditions, respectively. due to lack of cerebral blood supply, muscle weakness, etc. In
Index Terms—deep learning, machine learning, intelligent any case, a situation when a person has fallen and cannot get
video analysis, fall detection up without assistance is dangerous and requires an immediate
response.
I. I NTRODUCTION
The problem of detecting falls using video analysis has been
At present, surveillance cameras are widely used. They are studied in a large number of works [8], [9], which were based
utilized to identify suspicious persons, as well as persons on analyzing the shape and position of the person in the frame,
in a state of alcohol and drug intoxication, which may be gradients in the vertical and horizontal directions, and changes
dangerous to a life and health of others. The developers of the in images in the time domain.
closed-circuit television system (CCTV systems) offer com- In the majority of papers on CCTV cameras records analysis
plexes consisting of IP cameras for the automated detection of the effectiveness of the fall event classifiers is artificially
suspicious persons. These systems can be based on biometric overestimated because of the limitations of the datasets used
identification (for example, the NeoFace system, smart glasses for testing and training procedure. These limitations may be
R7) [1], [2], as well as on the recognition of emotions or described as follows:
facial expressions (for example, DeepFace ) [3]–[6]. The main
disadvantage of this type of systems is that information about • Dataset is usually formed by the data recorded in un-
the possible trespasser may be absent in the database used for changed conditions (most often in the laboratory condi-
biometric identification. tions, rather than realistic ones), and for uniform illumi-
It should be noted that in the majority of cases, data from nation of the entire analyzed scene.
CCTV cameras are used only for the creating archives of • One and the same person acts as test subjects.
video records, and the possibilities of intelligent video analysis • Movement artifacts of the ”fall” type are similar in pre-
of data from remote objects are practically not used. Such ceding actions of the subject and her/his relative position
situation is determined in particular by technical difficulties toward the camera at the time of the fall.
associated with the use of intelligent video surveillance sys- • In almost all cases, falls are performed on a cushioning
tems in practice. For example, classification algorithms are mat, which most often has a significant color contrast
usually extremely sensitive to lighting conditions [7]. with respect to the clothing of the subject.
As a specific problem of intelligent video analysis, we can All these factors result in a significant overestimation of the
define the problem of detecting an abnormal situation when classifiers presented in papers based on the analysis of datasets
a person is in danger and there are no passers around, who with the above-listed disadvantages.
can help (for example, a person has fallen and can not get
The purpose of this paper was to evaluate the applicability
up or call for help). If such situation takes place in the cold
of deep learning and transfer learning techniques in automated
The research was supported by the grant of Russian Foundation for Basic detection of falls by analysis of surveillance cameras data
Research (17-20-03034). gathered in realistic conditions.
II. M ETHODS images (fall and non-fall classes), we need to make changes
to the AlexNet CNN architecture. Namely, replace the 23d
As it is known, deep learning technique is based on the
layer with 2 ”fully connected layers”, since there are two
use of convolution neural networks (CNN) [10], which are
classes, and replace the 25th layer in accordance with the class
biologically-inspired variants of multilayer perceptrons. The
names for ”fall” and ”non-fall”. After making the appropriate
main drawback of this technique is that its training requires
changes, the network architecture has the appearance shown
a huge amount of data (more than 100,000 examples) for the
in Fig. 1. Layers that are different from the original AlexNet
successful implementation in practice. As a result, the training
are underlined. After CNN is designed supervised training
process can last for weeks and months, which is not always
procedure should be done by using a dataset with images for
acceptable.
both classes (fall and non-fall).
Transfer learning [11] is the technique which allows over-
coming disadvantages of deep learning. It may be described III. E XPERIMENTAL DATA
as following: the researcher selects for solving a new problem In present work, an open database of video recordings
CNN already trained for another similar problem. That allows provided by the Laboratory of Electronics and Imaging of the
transferring knowledge obtained as a result of solving one task National Center for Scientific Research in Chalon-sur-Saone
(for example, recognition of images of various animal species) was used [14]. In Fig. 2 and 3 examples of frames extracted
to solve another (for example, recognition of interior objects from the video records both corresponding to fall and non-fall
on the image). This approach allows reducing significantly the episodes are presented. The merits of this database include the
time required for CNN training on a new dataset comparing following factors:
to training a neural network from scratch (when initializing • Video signals are recorded for various environmental
the neural network connection with random weights). conditions.
In present paper transfer learning procedure was done utiliz- • The illumination of the experimental scene is irregular.
ing MATLAB2017b. We used pre-trained CNN AlexNet [12] The dataset contains records for which contrast of the
which is available by installing Neural Network ToolboxTM human over the background objects is rather low due to
Model for AlexNet Network support package. This CNN was the limits of the camera dynamic range and presence of
trained by its authors on 1.2 millions of images to classify regions with high brightness (for example, window area
1000 different classes and significantly outperformed in 2012 in Fig. 3).
all earlier CNN versions by utilizing more filters per layer that • 4 different subjects (3 men and 1 woman) participated in
previously proposed LeNet-5 [13], which was a pioneering the experiments.
CNN designed by Yann LeCunn in 1998. • Falls were performed at different viewing angles, both
In order to use this network to recognize only two classes of from the standing position and from the sitting position.
As transfer learning technique was used to train the CNN, • Accuracy = 0.99
the parameters of the input layer were remained unchanged. • Sensitivity = 0.93
That allows using values of connections weights available after • Specificity = 0.99
CNN pre-training. Moreover, initial learning rate parameter • Positive predictive value = 0.94
was set to be equal 0.001 to make changes of the initial values • Negative predictive value = 0.99
of weights small. As it is known, an epoch is a full training We have also used trained classifier for new surrounding
cycle for the whole training dataset. The maximum number of conditions (another 30 records subset of the dataset [12]),
training epochs parameter was set to be 20, and the batch size to understand if the CNN will need re-training in case the