Академический Документы
Профессиональный Документы
Культура Документы
ABSTRACT
Social interactions are a vital aspect of everyones daily living. Individuals with
visual impairments are at a loss when it comes to social interactions as majority
(nearly 65%) of these interactions happen through visual non-verbal cues.
Recently, efforts have been made towards the development of an assistive
technology, called the Social Interaction Assistant (SIA)[1], which enables access
to non-verbal cues for individuals who are blind or visually impaired. Along with
self report feedback about their own social interactions, behavioral psychology
studies indicate that individuals with visual impairment will benefit in their social
learning and social feedback by gaining access to non-verbal cues of their
interaction partners. As part of this larger SIA project, in this paper, we discuss the
importance of person localization while building a human-centric assistive
technology which addresses the essential needs of the visually impaired users. We
describe the challenges that arise when a wearable camera platform is used as a
sensor for picking up non-verbal social cues, especially the problem of person
localization in a real-world application. Finally, we present a computer vision
based algorithm adapted to handle the various challenges associated with the
problem of person localization in videos and demonstrate its performance on three
examplar video sequences.
Figure 1. The Social Interaction Assistant Figure 3. Person of interest at a large distance
from camera
In [1] we introduce a systematic requirements When such a person of interest is in close proximity,
analysis for an effective SIA. Through an online his/her presence can be detected by analyzing the
survey (with inputs from 27 people, of whom 16 incoming video stream for facial features (Figure 2).
were blind, 9 had low vision, and 2 were sighted But when such a person is approaching the user
specialists in the area of visual impairment) we rank from a distance, the size of the facial region in the
ordered a list of important visual cues related to video appears to be extremely small. In this case,
social interaction that are considered important by relying on facial features alone would not suffice
the target population. Most of the needs identified and there is a need to analyze the data for full body
through this survey display the importance of features (Figure 3). In this work, we have
extracting these following characteristics of concentrated on improving the effectiveness of the
individuals in the scene, namely, a) Number and SIA by applying computer vision techniques to
location of the interaction partners, b) Facial robustly localize people using full body features.
expressions, c) Identity, d) Appearance, e) Eye Following section discusses some of the critical
Gaze direction, f) Pose and e) Gestures. A brief issues that are evident when performing person
glance through this list reveals the commonality of localization from the wearable camera setup of the
these issues with some of the important research SIA
questions being tackled by the face research group
of the computer vision and pattern recognition 1.2 Challenges in Person Localization from a
community. In this regard, many advances have wearable camera platform
been made in order to extract information related to A number of factors associated with the
humans from images and videos. But, when the background, object, camera/object motion, etc.
mobile setup of SIA is considered with real world determine the complexity of the problem of person
data captured in unconstrained settings, a new localization from a wearable camera platform.
dimension of complexity is added to these problems. Following is a descriptive discussion of the
As most of these cues are related to people in the imminent challenges that we encountered while
surroundings of the user, it is essential to localize processing the data using the SIA.
the individuals in the input video stream prior to
1.2.1 Background Properties has not been studied much in the literature. Figure
5(a) shows the simplicity of the data when these
When the Social Interaction Assistant is used in problems are not present, while Figure 5(b)
natural settings, it is highly possible that there are highlights complex data formulations in a typical
objects in the background which move, thus interaction scenario.
causing the background to be dynamic. Also, there
are bound to be regions in the background whose 1.2.3 Object/Camera Motion
image features are highly similar to that of the
person, thus leading to a cluttered background. Due
to these factors, the problem of distinguishing the
person of interest from the background becomes
highly challenging in this context. Figure 4 (a) and
(b) illustrate the contrast in the data due to the
nature of the background. (a) Static Camera
1 N
OTE = (Centroid gTruthi Centroid tracki ) (7)
N i =1
An OTE value closer to 0 implies better tracking
(a) SMSPF Results on a sequence from Dataset1 while a value away from 0 implies larger distance
between the prediction and ground truth.