Вы находитесь на странице: 1из 6

Real-Time Color-Based Tracking via a Marker Interface

Heikki Pylkkö, Jukka Riekki, Juha Röning


Dept. of Electrical Engineering and Infotech Oulu
FIN-90014 University of Oulu, Finland

Abstract are useful for a robot interacting with humans. People can
be located efficiently from colour images on the basis of
This paper presents a general architectural solution for
skin colour (see e.g. [1-8]). Objects handled by humans can
concurrent real-time color-based tracking of multiple ob-
be located by many different color-based segmentation
jects. The presented architecture is suggested for a mobile
methods (see e.g. [9, 10, 11]).
robot interacting with humans. The main contributions of
This paper suggests a system architecture called Cocoa
the architecture are markers and color judges. Markers are
and image processing methodology that together meet the
a situated representation of the robot’s environment. Color
above challenges. The main contributions of the architec-
judges separate color-based segmentation methods from
ture are markers, color judges, and dynamic handling of
the process of labeling images. Furthermore, methodology
markers.
for coping with objects entering and leaving the robot’s field
Markers are a situated representation of an agent's envi-
of view is suggested. Tracking experiments performed with
ronment, that is, a representation supporting quick reactions
a system implemented on a Nomad XR4000 robot are pre-
to changes in the environment. They were first introduced
sented as well.
by Chapman [12]. A marker maintains information about an
object (or several similar objects) in the environment. In our
1 Introduction architecture, markers are components of the architecture
A mobile robot serving humans in their everyday environ- acting as an interface between the image processing and
ment would have a vast number of potential applications. control systems. In earlier work, we utilized markers to
However, the need to interact with people and the nature of track objects based on distance measurements [13] and to
our daily environment set challenging requirements for the track several objects in a highly dynamic simulated environ-
robot’s perception. Because of the huge number of details in ment [14,15]. In this study, we utilized markers in color-
our daily environment, the robot should concentrate only on based tracking. Furthermore, this is the first time in a real
the environmental features relevant to the task at hand. Fur- system that a control system assigns tasks to the image
thermore, several features often need to be tracked simulta- processing system via markers.
neously, such as the head and hands of a person, an object Inspired by the variety of color-based segmentation
and the hand grasping it, or a coffee pot and a mug. The methods, we propose the use of color judges. Color judges
tracked features should be swithced smoothly as the robot’s are components of the image processing architecture. They
local environment changes. Finally, all this should take are a novel way of separating the color-based segmentation
place at the pace of the environment, which constrains sig- method from the process of labelling the images, thus facil-
nificantly the set of possible solutions. itating the testing of different segmentation methods and
Vision has the potential to provide enough information dynamic, context-aware switching between them.
for a wide variety of tasks. Ultrasonic and other sensors are Dynamic handling of markers is a methodology for cop-
suitable for studying the coarse structure of the local envi- ing with the dynamics of our everyday environment. By this
ronment, but vision is needed to produce detailed informa- methodology, markers are updated when objects enter or
tion. Furthermore, color has proven to be a very powerful leave the robot's field of view.
cue for image segmentation. It has numerous advantages The rest of the paper is organized as follows. In chapter
compared to geometric cues and gray scale intensity. The 2 the Cocoa system architecture and image processing
capability to endure partial occlusion, rotation in depth, methodology are presented. Chapter 3 presents the experi-
scale and resolution changes make it an attractive approach ments and results for the suggested system. Discussion of
to segmentation. Many interesting segmentation methods the results together with concluding remarks are presented
feasible in service robots have been studied. Especially in chapter 4.
methods for detecting people and objects handled by them
2 Cocoa System VISUAL TRACKING SYSTEM
ROBOT
CONTROL

We have identified the following properties as essential for


a real-time tracking system for a mobile robot. A brief ra- CAMERA MODULE
tionale is given for each of them as well.
COLOR VALUE
1. Simultaneous tracking of multiple moving objects. The COLOR
execution of multiple simultaneous tracking tasks LOOK-UP
TABLE PIXEL COLOR
should be supported by the system because there are LABELING JUDGE

usually several interesting objects, such as a human


head and hands, in the field of view at the same time.
PIXEL LABEL
OPTIONAL IMAGE
2. Dynamics of tracking. Tracking should be able to detect PROCESSING
(NOISE REMOVAL,
new objects that appear into the field of view and start BINARIZATION etc.)

to track them. It should also be able to detect disappear-


ing objects and cease tracking them.
OBJECT
3. Real time performance. Real-time tracking is required OBJECT LABELING
MARKERS

in order to make the robot capable of smooth interac-


tion capabilities with its environment and humans.
4. Context awareness. The tracking system should be eas-
ily configurable to allow different tasks, and it should OBJECT TRACKING

be robust against the changing environmental condi-


tions, such as lighting and imaging geometry.
Our solutions for fulfilling these requirements are present-
ed next. The solution can be seen as a two-fold scheme – Figure 1. Cocoa architecture.
the first part being the system architecture and the second
arithmetic operations per connected component, which is
the segmentation methodology used in the system – be-
certainly less. Another reason is implied by requirement 4;
cause neither of the requirements mentioned above can be
We do not want to reduce the resolution of tracking by fil-
exclusively fulfilled by architectural design or the method-
tering, because there might be situations in which we also
ology used in the different parts of the system.
want to track objects that are very small in the image and
would otherwise be removed as noise. At the tracker level,
2.1 System Architecture
the level of ignorance can be easily and dynamically adjust-
The Cocoa system architecture is presented in Figure 1. ed. Nevertheless, some other image processing might be
The left part shows the color blob-based tracking system necessary as the raw pixel labeled images are not necessar-
and the right part the robot’s control system. Markers are ily suitable for the rest of the system. For example, if a
used as an interface between these two. The thick black ar- probabilistic segmentation method is used by the color
row indicates the transfer of a color image and the thick judge, the resulting labeled image is usually a gray-scale
white arrow indicates the transfer of a indexed or binary color probability image which needs to be binarized. The
image. Thin arrows indicate data flow. From the control same applies to images segmented by clustering methods.
system's perspective, a marker tracks automatically an ob- In blob-based tracking, the next step is object labeling,
ject in the robot's environment once it has been initialized which actually performs a connected component labeling
with the properties of the object. From the tracking sys- procedure for a binary image. Labeling is based on 4-con-
tem's perspective, a marker is an interface for receiving a nectivity, and it is performed efficiently within two runs
tracking task and for returning information about the through the image.
tracked object. The last operational part in our architecture is the track-
The tracking system reads the raw images from the cam- ing module, which tracks color blobs and updates object
era module. The first step in image processing is color- markers. It reads in the labeled frame and creates an object
based pixel labeling which is done by the pixel-labeling candidate from each connected component. For these can-
module. It reads the frames and assigns each pixel a label didates, it calculates some essential properties, such as po-
utilizing a color judge. sition (centroid), area and bounding box. It takes care of
Next, some extra image processing, such as noise re- noise removal by removing the object candidates that do
moval, may be committed. However, we prefer higher level not fulfil a certain adjustable area criterion. For the predic-
ignorance of noise components to pixel-wise operations tion-estimation procedure, the tracker utilizes Kalman fil-
such as morphological filtering or density mapping. There ters, namely alpha-beta trackers [17].
are two reasons for that. The first is implied by requirement With respect to the presented requirements, the three es-
3. Pixel-wise filtering is computationally much heavier sential aspects of the proposed architecture are the use of
than filtering at the object level, because it involves at least color judges in the pixel-labeling phase, the use of object
some arithmetic operations per pixel compared to some markers as an interface between image processing and con-
trol, and the dynamic handling of object markers and object mation needs to be executed prior to pixel-labeling. The
trackers. These are discussed in detail next. transformations, if any are needed, are internal operations
of color judges, and the actual pixel-labeling module can
2.2 Color Judges work in the color space in which the frames come from the
capturing hardware.
As no general-purpose color-based segmentation methods
exist, we want to be able to easily implement different
2.3 Marker Interface
methods for particular tracking purposes, which might
need to be running concurrently (requirement 1). Our solu- For a robot control system exploiting visual servoing for
tion to this problem is an object-oriented use of color judg- accomplishing its tasks, it is beneficial to have a standard-
es. Color judges are used by the pixel-labeling module to ized interface to the visual tracking system. We suggest ob-
assign labels to the pixels of incoming frames. All the judg- ject markers to be used as this interface. The marker
es inherit the same interface for communicating with the interface has the following essential functions:
pixel-labeling module, hiding the internal operation of 1. Configuring tracking tasks. The control system config-
judges from the labeling module and enabling the use of al- ures a tracking task by passing the properties of the
most any color-based segmentation method inside the object to be tracked to a marker. In minimum, the color
judge. There are two different possibilities for committing properties of the object are defined. These properties
the pixel-labeling procedure. are defined by specifying the color judge to be used in
1. Indirect use of a color judge. This alternative utilizes segmentation. In addition, other properties, such as the
the color judge for constructing a look-up table at the expected location of the object, can be passed.
initialization phase. Pixel-labeling runs through the 2. Transmitting object information. By reading the mark-
color space, passing each color value on to the color ers, the control system modules get the most recent data
judge through the standard interface, which returns a about the objects being tracked.
corresponding pixel label as a return value. The labe- To fulfil the requirements 1 and 4, the marker interface
ling module writes this return value in the look-up table should allow several concurrent tracking tasks and switch-
cell indicated by the current color values. When track- ing between the tasks based on the situation. A tracking
ing is started, pixel-labeling runs through the current task is switched by configuring an active marker with new
frame, reads the actual pixel label from the look-up properties. The number of concurrent tracking tasks (i.e.
table cell indicated by the current pixel value of the the number of markers) depends mostly on the capabilities
image and assigns it to the current pixel. of the image processing system. Several objects of the
2. Direct use of a color judge. In this approach, no look-up same type can easily be tracked concurrently, but tracking
table is constructed, but the pixel color values of incom- different types of objects concurrently is more challenging.
ing frames are directly passed to the color judge using Requirement 2 states that the tracking should be dynam-
the standard interface method and the returning pixel ic, which means that the tracking system should automati-
label is assigned to the current pixel. cally start tracking when an object with properties specified
The return values from the judges depend on the segmenta- by the control system enters the robot’s field of view. In a
tion method. For thresholding, they might be zeros for the similar manner, the system should automatically cease
background and ones for the object pixels and gray scale tracking when an object leaves the field of view. Dynamic
values if some probabilistic segmentation method is used. tracking is needed when the system should track all objects
If clustering is used, the color class labels are returned. of a type. There are also situations when only a single ob-
Two alternative ways of pixel labelling are justified by ject of a type should be tracked. Hence, there are two types
the requirements 3 and 4. If the segmentation method is of markers: static markers for tracking single objects and
time-consuming or some heavy color space transformation, dynamic markers for tracking multiple objects of the same
such as RGB to HSI, is committed by the color judge prior type. Requirement 3 can also be fulfilled, as the amount of
to segmentation, it is reasonable not to do such operations information (about the environment) to be updated is min-
on-line. In such cases, we suggest the indirect use of judg- imized.
es. On the other hand, direct use of color judges is justified As a conclusion, markers provide a two-way interface.
when the segmentation needs to be adaptive in order to First, they provide a single interface for specifying tracking
cope with, for example, varying lighting conditions that tasks. Second, they provide a standard place (system com-
cause the color values of objects to change over time. In ponent) for reading the locations and other essential prop-
such cases, the look-up table would need to be reconstruct- erties of the objects being tracked. Markers are the key to
ed every time an adaptation to a change in the conditions tracking only the environmental features relevant to the
takes place, which might take too much time. It is thus a task at hand. They can be seen as a situated model of the en-
better alternative to use a color judge directly allowing ad- vironment that does not suffer from the traditional symbol-
aptation any time without excessive processing caused by grounding problem (see e.g. [13]).
the look-up table reconstruction. One more advantage of
using color judges is the fact that no color space transfor-
2.4 Dynamics of Tracking jects should be created or deleted. This is shown in Eq. 2
where k indicates the change in the number of objects. If
Requirement 2 states that tracking should be able to detect this is positive, k objects should be created, and if it is neg-
appearing and disappearing objects and, correspondingly, ative, k objects should be deleted.
begin and cease tracking them. In practice, this means that
it should be possible to create and delete object trackers and (2)
k = c MA – o
update correspondingly the marker when objects come into n

the field of view and disappear. Below, we present our so-


lution to this problem. First, we present the tracking proce- Nevertheless, knowledge the number is not enough, but we
dure for dynamic markers: also need to know which objects to create or delete. As to
creation, we simply create new objects from among the free
3URFHGXUH7UDFNLQJIRU'\QDPLF0DUNHUV
candidates that have not been used for updating some ob-
1. Read in the labeled image.
ject. We consider an object as disappeared if k is negative
2. Create an object candidate from each connected com-
and the object’s probability is below a certain treshold. Ob-
ponent.
ject probability, PO , is updated from frame to frame. It is
3. Calculate properties for candidates.
calculated as an moving average of the object history, as
4. Delete candidates that do not fulfil a certain area crite-
presented in the Eq. 3.
rion.
5. For each object in the marker’s object list:
5.1. Calculate the distance between the object’s pre- n
dicted position and each object candidate. ∑ statusi
n–h
(3)
5.2. Store the nearest candidate and the shortest dis- PO = ---------------------------
h
tance.
5.3. If the shortest distance is shorter than a specified
treshold, update the object with FOUND status. Object status, statusi , is FOUND (1) if the distance be-
Copy the area and bounding box information from tween the object’s predicted position and the position of the
the nearest candidate to the object. nearest candidate in the corresponding frame i is short
5.4. Update the object’s alpha-beta tracker with the enough and NOT_FOUND (0) otherwise. Thus, object
nearest candidate’s position and velocity to get a probability is within the range [0,1].
new estimate and prediction for the object’s position When a new object is detected, it is added to the object
and velocity. list of the marker (a marker can track several objects of the
5.5. Else, if the shortest distance is greater than the same type) and a new alpha-beta tracker is created for it.
specified treshold, update the object with Accordingly, when an object is considered as having disap-
NOT_FOUND status. peared, it is deleted from the marker’s object list together
6. Update the moving average (MA) of the number of with its alpha-beta tracker.
object candidates. Compare that value to the current
number of objects to find out whether new objects 3 Experiments
should be created or some objects deleted. To prove the capabilities of our tracking system we present
7. Create or delete objects. some experimental results next. All the tests were run on a
8. Delete all object candidates. Nomadic XR4000 robot platform with the tracking soft-
9. Return the updated dynamic marker. ware running in the robots Pentium 200 Pro processor. The
robot is shown in Figure 2.
The creation and deletion of objects are presented in detail We used a Matrox Meteor frame grabber in the unsynchro-
next. The MA of the number of object candidates is calcu- nous capture mode to capture the video taken by a Cohu
lated over a history of h frames. This count is then com- 1300 single CCD color camera. Each experiment was run
pared to the current number of objects being tracked. We using two image sizes, 176*132 pixels and 320*240 pixels.
use the MA to make the tracker robust against momentary The three test tasks we assigned to the tracking system
false object candidates. From Eq. 1, we get the candidate were as follows:
MA, c MA , over a history of h frames, the current frame be-
1. Track the red RoboCup soccer ball.
ing n and the number of candidates in the frame being c i .
2. Track human skin areas (hands and faces).
3. Track the ball and skin areas.
n
As a tracking task the first assignment is trivial. However,
∑ ci
n–h
(1) it is useful as a reference the performance of the system in
c MA = -------------- the more challenging tasks 2 and 3.
h
In task 1, a static ball marker was set and it was instruct-
After that, we simply subtract the number of objects, o n , ed to use HSI tresholding judge for segmentation. The
currently being tracked from c MA to find out how many ob- tracking system was easily able to track the ball and update
marker, as we did in task 1, and also set a dynamic skin
marker, as was done in task 2. These two tasks were then
run concurrently. The image region traces are shown in
Figure 4.

Figure 2. Our Nomadic XR4000 mobile robot.


the ball marker at a rate of 40-60 Hz, depending on the res-
olution. Figure 4. Tracking of skin and the ball.
To prove that requirements 1, 2 and 3 can be fulfilled by
the system, task 2 was run. A dynamic skin marker was set It turned out that the system was able to reliably track both
and instructed to use a special skin color judge, which was skin areas and the ball at the same time. However, the
based on the skin segmentation method proposed by Sori- frame rate in this case was significantly lower due the fact
ano et al. [8]. The skin region traces produced by the sys- that two separate segmentation and object-labeling proce-
tem during this experiment are shown in Figure 3. The lines dures were run concurrently, approximately doubling the
are drawn from the origin to the locations where the skin re- processing time per frame.
gions were first detected.
4 Discussion
We proposed a system architecture and methodology for
color-based real-time visual tracking to be used in a mobile
robot that interacts with humans. We presented four essen-
tial requirements for a system of this kind, namely simulta-
neous tracking of multiple moving objects, dynamics of
tracking, real-time performance and context awareness. To
meet these requirements we introduced three essential nov-
elties: the use of a marker interface between the tracking
system and the robot control, the use of color judges for
segmentation, and dynamic handling of the tracked objects.
Our experiments showed that the requirements we set
were fulfilled by the system. One significant advantage of
our system is that a commodity PC can provide enough
Figure 3. Tracking of skin regions. processing power to do the tracking at real-time rates.
Controlling a mobile robot and a pan-tilt cameras utiliz-
ing marker data from color based tracking is one of the fu-
It was noted that the tracker was able to track multiple faces
ture challenges. The tracking of multiple objects of the
and hands in real time (40-60 Hz for 176*132 image and
same type concurrently imposes another challenge: Which
22-25 Hz for 320*240 image). Moreover, it was capable of
are the relevant objects? In the case of human skin, for ex-
taking faces and hands under tracking when they first ap-
ample, it should be possible to discriminate between faces
peared in the field of view and cease tracking when they
and hands. Pure 2D color tracking does not provide ade-
went out of it, proving that the dynamic handling of object
quate methods for that. Color-based segmentation methods
markers worked. It should be noted that, because of the
are also a subject for further study. Requirement 4 will not
characteristics of our skin judge, the segmented image is
be properly fulfilled until the color judges can adapt to the
extremely noisy. Due to the robustness of the tracker, the
current illumination conditions. Adaptation methods for
reliability of tracking was very good even without any pix-
skin color segmentation have been proposed by e.g. [7, 8,
el-wise filtering.
10], but no general methods are available.
Task 3 was the most challenging. We set a static ball
The mean shift [18] (or CAMSHIFT [19]) tracker mod- lu, Oulu, Finland. http://herkules.oulu.fi/isbn9514251318/
ule is also yet to be implemented in order to allow the track- [15] Riekki J., Pajala J., Tikanmäki A. & Röning J. (1999) CAT
ing of the objects directly from color probability images. Finland: Executing primitive tasks in parallel. In: Asada M
& Kitano H (eds.) RoboCup-98: Robot Soccer World Cup
II, Lecture Notes in Artificial Intelligence 1604, Springer-
Acknowledgements Verlag, pp 396-401
[16] Malcolm C. & Smithers T. (1990) Symbol grounding via a
Maricor Soriano, Sami Huovinen and Birgitta Martinkaup- hybrid architecture in an autonomous assembly system. In:
pi are acknowledged for their valuable work in skin seg- Maes (ed) Designing Autonomous Agents: Theory and
mentation that we utilized in our study. Academy of Practice from Biology to Engineering and Back. MIT Press,
Finland is acknowledged for their financial support. Cambridge, MA, p. 123-144.
[17] Chui C. K., Chen G. (1987) Kalman Filtering with Real-
Time Applications. Springer-Verlag
References
[18] Comaniciu D., Ramesh V. (2000) Real-Time Tracking of
[1] Xu G., Sugimoto T. (1998) Rits Eye: A Software-Based Non-Rigid Objects using Mean Shift. In: Proceedings of
System for Realtime Face Detection and Tracking Using IEEE Conference on Computer Vision and Pattern Recogni-
Pan-Tilt-Zoom Controllable Camera. In: Proc. of Four- tion 2000, pp. 142-149 vol. 2
teenth International Conference on Pattern Recognition, [19] Bradski G. (1999) Computer Vision Face Tracking For Use
1998, pp. 1194 -1197 vol. 2 in a Perceptual User Interface. In: Proceeding of Fourth
[2] Menser B. & Brünig M. (1999) Segmentation of Human IEEE Workshop on Applications of Computer Vision 1998,
Faces in Color Images Using Connected Operators. In: Proc. pp. 214 -219
of the International Conference on Image Processing 1999,
pp. 632-636 vol 3
[3] Yoo T.-W., Oh I.-S. (1999) A fast algorithm for tracking hu-
man faces based on chromatic histograms. In: Pattern Rec-
ognition Letters 20, 1999, p. 967-978
[4] Kjeldsen R., Kender J. (1996) Finding Skin in Color Images.
In: Proc. of the Second International Conference on Auto-
matic Face and Gesture Recognition, 1996, pp. 312-317
[5] Chai D., Ngan K. N. (1999) Face Segmentation Using Skin-
Color Map in Videophone Applications. In: IEEE Transac-
tions on Circuits and Systems for Video Technology, Vol. 9,
Issue 4, June 1999, p. 551 -564
[6] Feyrer S., Zell A. (1999) Detection, Tracking, and Pursuit of
Humans with an Autonomous Mobile Robot. In: Proc. of In-
ternational Workshop on Recognition, Analysis, and Track-
ing of Faces and Gestures in Real-Time Systems, 1999, p.
83 -88
[7] Wu Y, Liu Q, Huang T. S. (2000) An Adaptive Self-Organ-
izing Color Segmentation Algorithm with Application to
Robust Real-time Human Hand Localization. In: Proc. of
ACCV 2000, pp. 1106-1111
[8] Soriano M., Huovinen S., Martinkauppi B. & Laaksonen M.
(2000) Skin detection in video under changing illumination
conditions. In: Proc. of 15th International Conference on
Pattern Recognition, September 3-8, Barcelona, Spain, pp.
[9] Wei G.-Q., Arbter K., Hirzinger G. (1997) Real-Time Visu-
al Servoing for Laparoscopic Surgery: Controlling Robot
Motion with Color Image Segmentation. In: IEEE Engineer-
ing in Medicine and Biology, January/February 1997, p. 40-
45
[10] McKenna S., Raja Y., Gong S. (1999) Tracking colour ob-
jects using adaptive mixture models. In: Image and Vision
Computing 17, 1999, p. 225-231
[11] Nakamura T. Ogasawara T. (1999) On-Line Visual Learn-
ing Method for Color Image Segmentation and Object
Tracking
[12] Chapman D. (1991) Vision, instruction, and action. MIT
Press, Cambridge, MA.
[13] Riekki, J. & Kuniyoshi, Y. (1995) Architecture for Vision-
Based Purposive Behaviors. In: Proc. of the IEEE/RSJ Inter-
national Conference on Intelligent Robots and Systems
(IROS'95), Pittsburgh, USA, August 1995, pp. 82-89
[14] Riekki, Jukka (1998) Reactive task execution of a mobile ro-
bot. Acta Univ. Oul. C 129 (PhD Thesis), University of Ou-

Вам также может понравиться