Академический Документы
Профессиональный Документы
Культура Документы
tracking,
sparse
representation,
I. I NTRODUCTION
ISUAL tracking, one of the fundamental topics in computer vision, has long been playing a critical role in
numerous applications such as surveillance, military reconnaissance, motion recognition and traffic monitoring. While
much breakthrough has been made within the last decades
(like [7][16], etc), it still remains challenging in many
aspects including pose variation, illumination change, partial
Manuscript received August 20, 2013; revised December 18, 2013 and
February 17, 2014; accepted February 17, 2014. Date of publication
February 26, 2014; date of current version March 14, 2014. This work was
supported in part by the Natural Science Foundation of China under Grants
61071209 and 61272372, and in part by the Joint Foundation of China
Education Ministry and China Mobile Communication Corporation under
Grant MCM20122071. The associate editor coordinating the review of this
manuscript and approving it for publication was Prof. Richard J. Radke.
The authors are with the School of Information and Communication
Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China (e-mail:
zhuangbohan2013@gmail.com; lhchuan@dlut.edu.cn; 461179822@qq.com;
wangdong.ice@gmail.com).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIP.2014.2308414
Fig. 1.
Challenges during tracking in real-world environments, including heavy occlusions (Woman), abrupt motion (Face), illumination change
(Singer1), pose variation (Girl) and complex background (Cliffbar). We use
blue, green, black, yellow, magenta, cyan and red rectangles to represent the
tracking results of the OSPT [1], APGL1 [2], LSAT [3], ASLAS [4], MTT [5],
SCM [6] and the proposed method, respectively.
1057-7149 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
1873
Through the above analysis, we propose a reversed multitask sparse tracking framework which projects the templates
matrix (both positive and negative templates) into the candidates space. By selecting and weighting the discriminative
sparse coefficients, the DSS map and pooling method lead to
the best candidate. Our contributions can be summed up in
the following three aspects:
First, we propose an innovative optimization formulation
named multi-task reverse sparse representation. In our
work, a single task means to reconstruct a template with a
few candidates that bear more similarity with the template
than the others, which is inverse to the traditional sparsity
based formulations (like those in [1][6], [19], [20]) and
multi-task means that we seek to simultaneously reconstruct multiple templates. A customized APG method
is derived for getting the optimum solution (in matrix
form) within several iterations. A Laplacian term is
also included to keep the coefficients similarity level in
accordance with the candidates similarities, which makes
our tracker more robust as the experimental observations
show. This formulation provides the tracker with the
similarity relationship between all the candidates and
templates through solving only one optimization problem
without loss of accuracy. Therefore, this formulation is
more superior in terms of cost-performance ratio.
Second, we construct a discriminative sparse similarity
map (DSS map) based upon those similarity relationship.
The discriminative information containing in this map
comes from a large template set composed by multiple positive target templates and hundreds of negative
templates. Both the target templates and the background
templates are updated online to accommodate the appearance change in and near the target area. With this DSS
map, candidates are evaluated in both directions: not
only how similar it is to the target object but also how
different it is from the background. This is also one of
the key difference from most previous sparse trackers like
[1][5], [19][21], [23], [24], making our tracker more
robust when similar objects appear near the target or
when the target appearance bears some similarity with
the background due to partial occlusion.
Third, we propose a simple yet useful additive pooling
method to make the best use of the information in the
DSS map and before this step the DSS map would
be refined with adaptive weights to get rid of potential
instability. Through this pooling scheme, the information
for each candidate is integrated to be a single score and
the candidate with the highest score is regarded as the
tracking result.
II. BAYESIAN I NTERFERENCE F RAMEWORK
We carry out the object tracking in a Bayesian interference
framework, a technique for estimating the posterior distribution of state variables that characterize a dynamic system,
to form a robust tracking algorithm. We define the observation
set of target Zt = [z1 , z2 , . . . , zt ], and let xt be the state
variable of an object at time t. In the tracking frame, we use the
1874
(1)
xti
where xti indicates the state of the i -th sample. The posterior
probability can be inferred from the Bayesian framework
recursively,
p(xt |Zt ) p(zt |xt ) p(xt |xt 1) p(xt 1|Zt 1 )dxt 1 (2)
where p(xt |xt 1) is the dynamic model and p(zt |xt ) denotes
the observation likelihood. The state variable xt is composed
of six independent parameters {1 , 2 , 3 , 4 , 1 , 2 }, in
which {1 , 2 , 3 , 4 } are the deformation parameters and
{1 , 2 } contain the 2D transformation information. As the
dynamic model can be modeled by the Gaussian distribution,
it can be represented by
p(xt |xt 1 ) = N(xt ; xt 1, )
(3)
s.t. c 0
(4)
......
c p+1
......
arg
min
||t
Yc
p+n
p+n ||2 + ||c p+n ||1
c p+n
1875
Fig. 2. Problem Formulation. This figure illustrates the basic idea of the multi-task reverse sparse representation scheme. (a) The positive and the negative
template sets. (b) The sampled candidates. (c) The discriminative sparse similarity map (DSS map).
s.t. ci 0, i = 1, 2, . . . , ( p + n).
(6)
ij
= 2tr (CLC )
(8)
s.t. ci 0, i = 1, 2, . . . , ( p + n).
(9)
(11)
G(C) = (C)
(12)
(7)
F(k+1 ) 2
s.t. ci 0, i = 1, 2, . . . , ( p + n),
ij
1876
Fig. 3. This figure illustrates how the discriminative sparse similarity map indicates whether a candidate is good or not. (a) The original discriminative
similarity map. A typical good candidate and a bad one are picked as examples. (b) The process regarding how to obtain the refined discriminative feature
for the good candidate. The notion is the Hadamard product (element-wise product). (c) The process regarding how to obtain the refined discriminative
feature for the bad candidate. The sub features related to the positive/negtive templates are shown in red/green. We notice that the positive part in the refined
discriminative feature for the bad candidate is weakened by the adaptive weights.
Algorithm 1 Algorithm for Optimizing the Laplacian MultiTask Reverse Sparse Representation.
(15)
where 1 R ( p+n) ((p+n) is the number of templates) denotes
the vector whose entries are all ones. Based on the above
assumption, Eq. 13 is equivalent to
k+1 = max(0, gk+1 )
(16)
(17)
(18)
1877
(19)
A candidate with smaller difference from a foreground template indicates they share higher similarity with each other,
representing that the candidate is more likely to be a target
object, and vice versa. For the following employment, we
separate the weight map into two submaps:
W pos = [w1 , . . . , w
p] ,
Wneg = [w
p+1 , . . . , w p+n ] ,
(20)
(21)
(22)
and
where is the Hadamard product (element-wise product).
In the weighted DSS map, an element Fi j = Wi j Ci j is
supposed to be large only when the j -th candidate has small
difference from the i -th template and it plays a significant
role in decomposing the i -th template with other candidates.
Otherwise, Fi j will have a small or even zero value, indicating
that j -th candidate bears little similarity with the i -th template.
An example is shown in Fig. 3(c) to illustrate the benefit of
this refinement process. For the bad candidate, the sub-feature
related to the positive templates (in red) are non-zero since the
positive templates might account for some minor parts of the
bad candidate in the 1 minimization process. Although their
values are small, they might cause unexpected tracking result.
However, by applying the adaptive weight, the refined subfeature related to positive templates (in red) are suppressed to
be close to zero, which means the bad candidate just bears
similarity to some negative templates instead of any positive
templates. In terms of this view, we can get the most accurate
feature for the candidate in order to get the convincing final
candidate score.
B. Additive Pooling
For the i -th candidate, we view the i -th column in the
refined similarity map F as a refined discriminative feature:
fi = [ F1i , . . . . . . , F pi , F( p+1)i , . . . . . . , F(n+ p)i ] ,
(23)
(24)
(25)
(26)
(27)
(28)
1878
Fig. 4. This figure intuitively illustrates how to get the discriminative scores for all candidates and choose the best candidate state based on it. (a) The
weighted DSS map. (b) Two score vectors after the first step of additive pooling, and they respectively indicate the degree of resemblance to the positive
(upper one) and the negative (bottom one) template set for all candidates. (c) The final discriminative score vector after the second step of additive pooling.
(d) The optimal state corresponding to the candidate that scores the highest.
1879
TABLE I
C OMPARISON R ESULTS IN T ERMS OF AVERAGE C ENTER E RROR ( IN P IXELS ). T HE B EST T HREE R ESULTS
A RE S HOWN IN R ED , B LUE , AND G REEN F ONTS .
(The Last Two Columns are for Self Comparison and do not Participate in Ranking)
TABLE II
C OMPARISON R ESULTS IN T ERMS OF AVERAGE OVERLAP R ATE ( IN P IXELS ). T HE B EST T HREE R ESULTS A RE S HOWN IN R ED , B LUE ,
AND G REEN F ONTS . T HE L AST R OW S HOWS C OMPARISON R ESULTS A BOUT C OMPUTATIONAL L OADS IN T ERMS OF Fps.
(The Last Two Columns are for Self Comparison and do Not Participate in Ranking)
1880
Fig. 5. Sample tracking results on fifteen challenging sequences. (a) Occlusion1 and Woman with heavy occlusion and in-plane rotation. (b) Caviar1 and
Caviar2 with heavy occlusion and in-plane rotation. (c) Face, Jumping and Deer with abrupt motion. (d) DavidIndoor, Singer1 and Car4 with illumination
change. (e) Sylvester2008b, Girl and Dudek with pose variation. (f) Cliffbar and Car11 with background clutter.
and store them in the DSS map. Meanwhile, the additive pooling method effectively extract the discriminative information
in the DSS map and enables our method to accurately calculate
the discriminative scores and find the optimum candidate.
VII. C ONCLUSION
In this paper, we propose an efficient tracking algorithm
based on a discriminative sparse similarity map which is
obtained via a multi-task reverse sparse coding approach
with Laplacian constraint. The proposed formulation enjoys
advantages including light computational load through using a
customized APG method and ideal stability by incorporating
a Laplacian term. The employment of dynamically updated
positive and negative template sets supplies our tracker with
sufficient discriminative information, which is stored in the
DSS map and accurately integrated via an additive pooling
scheme. Both quantitative and qualitative evaluations against
several state-of-the-art algorithms based on challenging image
sequences demonstrate the accuracy and the robustness of the
proposed tracker.
1881
R EFERENCES
[1] D. Wang, H. Lu, and M.-H. Yang, Online object tracking with sparse
prototypes, IEEE Trans. Image Process., vol. 22, no. 1, pp. 314325,
Jan. 2013.
[2] C. Bao, Y. Wu, H. Ling, and H. Ji, Real time robust L1 tracker
using accelerated proximal gradient approach, in Proc. CVPR, 2012,
pp. 18301837.
[3] B. Liu, J. Huang, L. Yang, and C. Kulikowsk, Robust tracking using
local sparse appearance model and k-selection, in Proc. CVPR, 2011,
pp. 13131320.
[4] X. Jia, H. Lu, and M. Yang, Visual tracking via adaptive structural
local sparse appearance model, in Proc. CVPR, 2012, pp. 18221829.
[5] T. Zhang, B. Ghanem, S. Liu, and N. Ahuja, Robust visual tracking
via multi-task sparse learning, in Proc. CVPR, 2012, pp. 20422049.
[6] W. Zhong, H. Lu, and M. Yang, Robust object tracking via sparsitybased collaborative model, in Proc. CVPR, 2012, pp. 18381845.
[7] D. A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang, Incremental learning
for robust visual tracking, Int. J. Comput. Vis., vol. 77, nos. 13,
pp. 125141, 2008.
[8] Z. Kalal, J. Matas, and K. Mikolajczyk, P-N learning: Bootstrapping
binary classifiers by structural constraints, in Proc. CVPR, 2010,
pp. 4956.
[9] J. Kwon and K. M. Lee, Visual tracking decomposition, in Proc.
CVPR, 2010, pp. 12691276.
[10] B. Babenko, M.-H. Yang, and S. Belongie, Visual tracking with online
multiple instance learning, in Proc. CVPR, 2009, pp. 983990.
[11] A. Adam, E. Rivlin, and I. Shimshoni, Robust fragments-based tracking
using the integral histogram, in Proc. CVPR, 2006, pp. 798805.
[12] S. Hare, A. Saffari, and P. H. Torr, Struck: Structured output tracking
with kernels, in Proc. ICCV, 2011, pp. 263270.
[13] M. Godec, P. Roth, and H. Bischof, Hough-based tracking of non-rigid
objects, in Proc. ICCV, 2011, pp. 8188.
[14] E. G. Learned-Miller and L. S. Lara, Distribution fields for tracking,
in Proc. CVPR, 2012, pp. 2533.
[15] F. Yang, H. Lu, and M. Yang, Robust visual tracking via multiple
kernel boosting with affinity constraints, IEEE Trans. Circuits Syst.
Video Technol., vol. 24, no. 2, pp. 242254, Jul. 2013.
[16] F. Yang, H. Lu, and M.-H. Yang, Learning structured visual dictionary
for object tracking, Image Vis. Comput., vol. 31, no. 12, pp. 992999,
2013.
[17] P. Prez, C. Hue, J. Vermaak, and M. Gangnet, Color-based probabilistic tracking, in Proc. ECCV, 2002, pp. 661675.
[18] D. Comaniciu, V. Ramesh, and P. Meer, Kernel-based object tracking,
IEEE TPAMI, vol. 25, no. 5, pp. 564577, May 2003.
[19] X. Mei and H. Ling, Robust visual tracking using 1 minimization,
in Proc. ICCV, 2009, pp. 110.
[20] X. Mei, H. Ling, Y. Wu, E. Blasch, and L. Bai, Minimum error bounded
efficient 1 tracker with occlusion detection, in Proc. CVPR, 2011,
pp. 12571264.