Вы находитесь на странице: 1из 6

2012 11th International Conference on Machine Learning and Applications

Improving image segmentation using genetic algorithm

HUYNH Thi Thanh Binh

MAI Dinh Loi

NGUYEN Thi Thuy

School of Information and


Communication Technology
Hanoi University of Science and
Technology
Hanoi, Vietnam
binhht@soict.hut.edu.vn

School of Information and


Communication Technology
Hanoi University of Science and
Technology
Hanoi, Vietnam
csmloi89@gmail.com

Dept. of Computer Science


Faculty of Information Technology
Hanoi University of Agriculture
Hanoi, Vietnam
myngthuy@gmail.com

In recent years, research has been focused on the


combination of contextual information with local visual
features to resolve regional ambiguities [2, 3, 4].
Researchers have attempted to use techniques that can
exploit contextual information for representing object class.
One of the prominent representations for local image feature
is the texton [2]. The author in [18] has developed an
efficient framework for a learning novel feature based on
texton and exploiting appearance, shape and context of
object classes. Besides extracting good features, building a
good learning algorithm is a crucial factor a robust
segmentation system. For learning object classes, state-ofthe-art machine learning techniques, such as Bayes, SVM,
boosting are usually used for learning a classifier to
classify the object class. For improving the performance of
those traditional classifiers, besides adding contextual
information to the image features, advanced learning
techniques which allow incorporating contextual
information to the model have been developed. Typical ones
are random field models, such as Markov random fields
(MRFs) and conditional random fields (CRFs). However,
even with the state-of-the-art classifiers, the segmentations
accuracy on natural images is still far from being satisfied.
For example, for the Microsoft Research Cambridge
(MSRC) data set, the percentage of correctly labeled image
pixels is relatively low (approximately 75%), and some
object classes (such as boats and birds) have very low
segmentation results, below 20 to 30% [18,19,21].
To our knowledge, there has not been any attempt to use an
evolutionary technique for improving the performance of
some state-of-the-art classifiers for the problem of semantic
image segmentation. In this work, we propose a new
approach for improving the performance of some image
segmentation systems. In particular, we employ Genetic
Algorithm (GA) for further improvement in the
performance of Textonboost [28]. The problem of learning
parameters of the weak classifiers in Textonboost can be
modeled as an optimizing problem for the parameters of a
texture-layout based classifier. Here, the feature and
parameters to be optimized are encoded as chromosomes in
the GA. We conducted experiments for our new approach

Abstract This paper presents a new approach to the problem


of semantic segmentation of digital images. We aim to improve
the performance of some state-of-the-art approaches for the
task. We exploit a new version of texton feature [28], which can
encode image texture and object layout for learning a robust
classifier. We propose to use a genetic algorithm for the
learning parameters of weak classifiers in a boosting learning
set up. We conducted extensive experiments on benchmark
image datasets and compared the segmentation results with
current proposed systems. The experimental results show that
the performance of our system is comparable to, or even
outperforms, those state-of-the-art algorithms. This is a
promising approach as in this empirical study we used only
texture-layout filter responses as feature and a basic setting of
genetic algorithm. The framework is simple and can be
extended and improved for many learning problems.
Keywords- Semantic image segmentation, object recognition,
boosting learning, texton feature, genetic algorithm

I.

INTRODUCTION

Semantic image segmentation is a central problem in


computer vision. Its aim is to partition an image into
multiple regions and automatically label each region
belonging to a specific object-class. The problem is also
known as multi-class images labeling, i.e., image
segmentation is equivalent to the labeling for all pixels in
the image. For example, consider a photo taken at a lake, the
segmentation algorithm will assign to each pixel a label
such as water, sky, boat.
This problem has drawn the attention of researchers in the
field for decades [1,3,7,18,22]. Significant progress has
been made, thanks to advances and new techniques for
feature extraction, object modeling, machine learning
methods, and the building of large benchmark image data
sets. Despite of that, it is still a challenging problem. The
performance of an image segmentation system depends
mainly on two processes: image feature extraction and
object class learning. The first process is challenging due to
the variety of object appearances. This requires that the
second process, the machine learning technique, has to be
robust to be able to separate object classes in feature spaces.

978-0-7695-4913-2/12 $26.00 2012 IEEE


DOI 10.1109/ICMLA.2012.134

18

segmentation in a single framework, and the work on


multiple classes is quite limited. In [28], the average
segmentation accuracy was only approximately 73% and the
detection rate was very low for some object classes such as
birds and boats. This led to attempts to develop new
frameworks which can improve the performance on this
data set in particular, and for the problem of semantic object
segmentation in general [19, 20, 21]. However, the
segmentation results are still modest.
Research in applying evolutionary techniques in computer
vision is quite limited. Recently, there have been few
attempts to employ genetic algorithms for some problems in
computer vision. Genetic algorithm has been used for the
problem of face recognition [24], 3D-object recognition
[25], and real time object detection [26]. But there has been
no attempt to use GA for semantic image segmentation. In
this work, we develop a new framework using GA for
automatic segmentation of images, aims to improve the
performance of some recently proposed systems.
By using only the texture-layout feature for building weak
classifiers, with the parameters optimized by GA, we
obtained comparable results to very recent proposed
algorithms on benchmark dataset.

on the MSRC 21-class databases. This is a most


comprehensive and complex labeled image dataset for
semantic image segmentation. This database has been used
widely used for evaluating the performance of newly
proposed systems [28,20,19]. We found that the proposed
system gives very promising results, which are comparable
to very recent proposed system such as [19,21], even in this
empirical research we used just the texture-layout feature.
The rest of this paper is organized as follows: Section II
describes the related work. In section III, we present the
proposed algorithm for semantic object segmentation. Our
experiments and evaluation are provided in section IV.
Section V concludes the paper and and contains a discussion
for future work.
II.

RELATED WORKS

The problem of automatic segmentation of images is a


major problem in computer vision and in recent years it has
received more attention from researchers in the field. The
problem can be essentially considered as object detection,
where parts of the object are detected, then the detections
can be used to infer the segmentation. This is the top-down
approach, in which pre-learned object fragments or patch
based that consists of a template and labeling are used [29,
30, 31, 32, 35]. These methods focus on the segmentation
of an object-class (e.g., a cow) from the background. The
ideas could be used for multi-class image labeling and for
methods that are based on unary potentials and CRFs.
Kumar et al. [33] described a method that provides a
coherent mechanism for combining top-down and bottomup cues. He et. al [16] used segment techniques with a
bottom-up algorithm to produce super-pixels, then merged
them together and assigned semantic label using a
combination of specific scene CRF models. [14] has
proposed an approach to combine a local discriminative
classifier that allows the use of arbitrary, overlapping
features, with smoothing over the label field. Their system
was not applied for segmentation at the pixel-wise level and
for accommodating multiclass labeling problems. Tu et. al
[12] presented a system that can perform two tasks of
segmentation and labeling simultaneously. The work
obtained quite high performance but the framework did not
allow for the integration of visual modules and develop a
general purpose vision. Shotton et al. [28] described a new
approach of learning a discriminative model of object
classes. It is based on a new version of texton feature,
which jointly models shape, texture and layout. Using this
feature, unary classification and feature selection are
achieved using joint boost (shared boosting) [36]. The
TextonBoost, which is used to define the unary potentials,
focuses on the estimation of the unary term using a strong
boosting classifier. The overall model is then learned using
a variance of CRF. The model could be used to achieve
automatic detection, recognition and segmentation of object
classes in images. Before this paper, there were very few
methods that could combine the recognition and

III.

PROPOSED ALGORITHM

A. Preliminary
In this section we present basic terminologies and
techniques that will be used in our framework.
Textons allow a compact representation for different
appearances of an object. It has been proven to be effective
in categorizing generic object classes. See [28] for a detail.
Texture-layout filter is a pair (r, t) of an image region r, and
a texton t. Region r is defined in coordinates relative to the
pixel i being classified. The features have been shown to be
sufficiently general to allow learning layout and context
information for the object classes.
Joint Boost: Given the texture-layout feature, many
techniques can be used for learning the object classes. [28]
employed an adapted version of the Joint Boost algorithm
for automatic feature selection and learning the texturelayout potentials. The process iteratively selects
discriminative texture-layout features as weak classifiers
and combines them into a powerful classifier.
This process leads to a problem of minimizing parameters
with an error function, which unfortunately requires an
expensive search over the possible weak classifiers
(features) to find the optimal combination of the sharing set
N, features (r, t), and thresholds . Several optimizations
are possible to speed up the search for the optimal weak
classifiers. Since the set of all possible sharing sets is
exponentially large, greedy approximation has been used
[28, 36]. To speed up the minimization over features, the
author in [28] employed a random feature selection
procedure and performed sub-sampling on pixels for

19

training. This could significantly affect the performance of


the system.
In this work, we employ texture-layout features for GA
to learn parameters. By a suitable encoding mechanism, we
can perform feature selection and parameters learning at the
same time. We use a set of parameters similar to [28] to learn
weak classifiers. This is done for its efficiency and an easy
comparison of results.

Fitness function: We use the error function [28] as fitness


function to evaluate the candidates for GA:

B. Learning Texture-layout with GA (GAST)


Genetic algorithms (GAs) are an optimization approach
based on the mechanics of natural selection and genetics.
GAs have been proven to be effective tools to optimize
discrete parameters in NP-hard problems. GAs have been
employed for solving some problems in computer vision
such as noise filtering and face recognition [24].
Experimental results have shown that GA could be a
promising approach to solve these problems. But in the
problem of semantic image segmentation, to our knowledge,
there has not been any research on using GA to solve this
problem.
In this section, we present how to use GA for finding good
image features and learning the optimized parameters for
the classifiers. The semantic image segmentation problem is
basically processed in three steps: (1) Extract image
features; (2) Train a classifier, and (3) Perform segmentation
using the learnt classifier. We employ the set up of Shotton
[28] for learning. There, in step (1), texture-layout features
are extracted as presented in previous section, resulting in a
texton map, the classifier in step (2) is strong classifier
obtained by the Jointboost algorithm. The crucial step to
achieve good segmentation result is to obtain a good
classifier, which is step (2). We use GA to improve the
accuracy of segmentation by improving the classifier, which
is implemented in step (2).
Joint boosting algorithm [28] builds M weak learners
from learning texture-layout features, and then combines
them into a powerful classifier. Each weak learner is learnt
based on feature response v[r, t] (i) of the form:

J wse ci ( zci him (c))


c

Where

i
c

is the classification accuracy for class c after m-

1 rounds of running.
i
c

i zci him ( c )
c

ci

is then

re-weighted as

to reflect the new classification accuracy

and maintain the invariant that c e


i

zci H i ( c )

zci {1, 1}
(+1 if example i has ground truth class c, -1 otherwise).
Initialization: The initial population of the GA algorithm is
initialized by two methods: The first one is by randomly
selecting d value, the second one is by using Joint Boost
algorithm.
Evolutionary operators: For the crossover operator, we use
the two point crossover technique; for the mutation
operator, we use swap mutation method.
Selection: To select the parents for crossover, one is
selected randomly from the population, the other one is
selected randomly from 50% population which has the best
fitness function.

IV.

EXPERIMENT AND EVALUATION

In this section, we present our experimental results and


evaluate the proposed image segmentation approach (GAST)
on a benchmark image database. We then compare it with
state-of-the-art methods.
A. Data sets
We conducted experiments on a benchmark image data set
MSRC, which is widely used to test the performance of
recently proposed systems [4, 28, 19, 20, 21]. The data set
includes 591 images in resolution of 320x240, accompanied
by ground-truth segmentations of 21 classes. , including:
building, grass, tree, cow, sheep, sky, aero plane, water, face,
car, bike, flower, sign, bird, book, chair, road, cat, dog, body,
and boat. For the experiments, the data is split into roughly
45% for training, 10% for validation and 45% for testing.
The splitting should ensure approximately proportional
contribution of each class.
B. System Setting
In the experiments, the system was run 20 times for each
test set. All the programs were run on a machine with Core
i5-2410M CPU 2.30GHz (4 CPUs), RAM 4GB DDR III
1333Mhz, Windows 7 Ultimate, and implemented by C#.
With our experience in using GA, and after conducting an
extensive set of experiments, we found that the following
value of GA gave the best performance for our system: the
population size is 700, the number of generation is 300,
crossover rate is 30%, mutation rate is 1%. In our
initialization population: 30% individuals are generated by
Joint Boost, 70% individuals are randomly selected.

a v (i ) b if c S

hi (c) r ,t
otherwise
kc

with parameters (a, b, { }c N, , S, r, t), (.) is a 0-1


indicator function. The region r and texton index t specify
the texture-layout filter feature, and v[r, t] (i) is the
corresponding feature response at position i. Each set of
parameters of each weak learner can be determined by a
feature value d, where d is a value in (0, D), where:
D = number of textons * number of regions.
Individual representation:
We use binary representation to encode the parameters. Due
to the maximum value of D can represented by 16-bit
number, we use 16-bit chromosomes to present d. For
example, if d=12345, the corresponding chromosome is
0011 0000 0011 1001.

20

we obtained comparable results over all object classes, and


outstanding segmentation results in some hard classes. This
is a promising approach which can be extended and applied
for many learning problems. For future work, we plan to
employ more image features in addition to the texton-layout
into the framework to improve the performance. We will
also conduct more research on GA to make it more suitable
for the task.

C. Results and Comparison


The performance of our system in term of segmentation
accuracy on the MSRC 21-class data set is given in Table 2.
From the Table, we see that we can obtain over 90% of class
labeled accuracy for classes such as bike, flower, book,
chair, and cat. The average segmentation accuracy over all
object classes is 78.7%. To evaluate the effectiveness of our
algorithm, experiment results are compared with recently
proposed systems, including Joint Boost [28], Theme-based
CRF [19], Hierarchical CRF [20] and dense CRF [21]. We
will make two comparisons: the first one is in terms of
segmentation accuracy for each object class, the second one
is the overall accuracy of object segmentation. Table 1
shows the segmentation accuracy of each class in 21-class
database using these algorithms. As can be seen from the
Table, we obtained the best results of 8 in 21 classes in
comparison to stat-of-the-art techniques.
For some class labels like aero plane, bike, bird, book, chair,
cat, dog, and boat, GAST gave the best accuracy in
comparison to other models. Especially with hard labels
such as cat, dog, and chair, the results are significantly
improved (from 10 to 33%. For example, accuracy of
segmentation for the class chair with Joint Boosting and
Hierarchical CRF, Hierarchical CRF are really small (less
than 31%) but when using GAST, this value is 98.8%.
Beside that, a significant achievement compared to other
approaches is that we obtained segmentation results
relatively even for all classes, i.e., for all object classes we
obtained accuracy higher than 50%, and no class with too
low accuracy. Moreover, we obtained the lowest difference
between global and average values (see Figure 1).
We also make a comparison for overall accuracy of
segmentation. We use two measurements for evaluation:
iL Nii is the percentage of image pixels correctly
global

i , jL

ACKNOWLEDGMENT
We would like to thank Philipp Krahenbuhl for providing us
the data set as well as sending us the materials related to
their works on object recognition problem.
This work was partially supported by the project Some
Advanced Statistical Learning Techniques for Computer
Vision funded by the National Foundation of Science and
Technology Development, Vietnam under grant number
102.01-2011.17. The Vietnam Institute for Advanced Study
in Mathematics provided part of the support funding for this
work.
REFERENCES
[1]
[2]

[3]
[4]

[5]

[6]

Nij

assigned to the class label in total number of image pixels,


Nii
is the average accuracy for
and average

iL

[7]

| L | jL Nij

[8]

21 classes, where: Nij is number of pixels of label i which


are labeled with j label, L is set of labels of MSRC 21-class.
Our algorithm has the largest average value in comparison
with previous models. This shows an improvement in
segmentation and recognition.
V.

[9]
[10]

[11]

CONCLUSION

We have presented a new approach for improving semantic


image segmentation system using a genetic algorithm. We
employed texton-layout filter responses as image features.
We then used GA for finding good features and learning
optimized parameters for the classifier. We adopted the
settings of Jointboost for the experiments. Experiments were
conducted and evaluated extensively on benchmark data sets
for segmentation of 21 object classes. We compared the
segmentation accuracy with current proposed systems and

[12]
[13]
[14]

[15]

21

J. Shi and J. Malik, Normalized Cuts and Image


Segmentation. IEEE Trans.PAMI, 22(8): 888-905, 2000.
J. Malik, S. Belongie, T. Leung, and J. Shi. Contour and
texture analysis for image segmentation. IJCV, 43(1):727,
June 2001.
E. Borenstein, E. Sharon, and S. Ullman, Combining top-down
and bottom-up segmentation, In Proc. CVPRW, 2004.
S. Gould, J. Rodgers, D. Cohen, G. Elidan, and D. Koller.
Multi-class segmentation with relative location prior. IJCV,
pages 5-10, 2008.
Comaniciu and P. Meer, Mean shift: A robust approach
toward feature space analysis. IEEE Trans. PAMI, vol. 24,
no. 5, pages 603619, 2002.
J. Winn and N. Jojic. LOCUS: Learning Object Classes with
Unsupervised Segmentation. ICCV, vol. 1, 2005.
S. Gould, T. Gao and D. Koller, Region-based Segmentation
and Object Detection, NIPS, 2009.
R. Fergus, P.Perona, and A.Zisserman: Object class
recognition by upsupervised scale-invariant learning. In Proc.
CVPR, vol. 2, p. 264-271, June 2003.
M. Marszalek and C. Schmid. Semantic hierarchies for visual
object recognition. In Proc. CVPR, June 2007.
A.C.Berg, T.L.Berg, and J.Malik: Shape matching and object
recognition using low distortion correspondences. In Proc.
IEEE CVPR, vol. 1, p. 26-33, June 2005.
J. Winn, A. Criminisi, and T. Minka. Categorization by
learned universal visual dictionary. ICCV, vol. 2, 2005.
E. Borenstein, E. Sharon, and S. Ullman. Combining topdown and bottom-up segmentations. In CVPRW, vol. 4, 2004.
B. Leibe and B. Schiele. Interleaved object categorization and
segmentation. In Proc. BMVC, vol. II, p. 264-271, 2003.
S. Kumar and M. Hebert. Discriminative random fields: A
discriminative framework for contextual interaction in
classification. In Proc. CVPR, vol. 2, p. 1150-1157, 2003.
Z. Tu, X. Chen, A.L. Yuille, and S.C. Zhu. Image parsing:
unifying segmentation, detection, and recognition. In Proc.
ICCV, vol. 1, p. 18-25, Nice, France, October 2003.

[16] X. He, R.S.Zemel, and D.Ray. Learning and in corporating


[17]

[18]

[19]
[20]

[21]
[22]

[23]

[24]

[25]

[26] J. M. Gomez, J.A. Gamez, I. G. Varea, V. Matellan. Using

top-down cues in image segmentation. In A. Leonardis, H.


Bischof, and A.Pinz, editors, Proc. ECCV, 2006.
P. Duygulu, K. Barnard, N. de Freitas, and D. Forsyth.
Object recognition as machine translation: Learning a lexicon
for a fixed image vocabulary. In A.Heyden, G.Sparr, and P.
Johansen, editors, Proc. ECCV, vol. 2353, 2002.
J. Shotton, J. Winn, C. Rother, and A. Criminisi.
TextonBoost: Joint appearance, shape and context modeling
for multi-class object recognition and segmentation. In Proc.
ECCV, pages 115, 2006.
S. Wu, J. Geng, F. Zhu. Theme-Based Multi-Class Object
Recognition and Segmentation. In Proc. ICPR, 2010.
L. Ladicky, C. Russell, P. Kohli and P. H.S. Torr. Associative
Hierarchical CRFs for Object Class Image Segmentation. In
Proc. ICCV, 2009.
P. Krahenbuhl, V. Koltun. Efficient Inference in Fully
Connected CRFs with Gaussian Edge Potentials. NIPS, 2011.
R. Fergus, P. Perona, and A. Zisserman. Object class
recognition by unsupervised scale-invariant learning. IEEE
CVPR, vol. 2, p. 264271, June 2003.
M. Johnson, G. Brostow, J. Shotton, O. Arandjelovic, V.
Kwatra, and R. Cipolla. Semantic photo synthesis. Computer
Graphics Forum, 25(3): pages 407413, September 2006.
Sarawat Anam, Md. Shohidul Islam, M.A. Kashem, M.N.
Islam, M.R. Islam, M.S. Islam. Face Recognition Using
Genetic Algorithm and Back Propagation Neural Network.
International MultiConference of Engineers and Computer
Scientists 2009 Vol I, March 18-20, 2009, Hong Kong.
G.. Bebis, S. Louis and S. Fadali. Using Genetic Algorithms
for 3D Object Recognition. Proc. 11th Int. Conf. Computer
Applications in Industry and Engineering, Las Vegas, 1998.
Joint Boosting

Mean-shift

[27]

[28]

[29]
[30]
[31]

[32]
[33]
[34]
[35]
[36]

Genetic Algorithms for Real-Time Object Detection. Proc. of


the 2010 Intl Conf. on Applications of Evolutionary
Computation. Vol. Part I, p. 261-271, Springer-Verlag.
M. Boutell, J. Luo, X. Shena, and C. Brown, Learning
multilabel scene classification, Pattern Recognit., vol. 37,
no. 9, pp. 17571771, Sep. 2004.
J. Shotton, J. Winn, C. Rother, A. Criminisi. TextonBoost for
Image Understanding: Multi-Class Object Recognition and
Segmentation by Jointly Modeling Texture, Layout, and
Context. In IJCV, 2007.
E. Borenstein and S. Ullman. Class-specific, top-down
segmentation. In Proc. ECCV, p. 109124, 2002.
E. Borenstein and S. Ullman. Learning to segment. In Proc.
8th ECCV, Prague, Czech Republic, vol. 3, p. 315328, 2004.
B. Leibe, A. Leonardis, and B. Schiele. Combined object
categorization and segmentation with an implicit shape
model. In Workshop, ECCV, May 2004.
A. Opelt, A. Pinz, and A. Zisserman. A boundary-fragmentmodel for object detection. Proc. ECCV, Graz, Austria, 2006.
M. P. Kumar, P. H. S. Torr, and A. Zisserman. OBJ CUT. In
Proc. IEEE CVPR, San Diego, volume 1, pages 1825, 2005.
G. Csurka and F. Perronnin. A Simple High Performance
Approach to Semantic Segmentation. In Proc. BMVC, 2008.
P. Kohli, L. Ladick, and P. H. S. Torr. Robust Higher Order
Potentials for Enforcing Label Consistency. CVPR, 2008
A. Torralba , K. P. Murphy and W. T. Freeman. Sharing
visual features for multiclass and multiview object detection.
IEEE Trans. PAMI, 2007.

Hierarchical CRF

Theme-based CRF

GAST

building

62

63

80

78.4

65.1

grass

98

98

96

95.3

89.5

tree

86

89

86

73.3

65.6

cow

58

66

74

79.8

74.6

sheep

50

54

87

74.3

71.8

sky

83

86

99

76.3

77.9

aero plane

60

63

74

80.7

86.4

water

53

71

87

53.3

60.8

face

74

83

86

66.4

84.8

car

63

71

87

88

75.8

bike

75

79

82

79

92.5

flower

63

71

97

92.4

92.8

sign

35

38

95

77.6

81.2

bird

19

23

30

41.7

72.4

book

92

88

86

93.6

98.8

chair

15

23

31

66.1

98.8

road

86

88

95

83.3

69.4

cat

54

33

51

82.5

94.5

dog

19

34

69

64.2

85.5

body

62

43

66

64.1

65

boat

32

24.2

50.4

Table 1: Comparison of segmentation accuracy of Joint Boost [18], Mean-shift [20], Hierarchical CRF [20], Themebased CRF [19] and GAST.
22

grass

Tree

cow

Sheep

sky

plane

water

face

car

Bike

flower

sign

bird

book

chair

road

cat

dog

body

boat

Building

building

65.1

0.0

4.1

0.0

0.4

3.6

2.6

1.9

2.1

1.0

3.1

0.2

3.4

0.0

1.0

0.0

5.9

0.0

1.7

3.7

grass

0.7

89.5

2.9

1.9

0.6

0.7

0.6

0.5

0.2

0.2

0.1

0.6

0.8

0.1

0.7

Inferred
class
True
class

0.
0

tree

5.4

65.6

1.4

6.7

1.1

3.3

0.4

0.2

0.2

7.1

1.2

0.6

0.3

0.8

0.
7

cow

22.5

0.2

74.6

2.7

sheep

1.7

24.5

0.7

0.3

71.8

0.1

0.7

0.2

sky

2.3

2.5

0.6

0.1

77.9

0.4

2.5

0.5

7.7

0.8

0.7

aero
plane

3.1

5.4

0.3

4.2

86.4

0.6

water

4.9

7.6

0.6

0.2

8.5

60.8

0.1

1.5

2.8

0.8

0.6

3.
5

face

1.7

2.3

84.8

6.2

0.3

4.6

car

14.7

1.8

0.9

75.8

5.8

1.1

bike

2.3

0.1

0.1

0.9

92.5

4.1

flower

4.7

0.8

0.8

0.9

92.8

0.1

sign

18.4

0.2

81.2

0.2

bird

0.4

13.1

0.1

11.7

72.4

0.9

0.9

0.
4

book

0.6

0.1

98.8

0.5

chair

0.3

98.8

0.9

road

6.2

1.4

3.1

0.6

0.2

0.7

0.3

0.7

2.8

1.3

4.2

69.4

2.7

3.2

1.
2

cat

5.5

94.5

dog

0.9

0.3

10.4

85.5

2.
9

body

7.4

2.8

2.7

14

0.1

0.6

1.8

1.7

3.8

65

50
.4

boat

18.3

global

77.7

avg

78.7

9.1

22.1

Table 2: Segmentation accuracy of our system on the MRSC 21-class data set

Figure 1: Comparison of the global and average results found by Joint Boosting [18], Mean-shift, Hierarchical CRF [20], Themebased CRF [19], Fully connected CRF [21], and GAST over 20 running times.

23

Вам также может понравиться