Академический Документы
Профессиональный Документы
Культура Документы
a r t i c l e
i n f o
Article history:
Received 5 April 2013
Received in revised form 18 April 2014
Accepted 30 May 2014
Keywords:
Image classication
K-means
K-nearest neighbor
Nave Bayesian classier
Precision agriculture
Yield mapping
a b s t r a c t
This study was conducted to identify blueberry fruit of different growth stages using natural outdoor
images toward the development of a blueberry yield mapping system. As blueberries usually contain different maturity stages in a same branch, identication of blueberry fruit and their maturity stages from
different background is very important for yield mapping. In this study, maturity stages of the fruit were
divided into four categories: mature (m), near-mature (nm), near-young (ny) and young (y). A stepwised
algorithm, termed color component analysis based detection (CCAD) method, was developed and validated to identify blueberry fruit using outdoor color images. Firstly, a dataset was built using manually
cropped pixels from training images. Three color components, red (R), blue (B) and hue (H), were selected
using the forward feature selection algorithm (FFSA), and used to separate all fruit of four maturity stages
from background through different classiers. In this work, not only the traditional classiers such as Knearest neighbor (KNN), and nave Bayesian classication (NBC) were used, but another newly introduced
supervised K-means clustering classier (SK-means) was also developed and applied to the dataset. In
the second step, classiers were built to separate a group of mature & near-mature fruit from a group
of near-young & young fruit from all fruit pixels. Finally, classiers were developed to separate mature
fruit from near-mature fruit, and near-young fruit from young fruit. The classiers obtained from these
different steps were then applied to validation images, resulting in nal identication. Cross validation
was conducted using these different classiers and their results were compared. KNN classier yielded
the highest classication accuracy (8598%) from the validation set of the prebuilt pixel dataset collected
from the training images in all separations. An one-way ANOVA was used to compare the performance of
the three classies, which shows KNN performed signicantly better than other methods. The newly
proposed SK-means classier yielded a fairly high accuracy (90%) for the separation of mature and
near-mature fruit. The newly developed CCAD method for blueberry was proved to be efcient for
identifying blueberry fruit of different growth stages using natural outdoor color images toward the
development of a blueberry yield mapping system.
2014 Elsevier B.V. All rights reserved.
1. Introduction
Highbush blueberry is a good source of ber, and contains antioxidants, which makes it an excellent choice for fresh market fruit
production all over the world (U.S. Highbush Blueberry Council,
2012). In Florida, USA, southern highbush blueberry acreage and
production have increased rapidly. Compared with the 9 million
pounds in 2008, 19 million pounds of blueberry has been harvested
in 2012 (Brazelton, 2013). Since fresh Florida blueberries are
mostly hand-harvested, the primary production cost is harvesting
Corresponding author. Tel.: +1 352 392 1864; fax: +1 352 392 4092.
E-mail address: wslee@u.edu (W.S. Lee).
http://dx.doi.org/10.1016/j.compag.2014.05.015
0168-1699/ 2014 Elsevier B.V. All rights reserved.
labor which can exceed 50% of the total picking, grading, and packing costs. The production window of blueberry in Florida is from
about April 1 until May 15, which is relatively short. The prices
usually drop fast as berries enter the market from northern regions
(Yang et al., 2012). Since blueberry fruit in a same branch usually
do not ripen at the same time before harvesting season, it is important for farmers to estimate the quantity of blueberry fruit on the
bushes at different stages of their growth, so that they can make
proper arrangement for harvesting labor and its distribution to
specic locations in their elds. Also early yield estimation can
be used to provide feedback on how crops respond to certain soil
and crop management practices and to determine recommendation rates for many crop production inputs (Arslan and Colvin,
2002).
92
bran_g
bg
bran_b
leaf
bran_older
nm
ny
Fig. 1. Example image of blueberry fruit with different growth stages and other
objects.
93
Class description
Color description
m
nm
ny
y
leaf
bran_b
bran_g
bran_older
bg
sky
Mature fruit
Near-mature fruit
Near-young fruit
Young fruit
Leaf
Regular branch
Newly developed green branch
Older branch
Other background objects including soil and leaves on the ground
Sky
Deep blue
Reddish purple
Yellowish green
Green
Green
Burgundy
Green
Light brown
Mostly brown
Blue or white
5,712,768
2,334,720
5,326,080
6,555,456
5,647,104
846,336
2,469,696
758,784
5,544,960
5,223,936
After the pixel data library was built for the 10 classes in the
images, ten thousand pixels were randomly chosen for each class
from the training images, and they were named as the whole dataset in this study. A half of the whole dataset (10,000/2 = 5000 pixels from each class) was randomly chosen to be used as a training
dataset, and the other half was used as a validation dataset. The
training dataset was used for analyzing distribution of different
color components of the 10 classes. The other half of the pixels
was used to build a validation dataset. These two datasets were
used for color component selection for training and testing. A fruit
training dataset was created by combining the training datasets of
m, nm, ny and y classes, resulting in a total of 20,000 pixels (4 classes 5000 pixels). Similarly, a background training dataset was
created by combining objects from the other six classes, resulting
in 30,000 pixels (6 classes 5000 pixels). The whole dataset was
used for a cross validation for choosing classiers for different
separations.
2.3. Color component analysis based detection (CCAD) method for
blueberry
The choice of color spaces and color components to classify RGB
images is widely discussed (Gonzalez and Woods, 2002; Szeliski,
2010). Red, green, blue (RGB), hue, saturation, intensity (HSI), luminance, blue-difference chroma, red-difference chroma (YCbCr) and
luminance, in-phase, quadrature-phase (YIQ) are four commonly
used color spaces. Also the normalized color based on maximum
value and 2 greenredblue (2 grb, or excess green index
(EGI)) are widely used for separating green vegetation from the
background (Woebbecke et al., 1995; Sgaard and Olsen, 2003).
In this study, since the images were acquired outdoors, varying
illumination conditions would affect the processing results significantly. Thus, I in HSI, Y in YCbCr, and Y in YIQ color spaces were
not used for fruit detection, which represented intensity or luminance, and directly reected varying illumination of the experimental environment. Therefore, a total of 10 dimensional dataset
was built using the other color components of R, G, B, H, S, I, Q,
Cb, Cr and EGI as a training dataset. To explore the possibility of
utilizing smaller number of color components, the forward feature
selection algorithm (FFSA) as described by Whitney (1971) and
Kumar et al. (2001) was chosen to reduce the number of color components and also to determine the best subset of color components
that would yield the best classication results. FFSA is explained in
a later section.
This section describes an overview of the proposed CCAD
method, and more details are explained in subsequent sections.
The whole algorithm is accomplished in eight steps as shown in
the algorithm owchart in Fig. 2. In the rst step, six color components were selected using FFSA according to their performance
among the 10 color components. It was found that the six components chosen by FFSA and the rst three components (R, B, and H)
performed almost equally well. Thus, only these three color
94
Fig. 3. The diagram describing how datasets were used in the paper.
and then posterior value was calculated through the Bayesian rule.
Based on the generated model, the class label of each sample in the
validation dataset was decided, through the posterior probability
calculated for each class. The label of the class with maximum posterior probability was assigned to the corresponding sample.
2.5.3. Supervised K-means clustering classier using weighted
Euclidian distance (SK-means)
Data clustering nds similarities in data, and then it partitions a
dataset into several groups so that the similarity within a group is
bigger than that of other groups. Different data clustering methods,
such as K-means clustering, fuzzy C-means clustering, and mountain clustering were applied to a variety of areas, including image
and speech data compression (Hammouda and Karray, 2000).
However, it has not traditionally been applied to classication
problems for the reasons that it usually does not assign a meaning
or class label to the groups of data it clustered. In this study, a
novel method, supervised K-means clustering using weighted
Euclidean distance (SK-means), was proposed and applied to the
blueberry fruit classication problem. The algorithm was accomplished in the following four steps:
S2iY N
12
B
C
r2iY N @qqA
2
2
Sii N SYY N
where
95
S2ii n
n2 2
1
2
S n 1 X i n X i n 1
n 1 ii
n
S2iY n
n2 2
1
S n 1 X i n X i n 1X Y n X Y n 1
n 1 iY
n
S2YY n
n2 2
1
2
S n 1 X Y n X Y n 1
n 1 YY
n
X i n
n 1X i n 1 X i n
n
X Y n
n 1X Y n 1 X Y n
n
3. Results
for n 1; . . . ; N:
S2ii n is covariance of X. S2iY n is covariance of X and Y, and S2YY n
is covariance of Y. X i n and X Y n are averages of X and Y.
(b) Weighted K-means clustering: Using weighted Euclidean distance, k clusters, denoted as Li (i = 1, . . ., k), and the k centroid
coordinates of these clusters, which are denoted as (C1, C2,
. . ., Ck) are obtained through K-means clustering. The weighted
Euclidean distance between the measured data point and the
centroid of the cluster is the following, where p is the dimension
of X.
dX; L
q
Xp
X i Li 2 r 2iY
i1
3.1. FFSA
The results of FFSA using training dataset are shown in Fig. 5.
The number of color features chosen by FFSA during the 100 iterations are shown in Fig. 5a, indicating that six components were
chosen most frequently. Among the 10 color components (R, G,
B, H, S, I, Q, Cb, Cr and EGI), six color components performed best
were R, B, H, Q, Cb and EGI as listed in Fig. 5b, which shows the
number of times chosen for separation of background and fruit.
If the information contained in a smaller subset of variables is
similar with that in a bigger subset of variables, then we could
use a smaller subset instead of a bigger one. It was found that
the rst six color components (R, B, H, Q, Cb and EGI) chosen by
(c) Classication of clusters: By counting the number of the members Niy (i = 1,. . ., k, y = 1,2) in the clusters belonging to different
groups, the clusters are classied into different classes. In this
case, if Ni1 > Ni2, then, cluster Li is assigned as class 1.
(d) Testing using an unknown pixel: First, the nearest cluster
number of the testing pixel is found by comparing the weighted
Euclidean distance between the centroids Ci and the unknown
pixel. Then, the same class label of the corresponding cluster
is assigned to this pixel. Thus, the new unknown pixels classication is accomplished.
2.5.4. Classier construction procedure and post processing
For the Separations 1 through 5 as described in Fig. 2, the same
procedure was followed to build ve classiers for each method of
KNN, NBC and SK-means. Let us take a KNN classier for example.
Firstly, using the three color components (R, B, and H) of the training dataset, one KNN classier was built to separate fruit from
background, which was named Separation 1. Then similarly,
another four KNN classiers were built to carry out the Separations
2 through 5. For Separation 3 among these separations, a leaf_nyy
classier was built, using the training dataset of leaf and nearyoung & young (nyy) classes. This classier was used to lter out
more leaves from near-young fruit, since there were some leftover
leaves for nyy fruit resulted from Separation 2.
All these classiers were applied to the validation images as
described in Fig. 2. To lter out small noise after classication, necessary post processing operations, median lter and morphological operations (open and close), were applied sequentially to the
images obtained from the classiers. Five pixels were chosen as a
window size for these three operations based on the results of preliminary experiment. Example images are shown in the result
section.
Fig. 4. Cross validation diagram of one iteration for fruit and background.
96
100
25
80
Frequency
Frequency
20
15
10
40
20
5
0
60
9 10
Cb Cr EGI
9 10
Color components
(a)
(b)
Fig. 5. Results of FFSA: (a) frequency of the number of color components chosen using FFSA during 100 iterations, and (b) frequency of each feature chosen using FFSA.
Table 2
Preliminary classication results for blueberry fruit from background using different classiers. 20,000 pixels were used for fruit, and 30,000 pixels were used for background.
Classication methods
NBC
KNN
Percent (%)
Background
Detected pixels
75.1
80.9
23,440
24,959
Percent (%)
Fruit
Detected pixels
Percent (%)
Background
Detected pixels
Percent (%)
78.1
83.2
14,732
15,780
73.7
78.9
22,053
25,660
73.5
85.5
bran_g
bg
bran_r
leaf
bran_older
nm
ny
97
80000
Number of pixels
70000
60000
m
nm
ny
y
leaf
bran-r
bran-g
bran-older
bg
50000
40000
30000
20000
10000
0
1
51
101
151
201
251
Red
(a)
Number of pixels
140000
120000
m
nm
ny
y
leaf
bran-r
bran-g
bran-older
bg
100000
80000
60000
40000
20000
0
1
51
101
151
201
251
4. Discussion
Blue
(b)
Number of pixels
800000
700000
m
nm
ny
y
leaf
bran-b
bran-g
bran-older
bg
600000
500000
400000
300000
200000
100000
0
1
51
101
151
201
251
Hue
(c)
Fig. 7. Histograms of different objects in the manually cropped data library: (a) red,
(b) blue, and (c) hue.
Table 3
Cross validation accuracies for the classication of fruit from background using different classiers. 32,000 pixels were used for fruit, and 48,000 pixels were used for background.
Cross validation number
1
2
3
4
5
Average accuracy
NBC
KNN
SK-means
Fruit
Detected pixels (%)
Background
Detected pixels (%)
Fruit
Detected pixels (%)
Background
Detected pixels (%)
Fruit
Detected pixels (%)
Background
Detected pixels (%)
18,156
17,760
17,642
17,461
17,974
17,709
37,336
37,570
37,562
37,581
37,523
37,559
26,522
26,543
26,561
26,338
26,590
26,508
41,041
40,917
41,150
40,886
40,706
40,915
15,226
15,278
23,130
15,180
15,156
17,186
38,064
38,016
33,292
37,967
38,077
36,838
(57)
(56)
(55)
(55)
(56)
(56)
(78)
(78)
(78)
(78)
(78)
(78)
(83)
(83)
(83)
(82)
(83)
(83)
(86)
(85)
(86)
(85)
(85)
(85)
(48)
(48)
(72)
(47)
(47)
(52)
(79)
(79)
(69)
(79)
(79)
(77)
98
Table 4
Cross validation accuracies for the classication of mature & near-mature fruit (mnm) from near-young & young fruit (nyy) using different classiers. 16,000 pixels were used
for mnm class, and 16,000 pixels were used for nyy class.
Cross validation
number
1
2
3
4
5
Average accuracy
NBC
KNN
SK-means
Near-young &
young
Detected pixels (%)
Near-young &
young
Detected pixels (%)
Near-young &
young
Detected pixels (%)
11,996
11,939
11,970
11,929
11,803
11,904
13,493
13,586
13,592
13,457
13,450
13,521
14,464
14,551
14,527
14,567
14,498
14,536
14,455
14,296
14,306
14,333
14,477
14,353
12,462
12,399
12,373
12,414
12,393
12,395
10,823
10,853
10,800
10,920
10,821
10,849
(75)
(75)
(75)
(75)
(74)
(75)
(84)
(85)
(85)
(84)
(84)
(84)
(90)
(91)
(91)
(91)
(91)
(91)
(90)
(89)
(89)
(90)
(90)
(90)
(78)
(77)
(77)
(78)
(77)
(77)
(68)
(68)
(68)
(68)
(68)
(68)
Table 5
Cross validation accuracies for the classication of mature fruit (m) from near-mature fruit (nm) using different classiers. 8000 pixels were used for both m and nm classes.
Cross validation number
1
2
3
4
5
Average accuracy
NBC
KNN
SK-means
Mature
Detected pixels (%)
Near-mature
Detected pixels (%)
Mature
Detected pixels (%)
Near-mature
Detected pixels (%)
Mature
Detected pixels (%)
Near-mature
Detected pixels (%)
7598
7669
7620
7606
7645
7635
7257
7135
7234
7306
7212
7222
7672
7709
7732
7684
7705
7708
7745
7661
7700
7722
7679
7691
7142
7115
7266
7066
7123
7143
7177
7144
7065
7189
7167
7141
(95)
(96)
(95)
(95)
(96)
(95)
(91)
(89)
(90)
(91)
(90)
(90)
(96)
(96)
(97)
(96)
(96)
(96)
(97)
(96)
(96)
(97)
(96)
(96)
(89)
(89)
(91)
(88)
(89)
(89)
(90)
(89)
(88)
(90)
(90)
(89)
Table 6
Cross validation accuracies for the classication of young (y) and near-young fruit (ny) using different classiers. 8000 pixels were used for both ny and y classes.
Cross validation number
1
2
3
4
5
Average accuracy
NBC
KNN
Young
Detected pixels (%)
Near-young
Detected pixels (%)
Young
Detected pixels (%)
Near-young
Detected pixels (%)
Young
Detected pixels (%)
6478
6469
6337
6463
6518
6447
6568
6616
6838
6658
6571
6671
7860
7866
7853
7865
7130
7679
7718
7246
7026
7712
7643
7407
5196
5175
5197
5187
5281
5210
7144
7121
7168
7166
7161
7154
(81)
(81)
(79)
(81)
(81)
(81)
(82)
(83)
(85)
(83)
(82)
(83)
Table 7
Means test and one-way ANOVA used to compare the means of the cross validation
results of the three classication methods. The superscripts beside the mean values
indicate a group to which the method belongs.
Method
NBC
KNN
SK-means
p-Value
SK-means
Near-young
Detected pixels (%)
Data group
Fruit vs. background
m vs. nm
ny vs. y
55.8a
82.8b
52.4a
1.08e05
74.8a
90.8b
77.4c
2.93e15
95.4a
96.2a
89.2b
7.53e09
80.6a
96.2b
65.2c
4.66e10
performed well if not best. Therefore, RBH components were chosen for the other four separations as well besides the separation of
background and all fruits. The good performance of these three
components could be reected on the accuracies of the separations
in Tables 4 through 6.
Five separation steps were conducted sequentially to achieve
the nal goal of identifying blueberry fruit. To show the efciency
of the proposed CCAD method, cross validations were conducted
for the whole dataset by separating them into ve equal parts.
The classiers used were KNN, NBC and SK-means. As shown in
Tables 36, KNN always yielded the best results ranging from
85% to 98%. It was mostly because the datasets obtained from
the raw images contained well represented characteristics of the
classes existed in the images, and nding the nearest neighbor
(98)
(98)
(98)
(98)
(89)
(96)
(96)
(91)
(88)
(96)
(96)
(93)
(65)
(65)
(65)
(65)
(66)
(65)
(89)
(89)
(90)
(90)
(90)
(90)
for the samples in the validation dataset was a good way to classify
them. This result showed that this traditionally used classier was
efcient for the blueberry detection problem as long as a data set
was big enough. SK-means was a new classier proposed in this
study, which intended to take advantage of the clustering of
K-means and a supervised dataset.
From the analyses of this study, some interesting points arise.
First of all, the features of the three components shown in the histograms in Fig. 7 indicated the feasibility and the difculties of the
problem to be solved to detect fruit from the complicated background. Higher R value for the fruit was observed, compared with
that of background, which indicated that the R component could
help with the Separation 1 (building classiers to separate fruit
from background). The separation of young fruit from the background, especially the leaves, was a challenge because the distribution of the young fruit pixels spread over across the whole R and B
intensity values. This indicated that the Separations 3 (building
classiers to lter out additional leaves) was very challenging also.
The H color component could help separate the young fruit from
the leaves because the overlap between young fruit and background was less. However, this separation still remains as a problem because there was some overlap between fruit and leaves even
for the H component. As a result, mature fruit was well separated
from the other objects, and an accuracy of 85% using KNN was
obtained. However, the result of the separation of young fruit
and leaves was still a problem which needs more study in the
99
bg
leaf
bran_b
bran_older
ny
y
bran_g
nm
(a)
(b)
(c)
leaf
(d)
(e)
(f)
(g)
Fig. 8. Results of a KNN classier applied to a training image: (a) original image, (b) fruit separated from background, (c) mature and near-mature fruit, (d) near-young and
young fruit with some leaves, (e) near-young and young fruit after ltering out the leaves, (f) mature fruit only, and (g) near-mature fruit only.
for the CCAD method to detect blueberry fruit using natural outdoor RGB images. However, Figs. 8d and 9d also showed that the
young leaves were not completely removed after applying KNN
classiers, even after morphological operations were used to
remove segmentation errors and noise. The reason for this could
be found from the histogram analysis. A small portion of young
fruit was overlapped with the background, especially the leaves,
even in the H component distribution in Fig. 8c. Further research
is needed to solve this problem.
100
bran_g
bg
bran_b
leaf
bran_older
nm
ny
(a)
(b)
(c)
(d)
(e)
(f)
(g)
leaf
Fig. 9. Results of a KNN classier applied to a validation image: (a) original image, (b) fruit separated from background, (c) mature and near-mature fruit, (d) near-young and
young fruit with some leaves, (e) near-young and young fruit after ltering out the leaves, (f) mature fruit only, and (g) near-mature fruit only.
the different relevance of the components and the class label. However, a performance difference of the SK-means method for detecting ny and y fruit was noticed. The ny pixel detection accuracy was
about 65%, while y detection accuracy was about 90%. This might
be due to the drawback of a K-means method, which is the convergence to a local minimum may produce incorrect results sometimes.
Similarly with any other clustering method, SK-means works well
with some datasets, while failing on others. While we explored the
feasibility of using SK-means for fruit detection in this study, it is
another interesting topic to study in the future, which is how to
make this method perform more stable.
Lastly, most of the source of error for the results came from the
rst separation when the fruit were separated from background,
because the accuracy for the separation was approximately 80%.
This can be explained by the distribution of colors in the histograms as shown in Fig. 7. Even though most of the pixels can be
classied based on these three color components, there are still
some misclassications because of similar color components for
different classes. The errors in the result also might have been
caused by the size of the dataset not being large enough, because
the smaller the overlaps occupied in the dataset is, the higher
the accuracy should be. The next step will be to expand the data
library. In addition, the eld complexity might be another source
of error.
5. Conclusions
Natural outdoor color images were acquired to identify blueberry fruit of different growth stages using machine vision. The
major ndings of this study can be summarized as follows.
(1) This study examined the potential of machine learning
methods for blueberry yield mapping using natural outdoor
color images. A multi-class classication algorithm for separation of blueberries of different growth stages was developed, and the results showed great potential for blueberry
yield mapping with.
(2) To optimally use the information of the acquired color
images, three color components of R, B, and H were selected
through the FFSA method, among different color features of
R, G, B, H, S, I, Q, Cb, Cr and EGI.
(3) In the proposed color component analysis based detection
(CCAD) method, three classiers were applied to the corresponding dataset, including a supervised technique (KNN), a
probabilistic approach (NBC), and a newly proposed method
(SK-means). The proposed classier, SK-means, used Kmeans for supervised classication, which has never been
used in other studies before. SK-means not only took the
advantage of K-means for nding similarities in data and
putting similar dataset into several groups, but also utilized
the target class information for supervised classication, so
that it can assign a class label to each of the groups.
(4) To choose an efcient classier, cross validations were conducted on the validation dataset using the three different
classiers. The statistical analysis results indicated that
KNN classier performed best compared with the other
two. It yielded an average accuracy ranging from 85% to
86% for fruit and background separation, and an average
accuracy ranging from 90% to 98% for the separation of fruit
of mature, near-mature, near-young and young growth
stages.
Acknowledgements
The authors would like to thank Ms. Ce Yang, Mr. Anurag R.
Katti, Ms. Xiuhua Li, Dr. John Schueller, and Mr. James Colee at
the University of Florida for their assistance in this study. The
101