Вы находитесь на странице: 1из 5

2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)

Fruit Recognition Based On Convolution Neural


Network
Lei Hou, QingXiang Wu* Qiyan Sun, Heng Yang, Pengfei Li
Key Laboratory of OptoElectronic Science and Technology Fujian Provincial Key Laboratory for Photonics Technology
for Medicine of Ministry of Education, College of Photonic College of Photonic and Electronic Engineering
and Electronic Engineering, Fujian Normal University Fujian Normal University
Fujian, Fuzhou 360007 Fujian, Fuzhou 360007

Abstract—Computer vision is widely used at present. However, In this paper, we propose a fruit recognition algorithm
fruit recognition is still a problem for the stacked fruits on based on convolution neural network(CNN). Input images can
weighing scale because of complexity and overlap. In this paper, be directly input into the network without feature extraction.
a fruit recognition algorithm based on convolution neural The CNN is trained by the regions which extract from original
network(CNN) is proposed. At first the image regions are images. And the type of the original image is based on the
extracted using selective search algorithm, then the regions have fusion of each region classification results. The final
been selected by means of an entropy of fruit images, and finally experiment results show that fruit recognition rate has much
these regions are regarded as input of CNN neural network for improved, and the proposed method can also be applied to
training and recognition. The final decision is made based on a
identify multiple fruit types in a picture in the future work.
fusion of all region classifications using voting mechanism. In
order to achieve the actual application in supermarket, we have The rest of this paper is organized as follows: Section 2
considered the variety of fruit, stack of fruits, the changes of fruit presents extraction of regions using selective search algorithm;
number and position, and have made a multifarious training set Section 3 introduces convolution neural network(CNN).
of fruits. After the network has been trained with an optimal Section 4 addressed how CNN is combined with selective
training set, it has obtained a remarkable recognition rates for search method. Section 5 shows experimental results and
the fruits stacked on a weighing scale. Section 6 gives conclusion.
Keywords- fruit recognition; CNN; selective search; vote;
II. EXTRACT REGIONS
I. INTRODUCTION In a practical application, the same category fruit’s images
have a wide of possible stacking forms due to the changes of
Recently, in the process of selling fruit in the supermarket,
fruit on the number and position. So we use selective search
it still requires staff to weigh. Not only it costs labor, but also
algorithm to extract the regions that contained useful
the efficiency is very low. So a fruit recognition algorithm is
information, and then region information are used to train
required to use in the supermarket fruit scale. There are so
CNN. The final result is obtained by mean of a fusion method
many types of fruit and some of them are very similar. Because
for each region classification results. This method can solve the
of changes in the position and number of fruits the recognition
problems with various possible stacking forms so that the
becomes a challenge issue.
recognition rate can be improved.
In early studies, WC Seng and SH Mirisaee proposed a
J. R .R . Uijlings put forward the selective search algorithm
method for fruits recognition system, which combines three
in 2012 [4]. Before of this, the practice of object recognition is
features analysis methods: color-based, shape-based and sized-
that select a window to scan the whole image, change the size
based. Fruit images are recognized using nearest neighbour
of the window and continue to scan the whole image. The
classification [1]. The system had a good performance for
approach has got rid of time-consuming problem of the original
single fruit recognition. However, in the actual situation fruits
window search method and obtains the better results. Firstly
are usually stacked together so that the system is not suitable
some original regions are segmented using an image
for application. Yundong zhang proposed a hybrid
segmentation method, and then a merging strategy is used to
classification method based on fitness-scaled chaotic artificial
incorporate these results for regions. Finally a hierarchical
bee colony (FSCABC) algorithm and feedforward neural
structure is obtained. The structure can contain the object. The
network (FNN) [2]. S. Arivazhagan proposed an efficient
size and location of regions are defined by a rectangular. The
fusion of color and texture features for fruit type recognition
specific steps of selective search algorithm are as fellows [4]:
[3]. In [2] and [3] the algorithms can be used to the database
which is variability on number and kind of fruits, but their • Input: (colour image)
recognition rates are not enough high. As mentioned above,
they are all adopted the method of feature extraction combined • Output: Set of object location hypotheses L
with classifier. And most researchers are committed to extract
the better characteristics and improve the performance of
classifiers.

978-1-5090-4093-3/16/$31.00 ©2016 IEEE 18


• Use efficient graph based image segmentation method In this stage, the information is transferred from the
of [13] to obtain the original segmentation region R = input layer to the output layer. This process is also the
{r1, r2,..., rn)} process of the implementation of the network after the
completion of the normal operation. In this process, the
• Initialise similarity set S= ∅ network is the implementation of the calculation:
• for each Neighbouring region pair (ri, rj) do Op=Fn (... (F2 (F1 (X W1) W2)…) Wn )
Calculate similarity s(r i, r j) • The second stage is the backward propagation:
S = S ∪s(r i, rj) the actual output Yp and the corresponding ideal output
• while S= ∅ do Op are calculated;

Get highest similarity s(ri, rj) = max(S) according to the method of minimization error back-
propagation weight matrix is adjusted.
Merge corresponding regions rt = ri∪rj
Remove similarities regarding r i: S = S\s(ri, r*) B. CNN Architecture
As shows in Figure 1, the network contains three
Remove similarities regarding r j: S = S\s(r*, rj) convolutional layers, each of them is followed by pooling
Calculate similarity set St between rt and its neighbours layers, and two fully connected layers. The Relu non-linearity
is applied to the output of every convolutional and full-
S=S ∪St connected layer. The first convolutional layer filters the
R=R ∪Rt 32*32*3 input image with 32 kernels of size 5*5*3. The
second convolutional layer has 32 kernels of size 5*5*32. The
• Extract object location boxes L from all regions in R third convolutional layer has 64 kernels of size 5*5*32. All
pooling layer pool over 3*3 regions with stride of 2. The full-
The selective search algorithm is used to extract regions.
connected layers have 64 neurons each. Finally, softmax
More than twenty regions which are different size can be
classifier is applied on the last layer.
extracted from an original image by using selective search
algorithm. Too small or too tall or fat regions are removed
because they contain not much discriminant information [5].
The effective regions are regarded as input of the network.

III. CNN
Convolution neural network(CNN) is a kind of artificial
neural network, which has become a hot research topic in the
field of speech recognition and image recognition at present. A.
Figure 1. the structure of CNN
Krizhevsky et al. applied deep convolutional neural network in
ImageNet database in 2012 and achieved good results [6]. As
the image can be directly input to the network, it avoids feature C. CNN Advantage
extraction and data reconstruction process in traditional The most classic framework is still the model of feature
recognition algorithm. The structure of CNN layers typically extraction combined with classifier in pattern recognition field.
contains: convolutional layer, pooling layer and full-connected The characteristics of image are constructed by human. Then
layers. feature information is feed into classifier to classify. Finally,
classification results are obtained. Multiple features are usually
• Convolutional layer: By convolution operation, the used to describe the image, such as SIFT HOG LBP and so on.
original signal features can be enhanced, and the noise However, selecting features is uncertain (it is difficult to know
can be reduced. which features can be used to express image and to achieve the
• Pooling layer: Using the principle of local correlation, best classification results). So a lot of experiments have to be
by subsampling image the amount of data processing done to verify it. Due to the pixels are the most redundant
can be reduced while preserving useful information. representation of image semantics, the characteristics of the
abstract description will lose part of the image information [7].
The structure of CNN directly connect to the data. Through the
A. CNN Training Algorithm
deep network layer upon layer mapping, it can obtain the
• The first stage is the forward propagation: implicit expression of image information. Machine by
A sample (X, Yp) is taken from the sample set, and the multilayer supervising independent learning can realize the
X is the input of network; efficient representation of the image [8]. CNN after the feature
extraction generally adopt softmax classifier or RBF distance-
The corresponding actual output Op is calculated by the classifier for a final determination. So CNN is more excellent
network; in learning ability, which learned feature can be more essential
to the representation of the image and is more conducive to
classify. In addition, feature extraction and pattern

19
classification can be performed simultaneously and
simultaneously in training.
The BP neural network is one of the most widely used
neural network model, which is a multilayer feedforward
network trained by the inverse propagation algorithm. The BP
network can learn and store a large number of input-output
mapping relationship. Its learning is to use the gradient descent
method to adjust the weight and threshold of the network,
which makes that the error square becomes minimum. The
differences between CNN and general BP neural network is not Figure 2. Entropy statistics diagram
only its own depth structure, but also the use of local receptive
field and weight sharing method to further reduce the network The horizontal axis shows entropy. The vertical axis shows
parameters. Local receptive fields are connected to one region the number of occurrences of entropy values.
of the image in each convolution kernel, and each convolution We find that the entropy of regions representing
kernel is only part of the image. Then the local convolution background is generally between 3 and 4. The entropy of
features are connected in other layers, so that it can not only regions representing less information or uniform surface of part
meet the spatial correlation, but also reduce the number of of a fruit is between 4.5 and 6.77. All of the regions are not
parameters. Weight sharing make the weight of each higher than about 7.6. The regions that contain useful
convolution kernel same, and the image features are extracted information entropy is larger than 6.77 so that regions entropy
by adding the types of convolution kernel. And weight sharing which is less than 6.77 are removed. Figure 3 shows the
method makes that the neural network structure becomes regions under above mentioned conditions. Figure 4 shows the
simpler and more adaptable. result of optimally selected regions.

IV. CNN COMBINED WITH SELECTIVE SEARCH


CNN shows excellent performance in object recognition
recently. And selective search algorithm can extract effective
information regions of the whole image. Combination of both
methods not only obtains effective regions, but also provides a
lot of images to CNN network. Furthermore, CNN combined
with selective search algorithm is possible to solve recognition
problem of the multi-class fruits in an image .
Regions are obtained through selection search, but some Figure 3. Part of regions
regions are too narrow and small or contain useless
information. So the regions which length or width is less than
0.2 times the length of the original are removed. In order to
remove more useless information image regions, an analysis on
image entropy is conducted.
Image entropy is expressed as bits of image grayscale set,
and describes the average information of image source. The
entropy also shows the aggregation of the image gray
distribution. The entropy is defined as Figure 4. Result of optimally selected regions

V. EXPERIMENT
E=-sum(p.*log2(p))
A. Data set
Through calculating the entropy of each regions of images In order to enable the proposed algorithm to deal with
in the training set, an entropy statistics is conducted. The various fruit stacking forms, a set of fruit images with complex
distribution of the entropy is shown as Figure 2: stacking forms are collected for training database. The fruit
database contains the change of fruit type, location and
quantity. It conforms to the actual application situation. The
database includes: red delicious apple, cherry tomato, orange,
kiwi fruit, banana, sugar orange, jujube; The different kinds of
fruit images show in Figure 5. Especially, Figure 6 shows the
changes of number in images with the same type fruit and the
changes of location in images with the same fruit.

20
set. In the experiment only CNN is used, in which the images
of training directly fed in CNN network. Parameter is set the
same as above. It is trained for 25 epochs through the training
set of 4000 images. A single desktop PC with 64GB RAM was
used for training CNN network.
In the end, the kind of fruit is determined by the vote results
of all regions. All regions from a fruit image are involved in the
vote. Which class support count in result is highest in the vote,
the fruit image belongs to corresponding fruit in the category
(If it has the same count of votes, then the first is taken for the
final result). Figure 7 shows the whole flow chart of the fruit
algorithm.

Figure 7. The flow chart of the fruit algorithm

The algorithm is shown as follows.


1. The original fruit image is input to selective search
algorithm. The usefull regions are obtained.
Figure 5. different kinds of fruit images
2. Calculate each of region entropy, and remove those
which exceed the threshold rang. Too small or too tall or
fat regions and original images are also removed.
3. The rest of regions are feed into the well trained
network. Each region obtains judgment result.
4. Use vote mechanism in which the result corresponding
the highest number of votes is regarded as the final result
for the input image.
Figure 6. The changes of number in images with the same type fruit and the
changes of location in images with the same type fruit. C. Experimental results
As show in Table I the second row shows test error rate of
B. Experimental Process
0.60% only using CNN. The next row shows that of test of all
The database contains 5330 fruit images. 4000 images are regions error rate is 0.91%. In the third row error rate is higher
used for training, 1330 for testing. These images respectively than the second one. Because some of the extracted regions
input to selective search algorithm. An image can obtain about contain only little part of a whole fruit or some of them is
twenty regions. Through the algorithm, a large number of deformed when they are resized, so that the error rate would be
regions can be obtained. However, some of them are too small high if all regions are used.
or too tall or too fat regions and contain a few of fruit
information. We remove ones that length or width is less than TABLE I.
Method Error rate(%)
the original length or width of 0.2 times and the original size
image. By calculating entropy of each region, the regions that
CNN 0.60
are less than 6.77 are also removed. Figure 4 shows the result
of selected regions. Finally, 54956 regions are obtained for Selective search + CNN 0.91
training set and 18421 regions for test set.
Extracted regions are resized to 32*32, and then subtract
the mean activity over the training set from each pixel. The
network is trained on the raw RGB values of the pixels [6]. A TABLE II.
label representing for their respective categories is also made. Method Recognition rate(%)
Then the regions and label are fed in the CNN as inputs. The color and texture feature +MDC 86.00488
FSCABC 89.1
CNN is trained using stochastic gradient descent with learning CNN 99.4
rate of 0.0001, momentum of 0.9, and weight decay of 0.0005.
The network training is required 20 epochs through the training Selective search + CNN +vote 99.77

21
Compared with the former two methods in Table Ⅱ, CNN [4] J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W.Smeulders.
Selective search for object recognition. Inter-national journal of
show the better recognition effect. Compared with the latter computer vision,vol.104,no.2, pp. 154-171, 2013S.
two methods, they both are greatly improved in the accuracy. [5] Park, N. Kwak, Cultural event recognition by subregion classification
Furthermore, selective search combined with CNN shows a with convolutional neural network. Computer Vision and Pattern
better recognition rate which is up to 99.77%. Hence, the Recognition Workshops, pp. 45-50, 2015.
proposed method can make the recognition rate greatly [6] A. Krizhevsky, I. Sutskever, and G. E. Hinton. “Imagenet classification
increased. And it can meet the application requirements. with deep convolutional neural networks,” In Advances in neural
information processing systems, pp. 1097-1105, 2012.
[7] A. Karpathy, “Rich Feature Hierarchies for Accurate Object Detection
VI. CONCLUSION and Semantic Segmentation,” Conference on Computer Vision &
In this paper, a fruit recognition algorithm based on Pattern Recognition, vol.1, pp. 580-587, 2014.
convolution neural network(CNN) is proposed. The [8] J. Wright , A Y. Yang, A. Ganesh, “Robust face recognition via sparse
recognition rate has improved greatly. And by comparing the sepresentation,” Transactions on Pattern Analysis and Machine
Intelligence, vol.31,,no.2, pp.210-227, 2009.
two methods, recognition rate of CNN combined with
[9] Ji Shuiwang, Xu Wei, Yang Ming, “3D convolutional neural networks
selective search algorithm is higher than using only CNN. for human action recognition,”. Transactions on Pattern Analysis and
Although this method in the recognition rate achieves a good Machine Intelligence, vol.35, no.1 pp.221-231, 2013.
result, but less species fruit database are used in the [10] J, Zhao. “On-tree fruit recognition using texture properties and color
experiment, and is not considering the external environment data. International,” Conference on Intelligent Robots & Systems, vol.1,
change and other factors, such as light. We will increase the pp. 263-268, 2005.
fruit database species and focus on the fruit detection and [11] Y, Song. “Automatic fruit recognition and counting from multiple
localization in further work. images, ” Biosystems Engineering, vol.118 pp. 203-215, 2014.
[12] AR, Jiménez. “Automatic fruit recognition: “a survey and new results
using Range/Attenuation images,” Pattern Recognition, vol.32, pp.
ACKNOWLEDGMENT 1719-1736, 1999.
The authors gratefully acknowledge supports from Fujian [13] P. F. Felzenszwalb and D. P. Huttenlocher. “Efficient Graph-Based
Image Segmentation,” IJCV, vol. 59, pp. 167-181, 2004.
Provincial Key Laboratory for Photonics Technology, and the
fund from the Natural Science Foundation of China (Grant No. [14] G. Hinton, R. Salakhutdinov, “Reducing the dimensionality of data with
neural networks,” pp. 504-507, 2006.
61179011) and Science and Technology Major Projects for
[15] M L Raymer, W F Punch, Goodman E D, “Dimensionality reduction
Industry-academic Cooperation of Universities in Fujian using genetic algorithms,” Transactions on Evolutionary Computation,
Province (Grant No. 2013H6008), and supports from vol.4,no.2, pp.164-171, 2000.
Innovation Team of the Ministry of Education (IRT1115). [16] L.Zhang, P N Suganthan,. “A Survey of Randomized Algorithms for
Training Neural Networks,” Information Sciences, 2016.
REFERENCES [17] L.Wang, N.Zhou, F.Chu, “A General Wrapper Approach to Selection of
Class-Dependent Features,” Transactions on Neural Networks, ,
[1] Woo Chaw Seng and Seyed Hadi Mirisaee, “A New Method for Fruits vol.19,no.7,pp.1267-1278,2008.
Recognition System,” MNCC Transactions on ICT, vol. 1, no. 1, June
2009. [18] X.Fu, L.Wang, “Data dimensionality reduction with application to
simplifying RBF network structure and improving classification
[2] Y. Zhang, L. Wu. “Classification of fruits using computer vision and a performance,” Transactions on Systems Man & Cybernetics Part B
multiclass support vector machine,” sensors, vol.12, no.9, pp. 12489- Cybernetics A Publication of the IEEE Systems Man & Cybernetics
12505, 2012. Society, vol.33,no.3,pp.399-409,2003.
[3] RN, Shebiah. Fruit Recognition using Color and Texture Features.
Journal of Emerging Trends in Computing & Information
Sciences,vol.??, no.1,pp.90-94,2010.

22

Вам также может понравиться