Вы находитесь на странице: 1из 6

JOURNAL OF SOFTWARE ENGINEERING & INTELLIGENT SYSTEMS

ISSN 2518-8739
31st December 2017, Volume 2, Issue 3, JSEIS, CAOMEI Copyright © 2016-2017
www.jseis.org

Multi-classifier method based on voting technique for


mammogram image classification
1
Mohamed Alhaj Alobeed, 2Ali Ahmed, 3Ashraf Osman Ibrahim
1
Information technology, Shendi University, Shendi, Sudan
2
Faculty of computer science and Information Technology, Karary University, Khartoum North, 12305, Sudan
1, 3
Faculty of Computer Science and Information Technology, Alzaiem Alazhari University, Khartoum North 13311,
Sudan
3
Faculty of Computer Science, Future University, Khartoum, Sudan
Email: 1mohamedelhaj123@hotmail.com, 2alikarary@gmail.com, 3ashrafosman2@gmail.com

ABSTRACT
Breast cancer is the disease most common malignancy affects female population and the number of affected people is
the second most common leading cause of cancer deaths among all cancer types in the developing countries.
Nowadays, there is no sure way to prevent breast cancer, because its cause is not yet fully known. But there are some
ways that might lower risk such as early detection of breast cancer can play an important role in reducing the associated
morbidity and mortality rates. The basic idea of this paper is to a propose classification method based on multi-
classifier voting method that can aid the physician in a mammogram image classification. The study emphasis of five
phases starting in collect images, pre-processing (image cropping of ROI), features extracting, classification and end
with testing and evaluating. The experimental results show that the voting achieves accuracy of87.50 % which is a
good classification result compared to individual ones.
Keywords: mammograms; breast cancer; multi classifier voting; early detection; image classification;
1. INTRODUCTION
Breast cancer affects women of all ages/ethnic groups. In spite of decades old breast cancer research regarding
diagnosis and treatment, prevention continues to be the sole way to lower this disease’s human toll which currently
affects 1 in 8 women in their lifetime [1]. In the United States in 2012, an estimated 227,000 women and 2,200 men
are expected to be diagnosed with this cancer, and around 40,000 women are expected to succumb to it [2]. The term
“breast cancer” includes more than one disease being an umbrella term for various cancer subtypes of the human
breast. Breast cancer subtypes differ in clinical presentations, and show clear cut gene expression patterns in addition
to having different genetic/molecular characteristics [3, 4]. Breast cancer subtypes have some shared and unique
causes, and contributing factors influencing prevention approaches. Mammography cannot stop or decrease breast
cancer but are supportive only in detecting the breast cancer at early stages to increase the survival rate [5]. Regular
screening can be a successful strategy to identify the early symptoms of breast cancer in mammographic images [6].
Medical images classification is a form of data analysis that extracts models describing important data classes.
Numerous methods have been created to classify masses into benign and malignant categories by using the multi-
classifier method [7]. In [8], the researcher proposed a computer aided diagnosis to detect cancer automatically in
mammograms without any help of radiologist or medical specialist. After that, enhancement has been performed so
that cancer can be clearly visible and identifiable. Results show that proposed method has achieved 96.74% accuracy
as well as 98.34% sensitivity.
In [9], researchers compared the performance of an Artificial Neural Network, a Bayesian Network and a Hybrid
Network used to predict breast cancer prognosis. The Hybrid Network combined both ANN and Bayesian Network.
The Nine variables of SEER data which were clinically accepted were used as inputs for the networks. They achieved
accuracy of (88.8%) using ANN and (87.2%) using Hybrid Network, both of the results outperformed the Bayesian
Network result.
Classification methods are becoming vast and constantly increasing [10]. The aim of this study is to evaluate the
classification methods of medical images and the development of multiple mammography based on the method of
voting (fusion). Voting is an assembly method used to combine decisions of multiple works.

280
JOURNAL OF SOFTWARE ENGINEERING & INTELLIGENT SYSTEMS
ISSN 2518-8739
31st December 2017, Volume 2, Issue 3, JSEIS, CAOMEI Copyright © 2016-2017
www.jseis.org

In [11], researchers used a voting technique to choose which of the answers based on their functionality equivalent
versions produce. More recent research presented in [12], concerned the identification of breast cancer patients for
whom chemotherapy could prolong survival time and is treated here as a data mining problem.
In this paper, we use techniques of voting, Voting is an aggregation technique used to combine decisions of
multiple classifiers, normal and abnormal (either benign or malignant) mammograms. In its simplest form that based
on plurality or majority voting, each individual classifier contributes a single vote. The aggregation prediction is
decided by the majority of the votes, i.e. the class with the most votes is finally classified.
The remainder of this paper is organized as follows: Section 2 introduces the materials and methods, voting
algorithm and technique. The experiment is given in Section 3. Results and discussions are provided in Section 4.
Finally, Section 5 concludes the study.
2. MATERIALS AND METHODS
This study emphasizes on five phases starting with images collection, pre-processing, features extracting,
individual classification and end with testing and evaluation followed by detail about each phase Figure 1 shows the
five steps research method.

Figure. 1 Research phases


2.1 Mammogram images collection
Dataset used in this study is downloaded from the MIAS (Mammographic Image Analysis) database website [13].
This dataset was recently used by many researchers. MIA’s dataset is used for experimentation purpose in this study
which is a standard and publicly available dataset. The size of each mammogram is 1024 × 1024 pixels and 200 micron
resolution. MIAS contains a total of 322 mammograms of both breasts (left and right) of 161 patients.
2.1.1 Image cropping based on ROI
Next step is to extract Regions of Interest (ROI). ROI’s are defined as regions containing user defined objects of
interest. Here we applied crop technique to the images; a cropping operation was employed in order to cut the interest
parts of the image. Cropping removed the unwanted parts of the image usually peripheral to the regions of interest as
shown in Figure 2.

281
JOURNAL OF SOFTWARE ENGINEERING & INTELLIGENT SYSTEMS
ISSN 2518-8739
31st December 2017, Volume 2, Issue 3, JSEIS, CAOMEI Copyright © 2016-2017
www.jseis.org

Figure. 2 Full Mammogram with detected region of interest


2.1.2 Feature extraction
The accurate classification and diagnostic rate mainly depends upon robust features, particularly while dealing
with mammograms, after cropping the Region of Interest (ROI) from [x] position to [y] position and [radius] depend
on the MIAS dataset. This stage applies the six functions (Mean, Standard Deviation, Skewness, Kurtosis, Contrast,
Smoothness) to extract the feature values from each mammogram image. The following paragraphs give more details
about the six functions used to extract features values.
2.1.3 Individual Classification
The result of the previous three phases converts the data to numeric values. In this stage we apply five individual
classifiers, namely SVM, Bayes Naïve and K-nearest Neighbours, Decision Tree and Artificial Neural Network. The
process of classifying features into their respective classes, such as normal and abnormal or benign and malignant, is
known as classification. In this paper we used the voting method on five classifiers (Decision Tree, NNA, BNC, KNN,
SVM) to apply on medical image that is extracted from MIAS data set. In the next paragraphs, we review and present
a brief overview of the five classifiers that are used in the classification stage of the mammogram images.
a) Decision tree
Decision tree induction is the learning of decision trees from class-labeled training tuples. A decision tree is a
flowchart-like tree structure, where each internal node (non-leaf node) denotes a test on an attribute, each branch
represents an outcome of the test, and each leaf node (or terminal node) holds a class label. The topmost node in a tree
is the root node [14].
b) Support vector machine classifier
Support vector machine (SVM) is a statistical learning theory to analyse data and to recognize patterns. It is a
supervised learning method. SVM has some benefits like it can handle continuous and binary attributes, speed of
classification and accuracy is good. But there are few drawbacks such as SVM take longer time for training dataset
and do not handle discrete attributes [15].
c) K-nearest neighbours classifier
Pattern classification the k-Nearest Neighbour (K-NN) is a non-parametric algorithm. The k-nearest-neighbour
method was first described in the early 1950s. The method is labour intensive when given large training sets, and did
not gain popularity until the 1960s when increased computing power became available. It has since been widely used
in the area of pattern recognition, Nearest-neighbour classifiers are based on learning by analogy, that is, by comparing
a given test tuple with training tuples that are similar to it [16].
d) Artificial neural network classification
Artificial Neural Network (ANN) has emerged as an important tool for classification. Neural networks were
introduced by McCollum and Pitts in 1943. The artificial neuron is a computer simulated model stimulated from the

282
JOURNAL OF SOFTWARE ENGINEERING & INTELLIGENT SYSTEMS
ISSN 2518-8739
31st December 2017, Volume 2, Issue 3, JSEIS, CAOMEI Copyright © 2016-2017
www.jseis.org

natural neurons. The neuron is starting to work and send a signal through the axon once the signal extent to a certain
threshold. This signal then transfers through to other neurons and may get to the control unit (the brain) for a proper
action [17].
e) Bayes Naïve classifier
Bayesian classifiers are statistical classifiers. They can predict class membership probabilities such as the
probability that a given tuple belongs to a particular class. Bayesian classifiers have also exhibited high accuracy and
speed when applied to large databases. Naive Bayesian classifiers assume that the effect of an attribute value on a
given class is independent of the values of the other attributes [18].
f) Development of multi-classifier based on voting method
In this phase, we proposed a multi-classifier based on the individual results obtained by each single classifier
discussed above. The concept of our proposed approach depends on the voting method. Majority of the voting
techniques are used to perform the final output of the given data. The voting technique presented by selecting the
majority output from the experimental results of the five algorithms. The included Mammogram Image and transport
data classification have five classes of output. The voting technique becomes difficult when the results of the five
algorithms output equally during majority vote. Figure 3 describes the voting algorithm.

Figure. 3 Voting algorithm


3. EXPERIMENT
The study contains two main processes the first one is built for each classifier using the 60,70,85 percentage (119
mammogram 72 images , 84 images , 95 images) to training dataset from the data set and after building the classifier,
the 40,30,15 percentage (47 images , 35% images , 24 images ) of data is used in test stage. The results are presented
in the upcoming section. To test the performance of the proposed method, different quantitative measures have been
used. Accuracy has been used. These can be calculated by using mathematical equation 1:
(𝑇𝑃+𝑇𝑃𝑁)
(1)
(𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁)

Where TP is True positive, FP is false positive FN is false negative and TN is true negative.

283
JOURNAL OF SOFTWARE ENGINEERING & INTELLIGENT SYSTEMS
ISSN 2518-8739
31st December 2017, Volume 2, Issue 3, JSEIS, CAOMEI Copyright © 2016-2017
www.jseis.org

4. RESULTS AND DISCUSSION


In this study, MIAS data set was used for five individual classifiers and applied multi classifier voting based on
continues data set. The highest precision was given with a good accuracy for 85% of data splitting, which was 87.50
%, while in 70% the accuracy was 84.28 % and in 60 % the accuracy was 76.59 %. Generally, the accuracy was
increased after applying voting in the five precisions as shown in Table 1.
Table. 1 Results of the five classifiers

Data set Tree BNC ANN KNN SVM Voting

60 – 40 72.34 % 57.50 % 57.44 % 68.75 % 51.06 % 76.59 %


70 – 30 80.00 % 57.11 % 62.44 % 73.33 % 42.86 % 84.28 %
85 – 15 75.00 % 58.33 % 66.67 % 70.00 % 50.00 % 87.50 %

After applying three different sizes of training and testing we calculated the overall accuracy, the final results are
shown in Table 1 and Figure 4. As a result, our method, namely multi- classifier, outperformed single classifiers. Even
the voting produced higher accuracy than these methods. This result shows the accuracy of our method consisting of
some classifiers.

Figure. 4 Result of classification and voting accuracy

We compared five classifiers methods in this experiment: multi- classifiers (Decision Tree, NNA, BNC, KNN,
and SVM) and the proposed method based on voting. Figure 5 shows the experimental results of the multi-classifier
and voting method.

Figure. 5 The compared results multi-classifier and voting method


The main measurement of comparison is accuracy. In a previous study [19] researchers proposed a method to
classify movie document into positive or negative opinions, consisted of three classifiers based on Decision Tree, ME
and Score calculation. Using two voting method (Naïve and weighted and integration with SVMs, Classification

284
JOURNAL OF SOFTWARE ENGINEERING & INTELLIGENT SYSTEMS
ISSN 2518-8739
31st December 2017, Volume 2, Issue 3, JSEIS, CAOMEI Copyright © 2016-2017
www.jseis.org

accuracy is achieved by Naïve voting is 85.8%, Weighted voting is 86.4%, SVM is 87.1%. The output results are
comparable to the work in the literature which achieves 87.50% accuracy. Future work can explore optimizing the
classifiers for improving the accuracy.
5. CONCLUSION
This study aimed to build and implement the voting method on five classifiers (Decision Tree, NNA, BNC, KNN,
SVM). The classifiers are applied on medical image that is extracted from MIAS data set. The study contains two
main processes the first one is built for each classifier using the 60,70,85 percentage to training set from the data set
and after building the classifier, the 40,30,15 percentage of data is used in test stage. The accuracy of the voting is
87.50 %.
REFERENCES

1. Pareek, A. and S.M. Arora, Breast cancer detection techniques using medical image processing. Breast
cancer, 2017. 2(3).
2. Horner, M., et al., SEER cancer statistics review. National Cancer Institute: p. 1975-2006.
3. Curtis, C., et al., The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel
subgroups. Nature, 2012. 486(7403): p. 346-352.
4. Perou, C.M., et al., Molecular portraits of human breast tumours. Nature, 2000. 406(6797): p. 747-752.
5. Mencattini, A., et al., Mammographic images enhancement and denoising for breast cancer detection using
dyadic wavelet processing. IEEE transactions on instrumentation and measurement, 2008. 57(7): p. 1422-
1430.
6. Zhang, G., et al. A computer aided diagnosis system in mammography using artificial neural networks. in
BioMedical Engineering and Informatics, 2008. BMEI 2008. International Conference on. 2008: IEEE.
7. Smith, R.A., V. Cokkinides, and H.J. Eyre, American Cancer Society guidelines for the early detection of
cancer, 2006. CA: a cancer journal for clinicians, 2006. 56(1): p. 11-25.
8. Jaffar, M.A., Hybrid Texture based Classification of Brea Mammograms using Ad boost Classifier.
International Journal of Advanced Computer Science and Applications, 2017. 8(5).
9. Choi, J.P., T.H. Han, and R.W. Park, A hybrid bayesian network model for predicting breast cancer
prognosis. Journal of Korean Society of Medical Informatics, 2009. 15(1): p. 49-57.
10. Anunciaçao, O., et al. A Data Mining Approach for the Detection of High-Risk Breast Cancer Groups. in
IWPACBB. 2010: Springer.
11. Vouk, M.A., et al., An empirical evaluation of consensus voting and consensus recovery block reliability in
the presence of failure correlation. Journal of Computer and Software Engineering, 1993. 1(4): p. 367-388.
12. Y.J. Lee, O.L.M.W.H.W. Survival -Time Classification of Breast Cancer Patients. 2008 [cited 2017;
Available from: http://www.cs.wisc.edu/dmi/annrev/rev0601/uj.ppt.
13. Clark, A.F. The mini-MIAS database of mammograms. 2012 [cited 2017; Available from:
http://peipa.essex.ac.uk/info/mias.html.
14. Usha, S. and S. Arumugam, Calcification Classification in Mammograms Using Decision Trees. World
Academy of Science, Engineering and Technology, International Journal of Computer, Electrical,
Automation, Control and Information Engineering, 2016. 9(9): p. 2127-2131.
15. Arning, A., R. Agrawal, and P. Raghavan. A Linear Method for Deviation Detection in Large Databases. in
KDD. 1996.
16. Min Dong, Z.W., Chenghui Dong, Xiaomin Mu, Yide Ma, Classification of Region of Interest in
Mammograms Using Dual Contourlet Transform and Improved KNN. Journal of Sensors, 2017. 2017: p. 15.
17. Gershenson, C., Artificial neural networks for beginners. arXiv preprint cs/0308031, 2003.
18. Han, J., J. Pei, and M. Kamber, Data mining: concepts and techniques. 2011: Elsevier.
19. Tsutsumi, K., K. Shimada, and T. Endo. Movie Review Classification Based on a Multiple Classifier. in
PACLIC. 2007.

AUTHORS PROFILE

285

Вам также может понравиться