Академический Документы
Профессиональный Документы
Культура Документы
Abstract. The association between the instance query example and the class labels are mutual-
ly exclusive in traditional single label examples. But in real life applications like musical cate-
gorization, functional genomics, text and document categorization, one instance query example
may belong to a subset of class labels i.e mutually inclusive. Because of the highly correlated
label structure, the traditional single label classification algorithms won’t be sufficient. We
need effective algorithms to work with multiple labels. The multi label classification algorithms
are classified into two ways: (i) transform the multi label problem in to single label binary
problem and (ii) make the existing single label algorithms to cope with multi label problems. In
this paper we present theoretical concepts behind multi – label classification and also we did a
comparative analysis of transformation methods with two tools MEKA and MULAN over
different application domains. 6 Example based, 6 Label based, and 4 Ranking based measures
are used to evaluate the efficacy of the different transformation methods.
1 Introduction
Traditional Single – labeled supervised classification maps an example exactly to
single output label. Let Q be the set of examples, Q = { q1, q2, q3, ….., qn } and ℒ be the
set of labels, ℒ = { ℓ1 , ℓ2 , ℓ3 … . . , ℓ𝑚 }, the single label classification (SLC) is defined
as
SLC (ħ) : qi ↦ ℓ𝑖 where ℓ =1 (1)
But in real life one example may be associated with the many labels. For ex-
ample in scene classification an instance may belong to beach, tree, people or city etc.
In music categorization, an instance may belong to different emotions like happy, sad,
pleased etc. Like that in medical diagnosis, a patient may suffer from diabetes and
cancer. So every instance may belong to more than one label, and to satisfy this need
the multi label classification assigns single query example to many labels i.e one idea
to multiple concepts.
The Multi - Label Classification (MLC) [8] is a generalization of supervised
single label classification task where each data instance may be associated with a set
of class labels as opposed to one label. But the each label contains only binary values.
The task of multi label classification (MLC) is to map an example instance q i {Q} to
an label set ℓ € {£}
ħ: χ → 2ℒ (2)
i.e
MLC (ħ) : qi ↦ 𝑚 𝑖=1 ℓ where ℓ ≥ 2
Nowadays the number of applications involving data with multiple target labels
gets increased. So, to learn this, the multi label classification has received increased
attention in recent years. Multi label classification is categorized into two ways: (i)
Problem Transformation Method and (ii) Algorithm Adaptation Method. The problem
Transformation method transforms the multi labeled data into single labeled data and
then the traditional single label classification methods are applied over the trans-
formed single labeled data. Here the transformed single labeled data are of binary
classifiers, so traditional single label classifiers are enough for us to make the classifi-
er model. The problem here is, during the multi to single label conversation lot of
information may lost. Even the problem transformation methods are fast in their na-
ture because of the above said nature, the problem transformation methods are less
efficient. In other hand, the algorithm adaptation methods, accepts the single labeled
classifiers as they are and change them to adopt for multi labeled data. So the algo-
rithm adaptation methods are very effective since there is no loss of information in the
data. But in this paper, we are going to make an experimental evaluation of 5 problem
transformation methods over data sets using the machine learning tools for multi la-
beled data.
1.2 Challenges
(i) Loss of label correlation i.e Discovering and modeling label dependencies
(ii) Output Label Sparsity i.e Output space is 2ℒ instead of ℒ
(iii) Insufficient measurements to include label and feature dimensions.
Training
Test
Data
Data
Prediction
Single –
label Base
Classifier
5 Evaluation Measures
The evaluation measures for single label classification problems presents the per-
formance of the classifier in terms of correctness the example label pair classification.
In multi label classification problem each example instance is associated with a label
set but the classification of the example may be partially correct or incorrect. So the
evaluation measures used for the single label classification problems are inadequate
for multi label classification problems. Multi label data can be measured with the
number of examples, number of attribute input space, and the number of labels.
There are 3 types of evaluation measures for multi label learning: they are (i)
Example Based (ii) Label Based and (iii) Ranking Based Measures. The Example
Based measure calculates the average difference between the actual and the predicted
set of labels over the set of examples given in the test data set. The example based
measures discussed here are Hamming Loss, Accuracy, Precision, Recall, F 1_Score,
and Subset Accuracy. The label based measures we used here are Macro_Precision,
Macro_Recall, Macro_F1, Micro_Precision, Micro_Recall, Micro_F1. The ranking
based measures works on the basis of the ranking of the labels. It makes the variation
between the predicted labels against the actual labels in the data set. The ranking
based measures used for this experimentation are: one – error, coverage, ranking loss
and average precision.
Solar_Flare Data
0.4
0.35
0.3 BR
0.25
0.2 LP
0.15 CC
0.1
0.05 PS
0 RAkEL
J48 SVM NB KNN MLP RF
0.25
0.2 BR
0.15 LP
0.1
CC
0.05
PS
0
RAkEL
J48 SVM NB KNN MLP RF
7 Conclusion
In this paper, experimental study on five different multi – label problem
transformation methods and different evaluation metrics were presented using differ-
ent application domains. This study gives useful insights on the working principle of
different methods and a comparative performance analysis is done to see the efficacy
of the different problem transformation methods. With all the 3 data sets the BR me-
thod shows the better performance than other methods because of the independence of
the label correlation nature and the speed.
8 References
[1] Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern
Recognition 37(9), 1757–1771 (2004)
[2] A. Elisseeff, J. Weston, A Kernel method for multi-labelled classification, in: Proceedings of the
Annual ACM Conference on Research and Development in Information Retrieval, 2005, pp. 274–
281.
[3] J. Read. A Pruned Problem Transformation Method for Multi-label classification. In Proc. 2008 New
Zealand Computer Science Research Student Conference (NZCSRS 2008), pages 143–150, 2008.
[4] Jesse Read, Bernhard Pfahringer, and Geoff Holmes. Multi-label Classification Using Ensembles of
Pruned Sets. In ICDM ’08: Proceedings of the 2008 Eighth IEEE International Conference on Data
Mining, volume 0, pages 995–1000, Washington, DC, USA, 2008. IEEE Computer Society
[5] http://mulan.sourceforge.net/
[6] http://meka.sourceforge.net/
[7] K. Trohidis, G. Tsoumakas, G. Kalliris, I. Vlahavas, Multilabel classification of music into emo
tions, in: Proceedings of the 9th International Conference on Music Information Retrieval, 2008,
pp. 320–330.
[8] G. Tsoumakas, I. Katakis, and I. Vlahavas, “Mining Multi-label Data”, Data Mining and
Knowledge Discovery Handbook, O. Maimon, L. Rokach (Ed.), Springer, 2nd edition, 2010.
[9] G. Tsoumakas and I. Vlahavas. Random k-Labelsets: An Ensemble Method for Multilabel Clas
sification. In Proceedings of the 18th European Conference on Machine Learning (ECML 2007),
pages 406–417, Warsaw, Poland, September 2007
[10] G. Tsoumakas, R. Friberg, E. Spyromitros-Xiou, I, Kataks, and J. Vilcek, “Mulan software - java
classes for multi-label classification available at:
http://mlkd.csd.auth.gr/multilabel.html#Software
[11] A. Wieczorkowska, P. Synak, and Z. Ras, “Multi-label classification of emotions in music”,
Proc of the International Conference on Intelligent Information Processing and Web Mining ,
307–315, 2006