Вы находитесь на странице: 1из 4

Weka: Practical Machine Learning Tools and Techniques

with Java Implementations


Ian H. Witten, Eibe Frank, Len Trigg, Mark Hall, Geoffrey Holmes, and Sally Jo Cunningham,
Department of Computer Science, University of Waikato, New Zealand.

Introduction Figure 1. They provide interfaces to pre-processing


routines including feature selection, classifiers for
The Waikato Environment for Knowledge Analysis
both categorical and numeric learning tasks, meta-
(Weka) is a comprehensive suite of Java class
classifiers for enhancing the performance of
libraries that implement many state-of-the-art
machine learning and data mining algorithms. classifiers (for example, boosting and bagging),
Weka is freely available on the World-Wide Web evaluation according to different criteria (for
and accompanies a new text on data mining [1] example, accuracy, entropy, root-squared mean
which documents and fully explains all the error, cost-sensitive classification, etc.) and
algorithms it contains. Applications written using experimental support for verifying the robustness
the Weka class libraries can be run on any of models (cross-validation, bias-variance
computer with a Web browsing capability; this decomposition, and calculation of the margin).
allows users to apply machine learning techniques Weka’s core
to their own data regardless of computer platform. The core package contains classes that are accessed
Tools are provided for pre-processing data, feeding from almost every other class in Weka. The most
it into a variety of learning schemes, and analyzing important classes in it are Attribute, Instance, and
the resulting classifiers and their performance. An Instances. An object of class Attribute represents
important resource for navigating through Weka is an attribute—it contains the attribute’s name, its
its on-line documentation, which is automatically type, and, in case of a nominal attribute, its
generated from the source. possible values. An object of class Instance
contains the attribute values of a particular
The primary learning methods in Weka are
instance; and an object of class Instances contains
“classifiers”, and they induce a rule set or decision
an ordered set of instances—in other words, a
tree that models the data. Weka also includes
dataset.
algorithms for learning association rules and
clustering data. All implementations have a Data Pre-Processing
uniform command-line interface. A common
Weka’s pre-processing capability is encapsulated in
evaluation module measures the relative
an extensive set of routines, called filters, that
performance of several learning algorithms over a
enable data to be processed at the instance and
given data set.
attribute value levels. Table 1 lists the most
Tools for pre-processing the data, or “filters,” are important filter algorithms that are included.
another important resource. Like the learning
weka.filter.AddFilter
schemes, filters have a standardized command-line
weka.filter.DeleteFilter
interface with a set of common command-line
weka.filter.MakeIndicatorFilter
options.
weka.filter.MergeAttrbuteValuesFilter
The Weka software is written entirely in Java to weka.filter.NominalToBinaryFilter
facilitate the availability of data mining tools weka.filter.SelectFilter
regardless of computer platform. The system is, in weka.filter.ReplaceMissingValuesFilter
sum, a suite of Java packages, each documented to weka.filter.SwapAttributeValuesFilter
provide developers with state-of-the-art facilities. weka.filter.DiscretiseFilter
Javadoc and the class library weka.filter.NumericTransformFilter

One advantage of developing a system in Java is its Table 1: The filter algorithms in Weka
automatic support for documentation. Descriptions General manipulation of attributes
of each of the class libraries are automatically
compiled into HTML, providing an invaluable Many of the filter algorithms provide facilities for
resource for programmers and application general manipulation of attributes. For example,
developers alike. the first two items in Table 1, AddFilter and
DeleteFilter, insert and delete attributes.
The Java class libraries are organized into logical MakeIndicatorFilter transforms a nominal attribute
packages—directories containing a collection of into a binary indicator attribute. This is useful when
related classes. The set of packages is illustrated in
a multi-class attribute should be represented as a using a user-specified transformation function.
two-class attribute.
Feature Selection
In some cases it is desirable to merge two values of
Another essential data engineering component of
a nominal attribute into a single value. This can be
any applied machine learning system is the ability
done in a straightforward way using
to select potentially relevant features for inclusion
MergeAttributeValuesFilter. The name of the new
in model induction. The Weka system provides
value is a concatenation of the two original ones.
three feature selection systems: a locally produced
Some learning schemes—for example, support correlation based technique [3], the wrapper
vector machines—can only handle binary method and Relief [4].
attributes. The advantage of binary attributes is that
Learning schemes
they can be treated as either being nominal or
numeric. NominalToBinaryFilter transforms multi- Weka contains implementations of many
valued nominal attributes into binary attributes. algorithms for classification and numeric
prediction, the most important of which are listed
SelectFilter is used to delete all instances from a
in Table 2. Numeric prediction is interpreted as
dataset that exhibit one of a particular set of
prediction of a continuous class. The Classifier
nominal attribute values, or a numeric value below
class defines the general structure of any scheme
or above a certain threshold.
for classification or numeric prediction.
One possibility of dealing with missing values is to
weka.classifiers.ZeroR
globally replace them before the learning scheme is
weka.classifiers.OneR
applied. ReplaceMissingValuesFilter substitutes
weka.classifiers.NaiveBayes
the mean (for numeric attributes) or the mode (for
weka.classifiers.DecisionTable
nominal attributes) for each missing value.
weka.classifiers.Ibk
Transforming numeric attributes weka.classifiers.j48.J48
Some filters pertain specifically to numeric weka.classifiers.j48.PART
attributes. For example, an important filter for weka.classifiers.SMO
practical applications is the DiscretiseFilter. It weka.classifiers.LinearRegression
implements an unsupervised and a supervised weka.classifiers.m5.M5Prime
discretization method. The unsupervised method weka.classifiers.LWR
implements equal width binning. If the index of a weka.classifiers.DecisionStump
class attribute is set, the method will perform Table 2: The basic learning schemes in Weka
supervised discretization using MDL [2].
The most primitive learning scheme in Weka,
In some applications, it is appropriate to transform ZeroR, predicts the majority class in the training
a numeric attribute before a learning scheme is data for problems with a categorical class value,
applied, for example, to replace each value by its and the average class value for numeric prediction
square root. NumericTransformFilter transforms all problems. It is useful for generating a baseline
numeric attributes among the selected attributes

Figure 1 Package Hierarchy in Weka


performance that other learning schemes are class situations—for example, the SMO class from
compared to. In some cases, it is possible that other above. In order to apply these schemes to multi-
learning schemes perform worse than ZeroR, an class datasets it is necessary to transform the multi-
indicator of substantial overfitting. class problem into several two-class ones, and
The next scheme, OneR, produces very simple combine the results. MultiClassClassifier does
rules based on a single attribute [5]. NaiveBayes exactly that.
implements the probabilistic Naïve Bayesian weka.classifiers.Bagging
classifier. DecisionTable employs the wrapper weka.classifiers.AdaBoostM1
method to find a good subset of attributes for weka.classifiers.LogitBoost
inclusion in the table. This is done using a best-first weka.classifiers.MultiClassClassifier
search. IBk is an implementation of the k-nearest- weka.classifiers.CVParameterSelection
neighbours classifier [6]. The number of nearest
neighbours ( k) can be set manually, or determined Table 3: The meta-classifier schemes in Weka
automatically using cross-validation. Additional learning schemes
j48 is an implementation of C4.5 release 8 [7] that Weka is not limited to supporting classification
produces decision trees. This is a standard schemes; the class library includes representative
algorithm that is widely used for practical machine implementations from other learning paradigms.
learning. Part is a more recent scheme for
producing sets of rules called “decision lists”; it Association rules
works by forming partial decision trees and Weka contains an implementation of the Apriori
immediately converting them into the learner for generating association rules, a
corresponding rule. SMO implements the commonly used technique in market basket
“sequential minimal optimization” algorithm for analysis [14]. This algorithm does not seek rules
support vector machines, which are an important that predict a particular class attribute, but rather
new paradigm in machine learning [8]. looks for any rules that capture strong associations
The next three learning schemes in Table 2 between different attributes.
represent methods for numeric prediction. The Clustering
simplest is linear regression. M5Prime is a rational
Methods of clustering also do not seek rules that
reconstruction of Quinlan’s M5 model tree inducer
predict a particular class, but rather try to divide the
[9]. LWR is an implementation of a more
data into natural groups or “clusters.” Weka
sophisticated learning scheme for numeric
includes an implementation of the EM algorithm,
prediction, using locally weighted regression [10].
which can be used for unsupervised learning. Like
DecisionStump builds simple binary decision Naïve Bayes, it makes the assumption that all
"stumps" (1-level decision trees) for both numeric attributes are independent random variables.
and nominal classification problems. It copes with
Evaluation and Benchmarking
missing values by extending a third branch from
the stump—in other words, by treating “missing” One of the key aspects of the Weka suite is the
as a separate attribute value. DecisionStump is ability it provides to evaluate learning schemes
mainly used in conjunction with the LogitBoost consistently. Table 4 contains a condensed
boosting method, discussed in the next section. summary of the current “league table” in terms of
applying the machine learning schemes to all of the
Meta-Classifiers
datasets we have collected (37 from the UCI
Recent developments in computational learning repository [14]). All schemes are tested by ten by
theory have led to methods that enhance the ten stratified cross-validation.
performance or extend the capabilities of these
basic learning schemes. We call these performance W-L Wins Loss Scheme
enhancers “meta-learning schemes” or “meta- 208 254 46 LogitBoost -I 100 Decision Stump
classifiers” because they operate on the output of 155 230 75 LogitBoost -I 10 Decision Stump
other learners. Table 3 summarizes the most 132 214 82 AdaBoostM1 Decision Trees
important meta-classifiers in Weka. 118 209 91 Naïve Bayes
The first of these schemes is an implementation of 62 183 121 Decision Trees
the bagging procedure [11]. This implementation 14 168 154 IBk Instance-based learner
allows a user to set the number of bagging -65 120 185 AdaBoostM1 Decision Stump
iterations to be performed. -140 90 230 OneR—Simple Rule learner
-166 77 243 Decision Stump
AdaBoost.M1 [12] similarly gives the user control -195 9 204 ZeroR
over the boosting iterations performed. Another
boosting procedure is implemented by LogitBoost Table 4: Ranking schemes
[13], which is suited to problems involving two-
Column 2, Wins, is the number of datasets for References
which the scheme performed significantly better (at
[1] Witten, I. H., and Frank E. (1999) Data
the 95% confidence level) than another scheme. Mining: Practical Machine Learning Tools
Loss is the number of datasets for which a scheme and Techniques with Java Implementations,
performed significantly worse than another Morgan Kaufmann, San Francisco.
scheme. W-L is the difference between wins and [2] Fayyad, U.M. and Irani, K.B. (1993) “Multi-
losses to give an overall score. It would appear, for interval discretization of continuous-valued
these 37 test sets, that Logit boosting simple attributes for classification learning.” Proc
stumps for 10 or 100 iterations is the best overall IJCAI, 1022–1027. Chambery, France.
method among the schemes available in Weka. [3] Hall, M.A. and Smith, L.A. (1998) “Practical
Building Applications with Weka feature subset selection for machine learning.”
Proc Australian Computer Science
In most data mining applications the machine
learning component is just a small part of a far Conference, 181–191. Perth, Australia.
larger software system. To accommodate this, it is [4] Kira, K. and Rendell, L.A. (1992) “A practical
possible to access the programs in Weka from approach to feature selection.” Proc 9th Int
Conf on Machine Learning, 249-256.
inside one’s own code. This allows the machine
[5] Holte, R.C. (1993) “Very simple classification
learning subproblem to be solved with a minimum
rules perform well on most commonly used
of additional programming.
datasets.” Machine Learning, Vol. 11, 63–91.
For example, Figure 2 shows a Weka applet written [6] Aha, D. (1992) “Tolerating noisy, irrelevant,
to test the usability of machine learning techniques and novel attributes in instance-based learning
in the objective measurement of mushroom quality. algorithms.” Int J Man-Machine Studies, Vol.
Image processing a picture of a mushroom cap (at 36, 267–287.
left in Figure 2) provides data for the machine [7] Quinlan, J.R. (1993) C4.5: Programs for
learning scheme to differentiate between A, B and machine learning. Morgan Kaufmann, San
C grade mushrooms [15]. Mateo, CA.
[8] Burges, C.J.C. (1998) “A tutorial on support
vector machines for pattern recognition.” Data
Mining and Knowledge Discovery, Vol. 2(1),
121-167.
[9] Wang, Y. and Witten, I.H. (1997) “Induction
of model trees for predicting continuous
classes.” Proc Poster Papers of the European
Conference on Machine Learning, 128-137.
Prague.
[10] Atkeson, C.G., Schaal, S.A. and Moore, A.W.
(1997) “Locally weighted learning.” AI
Review, Vol. 11, 11–71.
[11] Breiman, L. (1992) “Bagging predictors.”
Machine Learning, Vol. 24, 123–140.
[12] Freund, Y. and Schapire, R.E. (1996)
“Experiments with a new boosting algorithm.”
Proc COLT, 209–217. ACM Press, New York.
[13] Friedman, J.H., Hastie, T. and Tibshirani, R.
(1998) “Additive logistic regression: a
statistical view of boosting.” Technical Report,
Figure 2: Mushroom grading applet Department of Statistics, Stanford University.
Conclusions [14] Agrawal, R., Imielinski, T. And Swami, A.N.
(1993) “Database mining: a performance
As the technology of machine learning continues to perspective.” IEEE Trans Knowledge and
develop and mature, learning algorithms need to be Data Engineering, Vol. 5, 914–925.
brought to the desktops of people who work with [15] Kusabs N., Bollen F., Trigg L., Holmes G.
data and understand the application domain from and Inglis S. (1998) “Objective measurement
which it arises. It is necessary to get the algorithms of mushroom quality.” Proc New Zealand
out of the laboratory and into the work Institute of Agricultural Science and the New
environment of those who can use them. Weka is a Zealand Society for Horticultural Science
significant step in the transfer of machine learning Annual Convention, Hawke’s Bay, New
technology into the workplace. Zealand, 51.

Вам также может понравиться