Академический Документы
Профессиональный Документы
Культура Документы
One advantage of developing a system in Java is its Table 1: The filter algorithms in Weka
automatic support for documentation. Descriptions General manipulation of attributes
of each of the class libraries are automatically
compiled into HTML, providing an invaluable Many of the filter algorithms provide facilities for
resource for programmers and application general manipulation of attributes. For example,
developers alike. the first two items in Table 1, AddFilter and
DeleteFilter, insert and delete attributes.
The Java class libraries are organized into logical MakeIndicatorFilter transforms a nominal attribute
packages—directories containing a collection of into a binary indicator attribute. This is useful when
related classes. The set of packages is illustrated in
a multi-class attribute should be represented as a using a user-specified transformation function.
two-class attribute.
Feature Selection
In some cases it is desirable to merge two values of
Another essential data engineering component of
a nominal attribute into a single value. This can be
any applied machine learning system is the ability
done in a straightforward way using
to select potentially relevant features for inclusion
MergeAttributeValuesFilter. The name of the new
in model induction. The Weka system provides
value is a concatenation of the two original ones.
three feature selection systems: a locally produced
Some learning schemes—for example, support correlation based technique [3], the wrapper
vector machines—can only handle binary method and Relief [4].
attributes. The advantage of binary attributes is that
Learning schemes
they can be treated as either being nominal or
numeric. NominalToBinaryFilter transforms multi- Weka contains implementations of many
valued nominal attributes into binary attributes. algorithms for classification and numeric
prediction, the most important of which are listed
SelectFilter is used to delete all instances from a
in Table 2. Numeric prediction is interpreted as
dataset that exhibit one of a particular set of
prediction of a continuous class. The Classifier
nominal attribute values, or a numeric value below
class defines the general structure of any scheme
or above a certain threshold.
for classification or numeric prediction.
One possibility of dealing with missing values is to
weka.classifiers.ZeroR
globally replace them before the learning scheme is
weka.classifiers.OneR
applied. ReplaceMissingValuesFilter substitutes
weka.classifiers.NaiveBayes
the mean (for numeric attributes) or the mode (for
weka.classifiers.DecisionTable
nominal attributes) for each missing value.
weka.classifiers.Ibk
Transforming numeric attributes weka.classifiers.j48.J48
Some filters pertain specifically to numeric weka.classifiers.j48.PART
attributes. For example, an important filter for weka.classifiers.SMO
practical applications is the DiscretiseFilter. It weka.classifiers.LinearRegression
implements an unsupervised and a supervised weka.classifiers.m5.M5Prime
discretization method. The unsupervised method weka.classifiers.LWR
implements equal width binning. If the index of a weka.classifiers.DecisionStump
class attribute is set, the method will perform Table 2: The basic learning schemes in Weka
supervised discretization using MDL [2].
The most primitive learning scheme in Weka,
In some applications, it is appropriate to transform ZeroR, predicts the majority class in the training
a numeric attribute before a learning scheme is data for problems with a categorical class value,
applied, for example, to replace each value by its and the average class value for numeric prediction
square root. NumericTransformFilter transforms all problems. It is useful for generating a baseline
numeric attributes among the selected attributes