Вы находитесь на странице: 1из 3

Journal of Machine Learning Research 1 (2000) 1-48 Submitted 4/00; Published 10/00

Stream Mining Plugin for Rapid Miner

Zaigham Faraz Siddiquiă Zaigham.Siddiqui@student.uni-magdeburg.de


Department of Computer Science
University of Magdeburg
Magdeburg, 39106, Germany
Nico Schlitter Nico.Schlitter@cs.uni-magdeburg.de
Department of Computer Science
University of Magdeburg
Magdeburg, 39106, Germany

Editor: Leslie Pack Kaelbling

Abstract
This framework is implemented as a plugin for the open source data mining tool, Rapid
Miner. It provides the functionality to simulate data streams from the static data. It can
handle different classes of learners i.e. clusterer, classifier and associators and can handlbe
both static and incremental incremental learning. It also provides the functionality to
compare the models produced as a result of the above said learning tasks. Under this we
not only have the facility to compare the quality of the models but also their structures as
well.

Keywords: RapidMiner, Stream Mining, Stream Simulating,

1. Introduction

T ODO

2. Stream Implementation Methodologies

For handling data stream we extended on the existing implementation of a sub class of
ExampleSet i.e. ConditionalExampleSet. There are is a notion of regular and special at-
tributes in Rapidminer. Regular attributes are used for learning while special attributes
although do not directly participate in learning but provides important information regard-
ing a tuple e.g. target attribute provides class membership for supervised learning, weight
attribute states the relevance of tuple and etc. There are some predefined special attributes
that we utilised in our framework, namely, Batch & Weight attribute.
Batch can be used to divide the examples in various parts with each batch simulating
an instance in time. Some learners utilize weights for time-weighted forgetting instead of
creating time-window. Weight would be used by them. Before we go any further it is quite
useful to the strategies that can be applied in incremental learning scenario.

2000
c Zaigham F. Siddiqui and Nico Schlitter.
Siddiqui and Schlitter

2.1 Windowed & Batch-mode


We divide an ExampleSet into a certain number of batches. Each batch can contain 1 to n
Examples. Our window spans on these batches. A window, just like a batch, can contain
1 to n Batches. At each time-step, window’s span parameters (batchStart and batchEnd)
are incremented by one which effectively introduces a new batch to the window and filter
out the one with least batch id.
If we set the window size to 1, we can get a batch-mode simulation. This will be
appropriate for the learners that update on only new instances.

2.2 Time-weighted
ExampleSet is divided into batches as previously with a slight change. The window is
absent. Weights are assigned on the basis of time stamp of a batch. The most recent the
batch,the higher the weight. If it is required to implement weighting on each individual
example, batch size can be set to 1.
At time step, paremeter, batchEnd is incremented to accomodate the new set of exam-
ples andweight attribute for each example is adjusted to reflect the forgetting. We have
another variable called weightThreshold; if weight of an example fall bellows a certain value,
it is filtered out by the ConditionedExampleSet.

3. Incremental Learning Task


Incremental learning task is divided into two ordered steps i.e. initialization and updation
of the model. We extended on the existing Learner interface that provides the method
learn(). We extended the Learner intefacer to provide method updateLearner().

3.1 Initialisation
In the first step of incremental task, learner calls the method learn(). Although for con-
ventional learners this method contains the learning step but in incremental scenario it is
a little different. Here it is used to initialise the model i.e. set up the required variables. It
may as well turn out that some learning is also performed in this step, but it is completey
on the discretion of the learner. It returns the created model as output.

3.2 Update
At each subsequent time-step, after the initialisation of the model, the learner determines
and calls updateLearner(). This method updates the given model on new batch of examples.
All the changes that might have been introduced by new examples are reflected in the model.
Result of this step is an updated

4. Model Comparator
Model comparison can be of two types. We can either compare their performance over an
identical dataset or we can compare their structures. When comparing performances, the
models are evaluated against each other on the basis of their target prediction abilty. As a

2
Learning with Mixtures of Trees

result, this type of comparision is only restricted to supervised learning tasks. Where as in
the latter, we look for similarities in their structures. PANDA & MONIC are examples of
such frameworks. Such comparision can be performed on both supervised and un-supervised
learning tasks. However, it requires models to be the result of similar learning tasks i.e.
cluster models can only be compared with other cluster based models and same holds for
tree-based models, associations rules and etc.

5. Experimental setup
T ODO

5.1 Incremental Experimenter


An incremental experiment is divided into two steps. In first step, data is simulated as
though it were coming from a stream and in the second one, an incremental learner is used
to build the model. This is a accomplished by a composite operator called GenericStream-
Experimenter (F ig − 5.1). This is an iterative operator. It stops once the simulated data
stream has been exhausted.

5.2 Incremental Evaluator


In an incremental experiment, it is desirable to evalute the model on the newly arrived
batch before learning is performed on it. The result of this evaluation can be passed onto
the learner which can adapt the model accordingly if there is some change (provided that
the leaner supports this capability) or can be just used to observe the quality of the model
at each time step. It is used in second step of a GenericIncrementalExperiment in place of
an incremental learner (F ig − 5.2).

5.3 Model Comparator


It is a composite operator and consists of two applier chains. In each chain, it performs
model application on an identical ExampleSet. Results from this application are compared
by the comparator and a comparison vector is returned. Since this is a perform-based
comparison, it can only perform comparison on models that are a result supervised learning
i.e. a tree-based model can be compared with bayesian model, a neural network or a model
from some lazy learner but not a cluster model.

6. Conclusion
T ODO

Acknowledgments

reference to ko-rfid project

Вам также может понравиться