Академический Документы
Профессиональный Документы
Культура Документы
Abstract
This framework is implemented as a plugin for the open source data mining tool, Rapid
Miner. It provides the functionality to simulate data streams from the static data. It can
handle different classes of learners i.e. clusterer, classifier and associators and can handlbe
both static and incremental incremental learning. It also provides the functionality to
compare the models produced as a result of the above said learning tasks. Under this we
not only have the facility to compare the quality of the models but also their structures as
well.
1. Introduction
T ODO
For handling data stream we extended on the existing implementation of a sub class of
ExampleSet i.e. ConditionalExampleSet. There are is a notion of regular and special at-
tributes in Rapidminer. Regular attributes are used for learning while special attributes
although do not directly participate in learning but provides important information regard-
ing a tuple e.g. target attribute provides class membership for supervised learning, weight
attribute states the relevance of tuple and etc. There are some predefined special attributes
that we utilised in our framework, namely, Batch & Weight attribute.
Batch can be used to divide the examples in various parts with each batch simulating
an instance in time. Some learners utilize weights for time-weighted forgetting instead of
creating time-window. Weight would be used by them. Before we go any further it is quite
useful to the strategies that can be applied in incremental learning scenario.
2000
c Zaigham F. Siddiqui and Nico Schlitter.
Siddiqui and Schlitter
2.2 Time-weighted
ExampleSet is divided into batches as previously with a slight change. The window is
absent. Weights are assigned on the basis of time stamp of a batch. The most recent the
batch,the higher the weight. If it is required to implement weighting on each individual
example, batch size can be set to 1.
At time step, paremeter, batchEnd is incremented to accomodate the new set of exam-
ples andweight attribute for each example is adjusted to reflect the forgetting. We have
another variable called weightThreshold; if weight of an example fall bellows a certain value,
it is filtered out by the ConditionedExampleSet.
3.1 Initialisation
In the first step of incremental task, learner calls the method learn(). Although for con-
ventional learners this method contains the learning step but in incremental scenario it is
a little different. Here it is used to initialise the model i.e. set up the required variables. It
may as well turn out that some learning is also performed in this step, but it is completey
on the discretion of the learner. It returns the created model as output.
3.2 Update
At each subsequent time-step, after the initialisation of the model, the learner determines
and calls updateLearner(). This method updates the given model on new batch of examples.
All the changes that might have been introduced by new examples are reflected in the model.
Result of this step is an updated
4. Model Comparator
Model comparison can be of two types. We can either compare their performance over an
identical dataset or we can compare their structures. When comparing performances, the
models are evaluated against each other on the basis of their target prediction abilty. As a
2
Learning with Mixtures of Trees
result, this type of comparision is only restricted to supervised learning tasks. Where as in
the latter, we look for similarities in their structures. PANDA & MONIC are examples of
such frameworks. Such comparision can be performed on both supervised and un-supervised
learning tasks. However, it requires models to be the result of similar learning tasks i.e.
cluster models can only be compared with other cluster based models and same holds for
tree-based models, associations rules and etc.
5. Experimental setup
T ODO
6. Conclusion
T ODO
Acknowledgments