Вы находитесь на странице: 1из 3

Open Academic Analytics Initiative Marist College

1

Academic Analytics Predictive model Weka Flow
Platform
Weka 3.7.5 (refer installation documents)
Steps involved in developing the Weka model
There are 2 steps involved in developing of the Academic analytics model.
Preprocess of the datasets before training.
Training and saving the weka model
Preprocess of the datasets
1. Start Weka. In Weka GUI chooser select Knowledge flow application. Create a new flow.
2. Drag a Database Loader icon from Weka Design Datasources
3. Retrieve the U10FPCSM , U11SPCSM , U11UPCSM datasets from the MS Sql Server
OAAIDBEitel database using Database Loader

Parameters to be set
Set the name of the Dataset
Set Configurations
Database URL as ://10.128.247.167;databaseName= OAAIDBEitel
Use appropriate authentication credentials
Enter the query to retrieve data from the table.
Example - SELECT * FROM U11UPCSM
Click on Ok.
Follow the procedure for all the datasets required to train the model.
4. Drag Appender icon present in Weka Design Tools. Set the name for the appender.
Right click on each dataset, select dataset option and connect the hop to Appender.
Note: Make sure the dataset table from the database which are appended have the same table
structure with identical field names.
5. Attach a TextViewer to the Appender to check the results. TextViewer icon can be found in
WekaDesign Visualization.
6. The dataSet, trainingSet or testSet hops are used to connect adjacent icons in the knowledge
flow.
7. The Dataset exported from the SQL server and certain variables where recoded to numeric
values or nominal values as weka works only with these data types.

Numeric attributes can be real or integer numbers.
Nominal values are defined by providing an <nominal-specification> listing the possible values:
E.g. {<nominal-name1>, <nominal-name2>, <nominal-name3>,}
Open Academic Analytics Initiative Marist College
2

8. Use Numeric to Nominal filter present in Weka Design Filters unsupervised attribute
to convert numeric values to nominal values. Multiple attributes can be converted by using
comma separated values to specify attribute indices and ranges. Eg(1,4-6,9).

In the model we convert Academic Risk column(specified as last in attribute indices) from
numeric to nominal as the class attribute has to be a nominal value.
9. Use Remove Type filter to remove the unnecessary columns before training the model.
Attribute Instances can be specified as indices and ranges as comma separated values.
Example 1,4,5-9,18 (removes all these columns)
10. Use text viewer to verify the results.
11. Include Class Assigner evaluation icon to assign the class attribute for the model. In the analytics
model we assign the last column Academic_Risk as the class assigner.

Training and Saving the Weka model

1. The datset from the preprocess flow is connected to Train Test Split Maker. This evaluation icon
splits the dataset in to trainSet and testSet depending on the seed value and percentage
parameters selected.
Parameters
Set the seed value as 1
Percentage as 70 %
Connect Text viewers to trainSet and TestSet to view the results of the split.
2. The training set has to be balanced by creating duplicates of the minority set. SMOTE filter is
used for oversampling and there by balancing the data.
3. Connect the training Set from the Train Test Split maker to SMOTE filter present in
WekaDesignsupervisedinstance
Set the parameter values as
Parameters:
classValue default is 0(takes the minority class value). Can also specify class value to
be considered for over sampling.
Nearest neighbors 7
Percentage 1850.0
Random seed 12
4. Drag and Drop various classifier algorithms. In analytics model we use Logit Boost, Nave
Bayes,J48, Logistic, SMO, Lib SVM. These can be found in Weka Design Classifiers.
5. Connect the balanced training set from SMOTE to all the models.
6. Connect test set from train test set maker from the Train Test Split Maker to all the models.
7. Connect the batchClassifier from each of the models to ClassifierPerformanceEvaluator found in
WekaDesignEvaluation. Connect the text from ClassifierPerformanceEvaluator to the
TextViewer to see the performance results of each training model.
Open Academic Analytics Initiative Marist College
3

8. Run the flow using the Play button on Left top corner of the application.
9. Right Click on each of the trained and tested model and save the models using appropriate
names and location.
Note : In order to save the model without manual step, we can use SerializedModelSaver option
in WekaDesignEvaluation. Connect the batchClassifier from the model and configure
filename prefix and directory to save the model automatically.