Вы находитесь на странице: 1из 6

PRACTICAL No 4(b)

Aim : Data preprocessing and Visualization

Preprocessing:
In order to experiment with the application the data set needs to be presented to WEKA in a format that
the program understands. There are rules for the type of data that WEKA will accept. There are three
options for presenting data into the program.

♦ Open File - allows for the user to select files residing on the local machine or recorded
medium.

♦ Open URL - provides a mechanism to locate a file or data source from a different
location specified by the user.

♦ Open Database - allows the user to retrieve files or data from a database source
provided by the user .

There are restrictions on the type of data that can be accepted into the program. Originally the software
was designed to import only ARFF files, newer versions allow different file types such as CSV, C4.5 and
serialized instance formats. The extensions for these files include .csv, .arff, .names, .bsi and .data.

Once the initial data has been selected and loaded the user can select options for refining the experimental
data. The options in the preprocess window include selection of optional filters to apply and the user can
select or remove different attributes of the data set as necessary to identify specific information. The
ability to pick from the available attributes allows users to separate different parts of the data set for
clarity in the experimentation.

The user can modify the attribute selection and change the relationship among the different attributes by
deselecting different choices from the original data set. There are many different filtering options
available within the preprocessing window and the user can select the different options based on need and
type of data present.

1
Example : Processing & Visualization of Weather data file in CSV
format

Steps I : (Creating a CSV file )


Create a csv file/ arff file using notepad . The first row contains the attribute names (separated by
commas) followed by each data row with attribute values listed in the same order (also separated by
commas). In fact, once loaded into WEKA, the data set can be saved into ARFF format .

Figure below – Shows the Comma separated values in Notepad for weather data file .

Step-II: ( Loading the data )


Open Weka tool, Initially (in the Preprocess tab) Click "open" and navigate to the directory containing the
data file (.csv or .arff). In this case we will open the data file.

Fig below – shows opeing of a csv file (weather data file ) through weka open file option .

2
Step – III : (Computation of Statistics and Visualize)

Once the data is loaded, WEKA will recognize all the attributes and during the scan of the data, it will
compute some basic statistics on each attribute. The left panel in Figure shows the list of recognized
attributes, while the top panels indicate the names of the base relation (or table) and the current working
relation (which are the same initially).

Clicking on any attribute in the left panel will show the basic statistics on that attribute. For categorical
attributes, the frequency for each attribute value is shown, while for continuous attributes we can obtain
min, max, mean, standard deviation, etc.

3
Fig below – Shows the results of selecting the "temperaturature" attribute.

4
Fig below – Shows the recognined attributes and corresponding statistics of attributes of the relation

Example : Weather data file in arff format


In the above example , however, we will save our results as separate data files and can treat each step as a
separate WEKA session (after each filtering step ). To save the new working relation as an ARFF file,
Click on save button in the top panel. Here, as shown in the "save" dialog box , we will save the relation
in the file as "weather.arff".

5
Figure below - Shows an example of creating an arff file using save option from weka tool .

Figure below – Shows the portion of the new generated ARFF file (in text editor).