45 views

Uploaded by Andika Saputra

- 2009 - Stephen Ousley - UnderstandingraceandhumanvariationWhyforensicanthr[Retrieved-2015!11!28]
- Facto Extra
- Vol34No2Paper7.pdf
- ELEMENTAL STUDY OF POTTERY SHARDS FROM AN ARCHAEOLOGICAL SITE IN DEDAN, SAUDI ARABIA, USING INDUCTIVELY COUPLED PLASMA-MASS SPECTROMETRY AND MULTIVARIATE STATISTICAL ANALYSIS
- 1-s2.0-S003132039900179X-main
- Versiune Finala in Proceedings Id941
- ANALYSING INDIVIDUAL PROFILES BY THREE-WAY FACTOR ANALYSIS .pdf
- Tigris
- tmpE8FC.tmp
- seed2
- Golob et al 2004
- research paper
- Project Report4.Docxnew1
- Mammographic image Classification using Gabor Wavelet
- Immigration_Europe_and_the_new_cultural.pdf
- News document analysis by using a proficient algorithm
- Output Spss
- Pattern+Recognition+%26+Image+Processing Seme2 2006
- Bss
- anxiety levels among business majors

You are on page 1of 51

D.M.J. Tax, C. Veenman

Information and Communication Theory Group, Delft University of Technology

Informatics Institute, Faculty of Science, University of Amsterdam

10

10

apple

banana

pear

8

6

4

2

0

2

4

6

8

10

Introduction

The aim of this set of exercises is to assist the reader in getting acquainted with PRTools,

a Matlab toolbox for pattern recognition. It is a prerequisite to have a global knowledge on

pattern recognition, to have read the introductory part of the PRTools manual and to have

access to this manual during the study of the exercises. Moreover, the reader needs to have

some experience with Matlab and should regularly study the help texts provided with the

PRTools commands (e.g. help gendatc).

The exercises should give some insight into the toolbox. They are not meant to explain in

detail how the tools are constructed and, thereby, they do not reach the level that enables the

student to add new tools to PRTools, using its specific classes dataset and mapping.

It is left to the responsibility of the reader to study the exercises using various datasets. They

can be either generated by one of the routines in the toolbox or they should be loaded from a

special dataset directory. In section 13 this is explained further with examples of both, artificial

data as well as real world data. First the Matlab commands are given, next scatter plots of

some of the sets are shown. Note that not all the arguments in the commands presented are

compulsory. It is necessary to refer to these pages regularly in order to find suitable problems

for the exercises.

In order to build pattern recognition systems for real world (raw) datasets, e.g. images as

they are grabbed by a camera, preprocessing and the measurement of features is necessary.

The growing measurement toolbox MeasTools is designed for that. Here it is unavoidable

that students write their own low level routines as at this moment the collection of feature

measuring tools is insufficient. As no MeasTools manual is available yet, students should

read the online documentation and the additional material which may be supplied during a

course.

Dont forget to study the exercises presented in the manual and examples available under

PRTools (e.g. prex_cleval)!

**************************

The exercises assume that the data collections prdatasets, prdatafiles and Coursedata

are available. The last directory contains also some experimental commands not available in

the standard PRTools distribution.

Version 4.1 of PRTools contains some new facilities that may confuse the user. The

prprogress command controls the reporting of long running commands. It may irritate

the user and may even sometimes crash (especially in the Java interface). It may be switched

of by prprogress off.

Some commands may generate many warnings, especially when no class priors are set in a

dataset. One solution is to switch of the PRTools warning mechanism by prwarning(0).

A better way is to set class priors. Eg. a = setprior(a,getprior(a)) sets the class priors

according to class frequencies if they are not yet defined.

Contents

1 Introduction

2 Classifiers

10

16

18

23

26

29

34

9 One-Class Classifiers

36

10 Classifier combining

39

11 Boosting

42

44

13 Summary of the methods for data generation and available data sets

47

Introduction

Example 1. Datasets

PRTools entirely deals with sets of objects represented by vectors in a feature space. The

central data structure is a so-called dataset. It consist of a matrix of size m k; m row

vectors representing the objects given by k features each. Attached to this matrix is a set of

m labels (strings or numbers), one for each object and a set of k feature names (also strings

or numbers), one for each feature. Moreover, a set of prior probabilities, one for each class, is

stored. Objects with the same label belong to the same class. In most help files in PRTools,

a dataset is denoted by A. Almost all routine can handle multi-class objects. Some useful

routines to handle datasets are:

dataset

gendat

genlab

seldat

setdat

getdata

getlab

getfeat

renumlab

Generate a random subset of a dataset

Generate dataset labels

Select a specify subset of a dataset

Define a new dataset from an old one by replacing its data

Retrieve data from dataset

Retrieve object labels

Retrieve feature labels

Convert labels to numbers

Sets of objects may be given externally or may be generated by one of the data generation

routines in PRTools (see section 13). Their labels may be given externally or may be the

results of a classification or a cluster analysis. A dataset containing 10 objects with 5 random

measurements can be generated by:

>> data = rand(10,5);

>> a = dataset(data)

10 by 5 dataset with 0 classes: [ ]

In this example no labels are supplied, therefore no classes are detected. Labels can be added

to the dataset by:

>> labs = [1 1 1 1 1 2 2 2 2 2]; % labs should be a column vector

>> a = dataset(a,labs)

10 by 5 dataset with 2 classes: [5 5]

Note that the labels have to be supplied as a column vector. A simple way to assign labels to

a dataset is offered by the routine genlab in combination with the Matlab char command:

>> labs = genlab([4 2 4],char(apple,pear,banana))

>> a = dataset(a,labs)

10 by 5 dataset with 3 classes: [4 4 2]

Note that the order of the classes has changed. Use the routines getlab and getfeat to

retrieve the object labels and the feature labels of a. The fields of a dataset can be made

>> struct(a)

data:

lablist:

nlab:

labtype:

targets:

featlab:

featdom:

prior:

cost:

objsize:

featsize:

ident:

version:

name:

user:

[10x5 double]

[3x6 char]

[10x1 double]

crisp

[]

[5x1 double]

[] [] [] [] []

[]

[]

10

5

10x1 cell

[1x1 struct] 05-Apr-2005 18:57:19

[]

[]

In the on-line information on datasets (help datasets, also printed in the PRTools manual)

the meaning of these fields is explained. Each field may be changed by a set-command, e.g.

>> b = setdata(a,rand(10,5));

Field values can be retrieved by a similar get-command, e.g.

>> classnames = getlablist(a)

In nlab an index is stored for each object to the list of class names lablist. Note that this

list is alphabetically ordered. The size of a dataset can be found by both, size and getsize:

>> [m,k] = size(a);

>> [m,k,c] = getsize(a);

The number of objects is returned in m, the number of features in k and the number of classes

in c. The class prior probabilities are stored in prior. It is by default set to the class

frequencies if the field is empty. Data in a dataset can also be retrieved by double(a) or more

simple by +a.

1.1 Have a look of the help-information of seldat. Notice that it has many input parameters.

In most cases you can ignore input parameters of functions that are of no interest to you. The

default values are often good enough. Use the routine to extract the banana class from a and

check this by inspecting the result of +a.

Datasets can be manipulated in many ways comparable with Matlab matrices. So [a1; a2]

combines two datasets, provided that they have the same number of features. The feature set

may be extended by [a1 a2] if a1 and a2 have the same number of objects.

1.2 Generate 3 new objects of the classes apple and pear and add them to the dataset

a. Check if the class sizes change accordingly.

1.3 Add a new, 6th feature to the whole dataset a.

Another way to inspect a dataset is to make a scatterplot of the objects in the dataset. For

this the function scatterd is supplied. This plots each object in a dataset in a 2D graph,

using a coloured marker when class labels are supplied. When more than two features are

present in the dataset, only the first two are used. For obtaining a scatterplot of two other

features they have to be explicitly extracted first, e.g. a1 = a(:,[2 5]);. With an extra

option legend one can add a legend to the figure, showing which markers indicate which

classes.

1.4 Use scatterd to make a scatterplot of the features 2 and 5 of dataset a. Try it also

using the legend option.

1.5 Next, use scatterdui to make a scatterplot of a and use its buttons to select features.

(Note that legend is not a valid option here.)

1.6 It is also possible to create 3D scatterplots. Make a 3-dimensional scatterplot by

scatterd(a,3) and try to rotate it by the mouse after pressing the right toolbar button.

1.7 Use one of the procedures described on page 42 and following to create an artificial

dataset of 100 objects. Make a scatterplot. Repeat this a few times.

Exercise 1. Scatterplot

Load the 4-dimensional Iris dataset by a = iris and make scatterplots of all feature combinations using the gridded option of scatterd. Try also all feature combination using

scatterdui.

Plot in a separate figure the one-dimensional feature densities by plotf. Identify visually

the best combination of two features. Create a new dataset b that contains just these two

features. Create a new figure by the figure command and plot a scatterplot of b.

Exercise 2. Mahalanobis distance (optional)

Use the distmaha command to compute the Mahalanobis distances between all pairs of classes

in the iris dataset. Repeat this for the best two features just selected. Can you find a way

to test whether this is really the best feature pair according the Mahalanobis distance?

Exercise 3. Generate your own dataset (optional)

Generate a dataset that consists of two 2-D uniformly distributed classes of objects using the

rand command. Transform the sets such that for the [xmin xmax; ymin ymax] intervals

the following holds: [0 2; -1 1] for class 1 and [1 3; 1.5 3.5] for class 2. Generate

50 objects for each class. An easy way is to do this for x and y coordinates separately and

combine them afterwards. Label the features by area and perimeter.

Check the result by scatterd and by retrieving object labels and feature labels.

Exercise 4. Enlarge an existing dataset (optional)

Generate a dataset using gendatb containing 10 objects per class. Enlarge this dataset to

100 objects per class by generating more data using the gendatk and gendatp commands.

Compare the scatterplots with a scatterplot of 100 objects per class directly generated by

gendatb. Explain the difference.

Example 2. Density estimation

gaussm

parzenm

knnm

Normal distribution

Parzen density estimation

K-nearest neighbour density estimation

They are programmed as a mapping. Details of mappings are discussed later. The following

two steps are always essential for a mapping: the estimation is built, or trained, using a

training set, e.g. by:

>> a = gauss(100)

Gaussian Data, 100 by 1 dataset with 1 classes: [100]

Which is a 1-dimensional normally distributed dataset of 100 points with mean 0.

>> w = gaussm(a)

Mixture of Gaussians, 1 to 1 trained mapping --> normal map

The trained mapping w now contains all information needed for computing densities of given

points, e.g.

>> b = [-2:0.1:2];

Now we will measure for the points defined by b the density according to w (which is a density

estimator based on the dataset a):

>> d = map(b,w) 41 by 1 dataset with 0 classes: [41]

The result may be listed on the screen by [+b +d] (coordinates and densities) or plotted by:

>> plot(+b,+d)

2.1 Plot the densities estimated by parzenm and knnm in separate figures. These routines

need sensible parameters. Try a few values for the smoothing parameter and the number of

nearest neighbours.

Example 3. Create a dataset from a set of images

Load an image dataset, e.g. kimia. Use the struct command to inspect its featsize field.

As this dataset consists of object images (each object in the dataset is an image) the image

sizes have to be known and are stored in this field. Use the show command to visualize this

image datasets.

The immoments command generates out of a dataset with object images a set of moments as

features. Compute the Hu moments and study the scatterplot by scatterdui.

Exercise 5. Compute image features

Some PRTools command operate on images stored in datasets, see help prtools. A command like datfilt and dataim may be used to transform object images. Think of a way

to compute the area and the contour length of the blobs in the kimia dataset. Display the

scatterplot.

Exercise 6. Density plots (optional)

Generate a 2-dimensional 2-class dataset by gendatb of 50 points per class. Estimate the

densities by each of the methods from Example 2.

Make in three figures a 2D scatterplot by scatterd. Different from the above 1-dimensional

example, a ready made density plotting routine plotm can be used for drawing iso-density

lines in the scatterplot. Plot them on three figures by using command plotm(w). Try also

3-d plots by plotm(w,3). Note that plotm always needs first a scatterplot to find the domain

where the density has to be computed.

Exercise 7. Nearest Neighbor Classification (optional)

Write your own function for the nearest neighbour error estimation: e = nne(d) in which

the incoming parameter d is a labeled distance matrix obtained by d = distm(b,a), where

a and b are labeled datasets. The objects of datasets a and b should be represented in the

same feature space. The resulting d is again a dataset. The objects of d are represented

by distances between b and a. Labels of d can be retrieved by object lab = getlab(d),

features by feat lab = getfeat(d).

By the definition of the nearest neighbour rule, the label of each object in the test set has to

be compared with the label of its nearest neighbour in the training set. In this exercise a (b)

is playing a role of a training (test) set. The number of differences between two label sets can

be counted by n = nlabcmp(object lab,feat lab).

The nne routine thereby has the following steps:

1. Create a vector L with as many elements as d has objects. L(i) = j, where j is the

index of the nearest neighbour of row object i. This index of the closest object can

be found by [dd,j] = min(d(i,:));

2. Use nlabcmp to count the differences between the true labels of the objects corresponding to the rows given by object lab and the labels of the nearest neighbours

feat lab(L,:).

3. Normalise and return the error.

4. If the training set a and the test set b are identical (e.g. d = distm(a,a)), nne should

return 0 because each object is its own nearest neighbour. Modify your routine in such

a way that it returns the leave-one-out error if it is called by e = nne(d,loo).

The leave-one-out error is the error made on a set of objects if for each object under

consideration the object itself is excluded from the set at the moment it is evaluated.

In this case not the smallest d(i,j) on row i has to be found (which should be on the

diagonal), but the next one.

Inspect some 2D datasets by scatterd and estimate the nearest neighbour error by nne.

Running Exercise 1. NIST Digits

Several datasets of of handwritten digits are available. The command nist32 loads binary

images of size 32x32 as a dataset of 1024 features. In several ways features can be extracted,

e.g. immoments computes by default the coordinates of the mean.

Load a dataset A of four digits, e.g. 0,3,5 and 8. Create a subset B with 25 objects per class.

Use the show command to visualize this dataset.

8

Compute out of B a new dataset C with just two features, e.g. two moments. Make a

scatterplot of C.

Classifiers

In PRTools datasets are transformed by mappings. These are procedures that map a set

of objects form one space into another. Examples are feature selection, feature rescaling,

rotations of the space, classification. e.g.

>> w = cmapm(10,[2 4 7])

FeatureSelection, 10 to 3 fixed

mapping

--> cmapm

selecting the features 2, 4 and 7. Its name is FeatureSelection and its executing routine,

when it is applied to data is w. It may be applied as follows:

>> a = gauss(100,zeros(1,10))

Gaussian Data, 100 by 10 dataset with 1 class: [100]

>> b = map(a,w)

Gaussian Data, 100 by 3 dataset with 1 class: [100]

In a mapping (we use almost everywhere the variable w for mappings) various information

is stored, like the dimensionalities of input and output space, parameters that define the

transformation and the routine that is used for executing the transformation. Use struct(w)

to see all fields.

Often a mapping has to be trained, i.e. it has to be adapted to a training set by some

estimation or training procedures to minimise some error for the training set. An example

is the principal component analysis that performs an orthogonal rotation according to the

directions with main variance in a given dataset:

>> w = pca(a,2)

Principal Component Analysis, 10 to 2 trained

affine

mapping

-->

This just defines the mapping (trains it by a) for finding the first 2 principal components.

The fields of a mapping can be shown by struct(w). In the PRTools-manual or by

help mappings more information on mappings can be found. The mapping w may be

applied to a or to any other 10-dimensional dataset by:

>> b = map(a,w)

Gaussian Data, 100 by 2 dataset with 1 class: [100]

Instead of the routine map also the * operator may be used for applying mappings to datasets:

>> b = a*w

Gaussian Data, 100 by 2 dataset with 1 class: [100]

Note that the size of the variables a (100 10) and w (10 2) are such that the inner

dimensionalities cancel in the computation of b, like in all Matlab matrix operations.

The * operator may also be used for training. a*pca is equivalent with pca(a) and

a*pca([],2) is equivalent with pca(a,2). As a result, an untrained mapping can be stored

in a variable: w = pca([],2). They may, thereby, also be passed as an argument in a function

call. The advantages of this possibility will be shown later.

10

function or on class posterior probability estimates. They can be used in two modes: untrained and trained. When applied to a dataset, in the untrained mode the dataset is used

for training and a classifier is generated, while in the trained mode the dataset is classified.

Unlike mappings, fixed classifiers dont exist. Some important classifiers are:

fisherc

qdc

udc

ldc

nmc

parzenc

knnc

treec

svc

lmnc

Fisher classifier

Quadratic classifier assuming normal densities

Quadratic classifier assuming normal uncorrelated densities

Linear classifier assuming normal densities with equal covariance matrices

Nearest mean classifier

Parzen density based classifier

k-nearest neighbour classifier

Decision tree

Support vector classifier

Neural network classifier trained by the Levenberg-Marquardt rule

4.1 Generate a dataset a by gendath and compute the Fisher classifier by w = fisherc(a).

Make a scatter plot of a and plot the classifier by plotc(w). Classify the training set by

d = map(a,w) or d = a*w. Show the result on the screen by +d.

4.2

What is displayed is the value of the sigmoid function of the distances to the

classifier. This function maps the distances to the classifier from the ( inf, + inf) interval on the (0,1) interval. The latter can be interpreted as posterior probabilities.

The original distances can be retrieved by +invsigm(d). This may be visualised by

plot(+invsigm(d(:,1)),+d(:,1),*), which shows the shape of the sigmoid function (distances along the horizontal axis, sigmoid values along the vertical axis).

4.3 During training distance based classifiers are appropriately scaled such that the posterior

probabilities are optimal for the training set in the maximum likelihood sense. In multi-class

problems a normalisation is needed to take care that the posterior probabilities sum to one.

This is enabled by classc. So classc(map(a,w)), or a*w*classc maps the dataset a on

the trained classifier w and normalises the resulting posterior probabilities. If we include

training as well then this can be written in a one-liner as p = a*(a*fisherc)*classc. (Try

to understand this expression: between the brackets the classifier is trained. The result

is applied on the same dataset). Note that because the sigmoid-based normalisation is a

monotonous transformation, it does not alter the class membership of data samples in the

maximum-aposteriori probability (MAP) sense.

This may be visualized by computing classifier distances, sigmoids and normalized posterior

probability estimates for a multi-class problem as follows. Load the 80x dataset by a = x80.

Compute the Fisher classifier by w = a*fisherc, classify the training set by d = a*w, and

compute p = d*classc. Display the various output values by +[d p]. Note that the object

confidences over the first 3 columns dont sum to one and that they are normalised in the last

3 columns to proper posterior probability estimates.

4.4

Density based classifiers like qdc find after training (w = qdc(a), or w = a*qdc),

density estimators for all classes in the training set. Estimates for objects in some dataset b

can be found by d = b*w. Again, posterior probability estimates are found after normalisation

11

by classc: p = d*classc. Have a look at +[d p] to see the estimates for the class density

and the related posterior probabilities.

Example 5. Classifiers and discriminant plots.

This example illustrates how to plot decision boundaries in 2D scatter plots by plotc.

5.1

>>

>>

>>

>>

>>

>>

Generate a dataset, make a scatter plot, train and plot some classifiers by

a = gendath([20 20]);

scatterd(a)

w1 = ldc(a);

w2 = nmc(a);

w3 = qdc(a);

plotc({w1,w2,w3})

Plot in a new scatter plot of a a series of classifiers computed by the k-NN rule (knnc) for

various values of k between 1 on 10. Look at the influence of the neighbourhood size on the

classification boundary. Check the boundary for k=1.

5.2

>>

>>

>>

>>

>>

>>

a = gendatm

w = a*qdc

scatterd(a)

plotc(w,col)

hold on

scatterd(a)

%

%

%

%

colours the class regions

necessary to preserve the plot

plots the data again in the plot

Plots like these are influenced by the grid size used for computing the classifier outputs in the

scatter plot. By default it is 30 30 (grid size is 30). The grid size value can be retrieved

and set by gridsize. Study its influence by setting the gridsize to 100 (or even larger) and

repeating the above commands. Use each time a new figure, so results can be compared. Note

the influence on the computation time.

Exercise 8. Normal densities based classifiers.

Take the features 2 and 3 of the Iris dataset. Make a scatter plot and plot in it the normal

densities, see also example 2 and/or exercise 6. Compute the quadratic classifier based on

normal densities (qdc) and plot it on top of this. Repeat this for the uncorrelated (udc)

and the linear classifiers (ldc) based on normal distributions, but plot them on top of the

corresponding density estimation plots.

Exercise 9. Linear classifiers (optional)

Use the same dataset for comparing some linear classifiers: the linear normal distribution

based classifier (ldc) , nearest mean (nmc), Fisher (fisherc) and the support vector classifier

(svc). Plot them on top of each other, in different colours, in the same scatter plot. Dont

plot density estimates now.

Exercise 10. Non-linear classifiers (optional)

Generate a dataset by gendath and compare in the scatter plots the quadratic normal densities

based classifier (qdc) with the Parzen classifier (parzenc) and the 1-nearest neighbour rule

(knnc([],1)). Try also a decision tree (treec).

12

The performance of a classifier w can be tested by an independent test set, say b. If such

a set is available the routine testc may be used to count the number of errors. Note that

the routine classc just converts classifier outcomes to posterior probabilities, but does not

change the class assignments. So b*w*classc*testc produces the same result as b*w*testc.

6.1 Generate a training set a of 20 objects per class by gendath and a test set b of 1000

objects per class. Compute the performance of the Fisher classifier by b*(a*fisherc)*testc.

Repeat this for some other classifiers. For which classifiers do the errors on the training set

and test set differ most? Which classifier performs best?

Example 7. Classifier evaluation

In PRTools a dataset a can be split into a training set b and a test set c by the gendat

command, e.g. [b,c] = gendat(a,0.5). In this case, for each class 50% of the objects

are randomly chosen for dataset b and the remaining objects are stored in dataset c. After

computing a classifier by the training set, e.g. w = b*fisherc, the test set c can be classified

by d = c*w. For each object, the label of the class with the highest confidence, or posterior

probability, can be found by d*labeld. E.g.:

>> a = gendath;

>> [b,c] = gendat(a,0.9)

Higleyman Dataset, 90 by 2 dataset with 2 classes: [45 45]

Higleyman Dataset, 10 by 2 dataset with 2 classes: [5 5]

>> w = fisherc(b); % the class names (labels) of b are stored in w

>> getlabels(w)

% this rutine shows labels (classes labels are 1

% and 2)

>> d = c*w;

% classify test set

>> lab = d*labeld; % get the labels of the test objects

>> disp([+d lab]) % show the posterior probabilities and labels

Note that in the last displayed column (lab) the labels of the classes with the highest classifier

outputs are stored. The average error in a test set can be directly computed by testc:

>> d*testc

which may also be written as testc(d) or testc(c,w) (or c*w*testc).

Exercise 11. Error limits of K-NN rule and Parzen classifier

Take a simple dataset like the Higleyman classes (gendath) and generate a small training set

(e.g. 25 objects per class) and a large test set (e.g. 200 objects per class). Recall what the

theory predicts for the limits of the classification error of the k-NN rule and the Parzen classifier

as a function of the number of neighbours k and the smoothing parameter h. Estimate and

plot the corresponding error curves and verify the theory. How can you estimate the Bayes

error of the Highleyman dataset if it is known that the classes are normally distributed? Try

to explain the differences between the theory and your results.

Exercise 12. Simple classification experiment

Perform now the following experiment.

Load the IMOX data by a = imox. This is a feature based character recognition dataset.

13

Split the dataset in two parts, 80% for training and 20% for testing.

Store the true labels of the test set using getlabels into lab_true

Compute the Fisher classifier

Classify the test set

Store the labels found by the classifier for the test set into labtest

Display the true and estimated labels by disp([lab_true lab_test])

Predict the classification error of the test set by observing the output.

Verify this number using testc.

Example 8. Cell arrays with datasets and mappings

A set of datasets can be stored in a cell array:

>> A = {gendath gendatb}

The same holds for mappings and classifiers:

>> W = {nmc fisherc qdc}

As the multiplication of cell array can not be overloaded A*W can not be used to train classifiers

stored in call arrays. However, V = map(A,W) works. Try it. Try also the gendat and testc

commands for cell arrays.

Exercise 13. Classification of large datasets

Try to find out what the best classifier is for the six mfeat datasets (mfeat_fac, mfeat_fou,

mfeat_kar, mfeat_mor, mfeat_pix, mfeat_zer). These are different feature sets for the same

objects. Take a fixed training set of 30 objects per class and use the others for testing. Make

sure that all the six training sets refer to the same objects. This can be done by resetting the

random seed by rand(seed,1) or by using the indexes returned by gendat.

Try the following classifiers: nmc, ldc([],1e-2,1e-2), qdc([],1e-2,1e-2), fisherc,

knnc, parzenc. Write a macro script that produces a 6 6 table of errors (Using cell arrays

as discussed in example 8 this is a 5-liner). Which classifiers perform globally good? Which

dataset(s) are presumably normally distributed? Which are not?

Example 9. Datafiles

Datafiles are a PRTools extension of datasets, read help datafiles. They refer to raw data

directories in which every file (e.g. an image) is interpreted as an object. Objects in the

same sub-directory are interpreted as belonging to the same class. There are some predefined

datafiles in prdatafiles, read its help file. As an example load the Flower database, define

14

>>

>>

>>

>>

>>

prprogress on

a = flowers

b = a*im_resize(a,[64,64,3])

x = gendat(b,0.05);

show(x)

Note that just administration is stored untill real work has to be done by the show command.

After feature extraction and conversion to a dataset classifiers can be trained and tested:

>>

>>

>>

>>

>>

c = b*im_gray*im_moments([],hu)

[x,y] = gendat(c,0.05)

y = gendat(y,0.1)

w = dataset(x)*nmc

e = testc(dataset(y),w)

Also here the work starts with the dataset conversion. A number of classifiers and mappings

may operate directly (without conversion) on datasets, but this appear not be be full proof

yet. The classification result in this example is bad, as the features are bad. Look in the help

file of PRTools for other mappings and feature extractors for images. You may define your

own improcessing operations on datafiles by filtim.

Running Exercise 2. NIST Digit classification

Load a dataset of 50 NIST digits for each of the classes 3 and 5.

Compute 2 features.

Make a scatterplot.

Compute and plot some classifiers, e.g. nmc and ldc.

Classify the dataset.

Use the routine labcmp to find the erroneously classified objects.

Display these digits using the show command. Try to understand why the are incorrectly

classified given the features.

15

In PRTools three neural network classifiers are implemented based on an old version of

Matlabs Neural Network Toolbox:

bpxnc a feed-forward network (multi-layer perceptron), trained by a modified backpropagation algorithm with variable learning parameter.

lmnc a feed-forward network, trained by the Levenberg-Marquardt rule.

rbnc a radial basis network. This network has always one hidden layer which is extended

with more neurons as long as necessary.

These classifiers have built-in choices for target values, step sizes, momentum terms, etcetera.

No weight decay facilities are available. Stopping is done for no-improvement on the training

set, no improvement on a validation set error (if supplied) or at a given maximum number of

epochs.

In addition the following neural network classifiers are available:

rnnc feed-forward network (multi-layer perceptron) with a random input layer and a

trained output layer. This has a similar architecture as bpxnc and rbnc, but is much

faster.

perlc single layer perceptron with linear output and adjustable step sizes and target

values.

Example 10. The neural network as a classifier

The following lines demonstrate the use of the neural network as a classifier:

>> a = gendats; scatterd(a)

>> w = lmnc(a,3,1); h = plotc(w);

>> for i=1:50,

w = lmnc(a,3,1,w);delete(h);h=plotc(w);disp(a*w*testc); drawnow;

end

Repeat these lines if you expect a further improvement. Repeat the experiment for 5 and 10

hidden units. Try also the use of the back-propagation rule (bpxnc).

Exercise 14. A neural network classification experiment

Compare the performance of networks trained by the Levenberg-Marquardt rule (lmnc) with

different numbers of hidden units: 3, 5 and 10 for a three class digit problem (2, 3 and 5). Use

the NIST16 dataset (a = nist16) . Reduce the dimensionality of the feature space by pca to

a space that contains 90% of the original variance. Use training sets of 5, 10, 20, 50 and 100

objects per class and a large test set. Plot the errors on the training set and the test set as a

function of the training size. Which networks are overtrained? What can be changed in this

network to avoid overtraining?

Exercise 15. Overtraining (optional)

16

Study the errors on training and test set as a function of training time (number of epochs) for

a network with one hidden layer of 10 neurons. Use as classification problem gendatc with

25 training objects per class. Do this for lmnc as well as for bpxnc.

Exercise 16. Number of hidden units (optional)

Study the influence of the number of hidden units on the test error for the same problem and

the same classifiers as in the overtraining exercise 41.

Exercise 17. Network outputs and posterior probabilities (optional)

Network output values are normalised, like for all classifiers, by a*w*classc. Compare these

outcomes for test sets with the posterior probabilities found for the normal density based

classifier qdc and with the true posterior probabilities found for a qdc classifier based on a

very large training set. This comparison might be based on scatter plots. Use data based on

normal distributions. Train the network with various numbers of steps and try a small and a

large number of hidden units.

17

The following routines are available for the evaluation of classifiers:

testc

crossval

cleval

reject

roc

gendat

train and test classifiers by cross validation

classifier evaluation by computing a learning curve

computation of an error-reject curve

computation of a receiver-operator curve

split a given dataset at random into a training set and a test set.

A simple example of the generation and use of a test set is the following:

11.1 Load the mfeat_kar dataset, consisting of 64 Karhunen-Loeve coefficients measured

for 10*200 written digits (0 to 9). A training set of 50 objects per class (i.e. a fraction of

0.25 of 200) can be generated by:

>> a = mfeat_kar

MFEAT KL Features, 2000 by 64 dataset with 10 classes: [200 ... 200]

>> [trainset,testset] = gendat(a,0.25)

MFEAT KL Features, 500 by 64 dataset with 10 classes: [50 ... 50]

MFEAT KL Features, 1500 by 64 dataset with 10 classes: [150 ... 150]

50 10 objects are stored in trainset, the remaining 1500 objects are stored in testset.

Train the linear normal densities based classifier and test it:

>> w = ldc(trainset);

>> testset*w*testc

Compare the result with training and testing by all data:

>> a*ldc(a)*testc

which is probably better for two reasons. Firstly, it uses more objects for training, so a better

classifier is obtained. Secondly, it uses the same objects for testing as well a for training, by

which the test result is positively biased. Because of that, the use of separate sets for training

and testing has to be preferred.

Example 12. Classifier performance

In this exercise we will investigate the difference in behaviour of the error on the training and

the test set. Generate a large test set and study the variations in the classification error based

on repeatedly generated training sets:

>> t= gendath([500 500]);

>> a = gendath([20 20]); t*ldc(a)*testc

Repeat this last line e.g. 30 times. What causes the variations in error?

18

>> a= gendath([20 20]);

>> w = ldc(a);

>> t = gendath([500 500]); t*w*testc

Repeat the last line e.g. 30 times and try to understand the size of the variance in the results.

Example 13. Use of cell arrays for classifiers and datasets

In finding the best classifiers over a set of datasets the Matlab cell arrays can be very useful.

A cell array is a collector of arbitrary items. For instance a set of untrained classifiers can be

stored as follows:

>> classifiers = {nmc,parzenc([],1),knnc([],3)}

and a set of datasets is similarly stored as:

>> data = {iris,gendath(50),gendatd(30,30,10),gendatb(100)}

Training and test sets can be generated for all datasets simultaneously by

>> [trainset,testset] = gendat(data,0.5)

In a similar way classifiers and error estimation can be done:

>> w = map(trainset,classifiers)

>> testc(testset,w)

Note that the construction w = trainset*classifiers doesnt work for cell arrays. Cross

validation can be done by:

>> crossval(data,classifiers,5)

The parameter 5 indicates 5-fold cross-validation, i.e. a rotation over training sets of 80% and

test sets of 20% of the data. If this parameter is omitted the leave-one-out error is computed.

For the nearest neighbour rule this is also done by testk. Take a small dataset a and verify

that testk(a) and crossval(a,knnc([],1)) yield the same result. Note how much more

efficient the specialised routine testk is.

Example 14. Learning curves introduction

An easy to use routine for studying the behaviour of a classifier on a given dataset is cleval:

>> a = gendatb([30 30])

>> e = cleval(a,ldc,[2 3 5 10 20],3)

This generates at random training sets of sizes [2 3 5 10 20] per class out of the dataset a

and trains the classifier ldc. The remaining objects are used for testing (so in this example

the set a has to contain more than 20 objects per class). This is repeated 3 times and the

resulting errors are averaged and returned in the structure e. This is ready made for plotting

the so called learning curve by:

>> plotr(e)

19

Exercise 18. Learning curve experiment

Plot the learning curves of qdc, udc, fisherc and nmc for gendath using training set sizes

ranging from 3 to 100. Do the same for a 20-dimensional problem generated by gendatd.

Study and try to understand the results.

Example 15. Confusion matrices

A confusion matrix C has in element C(i,j) the confusion between the classes i and j.

Confusion matrices are especially useful in multi-class problems for analysing the similarities

between classes. For instance, let us take the IMOX dataset a = imox, split it for training and

testing by [train_set,test_set] = gendat(a,0.5). We can now compare the true labels

of the test set with the estimated ones found by a classifier:

>>

>>

>>

>>

true_lab = getlab(test_set);

w = fisherc(train_set);

est_lab = test_set*w*labeld;

confmat(true_lab,est_lab)

Compute the confusion matrix for fisherc applied to the two digit feature sets mfeat_kar

and mfeat_zer. One of these feature sets is rotation invariant. Which one?

Exercise 20. Bootstrap error estimates (optional)

Note that gendat can be used for bootstrapping datasets. Write two error estimation routines

based on bootstrap based bias corrections for the apparent error:

e1 = ea - (eba - ebc)

e2 = .348 ea + .632 ebo

in which ea is the apparent error of the classifier to be tested, eba is the bootstrap apparent

error, ebc is the apparent error (based on the whole training set) of the bootstrap based

classifier and ebo is the out-of-bootstrap error estimate of the bootstrap based classifier.

These estimates have to be based on a series of bootstraps, e.g. 25.

Exercise 21. Cross-validation (optional)

Compare the error estimates of 2-fold cross validation, 10-fold cross validation, the leave-one

out error estimate (all obtained by crossval) and the true error (based on a very large test

set) for a simple problem, e.g. gendath with 10 objects per class, classified by fisherc. In

order to obtain significant results the entire experiment should be repeated a large number of

times, e.g. 50. Verify whether this is sufficient by computing the variances in the obtained

error estimates.

Example 16. Reject curves

The classification error for a classification result d = a*w is found by e = testc(d) after

determining the largest value in each column of d. By rejection of objects a threshold is

used to determine when this largest is not sufficiently large. The routine e = reject(d)

determines the classification error and the reject rate for a set of such threshold values. The

errors and reject frequencies are stored in e. We will illustrate this by a simple example.

20

>> a = gendath([100 100]); w = fisherc(a);

Take a small test set:

>> b = gendath([20 20])

Classify it and compute its classification error:

>> d = b*w; testc(d)

Compute the reject/error trade off:

>> e = reject(d)

Errors are stored in e.error and rejects are stored in e.xvalues. Inspect them by

>> [e.error; e.xvalues]

The left column shows the error for the reject frequencies shown in the right column. It starts

with the classification error found above by testc(d) for no reject (0) and runs to an error

of 0 and a reject of 1 at the end. e.xvalues is the reject rate, starting at no reject. Plot the

reject curve by:

>> plotr(e)

16.2 Repeat this for a test set b of 500 objects per class. How many objects have to be

rejected to have an error of less than 0.06?

Exercise 22. Reject experiment

Study the behavior of the reject curves for nmc, qdc and parzenc for the sonar dataset

(a = sonar). Take training sets and test sets of equal size ([b,c] = gendat(a,0.5)). Study

help reject to see how a set of reject curves can be computed simultaneously. Plot the result

by plotr. Try to understand the reject curve for qdc.

Example 17. ROC curves

The roc command computes separately the classification errors for each of the classes for various thresholds. Results for a two-class problem can again be plotted by the plotr command,

e.g.

>>

>>

>>

>>

>>

>>

>>

[a,b] = gendat(sonar,0.5)

w1 = ldc(a);

w2 = nmc(a);

w3 = parzenc(a);

w4 = svc(a);

e = roc(b,{w1 w2 w3 w4});

plotr(e)

This plot shows how the error shifts from one class to the other class for a changing threshold.

Try to understand what these plots indicate for the selection of a classifier.

21

Create your own function myroc constructing an ROC curve on a given dataset with classifier

outputs. Hint: use the classifier outputs for each of the test examples as thresholds. The

necessary errors measures for each threshold may be obtained using confmat.

Exercise 24. Derivation of additional costs (optional)

Adapt the myroc function to return the precision or the positive fraction measure. Compare

the behaviour of ROCs and the precision-recall (or positive fraction vs TPr) curves for test

sets with different skew ratios.

Running Exercise 3. NIST Digit confusion matrix

Load a dataset of 100 NIST digits for all classes 0 - 9.

Compute the Hu moments using immoments.

Split it in a training and a test set of equal sizes.

Compute and display the confusion matrix of the test result for the nmc classifier.

Repeat this after reversing the roles of training and test sets.

Study the stability.

22

We will show the principle of the k-means algorithm graphically on a 2-dimensional dataset.

This is done in several steps.

1. Take a 2-dimensional dataset, e.q. a = gendatb;. Set k=4.

2. Initialise the procedure by randomly taking k objects from the dataset:

>> L=randperm(size(a,1)); L=L(1:k);

3. Now, use these objects as the prototypes (or centres) of k centres. Defining labels 1 to

k, the nearest mean classifier considers each object as a single cluster:

>> w=nmc(dataset(a(L,:),[1:k]));

4. Repeat the following line until the plot does not change. Try to understand what happens:

>> lab=a*w*labeld; a=dataset(a,lab); w=nmc(a); scatterd(a);plotc(w)

Repeat the algorithm with another initialisation, on another dataset and some values for k.

What happens when the nmc classifier in step 3 is replaced by ldc or qdc?

A direct way to perform the above clustering is facilitated by kmeans. Run kmeans on one

of the digit databases (for instance mfeat_kar) with k>=10 and compare the resulting labels

with the original ones (getlab(a)) using confmat.

Try to understand what a confusion matrix should show when the k-means clustering had

resulted into a random labeling. What does this confusion matrix show about the data distribution?

Example 19. Hierarchical clustering

Hierarchical clustering derives a full dendrogram (a hierarchy) of solutions. Let us investigate

the dendrogram construction on the artificial dataset r15. Because the hierarchical clustering

operates directly on dissimilarities between data examples, we will first compute the full

distance matrix (here using the squared Euclidean dissimilarity):

>> load r15

>> d=distm(a);

>> den=hclust(d,s); % using single-linkage algorithm

The dendrogram may be visualised by figure; plotdg(den);. It is also possible to use an

interactive dengui command simultaneously rendering both the dendrogram and the scatterplot of the original data:

>> dengui(a,den)

23

The user may interactively change the dendrogram threshold and thereby study the related

grouping of examples.

Exercise 25. Differences in single- and complete- linkage clusterings

Compare the single- and complete-linkage dendrograms, constructed on the r15 dataset using

the squared Euclidean distance measure. Which method is suited better for this problem and

why? Compare the absolute values of thresholds in both situations - why can we observe an

order of magnitude difference?

Exercise 26. Maximum lifetime criterion (optional)

Each clustering solution in a dendrogram survives over a set of thresholds. The dendrogram

may be cut by selecting the most stable solution i.e. the clustering with the maximum lifetime.

For a given dendrogram, find the threshold corresponding to the maximum lifetime. Use

den_getthrs function to retrieve the list of all thresholds. Show the scatter plot of the

respective clustering (the labeling specific to the particular threshold may be obtained by the

den_getlab function).

Example 20. Clustering by the EM-Algorithm

A more general version of k-means clustering is supplied by emclust which can be used for

several classification algorithms instead of nmc and which returns a classifier that may be used

to label future datasets in the same way as the obtained clustering.

The following experiment investigates the clustering stability as a function of the sample size.

Take a dataset a and compute for a given choice of the number of clusters k the clustering of

the entire dataset (e.g. using ldc as a classifier) by:

>> [lab,v] = emclust(a,ldc([],1e-6,1e-6),k);

Here v is a mapping that by d = a*v classifies the dataset according to the final clustering

(lab = d*labeld). Note that for small datasets or large values of k some clusters might

become small use classsizes(d)) for the use of ldc. Instead nmc may be used.. The dataset

a can now be given the cluster labels lab by:

>> a = dataset(a,lab)

This dataset will be used for studying the clustering stability in the following experiments.

The clustering of a subset a1 of n samples per cluster of a:

>> a1 = gendat(a,repmat(n,1,k))

can now be found from

>> [lab1,v1] = emclust(a1,ldc([],1e-6,1e-6));

As the clustering is initialized by the labels of a1, the difference e in labeling between a and

the one defined by v1 can be measured by a*v1*testc, or in a single line:

>> [lab1,v1]=emclust(gendat(a,n),ldc([],1e-6,1e-6)); e=a*v1*testc

24

Average this over 10 experiments and repeat for various values of n. Plot e as a function of

n.

Exercise 27. Semi-supervised learning

We will study the usefulness of unlabelled data in wrapper approach

Various self-learning methods are implemented through emc. Investigate how the usefulness of

unlabelled data depends on training samples size and ratio of labelled vs. unlabelled data. Are

there significant performance differences between different choices of cluster model mappings

(e.g. nmc or parzenc)? Are there clear performance differences depending on whether the

data is indeed clustered or not (e.g. gendats vs. gendatb)?

Running Exercise 4. NIST Digit clustering

Load a dataset A of 25 NIST digits for all classes 0-9.

Compute the 7 Hu moments:

Perform a cluster analysis by kmeans with k = 10 neglecting the original labels.

Compare the cluster labels with the original labels using confmat.

25

Example 21.

21.1 Dissimilarity based (relational) representations Any feature based representation a (e.g.

a = gendath(100)) can be converted into a (dis)similarity representation d using the proxm

mapping:

>> w = proxm(b,par1,par2); % define some dissimilarity measure

>> d = a*w;

% apply to the data

in which the representation set b is a small set of objects. In d all (dis)similarities between the

objects in a and b are stored (depending on the parameters par1 and par2, see help proxm).

b can be a subset of a. The dataset d can be used similarly as a feature based set. A

dissimilarity based classifier using a representation set of 5 objects per class can be trained

for a training set as:

>>

>>

>>

>>

b

w

v

u

=

=

=

=

gendat(a,5);

proxm(b);

a*w*fisherc;

w*v;

%

%

%

%

define an Euclidean distance mapping to the objects in b

map all data on the representation set and train

combine the mapping and the classifier

This dissimilarity based classifier for the dataset a can also be computed by one-line:

>> u = a*(proxm(gendat(a,5))*fisherc);

It is like an ordinary classifier in the feature space of a, It can be tested, by a*u*testc.

21.2 Embedding of dissimilarity based representations A symmetric n n dissimilarity representation d (e.g. d = a*proxm(a,c) can be embedded into a pseudo-Euclidean space as

>> [v,sig,l] = goldfarbm(d);

v is the mapping, sig = [p q] is the signature of the pseudo-Euclidean space and l are the

corresponding eigenvalues (first p positive ones, then q negative ones). To check whether d is

Euclidean, you can investigate whether all eigenvalues l are nonnegative. They can be plotted

by:

>> plot(l,*)

The embedded configuration is found as:

>> x = d*v;

The 3D approximate (Euclidean) embedding can then be plotted by

>> scatterd(x,3);

To project to m most significant dimensions, use

>> [v,sig,l] = goldfarbm(d,m);

Exercise 28. Scatter plot with dissimilarity based classifiers

Generate a training set of 50 objects per class for the banana-set (gendatb). Make a scatter

26

plot of the training set and make the representation set visible as well. Compare the dissimilarity based classifier using Euclidean distances and a representation set of 5 objects per class

with the svc for a polynomial of degree 3 (svc([],p,3)). Repeat this for a dissimilarity

based classifier using 10 objects per class.

Example 22. Different dissimilarities

Sometimes objects are not given by features but directly by dissimilarities. Examples are the

distance matrices between 400 images of hand-written digits 3 and 8. They are based on

four different dissimilarity measures: Hausdorff, modified Hausdorff, blurred, Euclidean and

Hamming. Load a dataset d by load hamming38. It can be split in sets for training and

testing by

>> [dtr,dte,i] = gendat(d,10); dtr = dtr(:,i); dte = dte(:,i);

The dataset dtr is now a 20 20 dissimilarity matrix and dte is a 380 20 matrix based on

the same representation set. A simple trick to find the 1-NN error of dte based on the given

distances is

>> (1-dte)*testc

A classifier in the representation space can be trained on dtr and tested by dte as:

>> dte*fisherc(dtr)*testc

Exercise 29. Learning curves for dissimilarity representations

Consider four dissimilarity representations for 400 images of handwritten digits of 3 and

8: hamming38, blur38, haus38 and modhaus38. Which of the dissimilarity measures are

Euclidean and which not (goldfarbm)? Try to find out the most discriminative measure for

learning in dissimilarity spaces. For each distance dataset d, split it randomly into the train

and test dissimilarity data (see Example 21), select randomly a number of prototypes and

train a linear classifier (e.g. fisherc, ldc, loglc). Find the test error. Repeat it e.g. 20

times and average the classification error. Which dissimilarity data allows for reaching the

best classifier performance? Do the results depend much on a number of prototypes chosen?

Exercise 30. Dissimilarity application on spectra

Two datasets with spectral measurements from a plastic sorting application are provided:

spectra_big and spectra_small. spectra_big contains 16 classes and spectra_small two

classes. The spectra are sampled to 120 wavelengths (features). You may visualize spectral

measurements, stored in a dataset by using the plots command.

Three different dissimilarity measures are provided, specific to the spectra data:

dasam: Spectral Angle Mapper measures the angle between two spectra interpreted as points

in a vector space (robust to scaling)

dkolmogorov: Kolmogorov dissimilarity measures the maximum difference between the cumulative distributions (the spectra should be appropriately scaled to be interpreted as such)

dshape: Shape dissimilarity measures the sum of absolute differences (city block distance)

between the smoothed derivatives of the spectra (uses the Savitsky-Golay algorithm)

Compute a dissimilarity matrix d for the measures described. The nearest-neighbour error may

27

be estimated by using the leave-one-out procedure by the nne routine. In order to evaluate

other types of classifiers, a cross-validation procedure must be carried on. Note, that cleval

cannot be used for dissimilarity matrices! Use the crossvald routine instead.

Using the cross-validation approach (crossvald), estimate the performance of the nearest

neighbour classifier with one randomly selected prototype per class. To do that use the

minimum distance classifier mindistc. nne will not work here. Repeat the same for a larger

number of prototypes. Test also the full nearest neighbour (with as many prototypes as

possible) and a Fisher linear discriminant (fisherc), trained in a dissimilarity space. Find

out if fisherc outperforms the nearest neighbour rule and if so, how many prototypes suffice

to reach this point?

Running Exercise 5. NIST Digit dissimilarities

Load a dataset A of 200 NIST digits for the classes 1 and 8.

Select by gendat at random a dataset B of one sample per class.

Use hausdm to compute the standard and modified Hausdorff distances between A and B.

Study the scatterplots.

28

There are several ways to perform feature extraction. Some common approaches are:

PCA on the complete dataset. This is unsupervised, so it does not use class information.

It only tries to describe the variance in data. In PRTools, this mapping can be trained

by using pca on a (labeled or unlabeled) dataset: e.g. w = pca(a,2) finds a mapping

to 2 dimensions. scatterd(a*w) plots these data.

PCA on the classes. This is supervised as it makes use of class labels. The PCA is computed on the average of the class covariance matrices. In PRTools, this mapping can be

trained by using klm (Karhunen Loeve mapping) on a labeled dataset a: w = klm(a,2)

Fisher mapping. This tries to maximise the between scatter over the within scatter of

the different classes. It is, therefore, supervised: w = fisherm(a,2)

23.1 Apply the three methods on mfeat_pix and investigate if, and how, the mapped results

differ.

23.2 Perform plot(pca(a,0)) to see a plot of the relative cumulative ordered eigenvalues

(normalised sum of variances). In what range lies the intrinsic dimensionality?

23.3 After mapping the data, use some simple classifiers to investigate how the choice of the

mappings influences the classification performance in the 2-dimensional feature spaces.

Exercise 31. Eigenfaces and Fisherfaces

The linear mappings used in the example above may also be applied to image datasets in

which each pixel is a feature, e.g. the Face-database containing images of 92 112 pixels. An

image is now a point in a 10304 dimensional feature space.

31.1 Load a subset of 10 classes by a = faces([1:10],[1:10]). The images can be displayed by show(a).

31.2 Plot the explained variance for the PCA as a function of the number of components.

When and why does this curve reach the value 1?

31.3 For each of the three mappings, make a 2D scatter plot of all data mapped on the first

two vectors. Try to understand what you see.

31.4 The PCA eigenvector mapping w points to positions in the original feature space called

eigenfaces. These can be displayed by show(w). Display the first 20 eigenfaces computed by

pca as well as by klm and the Fisherfaces of the dataset.

Exercise 32. Supervised linear feature extraction

In this exercise, you will experiment with pre-programmed versions of canonical correlation

analysis, partial least squares and linear discriminant analysis.

32.5 Load the iris dataset. This dataset has labels, but we will convert these to real-valued

29

>> load iris

>> b = setlabtype(a,targets);

Dataset b now contains real-valued target vectors.

Make a scatterplot of a and the targets in b; you can extract the targets using gettargets(b).

What do you notice about the targets in the scatterplot?

32.6 Calculate a canonical

correlation analysis (CCA) between the data and targets in b: [wd,wt] = cca(b,2);. Make

a scatterplot of the data projected using wd and the targets using wt. Can you link what you

see to what you know about CCA? 32.7 Calculate a 2D linear discriminant analysis (LDA)

on a using fisherm. Plot the mapped data and compare to the data mapped by CCA. What

do you notice? 32.8 Calculate a partial least squares (PLS) mapping, using pls. Plot the

mapped data and the mapped target values, like you did for CCA. Do you see any differences

between this mapping and the one by CCA? What do you think causes this?

Exercise 33. Embeddings

Load the swiss-roll data set, swissroll. It contains 1000 samples on a 3D Swiss-roll-like

manifold. Visualise it using scatterd(a,3) and rotate the view to inspect the structure. The

labels are there just so that you can inspect the manifold structure later; they are not used.

33.9 Apply locally linear embedding (LLE) using the lle function. This function is not a

PRTools command: it outputs the mapped objects, not a mapping. Plot the resulting 2D

embedded data. What do you notice?

The default value for the number of neighbours to use, k, is 10. What value gives better

results?

You can also play with the regularisation parameter (the fourth one). Try some small values,

e.g. 0.001 or 0.01.

33.10 (*) Some routines are given to:

perform a kernel PCA (kernelm) and plot it (plotm);

train a self-organising map (som) and display it (plotsom);

perform multi-dimensional scaling (mds).

perform Isomap (isomap);

Read their help and try to apply them to the swissroll data or your favourite dataset. If

the functions take too much time, you can try to first select a subset of the data.

Exercise 34. Feature Evaluation

The routine feateval can be used to evaluate feature sets according to a criterion. For a

given dataset, it returns either a distance between the classes in the dataset or a classification

accuracy. In both cases it means that large values means good separation.

34.11 Load the dataset biomed. How many features does this dataset have? How many

possible subsets of two features can be made from this dataset? Make a script which loops

30

through all possible subsets of two features and that creates for each combination a new dataset

b. Use feateval to evaluate b using the Euclidean distance, the Mahalanobis distance and

the leave-one-out error for the one-nearest neighbour rule.

34.12 Find, for each of the three criteria, the two features that are selected by individual

ranking (use featseli), by forward selection (use featself) and by the above procedure

that finds the best combination of two features. Compute for each set of two features the

leave-one-out error for the one-nearest neighbour rule by testk.

Exercise 35. Feature Selection

Load the glass dataset. Rank the features by the sum of the Mahalanobis distances, using individual selection (featseli), forward selection (featself) and backward selection

(featselb). The selected features can be retrieved from the mapping w by:

>> w = featseli(a,maha-s);

>> getdata(w)

Compute for each feature ranking an error curve for the Fisher classifier by clevalf.

>> rand(seed,1); e = clevalf(a*w,fisherc,[],[],5)

The random seed is reset to make the results for different feature sequences w comparable.

The command a*w reorders the features in dataset a according to w. In clevalf, the classifier

is trained by a bootstrapped version of the given dataset. The remaining objects are used for

testing. This is repeated 5 times. All results are stored in a structure e that can be visualised

by plotr(e).

Plot the result for the three feature sequences obtained by the three selection methods in a

single figure by plotr. Compare this error plot with a plot of the maha-s criterion value

as a function of the feature size (use feateval).

Exercise 36. Feature scaling

Besides classifiers that are hampered by the amount of features, some classifiers are sensitive

to the scaling of the individual features. This can be studied by an experiment in which the

data is good and one in which the data is badly scaled.

In relation with sensitivity to badly scaled data, three types of classifiers can be distinguished:

1. classifiers that are scaling independent

2. classifiers that are scaling dependent, but that can compensate badly scaled data by

large training sets.

3. classifiers that are scaling dependent, that cannot compensate badly scaled data by large

training sets.

First, generate a training set of 400 points for two normally distributed classes with common

covariance matrix, as follows:

>> a = gauss(400,[0 0; 2 2],eye(2))

31

Prepare another dataset b by scaling down the second dimension of dataset a as follows:

>> x = +a; x(:,2) = x(:,2).*0.01; b = setdata(a,x);

Study the scatter plot of a and b (e.g. scatterd(a)) and note the difference when the scatter

plot of b is scaled properly (axis equal).

Which of the following classifiers belong to which type (1,2 or 3)?:

nearest mean (nmc),

1-nearest neighbour (knnc([],1)),

LESS (lessc([],1e6)), and

the Bayes classifier assuming normal distributions (qdc)?

(Note that for LESS, we set the C parameter high to stress satisfaction of the constraints for

correct training object classification). It may help if you plot the decision boundaries in the

scatter plots of a and b and play with the training set size.

Verify your answer by the following experiment:

Generate an independent test set c and compute the learning curves (i.e. an error curve

as function of the size of the training set) for each of the classifiers. Use training sizes of

5,10,20,50,100 and 200 objects per class. Plot the error curves.

Use scalem for scaling the features on their variance. For a fair result, this should be computed

on the training set b and applied to b as well as to the test set c:

>> w = scalem(b,variance); b = b*w; c = c*w;

Compute and plot the learning curves for the scaled data as well. Which classifier(s) are

independent of scaling? Which classifier(s) can compensate bad scaling by a large training

set?

Exercise 37. High dimensional data

In this exercise, you will experiment with datasets for which the number of features is substantially higher than the number of training objects. For this type of dataset, most traditional

classifiers are not suitable.

37.13 First, load the colon dataset and estimate the performance of the nearest mean

classifier, by cross-validation. Set the number of repetitions for the cross-validation function

higher (e.g. to 3) to get a more stable performance estimate.

The LESS classifier is a nearest mean classifier with feature scaling. It has an additional

parameter to balance data fit and model complexity.

37.14 Estimate the best C parameter setting for the LESS classifier using cross-validation on

the entire training set. The number of effectively used features can be inspected as follows:

>> w = lessc(a,C);

>> d = getdata(w);

>> d.nr

32

37.15 Now, estimate the generalisation performance of the LESS classifier with optimised

C parameter. Note that for an unbiased performance estimate, the C parameter should be

optimized in each sample of the crossvalidation separately. Use the functions nfoldsubsets,

nfoldselect, and testc to do the performance estimation through cross-validation. See how

cross-validation can be implemented with these functions in nfoldexample.m.

37.16 In this exercise, you will work again with the colon dataset. First reduce the number

of features to 50 as follows:

>>

>>

>>

>>

>>

labs = getnlab(a);

m1 = mean(+a(labs==1,:),1);

m2 = mean(+a(labs==2,:),1);

[dummy,ind] = sort(-abs(m1-m2));

a = a(:,ind(1:50));

Choose a suitable feature selection method and estimate the generalisation performance of the

nearest mean classifier with an optimized number of features. For an unbiased performance

estimate, the feature subset should be optimized in each cross-validation sample, as in the

previous exercise.

Is there a large difference when the performance is estimated on the features that are optimised

on the whole dataset?

37.17 Some routines are given to:

train a LASSO classifier (lassoc);

train a LIKNON classifier (liknonc).

Read their help and try to apply them to a high-dimensional dataset.

33

Generate a 10-dimensional dataset generated by gendatd. Use cleval with repetition factor

10 to study the learning curves of fisherc and qdc for sample sizes between 2 and 20 objects

per class as plotted by plotr. Note that on the horizontal axis the sample size per class is

listed. Explain the maxima.

Study also the learning curve of fisherc for the dataset mfeat_kar. Where is the maximum

in this curve and why?

Exercise 39. Regularization

Use again a 10-dimensional dataset generated by gendatd. Define three classifiers: w1 = qdc,

w2 = qdc([],1e-3,1e-3) and w3 = qdc([],1e-1, 1e-1). Name them differently using

setname. Combine them in a cell array and compute and plot the learning curves between 2

and 20 objects. Study the effect of regularization. What is gained and what is lost?

Example 24. Support Vectors - an illustration

The routine svc can be used for building linear and nonlinear support vector classifiers.

Generate a 2-dimensional dataset of 10 objects per class

>> a = gendatd([10 10])

Compute a linear support vector by

>> [w,J] = svc(a)

In J the indices to the support objects are stored. Plot data, classifier and support objects

by:

>> scatterd(a)

>> plotc(w)

>> hold on; V = axis; scatterd(a(J,:),o); axis(V);

Repeat all this for 50 objects per class generated for the Banana set by gendatb, using a 3rd

order polynomial classifier. A 3rd order polynomial support vector classifier can be obtained

by setting the kernel to a polynomial kernel, with degree 3: [w,J] = svc(a,p,3).

Replace the polynomial kernel by other kernels (use help svc and help proxm to see what

possibilities you have).

Exercise 40. Support Vectors

Add the support vector classifier to exercise 39 and repeat it. Tricky question: How does the

complexity of the support vector classifier depend on the trade- off parameter C (which weighs

the errors against the kwk2 )?

Exercise 41. Classification Error

Generate a training set of 50 objects per class and a test set of 100 object per class, using

gendatb. Train several support vector classifiers with an RBF kernel using different width

values sigma. Compute for each of the classifiers the error (on the test set) and the number

34

of support vectors. Make a plot of the error and the number of support vectors as a function

of sigma. How well can the optimal sigma be predicted by the number of support vectors?

Exercise 42. Support Objects

Load a two class digit recognition problem by a = seldat(nist16,[1 2],[],[1:50]). Inspect it by the show command. Project it on a 2D feature space by PCA and study the

scatter plot. Find a support vector classifier using a quadratic polynomial kernel. Visualise

the classifier and the support objects in the scatter plot. Look also at the support objects

themselves by the show command. What happens with the number of support objects for

higher numbers of principal components?

Running Exercise 6. NIST Digit classifier complexity

Load a dataset A of 200 NIST digits for the classes 3 and 5.

Compute the Zernike moments:

Split the data in a training set of 25 objects per class and a test set.

Order the features on their individual performance.

Compute feature curves for the classifiers nmc, ldc and qdc.

35

One-Class Classifiers

The following classifiers are a subset of the available classifiers that can be used to solve

one-class classification problems:

gauss_dd

mog_dd

parzen_dd

nndd

kmeans_dd

pca_dd

incsvdd

sdroc

sddrawroc

Mixture-of-Gaussians data description

Parzen data description

Nearest neighbour data description

k-means data description

Principal Component Analysis data description

(Incremental) Support vector data description

ROC estimation using the PRSD toolbox

Interactive ROC plot and selection of an operating point

Use help to get an idea what these routines do. Notice that all the classifiers have the same

structure: the first parameter is the dataset and the second parameter is the error on the

target class. The next parameters set the complexity of the classifier (if it can be influenced

by the user; for instance the k in the k-means data description) or influences the optimization

of the method (for instance, the maximum number of iterations in the Mixture of Gaussians).

Before these routines can be used on a data set, the class labels in the datasets should

be changed to target and (possibly) outlier. This can be done using the routines

target_class and oc_set. Outliers can, of course, only be specified if they are available.

Exercise 43. Fraction target reject

Take a two-class dataset (e.g. gendatb, gendath) and convert it to a one-class dataset using

target_class. Use the one-class classifiers given above to find a description of the data.

Make a scatterplot of the data and plot the classifiers. Firstly, experiment with different

values for the fraction of target data to be rejected. What is the influence of this parameter

on the shape of the decision boundary?

Secondly, vary the other parameters of the incsvdd, kmeans_dd, parzen_dd and mog_dd.

These parameters characterise the complexity of the classifiers. How does that influence the

decision boundary?

Exercise 44. ROC curve

Generate a new one-class dataset a using oc set (so that the dataset contains both target

and outlier objects), and split it in a train and test set. Train a classifier w on the training

set, and plot the decision boundary in the scatterplot.

Make a new figure, and plot the ROC curve there using:

>> h = plotroc(w,a);

There should be fat dot somewhere on the ROC curve. This is the current operating point.

By moving the mouse and clicking on another spot, the operating point of the classifier can

be changed. The updated classifier can be retrieved by w2=getrocw(h).

36

Change the operating point of the classifier, and plot the resulting classifier again in the

scatterplot. Do you expect to see this new position of the decision boundary?

Exercise 45. Handwritten digits dataset

Load the NIST16 dataset (a = nist16). Choose one of the digits as the target class and all

others as the outlier class using oc_set. Build a training set containing a fraction of the target

class and a test set containing both the remainder of the target class and the entire outlier

class. Compute the error of the first and second kind (dd_error) for some of the one-class

classifiers. Why do some classifiers crash, and why do other classifiers work?

Plot receiver-operator characteristic curves (dd_roc) for those classifiers in one plot. Which

of the classifiers performs best?

Compute for the classifiers the Area under the ROC curve (dd_auc). Does this error confirm

your own preference?

Example 26. Outlier robustness

In this example and the next exercise we will investigate the influence of the presence of an

outlier class on the decision boundary. In this example data is classified using support vector

data description (incsvdd).

Run the routine: sin_out(4,3) This routine creates target data from a sinusoid distribution,

places an outlier at (x,y) (here (x,y) = (4,3)) and calculates a data description.

Investigate the influence of the outlier on the shape of the decision boundary by changing its

position.

Exercise 46. Outlier robustness

Investigate the influence of an outlier class on a decision boundary for other one-class classifiers.

Convert a two-class dataset (e.g. gendath) to a one-class dataset by changing all labels to

target (e.g. using target_class(+a) or oc_set(+a)). Find a decision boundary for just

the target class.

Manually add outliers to your dataset. Compare the decision boundaries.

Exercise 47. Outliers in handwritten digits dataset

Load the Concordia dataset using the routine concor_data. Convert the entire data set to

a target class (this time the target class consists of all digits) and split it into a train and test

set.

Train a one-class classifier w on the train set. Check the performance of the classifier on the

test set z and visualise those digits classified as outliers:

>>

>>

>>

>>

zt = target_class(z);

labzt = zt*w*labeld;

[It,Io] = find_target(labzt);

show(zt(Io,:))

%

%

%

%

classify the target objects

find which are labeled outlier

show the outlier objects

Repeat this, but before training the classifier, apply a PCA mapping, retaining 95% of the

37

Exercise 48. AUC for imbalanced classes

Load the heart dataset and convert it to a one-class dataset. Now extract a training set using

70% of the data, and put the rest in a test set. Train a standard quadratic classifier (qdc)

and the AUC optimizer auclpm. (You can use the default settings: w=auclpm(trainset).)

Compute the ROC curve of both classifiers and plot them. Is there a large difference in

performance?

Now reduce the training set size for one of the classes to 10% of the original size, by

trainsetnew = gendat(trainset,[0.999 0.1]). Train both classifiers again and plot their

ROC curves. What has changed? Can you explain that.

Do the same experiments, but now replace the quadratic classifier by a linear classifier. What

are the differences with the qdc? Explain!

Exercise 49. Kernelizing mappings or classifiers

Generate the Highleyman dataset and train a (linear) AUClpm. Plot the data and the mapping

in a scatterplot (use plotm) and see that the linear classifier does not really fit well.

Now kernelize the mapping by preprocessing the dataset through proxm:

>> w_u = proxm([],r,2)*auclpm;

>> w = a * w_u;

Also plot the new kernelized mapping.

The kernelized auclpm selects prototypes instead of features. Extract the indices of the prototypes by selecting the indices of the non-zero weights in the mapping by:

>> I = find( abs(w.data2.data.u)>1e-6 );

These support objects for the auclpm can now be plotted by

>> hold on; scatterd(a(I,:),o)

Try to kernelize other classifiers: which ones work well, and which one dont? Explain why.

38

10

Classifier combining

If w is a classifier then the output of a*w*classc can be interpreted as estimates for the

posterior probabilities of the objects in a. Different classifiers produce different posterior

probabilities. This illustrated by the following example. Generate a dataset of 50 points per

class by gendatb. Train two linear classifiers w1, e.g. by nmc, and w2, e.g. by fisherc. The

posterior probabilities can be found by p1 = a*w1*classc and p2 = a*w2*classc. They

can be combined in one dataset p = [p1 p2] which has four features (why?). Make a scatter

plot of the features 1 and 3. Study this plot. The original classifiers correspond to horizontal

and vertical lines at 0.5. There may be other straight lines, combining the two classifiers, that

perform better.

Example 28. Classifier combining strategies

PRTools offers three ways of combining classifiers, called sequential, parallel and stacked.

In sequential combining classifiers operate directly on the outputs of other classifiers, e.g.

w = w1*w2. So the features of w2 are the outputs of w1.

In stacked combining typically classifiers computed for the same feature space are combined. They are constructed by w = [w1, w2, w3]. If applied by a*w the result is

p = [a*w1 a*w2 a*w3].

In parallel combining typically classifiers computed for different feature spaces are combined.

They are constructed by w = [w1; w2; w3]. If applied by a*w then a should be the combined dataset a = [a1 a2 a3], in which a1, a2 and a3 are datasets defined for the feature

spaces in which w1, w2, respectively w3 are found. As a result, p = a*w is equivalent with

p = [a1*w1 a2*w2 a3*w3].

Parallel and stacked combining are usually followed by combining. The above constructed

datasets of posterior probabilities p contain multiple columns (features) for each of the classes.

Combining reduces this to a single set of posterior probabilities, one for each class, by combining all columns referring to the same class. PRTools offers the following fixed rules:

maxc

minc

medianc

meanc

prodc

votec

maximum selection

minimum selection

median selection

mean combiner

product combiner

voting combiner

If the so-called base classifiers (w1, w2, . . .) do not produce posterior probabilities, but for

instance distances, then these combining rules operate similar. Some examples:

28.1

Generate a small dataset, e.g. a = gendatb; and train three classifiers, e.q.

w1 = nmc(a)*classc, w2 = fisherc(a)*classc, w3 = qdc(a)*classc. Create a combining classifier v = [w1, w2, w3]*meanc. Generate a testset b and compare the performances

of w1, w2, w3 individually with that of v. Inspect the architecture of the combined classifier

by parsc(v).

39

28.2

Load three of the mfeat datasets and generate training and test sets: e.g.

>> a = mfeat_zer; [b2,c2] = gendat(a,0.25)

>> a = mfeat_mor; [b3,c3] = gendat(a,0.25)

Note the differences in feature sizes of these sets. Train three nearest mean classifiers

>> w1 = nmc(b1)*classc; w2 = nmc(b2)*classc; w3 = nmc(b3)*classc;

and compute the combined classifier

>> v = [w1; w2; w3]*meanc

Compare the performance of the combining classifier with the three individual classifiers:

>> [c1 c2 c3]*v*testc

>> b1*w1*testc, b2*w2*testc, b3*w3*testc

28.3 Instead of using fixed combining rules like maxc, it is also possible to use a trained

combiner. In this case the outputs of the base classifier are used to train a combining classifier

like nmc or fisherc. This demands the following operations:

>>

>>

>>

>>

a = gendatb(50)

w1 = nmc(a)*classc, w2 = fisherc(a)*classc, w3 = qdc(a)*classc

a out = [a*w1 a*w2 a*w3]

v1 = [w1 w2 w3]*fisherc(a_out)

>> v = [nmc*classc fisherc*classc qdc*classc]*fisherc

Such a classifier can simply be trained by v2 = a*v

Exercise 50. Stacked combining

Load the mfeat_zer dataset and split it into a training and a test set of equal size. Use the

following classifiers: nmc, ldc, qdc, knnc([],3), treec. Determine the performance of each

of them. Try to find a combining classifier that performance better than the best one.

Exercise 51. Parallel combining (optional)

Load all mfeat datasets. Split the data into a training and a test sets of equal size. Make

sure that these sets relate to the same objects, e.g. by resetting the random seed each time by

rand(seed,1) before calling gendat. Compute for each dataset the nearest mean classifier

and estimate their performances.Try to find a combining classifier that performance better

than the best one.

Exercise 52. Bootstrapping and averaging (optional)

The routine baggingc computes a set of classifiers on a single training set by bootstrapping

and averaging all coefficients. Compare the performance of a simple classifier like nmc with

its bagged version for a 2-dimensional dataset of 20 objects generated by gendatd. Use a a

test set of 200 objects. Study the performance for bagging sets of sizes between 10 and 200.

Exercise 53. Bootstrapping and aggregating (optional)

40

The routine baggingc can also be used to combine a set of classifiers based on bootstrapping.

using the posterior probability estimates. Combining rules like voting, min, max, mean, and

product can be used. Compare the performance of a simple classifier like nmc with its bagged

version for a datasets generated by gendatd. Study the scatter and classifier plots.

Running Exercise 7. NIST Digit classifier combining

Load a dataset A of 500 NIST digits for the classes 3 and 5.

Compute the Hu moments:

Split the data in a training set of 100 objects per class and a test set.

Generate at random 10 subdatasets of 25 objects per class from the training set and compute

the nmc for each of them.

Combine the 10 classifiers by various combing rules.

Compare the final classifiers with a nmc computed for the total training set by their performances on the test set.

41

11

Boosting

A decision stump is a simplified decision tree, trained to a small depth, usually just for a

single split. The command stumpc constructs a decision tree classifier until a specified depth.

Generate objects according to the banana dataset (gendatb), make a scatterplot and plot in

it the decision stump classifiers for the depth levels 1, 2 and 3. Estimate the classification

errors using an independent test set and compare the plots and the resulting error with a full

size decision tree (treec).

Example 30. Weak classifiers

A family of weak classifiers is available by the command W = weakc(A,ALF,ITER,R) in

which ALF (0 < ALF < 1) determines the size of a randomly selected subset of the training

set A to train a classifier determined by (R: R = 0: nmc R = 1: fisherc R = 2: udc

R = 3: qdc In total ITER classifiers are trained and the best one according to the total set A

is selected and returned in W. Define a set of linear classifiers (R = 0,1) for increasing ITER,

and include the strong version of the classifier:

v1 = weakc([],0.5,1,0); v1 = setname(v1,weak0-1);

v2 = weakc([],0.5,3,0); v2 = setname(v2,weak0-3);

v3 = weakc([],0.5,20,0); v3 = setname(v3,weak0-20);

w={nmc,v1,v2,v3};

Generate some datasets, e.g. by a=gendath and a=gendatb. Train and plot these classifiers

by W = a*w and plotc(W) in the scatterplot (scatterd(a)).

Exercise 54. Weak classifiers learning curves

Compute and plot learning curves for the Highleyman data averaged over 5 iterations of

crossvalidation for the above defined set of classifiers. Compute and plot learning curves for

the circular classes (gendatc) averaged over 5 iterations of crossvalidation for a set of quadratic

weak classifiers.

Example 31. Adaboost

The Adaboost classifier [W,V] = adaboostc(A,BASE-CLASSF,N,COMB-RULE) uses the untrained (weak) classifier BASE-CLASSF for generating N base classifiers by the training set

A, iteratively updating the weights for the objects in A. These weights are used as object

prior probabilities for generating subsets of A for training. The entire set of base classifiers is

returned in V. They are combined by BASE-CLASSF into a single classifier W. Default is the

standard weighted voting cpombiner.

Study the Adaboost classifier for two datasets: gendatb and gendatc. use as base classifier

stumpc (decision stump), weakc([],[],1,1) and weakc([],[],1,2).

Plot the final classifier in the scatterplot by plotc(W,r,3).

Plot also the unweighted voting combiner by plotc(V*votec,g,3) and the trained Fisher combiner by

plotc(A*(V*fisherc),b,3). It might be needed to improve the quality of the plotted

classifiers by giving gridsize(300), before plotc is executed.

42

Compute the Adaboost error curve for the sonar dataset for some numbers of boosting steps,

e.g. 5 and 100. (Advanced users may try to write a script that plots an entire error curve).

Use stumpc as a base-classifier and weighted voting for combining. Try to improve the result

by using other base classifiers and other combiners.

43

12

In the file userpaint an example is shown how a user can label interesting parts in an image,

train a classifier and visualize the resulting classification or detections in a new image.

In this example we will use a tiny database, which is a subset of a much larger image database collected by the university of Surrey. Three versions have been created. The first

surrey_col_64 contains color 64 64 images, the second surrey_grey_64 contains greylevel 64 64 images and the third surrey_col_128 contains grey-level 128 128 images.

Load one of the sets by load surrey_col_64 and show the images by show a.

Look into the file userpaint and try to understand what steps are performed. Notice that the

pixels are just represented by their color or grey-level values (as defined in file userpreproc).

Run and play with it (you can change the training and test image, the dataset, the classifier

and the region that you paint).

Exercise 56. Supervised and unsupervised pixel classification

Edit the file userpaint and change the given one-class classifier into a two-class classifier.

What differences do you observe between a supervised and unsupervised classifier?

Exercise 57. Improved the pixel features (optional)

Invent more interesting features to represent individual pixels. Implement it in (your own

copy of) userpreproc. Does it improve the pixel classification?

Exercise 58. Color image segmentation by clustering

A full-colour image may be segmented by clustering the colour feature space. For example,

read the famous Lena image in a 256 256 version

>> a=lena;

>> show(a)

The image may be reconstructed as a full colour images by:

>> figure; imagesc(reshape(+I,256,256,3));

The 3 colours may be used to segment the images on its pixel values only. We use a small

subset for finding 4 clusters in the 3d colour space:

>> testset=gendat(a,500)

% create small test set

>> [d,w]=emclust(testset,nmc([]),4) % cluster the data

The retrieved classifier w may be used to classify all image pixels in the colour space:

>> lab = classim(a,w);

>> figure

>> imagesc(lab)

44

>> aa=dataset(a,lab(:))

>> map=+meancov(aa)

>> colormap(map)

% compute class means

% set colour map accordingly

Note that the mean colours are very equal. Try to improve the result by using more clusters.

Exercise 59. Texture segmentation

A dataset a in the MAT file texturet contains a 256x256 image with 7 features

(bands): 6 were computed by some texture detector; the last one represents the original gray-level values. The data can be visualised by show(a,7). Segment the image by

[lab,w] = emclust(a,nmc,5). The resulting label vector lab may be reshaped into a label

image and visualised by imagesc(reshape(lab,a.objsize)). Alternatively, we may use the

trained mapping w, re-apply it to the original dataset a and obtain the labels by classim:

imagesc(classim(a*w)).

Investigate the use of alternative models (classifiers) in emclust such as the mixture of Gaussians (using qdc) or non-parametric approach by the nearest neighbour rule knnc([],1). How

do the segmentation results differ and why? The segmentation speed may be significantly increased if the clustering is performed only on a small subset of pixels.

Exercise 60. Improving spatial connectivity

The routine spatm concatenates for image feature datasets the feature space with the spatial

domain by performing a Parzen classifier in the spatial domain. The two results, feature space

classifier and spatial Parzen classifier may now be combined. Let us demonstrate the use of

spatm on a segmentation of a multi-band image emim31:

>> a = emim31;

>> trainset = gendat(a,500); % get a small subset

>> [lab,w] = emclust(trainset,nmc,3);

By applying the trained mapping w to the complete dataset a, we obtain a dataset with cluster

memberships:

>> b=a*w

16384 by 3 dataset with 1 class: [16384]

Let us now for each pixel decide on a cluster label and visualise the label image:

>> imagesc(classim(b));

This clustering was entirely based on per-pixel features and, therefore, neglects spatial connectivity. By using the spatm mapping, three additional features will be added to the dataset

b, each corresponding to one of three clusters:

>> c=spatm(b,2) % spatial mapping using smoothing sigma=2.0

16384 by 6 dataset with 1 class: [16384]

Let us visualise the resulting dataset c by show(c,3). The upper row renders three cluster

membership confidences estimated by the classifier w. The features in the lower row were

added by spatm mapping. Notice, that each of them is a spatially smoothed binary image

corresponding to one of the clusters. By applying a product combiner prodc, we obtain

45

an output dataset with three cluster memberships based on spectral-spatial relations. This

dataset defines a new set of labels:

>> out=c*prodc

16384 by 3 dataset with 1 class: [16384]

>> figure; imagesc(classim(out))

Investigate the use of other classifiers than nmc and the influence of different smoothing on

the segmentation result.

Exercise 61. Iterative spatial-spectral classifier (optional)

Previous exercise describes a single correction of spectral clustering by means of the spatial

mapping spatm. The process of combining the spatial-spectral may be iterated. The labels

obtained by combining the spatial and spectral domains may be used to train separate spectral and spatial classifiers again. Let us now implement a simple iterative segmentation and

visualise image labelings derived in each step:

>> trainset = gendat(a,500);

>> [lab,w]=emclust(trainset,nmc,3); % initial set of labels

>> for i=1:10, out=spatm(a*w,2)*prodc; imagesc(classim(out)); pause; ...

a=setlabels(a,out*labeld); w=nmc(a); end

Plot the number of label differences between iterations. How many iterations is needed to

stabilise the algorithm using different spectral models and spatial smoothing parameters?

46

13

available data sets

k

m

c

u

v

s

G

a

lab

number of samples (ma, mb for classes A and B), e.g. m = 20

number of classes, e.g. c = 2

class mean: (1,k) vector (ua, ub for classes A and B), e.g. u = [0,0]

variance value, e.g. v = 0.5

class feature deviations: (1,k) vector, e.g. s = [1,4]

covariance matrix, size (k,k), e.g. G = [1 1; 1 4]

dataset, size (m,k)

label vector, size (m,1)

a = rand(m,k).*(ones(m,1)*s) + ones(m,1)*u

a = randn(m,k)*(ones(m,1)*s) + ones(m,1)*u

uniform distribution

normal distribution with diagonal covariance matrix (s.*s)

lab = genlab(n,lablist)

for all values of i.

a = dataset(a,lab)

a and a set of labels lab, one for each datavector.

Feature labels can be stored in featlab.

a = gauss(m,u,G)

a = gencirc(m,s)

a = gendatc([ma,mb],k,ua)

a = gendatd([ma,mb],k,d1,d2)

a = gendath(ma,mb)

a = gendatm(m)

distributed classes (the means are newly generated

at random for each call)

a = gendats([ma,mb],k,d)

d.

a = gendatl([ma,mb],v)

a = gendatk(a,m,n,v)

dataset b using the n-nearest neighbor method.

The standard deviation is v* the nearest neighbour distance

random generation from a Parzen density distribution based on the dataset b and smoothing parameter v. In case G is given it is used as covariance

matrix of the kernel

a = gendatp(a,m,v,G)

set a will have m objects per class, the remaining

ones are stored in b.

[a,b] = gendat(a,m)

47

In the table below, a list of datasets is given that can be stored in the variable a provided

prdatasets is added to the path, e.g.:

a = iris;

>> a

Iris plants, 150 by 4 dataset with 3 classes: [50

50

50]

gauss

gendatb

gendatc

gendatd

gendath

gendatl

gendatm

gendats

gencirc

lines5d

boomerang

Generation

Generation

Generation

Generation

Generation

Generation

Generation

Generation

Generation

Generation

Generation

of banana shaped classes in 2D

of circular classes

of two difficult classes

of Higleyman classes in 2D

of Lithuanian classes in 2D

of 8 classes in 2D

of two Gaussian distributed classes

of circle with radial noise in 2D

of three lines in 5D

two boomerang-shaped classes in 3D

gendatk

gendatp

gendat

Parzen density data generation

Generation of subsets of a given dataset

x80

45

auto_mpg

398

malaysia

291

biomed

194

breast

683

cbands

12000

chromo

1143

circles3d 100

diabetes

768

ecoli

272

glass

214

heart

297

imox

192

iris

150

ionosphere 351

liver

345

mfeat_fac 2000

mfeat_fou 2000

mfeat_kar 2000

mfeat_mor 2000

mfeat_pix 2000

mfeat_zer 2000

mfeat

2000

by

8 with 3 classes: [15 15 15]

by

6 with 2 classes: [229 169]

by

8 with 20 classes

by

5 with 2 classes: [127

67]

by

9 with 2 classes: [444 239]

by 30 with 24 classes: [500 each]

by

8 with 24 classes

by

3 with 2 classes: [50 50]

by

8 with 2 classes: [500 268]

by

7 with 3 classes: [143

77

52]

by

9 with 4 classes: [163 51]

by 13 with 2 classes: [160 137]

by

8 with 4 classes: [48 48 48 48]

by

4 with 3 classes: [50 50 50]

by 34 with 2 classes: [225 126]

by

6 with 2 classes: [145 200]

by 216 with 10 classes: [200 each]

by 76 with 10 classes: [200 each]

by 64 with 10 classes: [200 each]

by

6 with 10 classes: [200 each]

by 240 with 10 classes: [200 each]

by 47 with 10 classes: [200 each]

by 649 with 10 classes: [200 each]

48

nederland

12 by

ringnorm 7400 by

sonar

208 by

soybean1

266 by

soybean2

136 by

spirals

194 by

twonorm

7400 by

wine

178 by

12

20

60

35

35

2

20

13

with 2 classes: [3664 3736]

with 2 classes: [97 111]

with 15 classes

with 4 classes: [16 40 40 40]

with 2 classes: [97 97]

with 2 classes: [3703 3697]

with 3 classes: [59 71 48]

Routines for loading multi-band image based datasets (objects are pixels, features are image

bands, e.g. colours)

emim31

lena

lena256

texturel

texturet

128

480

256

128

256

x

x

x

x

x

128

512

256

640

256

by

by

by

by

by

8

3

3

7 with 5 classes: [128 x 128 each]

7 with 5 classes:

Routines for loading pixel based datasets (objects are images, features are pixels)

kimia

nist16

faces

2000 by 16 x 16 with 10 classes: [200 each]

400 by 92 x 112 with 40 classes: [ 10 each]

prdataset

prdata

Read data from file

Some datafiles:

delft_idb

256

delft_images 619

mnist

2000

nist

28000

orl

400

roadsigns

332

highway

100

flowers

1360

9

10

20

40

2

17

Delft Images

MNIST train set and testset of handwritten digits

Raw NIST handwritten digit database

Standard face database

Scenes with roadsigns

Pixel labeled highway scenes

Flower images

49

Spherical Set

Highleyman Dataset

4

3

0

1

3

4

6

6

a = gendath([50,50]); scatterd(a);

a = gendatc([50,50]); scatterd(a);

Simple Problem

Difficult Dataset

8

2

0

2

1

6

2

a = gendatd([50,50],2);

scatterd(a); axis(equal);

10

a = gendats([50,50],2,4);

scatterd(a); axis(equal);

Banana Set

Spirals

4

4

2

2

4

6

6

6

10

a = spirals; scatterd(a);

10

a = gendatb([50,50]); scatterd(a);

50

a = faces([1:10:40],[1:5]);

show(a);

a = nist16(1:20:2000);

show(a);

10000

9000

8000

7000

6000

5000

4000

3000

5000

6000

7000

8000

9000

10000 11000

sepal length

a = faces(1:40,1:10);

w = pca(a,2);

scatterd(a*w);

80

80

80

80

70

70

70

70

60

60

60

60

50

50

50

petal length

sepal width

60

80

20

30

40

50

20

40

60

40

40

40

40

30

30

30

30

20

20

20

60

80

20

30

40

20

40

60

20

60

60

60

60

40

40

40

40

20

20

20

60

petal width

a = faces([1:40],[1:10]);

w = pca(a);

show(w(:,1:8));

80

20

30

40

40

60

20

20

20

10

10

10

10

80

20

30

40

sepal width

10

20

10

20

10

20

20

20

20

60

sepal length

20

40

60

petal length

10

20

petal width

a = iris;

scatterd(a,gridded);

a = texturet;

show([a getlab(a)],4);

51

- 2009 - Stephen Ousley - UnderstandingraceandhumanvariationWhyforensicanthr[Retrieved-2015!11!28]Uploaded byMasi Sultan
- Facto ExtraUploaded byAfulito
- Vol34No2Paper7.pdfUploaded byChowdhury Jony Moin
- ELEMENTAL STUDY OF POTTERY SHARDS FROM AN ARCHAEOLOGICAL SITE IN DEDAN, SAUDI ARABIA, USING INDUCTIVELY COUPLED PLASMA-MASS SPECTROMETRY AND MULTIVARIATE STATISTICAL ANALYSISUploaded byBaru Chandrasekhar Rao
- 1-s2.0-S003132039900179X-mainUploaded byMekaTron
- Versiune Finala in Proceedings Id941Uploaded byvliviu
- ANALYSING INDIVIDUAL PROFILES BY THREE-WAY FACTOR ANALYSIS .pdfUploaded byMiguelArceMonroy
- TigrisUploaded byGaurav Tanwer
- tmpE8FC.tmpUploaded byFrontiers
- seed2Uploaded bymzelmai
- Golob et al 2004Uploaded byWilliam Sasaki
- research paperUploaded byMaitri Shastri
- Project Report4.Docxnew1Uploaded byKanya Viswamitra
- Mammographic image Classification using Gabor WaveletUploaded byIRJET Journal
- Immigration_Europe_and_the_new_cultural.pdfUploaded byMariu Vornicu
- News document analysis by using a proficient algorithmUploaded byAnonymous 7VPPkWS8O
- Output SpssUploaded byPipit Rostika
- Pattern+Recognition+%26+Image+Processing Seme2 2006Uploaded byAnirban Baral
- BssUploaded byHarneet Singh Chugga
- anxiety levels among business majorsUploaded byapi-389219194
- Full TextUploaded byAnonymous pb0lJ4n5j
- Improved Clustering Technique in Marketing SectorUploaded byEditor IJTSRD
- Poster Optimal Reaching and Balancing for Humanoid Robots ParkUploaded byDi Originale
- L_3(H1)Uploaded byHaardikGarg
- LWPRJournalUploaded byhvutrong
- 0133128903Uploaded byNn
- Svm&Dimensionality4CRUploaded byPavan Reddy
- jcssp.2014.1139.1150Uploaded byShabranigdo
- Considering Observed and Future Nonstationarities in StatisticalUploaded byayman_awadallah
- Variation in Color and Color Change in Island and Mainland Boas (Boa constrictor).pdfUploaded byFernanda Mogollón Olivares

- 64 Interview QuestionsUploaded byshivakumar N
- APS Sistem EngUploaded byAndika Saputra
- 99831 ID Pengaruh Kepemimpinan Dan Komunikasi IntUploaded byAndika Saputra
- 1052562917703743.pdfUploaded byAndika Saputra
- Decision MakingUploaded byRohit Verma
- Introduction ot data miningUploaded byjithender
- Leader Skills to Motivate OthersUploaded byKamran Hamad
- SNA-8-KAKPM-05.pdfUploaded byaryons
- Decision making docUploaded byashok2009cs
- FP-Growth Based New Normalization Technique for Subgraph RankingUploaded byMaurice Lee
- Mathematics for EconomicsUploaded byHeru T. Natalisa
- 1058_ftpUploaded byAndika Saputra
- content.pdfUploaded byAndika Saputra
- 9_Tips_for_College_Admissions_Interviews.pdfUploaded byAndika Saputra
- 0003121.pdfUploaded byAndika Saputra
- Kiang_CompStatDataAnal_2001.pdfUploaded byAndika Saputra
- 1101.4270.pdfUploaded byAndika Saputra
- 380-389 (1).pdfUploaded byAndika Saputra
- 1109.2378.pdfUploaded byAndika Saputra
- La5.pdfUploaded byAndika Saputra
- Decision Trees- Examples.pdfUploaded byAndika Saputra
- Decision AnalysisUploaded byAshraf Sayed Abdou
- con_paper_0_284_hktut3.pdfUploaded byAndika Saputra
- Chapter_06.pdfUploaded byAndika Saputra
- Chapter12.pptUploaded byAndika Saputra
- ch17Uploaded byNguyễn Việt Hồng
- CDAM Game TheoryUploaded byThangDangDinh
- ca200decision 2.docUploaded byAndika Saputra

- On Image compression using bit slicingUploaded byRahul Shah
- ClippingUploaded bySeshareddy Katam
- Process Dynamics and Control SolutionsUploaded byciotti6209
- DSP - Lab - EinsteinUploaded bytharaneeswarant
- ORB - An Efficient Alternative to SIFT or SURF - Rublee_iccv2011Uploaded byfunny04
- Sequential and Parallel Sorting AlgorithmsUploaded bykesharwanidurgesh
- sem1Uploaded bySanthosh Chandu C
- dsa pptUploaded byVaibhav Gandhi
- Efficiency of Data Mining Techniques in Edifying SectorUploaded byIJAFRC
- Amcat Material (5)Uploaded byRakesh Chadalavada
- Unsupervised Learning - Text Clustering Machine Learning for NLPUploaded byravigobi
- CBSE Class 12 Mathematics Linear ProgrammingUploaded byDipak Mahalik
- Artificial Neural Network Models for Software Effort EstimationUploaded byijteee
- SP 561Uploaded bykishore_k777
- egr544-3Uploaded byWajeeh Haider
- 22b.patterson Gpops II August 2013Uploaded byAnonymous PsEz5kGVae
- Image Half ToningUploaded byBarath Gunasekaran
- DSP Lectures v2 [Chapter2][1]Uploaded byvvvssssvvv
- IRJET-Speech/Music Change Point Detection using SVM and AANNUploaded byIRJET Journal
- Trellis Coded ModulationUploaded byMohamed Hussien Hamed
- UntitledUploaded byapi-66132988
- Code conversion FEM - XFEMUploaded byjacobess
- Lecture 16Uploaded bySulthan Cader
- Compliance Analysis of PMU Algorithms and Devices for Wide-Area Stabilizing Control of Large Power SystemsUploaded byxuanvinhspktvl
- 11_819Uploaded byAnonymous A8A7zTEmq
- Interpolation Using a Fast Spline Transform (FST)Uploaded bysam_almasry
- DIP_2011_Labs_02Uploaded byraw.junk
- SYSID.lecture.04Uploaded bysam
- Chapt 6Uploaded byVehap Curri
- Article 31Uploaded bySumit Jadhav