You are on page 1of 26

Principles of Support Vector Machine (SVM)

classification
SVM is a pattern recognition method that is used widely in data mining applications, and
provides a means of supervised classification, as do SIMCA and LDA. SVM was originally
developed for the linear classification of separable data, but is applicable to nonlinear data
with the use of kernel functions. SVM are used in machine learning, optimization,
statistics, bioinformatics, and other fields that use pattern recognition. The algorithm
used within The Unscrambler is based on code developed and released under an
modified BSD license by Chih-Chung Chang and Chih-Jen Lin of the National Taiwan
University. Hsu et al,2009

What is SVM classification?


SVM is a classification method based on statistical learning wherein a function that
describes a hyperplane for optimal separation of classes is determined. As the linear
function is not always able to model such a separation, data are mapped into a new
feature space and a dual representation is used with the data objects represented by their
dot product. A kernel function is used to map from the original space to the feature space,
and can be of many forms, thus providing the ability to handle nonlinear classification
cases. The kernels can be viewed as a mapping of nonlinear data to a higher dimensional
feature space, while providing a computation shortcut by allowing linear algorithms to
work with higher dimensional feature space. The support vector is defined as the reduced
training data from the kernel. The figure below illustrates the principle of applying a
kernel function to achieve separability.

In this new space SVM will search for the samples that lie on the borderline between the
classes, i.e. to find the samples that are ideal for separating the classes; these samples are
named support vectors. The figure below illustrates this in that only the samples marked
with + for the two classes are used to generate the rule for classifying new samples.

A situation where SVM will perform well is when some classes are inhomogeneous and
partly overlapping, and thus, building local PCA models with all samples will not be
successful because one class may encompass other classes if all samples are used.
SVM will in this case find a set of the most relevant samples in terms of discriminating
between the classes and is invariant to samples far from the discrimination line.
SVM has advantages over classification methods such as neural networks, as it has a
unique solution, and has less tendency of overfitting when compared to other nonlinear
classification methodologies. Of course, the model validation is the critical aspect in
avoiding overfitting for any method. SVMs are effective for modeling of nonlinear data,
and are relatively insensitive to variation in parameters. SVM uses an iterative training
algorithm to achieve separation of different classes.

Two SVM classification types are available in The Unscrambler which are based on
different means of minimizing the error function of the classification.

c-SVC: also known as Classification SVM Type 1.

nu-SVC: also known as Classification SVM Type 2.

In the c-SVM classification, a capacity factor, C, can be defined. The value of C should be
chosen based on knowledge of the noise in the data being modeled. Its value can be
optimized through cross-validation procedures. When using nu-SVM classification, the nu
value must be defined (default value = 0.5). Nu serves as the upper bound of the fraction
of

errors

and

is

the

lower

bound

for

the

fraction

of

support

vectors.

Increasing nu will allow more errors, while increasing the margin of class separation.
The kernel type to be used as a separation of classes can be chosen from the following
four options:

Linear

Polynomial

Radial basis function

Sigmoid

The linear kernel is set as the default option . If the number of variables is very large the
data do not need to be mapped to a higher dimensional space the linear kernel function is
preferred. The radial basis function is also simple function and can model systems of
varying complexity. It is an extension of the linear kernel.
If a polynomial kernel is chosen, the order of the polynomial must also be given. In SVM
classification, the best value for C is often not known a priori. Through a grid search and
applying cross validation to reduce the chance of overfit, one can identify an optimal value
of C so that unknowns can be properly classified using the SVM model.

Data suitable for SVM classification


SVM classification is a supervised method of classification. The data used for SVM must
have a data matrix which includes a single category variable defining which classes are to
be discriminated by the model. The X and Y matrices must have the same number of rows
(samples) for SVM classification, and not have any missing data. The Y matrix must
contain a single column of category variables. The X data must be numerical, and not
contain any missing data.
SVM have been used in drug discovery to identify compounds that may have efficacy, and
also to identify toxicity issues with drugs. They have been used in classification problems

such as that of classifying plastics from their FTIR spectra, meat and bone meal in feed
from NIR imaging spectroscopy, teas from HPLC chromatograms, and many other areas in
pattern recognition and data mining.

Main results of SVM classification


When an SVM model is created a new node is added in the project navigator with a folder
for the data used in the model, and the results folder. The results folder has the following
matrices:

Support vectors

Confusion matrix

Parameters

Probabilities

Prediction

The main result of the SVM is the the confusion matrix, which indicates how many
samples were classified is each class, and the prediction matrix, which indicates the
classification determined for each sample in the training set.
The prediction matrix indicates the classification determined for each sample in the
training set.

More details about SVM Classification


It is advised to start with the RBF kernel with various settings of C for C-SVM and select
10-segment cross validation. If all samples are correctly classified, which means the
confusion matrix has no values outside the diagonal, one may select this model as
suitable for classifying future samples. Of course, some data will not classify all samples
in the correct class during training.
If the data are expected to be nonlinear, e.g. from looking at the classes in a scores plot
from PCA or PLS-DA, one may try other kernels and change the settings for C or nu.

SVM classification application examples


SVM were used as a multivariate classification tool for the identification of meat and bone
meal in animal feed in response to legislation banning such substances following the
outbreak of mad cow disease.Fernandez Pierna et al, 2004 NIR imaging spectroscopy is
able to detect differences in feeds based on the chemical composition. SVM can be used to

classify feed samples, reducing the need for constant expert analysis of data, thus
providing a rapid tool for analysis that can be utilized for certification of animal feed.
SVM were applied for the classification of plastics in a recycling system. Belousov et al,
2002 A remote FTIR spectrometer was mounted on a conveyor where plastics were being
sorted for recycling. A two-tiered classification model was developed where at the first
level samples were divided into the classes of important plastics (ABS, PC, PC/ABS, SB
and PVC) and reject plastics (PA, PP and PE). The important plastics were then further
categorized into each individual type of plastic.
More details regarding Support Vector Machine classification are given in the method
reference.

Tasks Analyze Support Vector Machine


classification
The sections that follow list menu options, dialogs and results while using Support Vector
Machine classification in practice accessible from the menu Tasks-Analyze-Support Vector
Machine Classification.

Model input
First the input data for the classification is defined in the Support Vector Machine dialog.
Choose the data matrix which contains the data to be used for the classification as the
first matrix. This matrix of predictors should contain only numerical values, with no
missing values. The second matrix to define is that containing the category, and must
have a single column only. The SVM training requires at least two classes. This
classification information may be from the same matrix or another, but must have the
same number of rows as the first, and have only a single column of category data.

Support Vector Machine Model Inputs

If the appropriate selection is not made for the classifier, the following warning will be
displayed. To build the SVM model go to the column drop-down list, select a single
column containing category variables.

Support Vector Machine Model Inputs Warnings

Options
Here one can choose the SVM type of classification to use, either C-SVC or nu-SVM, from
the drop-down list next to SVM type. The kernel type to be used to determine the
hyperplane that best separates the classes can be selected from the following types from
the drop-down list. The default setting of Radial basis function is the simplest, and can
model complex data.

Support Vector Machine Options

The kernel types are:

Linear

Polynomial

Radial basis function

Sigmoid

For a polynomial kernel type, the degree of the polynomial should be defined. The C-SVM
has an input parameter named C, which is a capacity factor (also called penalty factor), a
measure of the robustness of the model. C must be greater than 0.
When using nu-SVM regression the nu value must be defined (default value = 0.5). Nu
serves as the upper bound of the fraction of errors and is the lower bound for the fraction
of support vectors.

Support Vector Machine Options for nu-SVM

Support Vector Machine Options for C-SVM

Grid Search
In the options tab the Grid Search button

is available. Clicking on the Grid

Search button will open a dialog for grid search. The figure below shows the grid search
dialog after a grid search has been perforemd.

The dialog asks for input for the parameters Gamma and C in the case of C-SVMC and
Gamma and Nu in the case of nu-SVMR. It has been reported in the literature that an
exponentially growing sequence of the parameters is good as a first course grid search.
This is why the inputs Gamma and C are given on the log scale, but not the nu since it is
between 0 and 1. However, in the grid table above the actual values are given. It is
recommended to use cross-validation in grid search to avoid overfitting when many
combinations of the parameters are tried. After an initial grid search it may be refined with
smaller ranges for the parameters once the best range has been found. Click on the Start
button for the calculations to commence. Note that it is possible to click on Stop during
the computations so that if the results become worse for higher values for the parameters
one may stop to save time.The default is to start with five levels of each parameter. Click
on one (the best) value for the Validation accuracy in the grid after completion to see
detailed results. The SVs lists how many samples that were selected and is depending
should be related to the number of samples in the data.
Click on Use setting to return to the previous dialog and for running the SMVC again with
these parameter settings. Notice that since the cross validation is random the RMSE and

the R-square from validation may be different in the second run. This again is a function
of the distribution of the samples.
To understand more in detail how SVMC selects the support vectors (samples that are
lying on the boundary between the classes) one may run a PCA on the same data and
make use of the Sample Grouping option in the score plot to visualize the support vectors.

Weights
If the analysis calls for variables to be weighted for making realistic comparisons to each
other (particularly useful for process and sensory data), click on the Weights tab and the
following dialog box will appear.

Support Vector Machine Weights

Individual variables can be selected from the variable list table provided in this dialog by
holding down the control (Ctrl) key and selecting variables. Alternatively, the variable
numbers can be manually entered into the text dialog box. The Select button can be used
(which will bring up the Define Range dialog), or every variable in the table can be selected
by simply clicking on All.

Once the variables have been selected, to weight them, use the options in the Change
Selected Variable(s) dialog box, under the Select tab. The options include:
A/(SDev +B)
This is a standard deviation weighting process where the parameters A and B can
be defined. The default is A = 1 and B = 0.
Constant
This allows the weighting of selected variables by predefined constant values.
Downweight
This allows the multiplication of selected variables by a very small number, such
that the variables do not participate in the model calculation, but their correlation
structure can still be observed in the scores and loadings plots and in particular,
the correlation loadings plot.
Block weighting
This option is useful for weighting various blocks of variables prior to analysis so
that they have the same weight in the model. Check the Divide by SDev box
to weight the variables with standard deviation in addition to the block weighting.
Use the Advanced tab in the Weights dialog to apply predetermined weights to each
variable. To use this option, set up a row in the data set containing the weights (or create
a separate row matrix in the project navigator). Select the Advanced tab in the Weights
dialog and select the matrix containing the weights from the drop-down list. Use the Rows
option to define the row containing the weights and click on Update to apply the new
weights.
Another feature of the advanced tab is the ability to use the results matrix of another
analysis as weights, using the Select Results Matrix button

. This option provides

an internal project navigator for selecting the appropriate results matrix to use as a
weight.
The dialog box for the Advanced option is provided below.

SVM Advanced Weights Option

Once the weighting and variables have been selected, click Update to apply them.

Validation
Validation is an important part of any method applied in modeling data. Settings for the
Validation of the SVM are set under the Validation tab as shown below. First select to cross
validate the model by checking the check box. The number of segments to use can be
chosen in the segments entry. Cross validation is helpful in model development but
should not be a replacement for full model validation using a test set.

Support Vector Machine Validation

There are six result matrices generated after creating a SVM model:

Support vectors

Confusion matrix

Parameters

Probabilities

Prediction

Accuracy

There is only one matrix generated when predicting with a SVM model: Classified range

SVM node

Support vectors
The support vector matrix is comprised of the support vectors which are a subset of the
original samples that are closest to the boundary between classes and define the optimal
separation between classes.

Confusion matrix
The confusion matrix is a matrix used for visualization for classification results from
supervised methods such as support vector machine classification or linear discriminant
analysis classification. It carries information about the predicted and actual classifications
of samples, with each row showing the instances in a predicted class, and each column
representing the instances in an actual class.
In the below confusion matrix, all the Setosa samples are nicely attributed to the Setosa
group.
Two samples with actual value Virginica are predicted as Versicolor.
In the same way two samples with actual value Versicolor are predicted as Virginica.

Confusion matrix

Parameters
The parameters matrix carries information on the following parameters for all the
identified classes:

SVM type

Kernel type - as defined in the options for the SVM learning step

Degree - as defined in the options for the SVM learning step

Gamma - related to the C values set in the options

Coef0 Classes - the number of classes identified by the SVM model

SV Count - the number of support vector needed for the classification of the data

Labels - the labels of the corresponding classes, given as numerical values starting
with 0

Numbers - the number of samples classified in a given class

Parameters matrix

Probabilities
The probabilities matrix has three rows, for the Rho, and probabilities A and B for each of
the identified classes.

Probabilities matrix

Prediction
The prediction matrix exhibits the predicted class for each sample in the training set.

Prediction

Accuracy
Accuracy holds the % correctly classified samples from calibration and validation. If cross
validation was not chosen it leaves this field blank. However, cross validation is highly
recommended to avoid overfitting. See the Confusion Matrix regarding details for false
positives and false negatives.

Plot of classification results


This plot shows the various classes as they were classified for a 2D scatter plot of the
original variables. Use the arrows or drop-down list to choose which of the original
variables to show. This is useful to see for which combinations of pairs of variables there
is good separation between the classes. Alternatively perform PCA on the same data and
visualize the the support vectors with the sample grouping option in the score plot and
interpret the loading plot to find the most important variables.The Act and Pre buttons can
be used to toggle if one of them or both should be shown; the predicted are shown with a
smaller markersize. If the predicted class differs from the actual this is shown with a small
symbol with the color for the wrongly assigned class inside the larger marker for the
actual class. In the illustration below two samples (Batch19 and Batch21) are predicted to
belong to class Asia although the actual class is Europe.

Classified range
After an SVM model has been applied to new data to classify them, a new matrix with the
results is added to the project navigator. The Classified_Range matrix contains a category
variable giving the category predicted by the model for each sample.

Classified range

Autopretreatment may be used with SVM. This allows a user to automatically apply the
transforms used with the data in developing the SVM model to data used in the
classification of new samples with this model.

Support Vector Machine Autopretreatment

When all of the parameters have been defined, the SVM is run by clicking OK. A new node,
SVM, is added to the project navigator with a folder for Data, and another for Results.
More details regarding Support Vector Machine classification are given in the section SVM
Classify or in the link given under License.

Tasks Predict Classification SVM


After an SVM classification model has been developed, it can be used to classify new
samples by going to Tasks-Predict-Classification-SVM. In the dialog box, one first
chooses which SVM model to apply from the drop-down list. This requires a valid SVM
model in the current project. One then defines which samples to classify by selecting
samples from the appropriate data matrix, along with the X variables that are to be used
for the classification. The X-variables must contain only numerical data and have the same
number of variables as were used to develop the SVM model.

Classify Using SVM Model

The SVM classification results are given in a new matrix in the project navigator named
Classified_Range. The matrix has the predicted class for each sample.

Interpreting SVM Classification results


There are six result matrices generated after creating a SVM model:

Support vectors

Confusion matrix

Parameters

Probabilities

Prediction

Accuracy

There is only one matrix generated when predicting with a SVM model: Classified range

SVM node

Support vectors
The support vector matrix is comprised of the support vectors which are a subset of the
original samples that are closest to the boundary between classes and define the optimal
separation between classes.

Confusion matrix
The confusion matrix is a matrix used for visualization for classification results from
supervised methods such as support vector machine classification or linear discriminant
analysis classification. It carries information about the predicted and actual classifications
of samples, with each row showing the instances in a predicted class, and each column
representing the instances in an actual class.
In the below confusion matrix, all the Setosa samples are nicely attributed to the Setosa
group.
Two samples with actual value Virginica are predicted as Versicolor.
In the same way two samples with actual value Versicolor are predicted as Virginica.

Confusion matrix

Parameters
The parameters matrix carries information on the following parameters for all the
identified classes:

SVM type

Kernel type - as defined in the options for the SVM learning step

Degree - as defined in the options for the SVM learning step

Gamma - related to the C values set in the options

Coef0 Classes - the number of classes identified by the SVM model

SV Count - the number of support vector needed for the classification of the data

Labels - the labels of the corresponding classes, given as numerical values starting
with 0

Numbers - the number of samples classified in a given class

Parameters matrix

Probabilities
The probabilities matrix has three rows, for the Rho, and probabilities A and B for each of
the identified classes.

Probabilities matrix

Prediction
The prediction matrix exhibits the predicted class for each sample in the training set.

Prediction

Accuracy
Accuracy holds the % correctly classified samples from calibration and validation. If cross
validation was not chosen it leaves this field blank. However, cross validation is highly
recommended to avoid overfitting. See the Confusion Matrix regarding details for false
positives and false negatives.

Plot of classification results

This plot shows the various classes as they were classified for a 2D scatter plot of the
original variables. Use the arrows or drop-down list to choose which of the original
variables to show. This is useful to see for which combinations of pairs of variables there
is good separation between the classes. Alternatively perform PCA on the same data and
visualize the the support vectors with the sample grouping option in the score plot and
interpret the loading plot to find the most important variables.The Act and Pre buttons can
be used to toggle if one of them or both should be shown; the predicted are shown with a
smaller markersize. If the predicted class differs from the actual this is shown with a small
symbol with the color for the wrongly assigned class inside the larger marker for the
actual class. In the illustration below two samples (Batch19 and Batch21) are predicted to
belong to class Asia although the actual class is Europe.

Classified range
After an SVM model has been applied to new data to classify them, a new matrix with the
results is added to the project navigator. The Classified_Range matrix contains a category
variable giving the category predicted by the model for each sample.

Classified range