Data Mining: Support Vector Machines Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar

Data Mining
Support Vector Machines
Introduction to Data Mining, 2nd Edition

by
Tan, Steinbach, Karpatne, Kumar
02/03/2018 Introduction to Data Mining 1
● Find a linear h yperplane (decision b oundary) that will separate the d ata
B1
● One P ossible S olution

B2
● Another p ossible solution

B2
● Other p ossible solutions

B1
B2
● Which one is b etter? B 1 o r B 2?

● How do you d efine b etter?
B1
B2
b21
b22
margin
b11
b12
● Find h yperplane maximizes the margin => B 1 is better than B 2

B1
! !
w• x + b = 0
! ! ! !
w • x + b = −1 w • x + b = +1
b11
! ! b12
! ⎧ 1 if w • x + b ≥ 1 2
f ( x ) = ⎨ ! ! Margin = !
⎩− 1 if w • x + b ≤ −1 || w ||
Linear SVM
● Linear m odel:
! !
! ⎧ 1 if w • x + b ≥ 1
f ( x ) = ⎨ ! !
⎩− 1 if w • x + b ≤ −1
● Learning the m odel is equivalent to determining

!
the values of w and b
!
– How to find w and
b
from training data?
Learning Linear SVM

2
● Objective is to maximize: Margin = !
|| w ||
!
! || w ||2
– Which is equivalent to minimizing: L( w) =
2
– Subject to the following constraints:
! !
⎧ 1 if w • x i + b ≥ 1
yi = ⎨ ! !
⎩− 1 if w • x i + b ≤ −1
or
yi ( w • xi + b) ≥ 1, i = 1,2,..., N
u This is a constrained optimization problem

– Solve it using Lagrange multiplier method

Example of Linear SVM
Support vectors
x1 x2 y λ
0.3858 0.4687 1 65.5261
0.4871 0.611 -1 65.5261
0.9218 0.4103 -1 0
0.7382 0.8936 -1 0
0.1763 0.0579 1 0
0.4057 0.3529 1 0
0.9355 0.8132 -1 0
0.2146 0.0099 1 0
Learning Linear SVM
● Decision boundary depends only on support

vectors
– If you have data set with same support
vectors, decision boundary will not change
– How to classify using SVM once w and b are

found? Given a test record, xi
! !
! ⎧ 1 if w • x i + b ≥ 1
f ( xi ) = ⎨ ! !
⎩− 1 if w • x i + b ≤ −1

● What if the problem is not linearly separable?
● What if the problem is not linearly separable?

– Introduce slack variables
Need to minimize: !
u
|| w ||2 ⎛ N k ⎞
L( w) = + C ⎜ ∑ ξi ⎟
2 ⎝ i =1 ⎠
u Subject to:
! !
⎧ 1 if w • x i + b ≥ 1 - ξi
yi = ⎨ ! !
⎩− 1 if w • x i + b ≤ −1 + ξi
uIf k is 1 or 2, this leads to same objective function
as linear SVM but with different constraints (see
textbook)

B1
B2
b21
b22
margin
b11
b12
● Find the h yperplane that o ptimizes b oth factors

Nonlinear Support Vector Machines
● What if decision boundary is not linear?

Nonlinear Support Vector Machines
● Trick: T ransform data into higher dimensional

space
Decision boundary:
! !
w • Φ( x ) + b = 0
Learning Nonlinear SVM
● Optimization problem:
● Which leads to the same set of equations ( but

involve Φ(x) instead of x)

Learning NonLinear SVM
● Issues:
– What type of mapping function Φ should be
used?
– How to do the computation in high
dimensional space?
u Most computations involve dot product Φ(xi)• Φ(xj)
u Curse of dimensionality?
● Kernel T rick:
– Φ(xi)• Φ(xj) = K(xi, xj)
– K(xi, xj) is a kernel function (expressed in
terms of the coordinates in the original space)
u Examples:

Example of Nonlinear SVM
SVM with polynomial

degree 2 kernel
● Advantages of using kernel:

– Don’t have to know the mapping function Φ
– Computing dot product Φ(xi)• Φ(xj) in the
original space avoids curse of dimensionality
● Not all functions can be kernels

– Must make sure there is a corresponding Φ in
some high-dimensional space
– Mercer’s theorem (see textbook)

Characteristics of SVM
● Since the learning problem is formulated as a convex

optimization problem, efficient algorithms are available to
find the global minima of the objective function (many of
the other methods use greedy approaches and find locally
optimal solutions)
● Overfitting is addressed by maximizing the margin of the
decision boundary, but the user still needs to provide the
type of kernel function and cost function
● Difficult to handle missing values
● Robust to noise
● High computational complexity for building the model

Data Mining: Support Vector Machines Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Mining: Support Vector Machines Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar

Загружено:

Авторское право:

Доступные форматы

Data Mining

Support Vector Machines

Introduction to Data Mining, 2nd Edition

02/03/2018 Introduction to Data Mining 1

Support Vector Machines

● One P ossible S olution

Support Vector Machines

● Another p ossible solution

● Other p ossible solutions

Support Vector Machines

● Which one is b etter? B 1 o r B 2?

● Find h yperplane maximizes the margin => B 1 is better than B 2

Support Vector Machines

● Learning the m odel is equivalent to determining

02/03/2018 Introduction to Data Mining 9

Learning Linear SVM

u This is a constrained optimization problem

02/03/2018 Introduction to Data Mining 10

02/03/2018 Introduction to Data Mining 11

Learning Linear SVM

● Decision boundary depends only on support

– How to classify using SVM once w and b are

02/03/2018 Introduction to Data Mining 12

● What if the problem is not linearly separable?

02/03/2018 Introduction to Data Mining 13

Support Vector Machines

● What if the problem is not linearly separable?

02/03/2018 Introduction to Data Mining 14

● Find the h yperplane that o ptimizes b oth factors

Nonlinear Support Vector Machines

● What if decision boundary is not linear?

02/03/2018 Introduction to Data Mining 16

● Trick: T ransform data into higher dimensional

Learning Nonlinear SVM

● Which leads to the same set of equations ( but

02/03/2018 Introduction to Data Mining 18

02/03/2018 Introduction to Data Mining 19

Learning Nonlinear SVM

02/03/2018 Introduction to Data Mining 20

SVM with polynomial

02/03/2018 Introduction to Data Mining 21

Learning Nonlinear SVM

● Advantages of using kernel:

● Not all functions can be kernels

02/03/2018 Introduction to Data Mining 22

● Since the learning problem is formulated as a convex

02/03/2018 Introduction to Data Mining 23

Вам также может понравиться