Вы находитесь на странице: 1из 12

Data  Mining

Support  Vector  Machines

Introduction  to  Data  Mining,  2nd Edition


by
Tan,  Steinbach,  Karpatne,  Kumar

02/03/2018   Introduction   to  Data  Mining   1

Support  Vector  Machines

● Find  a  linear  h yperplane  (decision  b oundary)  that  will  separate  the  d ata
02/03/2018   Introduction   to  Data  Mining   2
Support  Vector  Machines

B1

● One  P ossible  S olution


02/03/2018   Introduction   to  Data  Mining   3

Support  Vector  Machines

B2

● Another  p ossible  solution


02/03/2018   Introduction   to  Data  Mining   4
Support  Vector  Machines

B2

● Other  p ossible  solutions


02/03/2018   Introduction   to  Data  Mining   5

Support  Vector  Machines

B1

B2

● Which  one  is  b etter?  B 1  o r  B 2?


● How  do  you  d efine  b etter?
02/03/2018   Introduction   to  Data  Mining   6
Support  Vector  Machines

B1

B2

b21
b22

margin
b11

b12

● Find  h yperplane  maximizes the  margin  =>  B 1  is  better  than  B 2


02/03/2018   Introduction   to  Data  Mining   7

Support  Vector  Machines

B1

! !
w• x + b = 0
! ! ! !
w • x + b = −1 w • x + b = +1

b11

! ! b12
! ⎧ 1 if w • x + b ≥ 1 2
f ( x ) = ⎨ ! ! Margin = !
⎩− 1 if w • x + b ≤ −1 || w ||
02/03/2018   Introduction   to  Data  Mining   8
Linear  SVM

● Linear  m odel:  
! !
! ⎧ 1 if w • x + b ≥ 1
f ( x ) = ⎨ ! !
⎩− 1 if w • x + b ≤ −1

● Learning  the  m odel  is  equivalent  to  determining  


!
the  values  of   w and b
!
– How  to  find  w and
          b
             from  training  data?

02/03/2018   Introduction   to  Data  Mining   9

Learning  Linear  SVM


2
● Objective  is  to  maximize: Margin = !
|| w ||
!
! || w ||2
– Which  is  equivalent  to  minimizing: L( w) =
2
– Subject  to  the  following  constraints:
! !
⎧ 1 if w • x i + b ≥ 1
yi = ⎨ ! !
⎩− 1 if w • x i + b ≤ −1
or
yi ( w • xi + b) ≥ 1, i = 1,2,..., N

u This  is  a  constrained  optimization  problem


– Solve it using Lagrange multiplier method

02/03/2018   Introduction   to  Data  Mining   10


Example  of  Linear  SVM

Support  vectors

x1 x2 y λ
0.3858 0.4687 1 65.5261
0.4871 0.611 -­1 65.5261
0.9218 0.4103 -­1 0
0.7382 0.8936 -­1 0
0.1763 0.0579 1 0
0.4057 0.3529 1 0
0.9355 0.8132 -­1 0
0.2146 0.0099 1 0

02/03/2018   Introduction   to  Data  Mining   11

Learning  Linear  SVM

● Decision  boundary  depends  only  on  support  


vectors
– If  you  have  data  set  with  same  support  
vectors,  decision  boundary  will  not  change

– How  to  classify  using  SVM  once  w and  b are  


found?  Given  a  test  record,  xi
! !
! ⎧ 1 if w • x i + b ≥ 1
f ( xi ) = ⎨ ! !
⎩− 1 if w • x i + b ≤ −1

02/03/2018   Introduction   to  Data  Mining   12


Support  Vector  Machines

● What  if  the  problem  is  not  linearly  separable?

02/03/2018   Introduction   to  Data  Mining   13

Support  Vector  Machines

● What  if  the  problem  is  not  linearly  separable?


– Introduce  slack  variables
Need  to  minimize: !
u
|| w ||2 ⎛ N k ⎞
L( w) = + C ⎜ ∑ ξi ⎟
2 ⎝ i =1 ⎠
u Subject  to:  
! !
⎧ 1 if w • x i + b ≥ 1 - ξi
yi = ⎨ ! !
⎩− 1 if w • x i + b ≤ −1 + ξi
uIf  k  is  1  or  2,  this  leads  to  same  objective  function  
as  linear  SVM  but  with  different  constraints  (see  
textbook)

02/03/2018   Introduction   to  Data  Mining   14


Support  Vector  Machines

B1

B2

b21
b22

margin
b11

b12

● Find  the  h yperplane  that  o ptimizes  b oth  factors


02/03/2018   Introduction   to  Data  Mining   15

Nonlinear  Support  Vector  Machines

● What  if  decision  boundary  is  not  linear?

02/03/2018   Introduction   to  Data  Mining   16


Nonlinear  Support  Vector  Machines

● Trick:  T ransform  data  into  higher  dimensional  


space

Decision  boundary:
! !
w • Φ( x ) + b = 0
02/03/2018   Introduction   to  Data  Mining   17

Learning  Nonlinear  SVM

● Optimization  problem:

● Which  leads  to  the  same  set  of  equations  ( but  


involve  Φ(x)  instead  of  x)

02/03/2018   Introduction   to  Data  Mining   18


Learning  NonLinear  SVM

● Issues:
– What  type  of  mapping  function  Φ should  be  
used?
– How  to  do  the  computation  in  high  
dimensional  space?
u Most  computations  involve  dot  product  Φ(xi)• Φ(xj)  
u Curse  of  dimensionality?

02/03/2018   Introduction   to  Data  Mining   19

Learning  Nonlinear  SVM

● Kernel  T rick:
– Φ(xi)• Φ(xj)  =  K(xi,  xj)  
– K(xi,  xj)  is  a  kernel  function  (expressed  in  
terms  of  the  coordinates  in  the  original  space)
u Examples:

02/03/2018   Introduction   to  Data  Mining   20


Example  of  Nonlinear  SVM

SVM  with  polynomial  


degree  2  kernel

02/03/2018   Introduction   to  Data  Mining   21

Learning  Nonlinear  SVM

● Advantages  of  using  kernel:


– Don’t  have  to  know  the  mapping  function  Φ
– Computing  dot  product  Φ(xi)• Φ(xj)  in  the  
original  space  avoids  curse  of  dimensionality

● Not  all  functions  can  be  kernels


– Must  make  sure  there  is  a  corresponding  Φ in  
some  high-­dimensional  space
– Mercer’s  theorem  (see  textbook)

02/03/2018   Introduction   to  Data  Mining   22


Characteristics  of  SVM

● Since  the  learning  problem  is  formulated  as  a  convex  


optimization  problem,  efficient  algorithms  are  available  to  
find  the  global  minima  of  the  objective  function  (many  of  
the  other  methods  use  greedy  approaches  and  find  locally  
optimal  solutions)
● Overfitting  is  addressed  by  maximizing  the  margin  of  the  
decision  boundary,  but  the  user  still  needs  to  provide  the  
type  of  kernel  function  and  cost  function
● Difficult  to  handle  missing  values
● Robust  to  noise
● High  computational  complexity  for  building  the  model

02/03/2018   Introduction   to  Data  Mining   23

Вам также может понравиться