Вы находитесь на странице: 1из 260

..

: 31 2014

, , . , ,
, , .
, ,
, , , , , ,
, , , ,
. ,
,
.
:
, , ,
.
, , ,
.
, . .
[84]. ,
[84], [85].

(2008) (20022005) . , .
!


1
1.1 . . .
1.1.1 . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.2 . . . . . . . . . . . . . . . . . .
1.1.3 : . . .
1.1.4 :
. . . . . . . . . . . . . .
1.1.5 : . . . . . . . . . . . . . . . .
1.1.6
1.1.7 . . . . . . . . . . . . . .
1.2 . . . . . . . .
1.2.1 . . .
1.2.2 : .
1.2.3 . . . . . . . .
1.2.4 . .
1.2.5 : . . . . . . . . . . . . .
1.2.6 () . . .
1.2.7 : . . . .
1.2.8 : . . . . . . . . . . . . . . . . . . .
1.3 . . . . . . . . . . . . . .
1.3.1 . . . . . . . . . . . . . . . . . . .
1.3.2 . . . . . . . . .
1.3.3 . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.4 . . . . .

4
6
6
8
9
10
12
14
16
17
17
18
19
20
24
29
31
32
33
34
34
34
37

2 :
2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 . . . . . . . . . . . . .
2.1.2 . .
2.1.3 . . . . . . . . . . . . . . . . .
2.2 . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 ( ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 (logistic regression) . . . . . . . .
2.2.3 . . . . . . . . . . . . . . . . . . . .
2.2.4 ( , margin
classifiers) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 . . . . .
2.3.1 . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 . . . . . . . . . . . . . . . . . . . . .

38
40
40
42
45
46

3
3.1 . . . . . . . . . . .
3.2 (MLP) . . . . . . . . . . . . . . . . . .
3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 () . . . . . . . . . .
3.2.3 : (error back-propagation) . . . . . . . .

65
65
68

47
50
52
56
60
60
61
64

69
71
71

3.2.4

3.3

: . . . . . . . . . . . . . . . . . . 75
RBF- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.3.1 RBF-: (expectation
maximization) . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4
4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 . . . . . .
4.1.2 . . . . . . . . . . . . . . . . . . .
4.1.3 . . . . . . . . . . . . . . . .
4.1.4 , . . . . . . . . . . . . . . . .
4.1.5 . . . . . . . . . . . . . . . . . . .
4.1.6 - . . . . . . . .
4.2 (SVM, SVC, SVR) . . . . . . . . . . . . .
4.2.1 . . . . . . . . . . . . . . . .
4.2.2 . . . . . . . . . . . . . . . . . .
4.2.3 SVC . . . . . . . . . . . . . . . . . . . .
4.2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.5 . . . . . . . . . . . . . . . . . .
4.2.6 . . . . . . . . . . . . . . .
4.2.7 . . . . . . . . . . . . . . . . . . . . . .

91
91
93
94
96
97
97
100
104
104
107
108
112
116
118

5
5.1 : . . . . . . . . . . .
5.1.1 . . .
5.1.2 . . . . . . . . . . . . . . .
5.1.3 . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 . . . . . . . .
5.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2 . . . . . . . . . . . . . . . . . . . .
5.2.3 ; AdaBoost . . . . . . . . . . . . . . . .
5.2.4 . . . . . . . . . . . . . . .
5.3 . . . . . . . . . . . . .
5.4 . . . . . . . . . . . . . . . . . .

122
122
123
124
126
126
126
128

118

132
136
138
142

6
142
6.1 . . 142
6.2 . . . . . 144
A
A.1 . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2 . . . . . . .
A.2.1 :
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2.2 . . . . . . .
A.3 . . . . . . . . . . . . . . . . . . . .
A.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.3.2 EM . . . .
A.3.3 EM . . . . . . .
A.3.4 EM . . . . . . . . . . . . . . . . . .
A.3.5 EM . . . . . . . . . . . . . . . . . . .

146
146
147
149
151
151
151
154
159
164
165

B
B.1 . . . . . .
B.2 . . . . . . . . . . . . . . . .
B.3 . . . . . . . .
B.4 - . . . . . . . . . . . . . . . . .
B.5 . . . . . . . . . . . . . . . . . . . . . . .
B.5.1 . . . . . . . . . . . . .
B.5.2 . . . . . . . . . . . .
B.5.3 . . . . . .
B.5.4 . . . . . . .
B.5.5 . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

C ;
C.1 . . . . . . .
C.2 . . . . . . . . . . . .
C.3 (HMM, Hidden Markov Model) . . .
C.3.1 , . . . . . . . . . . . . . . . . . . . . . . .
C.3.2
. .
C.4 . . . . . . . . . . . . . . . . .
C.4.1 . . . . . . . . .
C.4.2 . . . . . . . . . . . . . . .
C.4.3 . . . . . . . . . . . . . . . . .

167
167
170
170
172
174
175
176
177
178
179
179
180
182
185

186
199
205
205
207
208

D ;
213
D.1 . . . . 213
D.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
D.2.1 (MRF, Markov Random Fields) 215
D.2.2 . . . . . . . . . . . . . . 222
D.2.3 (CRF, Conditional Random Fields)225
D.3 . . . . . 228
D.3.1 . . . . . . . . . . 229
D.3.2 . . . . . . . . . . . . . 231
D.3.3 CRF . . . . . . . . . . . . . . . . . . . . . . . . . . 232
D.3.4 . . . . . . . . . . . . . . . . . . . 234
D.3.5 CRF . . . . . . . 243
D.4 - . . . . . . . . . . . . . . . . . . . . 244

245

255

.
..

- , , . ,
( ) . ,
4

- . , ,
.
(statistical learning, machine
learning, pattern recognition, ), .
, -
- , . , ,
, .
, .
, (-, , , . . . ), , . (recognition), (recognizer ,
learner ), (learning, training, fitting)
.
.
:
. ,
( , , , ) , , , () . ,
( ..) .
. ,
, , . , .
. . , .
. , (utterance) (,
) , .
, .
. , - , . , .
. .
,
, / . , , .

. (, ) .
, , , , .
. . ,
.
, , (: , cluster analysis). - . , ,
(cluster ), , , ,
, ,
(.. )
, , . , .
(. . . ) , ,
. , , , - , [14, 15].
, , , [114].
[117]. ,
pattern recognition, machine learning (, [15]) statistical learning, (, [50] [113])
, ,
,
-.

1.1
1.1.1

() (feature),
, . X . , d , X d- Rd . , , .
-, ( ) , , ()
, ,
. -,
- (


()

- ) ( )
. -, - ,
, : -, , , ... -,
d .
.
(,
, , -
, wavelet ..)
.
:

(, ) : , 0 1, .
, k > 2 ( , , . . . ) k
,
(derived features): j- xj 1, j- , 0 P
.
k
j

j=1 x = 1. ,
, - .
, ,
( , ,
(, , ) , )
( ), , ,
.

(, 56- ),
(, 0 10), .
d , ..
Rd , , - ,
, .
(response) , (
) , , .. () Y, , q-
Rq . ,
f : Rd Rq , . , , ,
. , , ,
.
7

1.1.2

( )
.

, - .
() , q , , j- - j-
. , , ,
, q- (confidence) .
, 1 .
, ,
( 0 1,
1).
, , ( ),
.
- . , .
, , 0
, , 31
.

- ,
, , ,
.
( )
.
: , .
, ,
: . ( )
.
( ),
: , .
,
, , . :
,
, ,
, .
, , 1 ,
. 2.2.2.

. , , .
, ., , 4.2.7.
, ( ) ( ): ,
(, ) - . , , - ,
,
- .

. , ,
- , , , ()
.
1.1.3

, - .
X , Y ,
N , .. T = ((x1 , y1 ), . . . , (xN , yN )) (X Y)N .
x yi X xi T .
: ,
.
- . :
;
Y , , R ( ), ;
Y ,
, {0, 1} ( ), .
, ,
, (NN , nearest neighbor ) k (k-NN ) k > 1.
k = 1 T ,
. k > 1 , k, , . ,
. N
, .
,
. k
. , ,
9

. (, ), .
[75], .
, (, [49]).
1.2.2 ,

, . -
. , , .
, X (, 40)
- (, 20-) ,

.
(curse of dimensionality).
, , , k :
k. - , -
. , ?
1.1.4

-.
(., ,
[115]). 1.2.4 1.2.6
: 2 , .
:
() X ,
, , d- Rd ;
Y, , , q- Rq ;
F f : X Y, , X Y, , ,
, ..;
P ( ) X Y, , X Y,
, /
, .., - ;
2 , , .., .
,
.

10

E: Y Y X R, , , , loss function, error function, . . . , , 0 ( ) ( )


( ); , Y E(r, y, x)
= kr yk2 ,
0 r = y
0-1- E(r, y, x) =
; 1 r 6= y
E : Y Y R,
X 3 ;
T = ((x1 , y1 ), . . . , (xN , yN )), ( , ) (xi , yi ) X Y, , ,
P.
, , x
y . , , x
(y|x) ,

p;x (y) = p (y|x) =

p (x, y)
p (x, y)
=R
.
p (x)
p (x, y 0 )dy 0
y 0 Y

(1)

X , Y, F, P, E T f F, ( , ,
,. . . )4
Z
E (f ) =
E(f (x), y, x)d(x, y) min,
(2)

f F

(x,y)X Y

- , ,
, .. , > 0
({(x, y) X Y|E(f (x), y, x) > }) < .

(3)

, , , , T . ,
T , - (2)
Z
E(f (x), y, x)d(x, y) E(f, T ) =
(x,y)X Y

N
1 X
E(f (xi ), yi , xi ) ,
N i=1

(4)

(training
error ), ... E (f )

N
1 X

E(f, T ) =
(5)
E(f (xi ), yi , xi ) min
f F
N i=1
3


, , .
4 E (,
E rror) ,
.

11

E
F.
, , (5) T , -
(4) f , T . (2) ,
0
T 0 = ((x01 , y10 ), . . . , (x0N 0 , yN
0 )), ,
( ,
test error ) E(f, T 0 ). , E(f, T 0 ) E (f )
. : E(f, T 0 ) > E(f, T ),
. ?. . .
, , , , .
1.1.5

(recognition tree , decision tree)


.
,
, : - - - . ,

if ( x[j] ?? t[k] ) then
else

...
...


return ( r[l] )
x , t , r , ?? . , ,
, .
f : X Y ,
, - , , ,
.
, ,
. , , XVIIIXIX ,
, .. ,
.

1960- 1980- (CART [20], C4.5
[92]). ,


, (..
) .
, . 5.
12


()

,
, (,
) (5). ,
( , ), <.
T , .. f1 (x) = r,
E(f1 , T ) =

n
X

E(f (xi ), yi ) .

(6)

i=1

. ,
() r

r=

n
1 X
yi ,
N i=1

0-1- () .
X
xj < xji xj xji , j = 1, . . . , d, i = 1, . . . , N ,
,
(
, stump), .. - r< r . (6),
, .
d(N 1) , .
, . , , , .
:
,
,
;

;
.

, .
, 1.1.6. , .
- k > 2
k .

13

1.1.6

, F,
.
(5) . , , N
X , , f (xi ) = yi
(xi , yi ). , , , . ,
: f , f (xi ) = yi f (x) x .
,
, ( ) .
(overfitting).
, ,
. ,
. , : ,
.
, Y F ( , ), , dim(F) < N dim(Y),
N dim(Y).
, .
, 5 (VC-dimension), .
(structural risk minimization) ( , ).
[114, 113].
5 - ( ) dim
VC (F) F X Y. Y. Y = {0, 1},
, f F .
. - F X n ( , ),
n x1 , . . . , xn X , F 2n .
- . ,
- .
.

F X =
ft (x) = {(x,t)|xt0} (x, t) t
dimVC (F) = 1 (; S
S).

F X = fw,b (x) = {(x,w,b)|wx+b0} (x, w, b) w, b


( , ), dimVC (F) = 2 ().

14

,
, , k
. F, W, .. f (x)=F (w, x)
F : W X Y w W,
,
(5)
N
X

E(F (w, xi ), yi ) min ,

(7)

kwkC

i=1

C > 0. (
(5)) ,
N1 .
, ,
. , W
( , Rn ,
) , (
, -
{w|(w) C}), (4)

(w) + E(F (w, ), T ) = (w) +

N
X

E(F (w, xi ), yi ) min ,


wW

i=1

(8)

(8) w
F (w, xi ) = yi (. [108]). (w) = kwk
(w) = kwk2 , k k
.
(7)
, ,
kwk2 +

N
X

E(F (w, xi ), yi ) min ,


wW

i=1

F X = d fw (x) = {(x,w)|(w,x)0} (x, w) w


(d 1)-, dimVC (F) = d (!).

(9)

Rd

. F X = ft (x) = (x + t) t [0, 1]
: {0, 1}. ,
[n, n + 1] n-
;
j
k

( ) = 2b c ( b c) mod 2

dimVC (F) = . , ft 1, 2, . . .
t , ,
.
- .
- , ,
. , - , .

15

. . .

. . .


kwk +

N
X

E(F (w, xi ), yi ) min ,


wW

i=1

(10)

C > 0 0, ,
.
1
(9) (10) 0 (7)
C 0.
E(F (w, x), y) w x y w
, (7) C > 0
(9) (10) 0.
(7) (
).
.
. w ,
(7) , (9)
.
1.1.7

!!!!

!!!!

, ,
C, , (8) , , (7) (2),
. ( , ..) , .
T ,
(validation
set, ) T 00 , .
, , T ,
T 00 :
.

. , k
k , 1
:
, kwk
k,
.

, -. k - ,
k , k , k-
,
k- , k .
(-),
16

,
.
N - - -
N (LOO, Leave-OneOut).
, .. ,
, .

, .

1 N
(2) ,
, N 1 .
( )
(
). .

1.2

1.1.4
, ( X Y), ,
( P
). , - ( P , , -), ( ),
(). ,
, ,
,
.
1.2.1

, .
, (2)
, , F .
:
x X , y Y
.
P {y|x} .
() 0-1-:
E(r, y, x) = 1 yr ,

(11)

ab , .. 0 1
.
X Y p (y|x), f E (f ). ,
x , .. y, p (y|x),

17

!!!!

, , y
x:
Z
f (x) = arg min
E(r, y, x)dp (y|x)
(12)
rY

yY

= arg min(1 p (y|x)) = 1 arg max p (y|x) .


rY

rY

( Y yY E(r, y, x)p (y|x)dy


P
yY E(r, y, x)p (y|x).) (11)
Z
E (f ) =
E(f (x), y, x)d(x, y)
(13)
Z

(x,y)X Y

E(f (x), y, x)p (y|x)dy p (x)dx

=
xX

yY

.
, , ,
p (y) p (x|y), p (y|x)
. , ( ), ,
, ,
.
, , .

x p (y|x)
Z
f (x) = arg min
E(r, y, x)p (y|x)dy
(14)
rY

yY

, (2).
E(r, y, x) = (r y)2
:

Z
Z
Z
d
0=
(ry)2 p (y|x)dy = 2
(ry)p (y|x)dy = 2 r
yp (y|x)dy ,
dr yY
yY
yY

Z
f (x) =

yp (y|x)dy,

(15)

yY

.. ( ) . , ,
. ,
,
1.2.6.
. - E(r, y) = |r y|.
1.2.2

EN
T = ((x1 , y1 ), . . . , (xN , yN )) = ((x1 , . . . , xN ), (y1 , . . . , yN )) = (X, Y)
18

!!!!

N (.. Y = {0, 1}), . EN N


EB ( 1.2.1). P {0|x} = p (0|x) P {1|x} =
1 p (0|x) x. x
0, P {0|x} > P {1|x}, 1, P {0|x} < P {1|x} ( P {0|x} = P {1|x}, ).
x EB (x) = min(P {0|x}, P {1|x}).
(xn , yn ) T = (X, Y)), (x, y)
( ) X Y.
Y y yn
P {y 6= yn } = P {y = 0, yn = 1}+P {y = 1, yn = 0} = P {0|x}P {1|xn }+P {1|x}P {0|xn }.
xn x,
P {y 6= yn } 2P {0|x}P {1|x},
Y
EN (x) = P {y 6= yn } 2P {0|x}P {1|x} = 2(1 EB (x))EB (x) .

(16)

,
EB (x) EN (x) .

(17)

(16) (17) x
X, N ( . [27]) f (t) = 2(1 t)t,

EB lim EN 2(1 EB )EB 2EB .
N

k
, , . [27]. ,
.
1.2.3

.

, .. f F, , F.
, F = {F (w, )|w W} (, )
W F : W X Y, ,
F , T . ,
P (,
Rd , ,
),
.. d + d(d+1)
2
.
19


.
, (X = Rd , Y = R,
Pd
W = R Rd , F (w, x) = w0 + j=1 wj xj ) .
2.
(. 1.1.3). F,
, , .
: ( 1.1.5)
? ?

,
E(f, T ), E(f, T 0 )
. ( discriminative methods,
, , ,
f ).
, , - .

(generative methods, , ,
).

F
E.
: (x, y), (y|x) (
(1)). , , y x
(y|x), . ,
( 2.2.1).
1.2.4


:
. , P ( ), .. .
P ( ) T
p{T |} =

N
Y

p (xi , yi ) .

(18)

i=1

P
R
p{T |}d()
0
() = P {|T } = R
p{T |}d()
P
20

(19)

!!!!

0 P. p ,
p0 () = p{|T } = R

p{T |}p ()
.
p{T |}p ()d
P

(20)

. -
, ()!
: 0 () p (x, y)
Z
P,T (Z) =
p (Z)d0 () , Z X Y
(21)
P

Z
p,T (x, y) =

p (x, y)p0 ()d ,

(22)

p0 , x
P,T (Y ) =

P,T ({x} Y )
, Y Y
P,T ({x} Y)

(23)

, ,
p,T (x, y)
,
p (x, y)dy
yY ,T

p,T (y|x) = R

(24)

( (1)) .
, ,
(), y p,T (y|x),
- E(r, y, x):
Z
(25)
E(r, y, x)p,T (y|x)dy min .
rY

yY

()
x y 0 y 00

p,T (y 0 |x)
.
p,T (y 00 |x)
.

,
R
QN
p (x, y 0 ) i=1 p (xi , yi )d()
p,T (y 0 |x)
P
=
.
R
QN
p,T (y 00 |x)
p (x, y 00 ) i=1 p (xi , yi )d()

(26)

. (26) ,
, ,
.
, :
. , . , , , :
21

!!!!

, P , X
P P .
, , 6 ,
- .
, . . . , , . . . : ,
, , ,
,
. , .
, P (21)
(22)
0 p (y|x).
. , P , , , p (), P
p (x, y). , ,
,
P () . (22) p (x, y)
() ,
,
.
(MAP , maximum of aposteriori probability)
N
Y
p(T |; ) = p ()
(27)
p (xi , yi ) max .
i=1

X Y
x (y|x) /
, p (y|x), ,
.
. , . ,
? , , ,
, ( noninformative prior ). P , ,
, .

T
6

, , ,
.

22

p(T |) =

N
Y

p (xi , yi ) max ,

(28)

i=1

(likelihood ) . (ML, maximum


likelihood ).

P, P
. ,
p , - , .. , ,
, , , , . , , ,
. , ,

(. [72]).
.
E(r, y, x) (y|x)
:
, x. ,
, . 21,
(25):
. ,
( Y) 0-1-;
(y|x) ;
y p (y|x)
E(r, y, x),
p (r|x).
, , , . ,
( , , ) -.
(1 < r y < 2 ) , (r y 1 ) 1 ..,
(r y 2 ) 10 ..
F : , . F - ,
X Y
, (, , ,. . . )
Z
E(f, ) =
E(f (x), y, x)d(x, y) min .
(29)
f F

(x,y)X Y

23

!!!!

.
.

,
(ML, MAP ) . 1.2.6.
.
1.2.5

q- d- X = {(x1 , . . . , xd )}, j-
Qd
M i . D = j=1 M j .
lk l- k- .

P
(Dq 1)- lk 0, l,k lk = 1
Dq- . MAP ML
1.2.4 , , ,
.
, D
N . N Dq 1 , - ,
.
D . ,
d = 60 , D = 2d = 260 1018
(q = 2).
, P. , , , .. y
p((x1 , . . . , xd )|y)
p((x1 , . . . , xd )|y) =

d
Y

p(xj |y) .

(30)

j=1

.
q Pd
k = p(y = k) qD0 , D0 = j=1 M j , j
mk
= p(xj = m|y = k). ((q 1) + q(D0 d))-
(q 1)-
(
)
q
X
q
q
s = (1 , . . . , q ) R+ |
k = 1
k=1

qd
sM
jk

Mj

X
j
j
j
j
M
= (1k
, . . . , M
|
mk
=1 .
j k ) R+

m=1

24

60
121.

1.2.4, (19) (23) .
.
.

S d (r) sd (r) d- (d 1)-
d- ,
, :

X
S d (r) =
x Rd+ |
xj r

j=1

X
xj = r .
sd (r) =
x Rd+ |

j=1

ES (f )
f S Rd dim(S) (
f ).
1 xn1 1 xnd d sd (1),
Qd
(d 1)! j=1 nj !
nd
n1
.
Esd (1) (x1 xd ) = P
d
n
+
(d

1)
!
j
j=1
. : :
1.

Z
d

v(S (r)) =

dx =
xS d (r)

rd
;
d!

2.
Esd (r) f (x1 , . . . , xd1 )g(xd )
=

ES d1 (r) f (x1 , . . . , xd1 )g(r

xj )

j=1

R
=

d1
X

xS d1 (r)

f (x1 , . . . , xd1 )g(r

Pd1
j=1

xj )dx

v(S d1 (r))

3. :
Z
xS 1 (r)

xn1 1 dx =

rn1 +1 n1 !
rn1 +1
=
;
n1 + 1
(n1 + 1)!

4. :
Z
rn1 +n2 +1 n1 !n2 !
xn1 1 (r x1 )n2 dx =
;
(n1 + n2 + 1)!
xS 1 (r)
25

5. :
Pd
Qd
Z
d1
d1
Y n
X
r j=1 nj +(d1) j=1 nj !
j
nd
.
xj (r
xj ) dx = P
d
xS d1 (r) j=1
n
+
(d

1)
!
j=1
j
j=1
1, 2 5 sd (1), .
.

T = ((x1 , y1 ), . . . , (xN , yN )).
Nk =#{i|yi = k}, k = 1, . . . , q
j
Nmk
=#{i|xji = m, yi = k}, j = 1, . . . , d, m = 1, . . . , M j , k = 1, . . . , q .

(31)
(32)


p(T |) =

N
Y
i=1

N
Y

p((xi , yi )|) =

N
Y

p(yi |)p(xi |yi , )

i=1

p(yi |)

d
Y

p(xji |yi , ) =

j=1

i=1

q
Y

(k )Nk

d Y
M
Y

j
(mk
)

j
Nmk

(33)
.

j=1 m=1

k=1
j

1 sq sM
jk , ,

q
X

Nk = N

k=1
j

M
X

j
Nmk
= Nk , , (20)

m=1

Qq

p(|T )

(k )Nk
Q
(q1)! qk=1 Nk ! Qq

Qd

j
j
Nmk
m=1 (mk )

Qd

j
(M j 1)! M
m=1 Nmk !
j=1
(Nk +M j 1)!

k=1

(N +q1)!

QM j

j=1

k=1

(34)

,
.

x y
Z
PT (x, y)=
P (x, y)p(|T )d
P

Qq
Qd QM j
j
j
Nmk
Nk
Z
d
(
)
(
)
Y
k
j=1
m=1 mk
k=1
=

xj j y d
QM j
Qq
y
j
j 1)!
Q
Q
N
!
(M
N
!
(q1)!
d
q
k
m=1
mk
k=1
P
R
=

Qq
P

(N +q1)!

(k )(Nk +y )
Q
(q1)! qk=1 Nk ! Qq
k=1

(N +q1)!

k=1

d
Y

j=1

Qd

QM j

j=1

j=1

j
m
j
(Nmk
+yk x
j)
m=1 (mk )

Qd

k=1

(Nk +M j 1)!

j=1

QM j

j
m=1 Nmk !
+M j 1)!

(M j 1)!
(Nk

Nxjj y

+1
Ny + 1
,
N + q j=1 Ny + M j

(35)

ab ;
1 .
:
PT (y|x) =

Qd N jj y +1
(Ny + 1) j=1 Nyx+M
j
PT (x, y)
Pq
=
.
j
P
Q
N
+1
P
(x,
k)
q
d
xj k
k=1 T
(N
+
1)
j
k
k=1
j=1 Nk +M
26

(36)

!!!!

PT (x, y), .
. (35) , y 0 y 00
T (y 0 , y 00 |x)= ln

PT (x, y 0 )
PT (x, y 00 )

= ln(Ny0 + 1) ln(Ny00 + 1) +

d
X

ln

j=1

= ln(Ny0 + 1)

d
X

Nxjj y0 + 1
Ny0 + M j

ln

ln(Ny0 + M j ) ln(Ny00 + 1)

j=1
d
X

Nxjj y00 + 1

Ny00 + M j
d
X

ln(Ny00 + M j )

j=1

ln(Nxjj y0 + 1) ln(Nxjj y00 + 1)

(37)

j=1

(37) y 0 y 00 x. , x
y 0 , y 00 , - .
T (y 0 , y 00 |x)
y 00 y 0 (36)
(!), , , : y 0 , T (y 0 , y 00 |x)
( !).
. (37) , x, d
, . .
O

,
, .
2 p(x) = xn1 1 xnd d sd (1),

!
n1
nd

x = Pd
, . . . , Pd
.
j=1 nj
j=1 nj
.
, ,
nj > 0, xj > 0. , nj = 0, xj = 0 (
x

j
1

j
j
), 1x
x 1x e , e j-
j
j
, , x . , ,
, , nj > 0
.

Pd
L(x, ) = p(x) + 1 j=1 xj
, x
(
n p(x)
L
0 = x
= jxj
j = 1, . . . , d
j
Pd
L
0 = = 1 j=1 xj

27

!!!!

!!!!

j- d xj , j ,
d
X
= p(x)
nj .
j=1

d , .
,
(27) ,
p(T |). 2 (33),
j
k = NNk , mk
=

PT (x, y)

j
Nmk
Nk .

, (x, y)

Qd
j
j
d
Ny Y Nxj y
j=1 Nxj y
=
,
N j=1 Ny
(Ny )d1 N

(38)


PT (y|x) =

Qd

j
j=1 Nxj y
Q
d
j
1d
k=1 (Nk )
j=1 Nxj k

(Ny )1d

Pq

(39)


T (y 0 , y 00 |x)= ln

PT (x, y 0 )
PT (x, y 00 )

=(1 d) (ln(Ny0 ) ln(Ny00 )) +

X
ln(Nxjj y0 ) ln(Nxjj y00 )

(40)
.

j=1


(35) (38) ,
: , , .
j
, Nmk
,
, - : , , ,
. , ,
( ) ,

.
, , ,
,

q
d Y
Mj
Y
Y
k
m
j
(k )n
p()
(41)
(mk
)njk ,
k=1

j=1 m=1

, 1
. , 2 , , 1
28

-7 ,
, , .
.
, (41).
. :
j
T Nk Nmk
(3132);

x q y PT (x, y) (35) (38);


y, PT (x, y).

Y (.. , ),
X .
, xj p(xj |y) , - . ,
, ..
,
, 8 .
.
, , .

, , ,
. - , ,
- ( )
( ), ,
, ( , ).
, ,
.
1.2.6

()

p(y|x) Y, , , ,
, .
R
(x) = 0 tx1 et dt x > 0.
(x) = (x 1)! x (x + 1) = x(x) .
8 ,
.
7 :

29

!!!!

(., , [52],
): F Y (, R)
, PY
( ) ( ), . 1.1.2.
F .
, ,
(, belief ) ,
-
. , f (x) Y, f F x X , ( )
y Y (.. ) f (x). -
.
(., , [14]) , F, , , pf (y|x),
f F , x X y Y, p(x, y) = p(y|x)p(x) P. ,
- . , ( )
X ,
. p(x) X
; ,
x1 , . . . , xN .
[ ],
().
,
F f (x) Y p pf (x) . f , X=(x1 , . . . , xN ) T = ((x1 , y1 ), . . . , (xN , yN ))
Y=(y1 , . . . , yN ) [.. Y X f ],

N
Y
p(Y|X, f ) =
pf (xi ) (yi ),
(42)
i=1


f [ ]

p(Y, f |X) = p(Y|X, f )p(f ),


[ ]
Z
Z
p(Y|X) =
p(Y, f |X)df =
p(Y|X, f )p(f )df
f F

f F

f [ f ]
T
p(f |T ) = p(f |X, Y) =

p(Y, f |X)
p(Y|X, f )p(f )
=R
.
p(Y|X)
p(Y|X, f )p(f )df
f F
30

(43)

(43)
: [
], . 1.2.4. []
Z
fT =
f p(f |T )df

f F

( fT F, ), , , , , f (x) , . [] .
(43) f
N
Y
p(Y, f |X) = p(f )
pf (xi ) (yi ) max
(44)
f F

i=1


ln(p(Y, f |X)) = ln(p(f ))

N
X

ln(pf (xi ) (yi )) min .


f F

i=1

(45)

E(r, y) = ln(pr (y)) (f ) ln(p(f )). (45)


(f ) +

N
X

E(f (xi ), yi ) min,

(46)

f F

i=1

.. (8)
. , E
Y Y, - PY Y.
(42),
[], (4) .
1.2.7

, (Y = R), X , : X Rd .
d+1- F (
)
2

12 (yw(x))
2s

pw,s (y|x) = (2s)

(47)


s Pd
(x) w(x) = j=1 wj j (x). , ,
,

.
T (42),

N

ln(p(Y, w, s|X)) =

1 X
N ln(2) N ln(s)
+
+
(yi w(xi ))2 min ,
w,s
2
2
2s i=1
31

. . .

. . .

, w
N
X

(yi w(xi ))2 min


w

i=1

(48)

( 9
, xi , j (x)) s
N

N ln(s)
1 X
+
(yi w(xi ))2 min ,
s
2
2s i=1

s=

N
1 X
(yi w(xi ))2 .
N i=1

x y (47) , .. w(x), s.
, s (, s = 2008), ,
s = 2008 .
1.2.8

PY Y = {1, . . . , q} (q 1)-
{(y 1 , . . . , y q ) Rq+ |

q
X

y j = 1} ,

j=1

F

X Rq , Pq
j
j
(f 0 j=1 f = 1), f j (x) = p(j|x, f ).
.
Y,
PY , , k 7 (k1 , . . . , kq ). p(y|x, f )

q
Y
j
p(y|x, f ) =
(49)
(f j (x))y .
j=1

(49) y PY ,
, , ( ,
). (49)
E(f (x), y) = ln(p(y|x, f )) =

q
X

y j ln(f j (x))

(50)

j=1
9
, 2.1;
.

32

(cross entropy) f (x) y, q- {1, . . . , q}. , ( (44),


(45), (46)),

q
N X
X
(f )
yij ln(f j (xi )) min ,
(51) ,
f F
i=1 j=1

- ,

q
N
XX j

(52)
yi ln(f j (xi )) min .
f F
i=1 j=1

f () = f 1 () yi = yi1
(51) (52)
(f )

N
X

yi ln(f (xi )) + (1 yi ) ln(1 f (xi )) min ,


f F

i=1

N
X

yi ln(f (xi )) + (1 yi ) ln(1 f (xi )) min ,


f F

i=1

(53)

(54)

.
. ,
, () , Y = {1, . . . , q}, . : (50)

E(f (x), y) = ln f y (x) ,


(55)
, (51)
N
X
(f )
(56)
ln(f yi (xi )) min .
f F

i=1

, , , (50) (55).
() F
, , 2.2.2.

1.3

, .
. ,
.

33

1.3.1

P ( 1.2.4),
.
p (x, y)
Z
p (x, y) = p (x)p (y|x), p (x) =
p (x, y)dy
yY

(. (1)). : p (x), , p (y|x), . ,


( ),
, . , ,
, , .
,
, (unsupervised
learning) (supervised learning). ,
.
1.3.2

X X X N
. 1.3.1
, .
()
1.2.5 p(y) : . ,
, {0} Y
T = ((0, y1 ), . . . , (0, yN )).
(outlier detection), :
, ( ) .
, , .
:
F {f : X {0, 1}} T -
.
4.2.7.
: , .
1.3.3

.
N , .. .
34

(), X q
, , , , . ,
- , , , . ,
, , f : X Y = {1, . . . , q},
, , , Y.
. ,
, , , ,
,
- .
, . , ( : N q ), N q
, ( q!) . , ,
,
.
. ,
, ,
f : X Rq , x
arg maxj fj (x) ( ), ,
maxj fj (x).
( 1.3.2).
: , P
fj (x) 0

j fj (x) = 1 X

N
N
Y
Y
max
fji (xi ) =
max fj (xi ) max .
(57)
(j1 ,...,jN )Y N

i=1

i=1

jY

f F

- , , -, , .
.
.
,
, .
,
..
(57) . ,
.
, - , ,
.
.
, , , .
, : , , 35


, .
, . ,
(: , , , ,. . . )? ,
- (: ,
, , ,. . . )?
q-
( 1.2.1) d- X , : p(y) 1q , p(x|y)

2
1
d
p(x|j) = (2) 2 e 2 (xj ) , 1 j q.
q 1 , . . . , q X .
( (28)) X = (x1 , . . . , xN ) - y1 , . . . , yN
,

N
Y
i=1

p(xi , yi ) =

N
Y

(p(xi |yi )p(yi )) =

i=1

N
1 Y
p(xi |yi ) max ,

q N i=1

..
N
X

ln(p(xi |yi )) min ,

i=1


N
X

(xi yi )2 min .

i=1

(58)

, :
. ,
P
i|yi =j xi
j =
#{i|yi = j}
(58).
, j
( ) j- .
xi j, p(xi |j) , .. kxi j k . , ,
.., , .. .

1.3.4 . k-means (k ), k
36

!!!!

k-means

(, q), a means . 25
, (Stuart Lloyd)
: [79]. , k-means ,
([5]). ,
.
k-means ,
,
.
k , , (8), - k
(57). ,
k, .
1.3.4

. f : X Y
c : Y X , y c(y), . ,
.

(vector quantization). . , , ,
X . ()
, . ()
.
,
( 1.1.4). :

() X , -
, d- Rd ;

Y, , ;
F f : X Y, ,
X Y;
C c : Y X , , Y X ;
P ( ) X , ,
X ,
, ..;
E : X X R, , 0
, ,
X E(x, s) = kx sk2 ;

37

X= (x1 , . . . , xN ), xi X ,
P.
X , Y, F, C, P, E X f F
c C, E (f )
Z
E (f, c) =
E(c(f (x)), x)d(x) min ,
(59)
f F ,cC

xX

P. ,

N
1 X
E (f, c) E(f, c, X) =
E(c(f (xi )), xi ) min ,
f F ,cC
N i=1

(60)

, (59) , . 1.1.6. , f c (59) (60) c f .


, ,
, (60),
, E(f, c, X0 ) X0 .
1.2.4 .

Y, . , X Y , dim(Y) < dim(X ), C ,
dim(Y)-
( , ), .
, (dimension reduction).

: Y
X ,
c f , (), ( ).
(LVQ,
learning vector quantization) (SOM , self-organizing maps), [68]. , (factor analysis)
(principal component analysis) ,
p(x|y), ., , [15].

, ,
X Y ( 1.1.4). ,

X X 0 Y0 Y ,

38

X Y , X 0 Y 0 (, ,
, , ,
, ), , f ()
F, (-,
, ) X 0 Y 0 . ,
f (x) = wx f (x) = wx + b, .
, , Y 0 = Y = R . Y, ,
, - , ,
Y = {0, 1}
- (t) = 21 (1 + sign(t)), (0) = 12 .
, Y , , . 2.2.2.
2.1 2.2 , , X = X 0 : X X 0
.
2.3.
20- ,

( ).

. 21-
:
P
, ,
, ;
X
, (4),
;
X ,
,
( ) X 0 , ,
.

, 1, .
[14, 50] [98].
.
f (x) = wx f (x) = wx + b. , ,
- X 0 1 x = (x1 , . . . , xd ) x0 = 1, w
w0 = b. X .
39

x
x

1
x
=
R X0 .
x

(61)

w b X 0 Y 0 ,
w

w
= ( b w ) : R X 0 Y0 .

(62)

Y 0 (..
- )
f (x) = w
x f (x) = (w,
x
)
f (x) = b + (w, x) = w0 + (w, x).

2.1
2.1.1

(48)
E(w,
T) =

N
X

(yi w
xi )2 min .
w

i=1

(63)

E(w,
T ) w
( ), , .
w.
(63)
E(w,
T)
= 0.
w

, . X (R X )N ' R(d+1)N y Y N ' RN

1 1
x11 x1N

X = (
x1 , . . . , x
N ) = . .
..
..
..
.
xd1

xdN

y = (y1 , . . . , yN )
10

. Y
w
(-)
w
= (w0 , w1 , . . . , wd ).
(63)
>

E(w,
T ) = (y wX)(y

wX)

min,
w

(64)

10 , , . ,
, , .
.

40

!!!!

11
0=

E(w,
T)
>
= 2X(y wX)

,
w

(65)

>
wXX

= yX> .
>

(66)
1
>
N XX

>

XX yX , ,

Z
!

1
>
N yX ,

xk xl d(x, y)
(x,y)X Y

1k,ld

!
k

yx d(x, y)
(x,y)X Y

1kd

, . XX>
, (63)
w
= yX> (XX> )

(67)

, , . (66) w

, (67).
q- Y E(r, y) = A(r y, r y)
A (
).

:

>
>
E(r, y) = (r y) A(r y) = tr A(r y)(r y) .

(64)

>
E(w,
T ) = tr A(wX
y)(wX
y) ,
w

>

AX(wX
y) = 0,
y q-:

y11
..
y = (y1 , . . . , yN ) = .
y1q

..
.

1
yN
.. .
.
q
yN

A
(66).
PN
PN
. i=1 xi = 0 i=1 yi = 0,
(63) w0 = 0.
11 , (= ),
- .

41

!!!!

2.1.2

( (8), (46))

(kwk)
+

N
X

(yi w
xi )2 min,
w

i=1

(68)

kk , , lp -,
.. p- p- p > 0
p = 0. . ,
( ), ( ), ,
.
, , , , , 12 .
(k k), (68) , , - .
.
l2 - kwk
22 kwk22 ( , ridge regression
[56])

12 (yw(x))
2

pw (y|x) = (2)

(69)

(. (47)) 0
1 .
kwk
22 +

N
X

(yi w
xi )2 min ,
w

i=1

(70)

(63), ( (63) ) .
(63),
1
w
= yX> (Id+1 + XX> )
(71)
(. (67)), In Rn , .. n n. ,
, w0
(62), .
.
In0 =01 In1 (n 1)- Rn ,
, .

12 ,
, .

42

. . .

kwk22 +

N
X

0
(yi w
xi )2 = kwI
d+1
k22 +

i=1

N
X

(yi w
xi )2 min
w

i=1

(72)

(70) (71) :
1
0
w
= yX> (Id+1
+ XX> ) .
.

0
, Id+1
+ XX> .

. . .

!!!!

l1 - kwk1 (, LASSO, lasso regression [110])



(69)
0
a qd
p(w) =
eakwk1
2
(
).
kwk1 +

N
X

(yi w
xi )2 min
w

i=1

(73)

, ,
, , . , ( LARS , . [35])
(73) .
(73) , .. w
0. ,
, ,
. , d N .

(73), 0,
N
X
(yi w
xi )2 min ,
kwk1 C

i=1

(74)

C > 0, ( 1).
kwk1 C (, 2qd , , , ).
( , 3qd 1) ,
C.
w,

, , . (, )
, .. (,
). (74) , .

,
. (C , ),
43

LASSO

LARS,

...
...
..
. ...... .
...
......
...
..
..
.
...
..... ...... . ....
. .
. ..
...
. .....
...
.
.
.
.
.
.
. .
.. .
.
.
.
.
.
.
.
.
.
.. .
.... ... .. . .. ..
.... .. .
. .
...
....
.
........ ..
... .
.... . .
.
.. . ... ..
.
.
...
.
.
.
.
.. ...
..... ........ . ...... ......
...
.
.......................................................................................................................................................................
.
.
.
.. . . . . ..
.
....
.... ... . .
.........
.
. .. ...
....
.
.......
...
.
.
.
.... .. ...
. .. .
... ... ...
....
....
...
....

r r

...
...
..
...
...
...
...
...
...
...
...
...
...
..
..................................................................................................................................................
...
...
...
...
...
...
...
...
...
...

..
....
.
.
.
...........
.

. 1: (74). . l1 -, -
, ,
. ( )
C.

. (74) .
( C) , , (..
, . . 1) ,
.
, lp - , p = 1 . (68)
( , ) p 1,
p 1.
l1 l2 - (elastic net [127])

- lp - . , (68) (w)
= 1 kwk1 + 2 kwk22 , 1 2 ,
. (68) ,
2 1 ,
, LARS. , 2 kwk22 (73)


0

(x0 , y0 ) = 2 , 0 .
..
. 2
, , ,
. , ,
13 . ,
, , , ,
.
13
.

44

[127] , elastic net14


, , , , (
, ) .
l0 - kwk0 (subset selection)
,
.
N
X
(yi w
xi )2 min .
kwk0 C

i=1

(75)

C-
w, .. dq- , , ,
dq . : ,
"leaps and bounds" ([43]) ( 50) . - ,
, .
2.1.3


. , (. . 2):
E(r, y) = |y r| ;
E(r, y) = |y r|p , p > 0 ;

|y r|2
|y r|
([60]) E(r, y) =
;
(2|y r| ) |y r| >

15

- E(r, y) = max(0, |y r| ) ;
- E(r, y) = (max(0, |y r| ))2 .
, , ,
.
:
( );
,
( );
14

elastic net ( ) , , .
15
, 0-1-, .

45

E ......
...6
..

E1

..E2
.
....
....
...
..
...
... ... ....
...
.
.
....
...
.
...
.. .... ....
.... ..
.
...
.
...........
.
.
.........
.
.
.
.
.
.
E 21 . ....... ... ......... ...... ............ ...... . ..
.
......
.......
.
...... ....
.
............ .... ................
. .. .
..........
..... .. ..
...................................................................................
....................
...................................................................................
..
...
y f (x)
....
.
....

....

....
.

E .....
.E
.. H .E
...6
...
...
.... ..
.. 2,
. ..
....... ..
...
.
E1, ..... ... ...
...
.
...
.
..
...
... .
.
. .
...
... ....
. .. ....
... .. .
...
. . ..
.
.
.
.
.
.
.
.... . ..
.. . ...
.
....
.... ...
.... ...
...
.
...
.......... ..
.........
.
. ..
...
.
.... .
...
.. ......
...... ..
...
. .
............. .
...
.
..... . ..
.
.
.
.
.
.
.
.
.
.
.
.
.
......................................................................
......................
...........
........................................................................................ ....... 
y f (x)
...
.
..

. 2: , . Ep , EH - Ep, .
.

,
( );
,
( );
, 0, , ( ), ,
( );
(
, , ,
).
-
( , )
. (SVR, support vector regression),
4.2.4. d N , , (lasso (.
43)).

2.2

,
P {j|x} = f j (x) x
q , , , ,
. ,
. ,
f j (x), .
f j (x), , .
,
. : (, ) ,
,
().

46

2.2.1

( )

q- d- X (, , ) ( 1.2.4). , ,
(j-) q
j X ,
A, .
j- j , j > 0 (
j- ). 16
d

pj (x) = p(x|y = j) = (2) 2 |A|

12 12 (xj )> A1 (xj )

12 21 (xy )> A1 (xy )

p(x, y) = y py (x) = y (2) 2 |A|

p(x) =

q
X

p(x, j) =

j=1

q
X

j (2) 2 |A|

12 12 (xj )> A1 (xj )

(76)
.

j=1

P {y|x}, ,
>

P {y|x}

=
=

y e 2 (xy ) A (xy )
p(x, y)
= Pq
12 (xj )> A1 (xj )
p(x)
j=1 j e
ey

Pq

j=1

>

A1 x 12 y > A1 y +ln(y )

e j

> A1 x 1 > A1 +ln( )


j
j
2 j

ew

P {y|x} = Pq

j=1

(77)

(78)

ew j x

w
: X Rq .
j , j 1 j q A
T .
, ..
ln(p(T )) =

N
X

ln(p(xi , yi )) min ,

i=1

,,A

p(xi , yi ) (76).
j 1,

q
X
L(T, , , A, )= ln(p(T )) +
j 1
(79)
j=1

N
X
1
1
d
> 1
=
ln(yi ) ln(2) ln |A| (xi yi ) A (xi yi )
2
2
2
i=1

q
X
+
j 1
j=1
16 :
A1 x1 x2 x2 > A1 x1 ,
A1 A .
(A1 x1 , x2 ), A1 (x1 , x2 ) .

47

, j , j A.
A, A1 (
).
. , M

( ln |M |) =
(ln |M 1 |) = Mlk ,
(M 1 )kl
(M 1 )kl
A1 , -
,

(ln |A1 |) = (#{k, l}) Alk ,


kl,
(A1 )kl
.. ,
, 2.

0=

L
j
L
0=
j
0=

L
0=
A1
kl

q
X

!!!!

j 1

j=1

=
=

#{i|yi = j}
+
j
X

A1 (xi j )

i|yi =j

N
1X k
(#{k, l}) Alk +
(x kyi )(xli lyi )
2
2 i=1 i

N
#{i|yi = j}
P N
i|yi =j xi

(80)

(81)
#{i|yi = j}
PN
k
k
l
l
i=1 (xi yi )(xi yi )
Akl =
,
(82)
N
,
. ,
:
NNq (!). A
( N d).
. A. ! .
(76) , (77),
( 1.2.1). , ?
, , ij (x), pi (x) > pj (x) ( x i- , j-) ij (x) > 0. ,
,
48

!!!!

!!!!

...
.

...
.
...
.

.
...
.

....

.
...
.

...
.
...
.

.
...
.

....

.
...
.

..

...
..
.
......
......
... ....... .. .... .... . ... .......
..
...
.
...
... ..
.
.
. .
...
..
...
...
... .... . . ........
...
...
.
...
. ..
.
.
...........................
...............................................................................
... ..
... . .... . .... 2 .....
1
.
...
... .
..
..
. ..
...
...
.
..
..
. ..
...
... ..
....
..
..
.
.
.
.
...
.
.
..... ..
.
.
.........
...
.
....
...
..

...
.

...
.
...
.

1 = 2

.
...
.

...
.

.
...
.

1 > 2

...
.
...
.

.
...
.

...
.
.

. 3: (83)
j

pj (x) p(x, j), pi (x) pj (x) p(x, i) p(x, j).


, ,
ln

pi (x)
pj (x)

i
1
>
>

(x i ) A1 (x i ) (x j ) A1 (x j )
j
2

1 > 1
i
>

i A i j > A1 j + (i j ) A1 (x)
= ln
j
2

i
i + j
> 1
= ln
+ (i j ) A
x
.
(83)
j
2
=

ln

A = Id i = j 0 (83) , [i , j ] .
i 6= j , A 6= Id
, [i , j ]
A1 (. . 3). : d- ( A = Id ,
A1 )
(q 1)- j .

, , ,
. (80)
(81) , (82) q
P
k
l
i|yi =j (xi yi ) (xi yi )
kl
Aj =
.
#{i|yi = j}
(83) , ,

() .

49


( ), ,
, .
. (q 1) + qd + d(d+1)

2 ,

(q 1) + q d + d(d+1)
, .. q 17 . 2
,
,
, , .
,
.
,
. , ( , ) ,
. ,
.
,
, 1.2.4,
.
(83), (80), (81), (82)
. 1936
[36]. ( )
- . , ,
2.2.2 ,

.
2.2.2

(logistic regression)

2.2.1
(78) . (q(d+1) 18 (q 1) + qd + d(d+1)
)
2
( )
, .
1.2.6 (43)
(46),
.
1.2.8 , f y (x), P (y|x),
(. (51),
(52)). , f (x) . . f j (x) ( 17 1.1.6, , -.
18 , , (q 1)(d + 1)

50

) , .. eh (x) .
, , ,
-, e100 = 0.
:
eh

f (x) = Pq

(x)

ehk (x)

k=1

(84)

(84)

hj , , , hq = 0,
P
q
j
j=1 h = 0.
(84), (51),
(logistic regression).
hj (x) = w
j x
,
ew
f j (x) = Pq

k=1

ew k x

(85)

, w
q = 0.
(85),
j
(x)
ff i (x)
(w
j w
i )
x +b = 0. ,
j
i
{x|i {1, . . . , q} f (x) f (x)} j .
, ,
w ( w)
(w)
= kwk22 ,

q
q
N
X
X
X
j

L(w,
T, ) = kwk22
yij w
j x
i ln
ew xi min
(86)
q
i=1

j=1

j=1

w
=0

( (85) (51)
P
q
j
j=1 yi = 1). , (86) w,
(71), .
w
, , () (!). (86)
. (-)
19 - .
(86)
, , ([110], . 2.1.2),
19 . . ,

1, .. ,
. n
n .
(
). ,
2,
.
. , .
, 40 .

51

!!!!

. , , ., , [76].

q = 2,

w
1 = w,

f 1 (x) = f (x) =

w
2 = 0,
w
x

e
x
1 + ew

y 1 = y, y 2 = 1 y,
1
1
=
, f 2 (x) =
x
x
1 + ew
1 + ew

(87)

(86), :
L(w,
T, )

0
= kwI
d+1
k22

N
X

xi
yi w
xi ln(1 + ew
)
i=1

L(w,
T, )
0=
w

2 L(w,
T, )
w
w
>

0
2Id+1
w
>

N
X

yi x
i
i=1

0
= 2Id+1
+

N
X
i=1

xi
ew
x

xi i
1 + ew

xi
ew
x
x
> .
xi )2 i i
(1 + ew

(88)

(89)
(90)

, ,
. , 0
, , Id+1

(.. (t, 0, . . . , 0)) N .


. (84) Y = {1, 1} Y = {1, 2} h1 = h, h1 = 0
(50) :
f y (x) =
E(f (x), y) =

1
1 + eyh(x)
ln(1 + eyh(x) ) .
!!!!

.
logistic regression, (84)
hj (x) logistic functions, log-odds logit, , . logistic function
(t) = 1+e1 t ,
sigmoid function. - f (x) = (w
x) ( (87)).
. 1950- , 1960- ([28]),
, , [76].
2.2.3

, , .. 1950- , , . [97].

52

- , ( , ),
.
: X = Rd ,
Y = {1, +1}, F = {sign(w, x)|w Rd }. 2 ,
,
, . , sign Y, 0,
, 20 .
T . ,
{xi |yi = 1} {xi |yi = +1} {x X |(w, x) = 0}, , , w,
(xi , yi ) yi (w, xi ) > 0.
, .
(w, x) + b = 0
2 xi

yi xi
.
.
(w, x) = 0,
{x X | |(w, x)| < },
= min yi (w, xi ) > 0.
1iN


2 = 2

.
kwk

(. 1),
, , .
20

. . .

53

!!!!

1:
1. : T = ((x1 , y1 ), . . . , (xN , yN )) N .
2. w = 0 x 7 sign(w, x),
k = 0 n = N .
3. n > 0
(a) : n = 0;
(b) (xi , yi ) yi (w, xi ) 0
( )
i. n = n + 1 ( );
ii. w = w + yi xi ( );
iii. k = k + 1 ( ).
4. : w k.
2 (A.Novikoff ([89]) ) -
, . ,
2 R = max1iN kxi k,
2
(w, x) , R
w.
w w0 , ,
w
2.
. w(k)
w k . w(0) = 0, w(k) w(k1) - yi xi ,
w0 , w(k1) , R.
(w(k) , w0 ) = (w(k1) , w0 ) + (yi xi , w0 ) (w(k1) , w0 ) + kw0 k,

(w(k) , w0 ) kkw0 k

(w(k) , w0 ) kw(k) kkw0 k


kw(k) k k .
(91)

kw(k) k2 = kw(k) + yi xi k2 kw(k1) k2 + kyi xi k2 kw(k1) k2 + R2 ,

kw(k) k2 kR2 .

54

(92)
(93)

(91) (93)

2
,

, . .
, ,
. ,
, , ,
, ,
, w (,
PN
w = i=1 i yi xi i ),
2
R
,
.
, ,
2
, R
< N ,
w.
2
, -, w , R
, -,
2
N R

2
, , 1 NR2 -
. , ,
2
NR2 . , , ,
[116].
, .. ( . 83),

ry ry < 0
E(r, y) = (ry)+ =
0
ry 0
(. 5) T :
N
X
i=1

EP ((w, xi ), yi )

N
X
i=1

(yi (w, xi ))+ min .


wX

w
( , , ),
EP ((w, xi ), yi ).
.
55

, , . 6 . 83. 19501960- ,
,
. XXI
, .
,
2,
. :
yi (w, xi ) 0, yi (w, xi ) kwk. ,
2.

3 2, R = max1iN kxi k, < , 2 ,


2

R
.
2
. 2 (91)
, (92)
kw(k) k2 = kw(k) + yi xi k2 kw(k1) k2 + 2kw(k1) k + R2 .

(94)

kw(k) k k ,
k, (93), - ,
, (91). , ,
R2
< < , (94) , kw(k1) k 2()
(
), kw(k) k kw(k1) k+.
,
R2
kw(k) k
(95)
+ k.
2( )
(95) (91)
k

R2
,
2( )( )

.. . , = +
2
k

2R2
.
( )2

. ?
2.2.4

( , margin
classifiers)

2.2.3 , , . (61) .
56

!!!!

......
2 ........
...
|w|
...
.
.
.
...
..... ....
...
...
...
...
.
...
...
...
...
...
...
.
...
...
...
...
...
.
...
...
...
...
...
...
.
...
...
...
.
...
...
...
...
...
...
...
...
.
...
...
...
...
...
...
.
...
...
...
...
...
...
.
...
...
...
...
...
.
...
...
...
...
...
...
.
...
...
...
...
...
...
.
...
...
...
...
...
.
...
...
...
...
...
...
.
..
...
.
..
...
...
.

+ w
*



+
+

+

+
+

. 4: (97).
.

40, ..
sign((w,
x
)) = sign((w, x) + b) .

(96)

w
( ) ,
yi (w,
x
i ) > 0, yi (w,
x
i ) 1. 2
kwk

w
, , :
kwk
2 min

(97)

yi (w,
x
i ) 1

1 i N.

, , , , . ,
. : , ( , . . 4),
, .

, , ,

(. (70) (72)). (97)
, ,
kwk2 min

(98)

w,b

yi ((w, xi ) + b) 1

1 i N.

,
, , , (yi ((w, xi ) + b) < 0) (yi ((w, xi ) + b) < 1),
, ,

57

. :
N

X
1
kwk2 + C
E(i ) min
w,b,
2
i=1
yi ((w, xi ) + b) 1 i
i 0

(99)
1 i N
1 i N.

E()
, C > 0
,
.
(soft margin classifier 21 ).
xi
i 1, ..

0 < 1
E() =
1 1
, - , . -
E() = max(, 0), ., ,
2.1.3.
.. [117] [18, 26]
(SVM , support vector machine). [26] [31], [22] . , ,
( (8)),
- .
( (97))
, f (x) = (w, x) + b ,

, , |f (x)| 1. ,
, .
, , -, , ( [113]
35% , 3050%),
-,
.
2.2.2 ,
21 margin ,
( ) (, ) (
(w, x) + b). margin
.
((w, x) + b) ,
.. . margin
1
([114],[15]),
([50]), , .. kwk
([52]), , .. y((w, x) + b) ([100],[52]),
([52])

58

...
E ......
.E
...6
E....L... .....S
...
.......... .....
...
.....
...
...
EP ................ ..... ...
...
............. .. ..
.
.............. .. .. .......
.............. .. .. .....
............ .. . ...
...... . ...... . ...... . ...... . ...............................
... .............................
EM
. ............................................................................................................................................................................................................................................
...........
...
0....... 1
yf (x)
.
.

. 5: , sign(f (x)) {1, 1}: 0-1-


EM (t) = (t), EP (t) = (t)+ , , EL (t) = ln(1 + et ) ,
ES (t) = (1 t)+ . EL (t) 0 t t t . , EM (t) ES (t)
L (t)
EM (t) Eln
2 .

. , ,
x 1
((w, x) + b) =

1
1 + e((w,x)+b)

(. 2.2.2). ,

(8). , , .
1
C 2C
, SVM
(99)
kwk2 +

N
X

ES (yi (w, xi ) + b)) min ,


w,b

i=1

ES (t) = (1 t)+ =

1t
0

(100)

t 1
t > 1

, , 1.2.8 2.2.2 y {0, 1}, {1, 1},


(88), ,
kwk2 +

N
X

EL (yi (w, xi ) + b)) min


w,b

i=1

(101)

EL (t) = ln(1 + et ).

( . 52). (100) (101)


, , . . 5.
, . , ( kwk2 kwk1 .., . 2.1.2)
X ,
2.3.2.
59

2.3

2. , - , ( 1.1.1) . - .
R {|x| < 1} {2 < |x| < 3}
. .


. ( ) . ,
.
.
2.3.1

j : X R, 1 j d0 ,
0
, , , : X X 0 = Rd X X 0 .
f : X 0 Y , .. f ,
. , f f (x0 ) = (w, x0 ) + b. f : X Y
X
0

f (x) = (w, (x)) + b =

d
X

wj j (x) + b,

(102)

j=1

.. (,
). ,
, .

. ,

n. d-

d0 = d+n

,
d
. (
)
n. (
), n
.
,
.
, , X
. X - ()
{c1 , . . . , cd0 } - : R+ R+ j (x) = ((x, cj )). , , (RBF , radial
basis functions). ,
x, - cj ,

60

RBF

(102) , cj . () (,
, ),
( 1.3.3 1.3.4)
RBF- ( 3.3).
.
(, ), ().
,
(
- ). .
N . , , ( .. [114, 113, 115]) .
N ,
, , , , , .
N , , , , (,
) , N
.
. ,
(
, X , , ),
(- . . . ) .
2.3.2
.
2.3.2

22 (kernels)

X K : X X R. T = ((x1 , y1 ), . . . , (xN , yN ))
T : X RN T (x) = (K(x1 , x), . . . , K(xN , x))
- RN
X 0 (T ) ( T ) . jT () = K(xj , ) .
x
T (x) = (K(x1 , x), . . . , K(xN , x)).

(103)

(w, T (x)) + b =

d
X

wj K(xj , x) + b

(104)

j=1

(. (102)), wj b
K(xi , xj ) yi .
22 . , ,
.

61

K (kernel ),

N (. 2.3.1) kernel trick . X K . , X
, K(x, y) ,
, 3.14.
K(x, y) (. (. 61))
, x y .
.
kernel trick , ()
(T (x), T (x0 )) =

N
X

K(xj , x)K(xj , x0 )

j=1


((x), (x0 )) = K(x, x0 ),

(105)

( 2N ). , K , ,
(K(x, x0 ) = K(x0 , x)) ( (x1 , . . . , xn ) (K(xi , xj )) ). ( 9, . 93).
(positive definite kernel ) .
X , , , .
T d- N -.

, ,
, . , N > d
, N < d .
, . ,
, ,

0 2
K(x, x0 ) = ekxx k .
,
- H X
: X H. .
K(x, x0 ) N x1 , . . . , xN N -
( (105)), ,

, ,
K(x, x0 ) = ((x), (x0 )). ,
, 4.1.
62

Kernel trick

, .
.
[1]23 .
.
, (x, y)
K(x, y), w PN
(. 55), w = i=1 i yi xi .
4.2.
(70)

1
w
= yX> Id+1 + XX>

( (71))

1
f (
x) = w
x = yX> Id+1 + XX>
x
,

(106)

, , . ,

X> Id+1 + XX> = X> + X> XX> = IN + X> X X>


(),

1 >

1
IN + X> X
X = X> Id+1 + XX>
( , ),

1 >
w
= y IN + X> X
X
()

1 >
f (
x) = w
x = y IN + X> X
X x
.

(107)

XX> , X> X ( ), X> x



x
.
(107)

N 1
N
f (
x) = y IN + K(
xi , x
j ) i,j=1
K(
xi , x
) i=1 .
(108)
, (106) (107) (103) , , (108),
, ,
.
. k k
.
23

[3] . ,
, [2].

63

!!!!

2.3.3

(.. , , , ) , .
, (d0 ), (..
), -
( , 21 ).
, ,
(.. ),
, .
(,
, ),
.
,
, .. , .
, (boosting ), . 5.
, , ,
, wj w (
- ),
b ( ), , , , l1 -
Pd0
w, j=1 wj . (99) :
0

d
X
j=1

wj + C

N
X

i min

i=1
yi (w, x0i )

w,

1 i
wj 0
i 0

1 i N
1 j d0
1 i N,

x0i i- , d0 - , yi {1, 1} , i-
. 5.3. , , ,
,
wj w (
- ), b ( ),
, , , l1 - w, Pd0
j=1 wj . (99) :
0

d
X
j=1

wj + C

N
X

i min

i=1
yi (w, x0i )

w,

1 i
j

w 0
64

1 i N
1 j d0

i 0

1 i N,

x0i i- , d0 -
, yi {1, 1} ,
i- . 5.3.
:
,
.
, ,
, .

( , ) ( 2), ,
(, ) ( ). ,
, .
,
, ,
.
. . . .
, , . -
!

, . , , ,
,

.

3.1

19401960- (
computer science) . .

( 1011 ) (),
() ( 104 )
(), /
. () ( , ).
: ( , )
(), . , ( ) , .

65

, () (, )
. [82]
1943 24 . -: ( ,
)
. ,
( , 1011 ) ( , 1015 ), ,
. .
,
.
(ANN, Artificial Neural Networks), , , ,
.
.

V = {v1 , . . . , vS } ( ) () E = {e1 , . . . , eT }, :
- () Vin - (
) Vout (, ) ;

e we R he : R2 R, ();
v V \ Vin wv R
gv : R2 R, ().
q
1
d
1
Vin = (vin
, . . . , vin
) Vout = (vout
, . . . , vout
)
d q , . f : Rd Rq ,
. y = f (x) u (),
- x w, u(e) e u(v) v,
(e vout ). :
j
j
j- vin
j- xj , .. u(vin
) = xj ;

v, ,
X
he (we , u(e))
s(v) =
(109)
evin


u(v) = gv (wv , s(v)) .
24

(110)

, 2.2.3.

66

. -
q
1
)) -.
y = (y 1 , . . . , y q ) = (u(vout
), . . . , u(vout
w, we wv , .
w,
y, y = f (x) = F (w, x)
x, , . ,
he gv .

() , . V
V0 = Vin , V1 , . . . , VL = Vout ,

, i- , (i 1)- ;

gv ; he .
L-. .
.
:
he (we , u) = we u, gv (wv , s) = th(s wv ) ( ), gv (wv , s) = (s wv ) ( ),
- ; . 3.2;
he (we , u) = (u we )2 , gv (wv , s) = es/2wv - -; . 3.3.
( )
, .
( ) gv (wv , s) = (s wv ),
( ) gv (wv , s) = (s wv ) (.
(87) ); -. : gv (wv , s) =
s wv .
, .

he gv ,
. he gv .
L : , . ,
, - , . ,
.
67

d q . d ( ), q
1 () ( ). ,
, ,
, , ,
. , ,
, , .

, , ,
. ()
.
:
,
, ,
.
( 1.1.4),
( 1.1.6), , , ( 1.2.4).
-
: , .. , w.

, .
.

, , , (, ), , .. - .

(MLP RBF)
, : , .
, .

3.2

(MLP)

() 1950-
[97] ,

. (MLP (Multi-Layer
Perceptron)) gv , t = s(v) wv ,
, t
t
th(t/2)+1
= 1+e1 t .
th(t) = eet e
+et (t) =
2

68

...


, : gv (t) = t.
v

!
!
X
u(v) = gv
we u(e) wv .
(111)

. . . , ,

evin

we , wv .
th :
th, Fth (w, ), ,
,
w0 , F (w0 , ) = Fth (w, ),

u (v) = uth (v)+1
.
2
. w 7 w0 , .
, , ,

, Fth (0, ) = 0 w x f (x) = Fth (w, x) .
1 v
wv . wv , v (111)

!
X
u(v) = gv
we u(e) .
(112)

!!!!

evin

3.2.1

,
, ,
.
3 ( )

( )
g(t) = (t).

( , L2loc , , ..)
g(t) = (t).
.
4 ( )

69

!!!!

.
.

2d .
5 ( )
.


.

, , .
,
.
, ,
.

!?

4 (.., 1957, [69]). d


.
,
d- (. (111)). :
, , (d + 1)(2d + 1) ,
.
: ,
.
5 :
6 .
,
. : ,
. :

Pd
X
j
T (x) =
ck ei j=1 ckj x .
k

: , ,
,
() .
70

!!!!

- .
, 5
. ,
.
.

19501960- . , ( , . 1.1.6) ()
,
.
3.2.2

()

( 5). :
, - , .
.
,
, .. , ( ) .
(
), . - ,

. ()
.
,
1990- , [123, . 4].
3.2.3

: (error back-propagation)

( 1.1.6, (8)) , F (w, x) w,


ET (w) = (w) +

N
X

E(F (w, xi ), yi )

(113)

i=1

T = ((x1 , y1 ), . . . , (xN , yN )).


.
E . - w
F w.
,
71

, ()
, .

, .
, , ,
.

,
{u(v)} . ,

v...
v... ;
q
1
, . . . , vL
vL
;

e... e... ;
e w
e = we , e e;
v gv , gv0 (gv1 (u(v))) = gv0 (s(v) wv ),
.. gv v v , u(v) gv ,
.
, ,
.
gv0 (gv1 (u(v)))
.
. gv (t)
, gv0 (gv1 (u(v))) = 1 (u(v))2 ;
, gv0 (gv1 (u(v))) = u(v)(1 (u(v)));
, gv0 (gv1 (u(v))) = 1.
, (
) s(v) = gv1 (u(v)) 0 .
j
vL

E(r, y)
,
rj r=F (w,x)
j
gv0 (gv1 (u(vL
))). ,

E(F (w, x), y) = kF (w, x) yk2 ,


() ,
1

j
j
u
(
vL
) = 2(F j (w, x) y j ) = 2(u(vL
) yj ) ,
72

!!!!

-
E(F (w, x), y) = y ln F (w, x) (1 y) ln(1 F (w, x)) ,
vL , - (. (54)),
gvL (t) = 1+e1 t ,
y
1F1y
(w,x) F (w,x) , F (w, x)(1 F (w, x))
u
(
vL ) = (1 y)F (w, x) y(1 F (w, x)) = F (w, x) y = u(vL ) y ,
.. 2 .
. .

. , , , .


: (111). , s(v) v, (109),
.
7 v

E(F (w, x), y)


E(F (w, x), y)
=
=u
(
v ),
wv
s(v)

!!!!

!!!!

v , v. e
E(F (w, x), y)
= u(e)
u(
e)
we
e , e.
,
(x, y) x ,
, , (F (w, x), y), , ,


( wv ).
(113)

(w). P
(w) = e we2 , ,

, (w) =
P
P
e e we2 + v v wv2 ,
, .. ,
, (113)
. ().
, (113) ,
, .
3.2.4.
73

!!!!

.
: .
.
he gv (109) (110). 7
:
, . ,
.
:
e

he (we , t)
;

t
t=u(e)
v
gv ,

gv (wv , t)
;

t
t=s(v)
j
vL
,

gvj (wvj , t)
L
L
;

t
j
t=s(vL )

gv (r, s(v))
E(F (w, x), y)
=
u
(
v) ;

wv
r
r=wv
e

E(F (w, x), y)


he (r, u(e))
=
u
(
e) .

we
r
r=we
. , (w,x),y)
25 w E(F (w, x), y) = E(Fw
.

1970- [99].
f (x,y)

25 f (x, y)
f x
x
x ,
, .

74

!!!!

3.2.4

3.2.3
,
. ( ) , . :
W Rn (
) n;

( );
;
(
);
( ).
-
.

, :
,
, .. .
,
(, g(t) = th(t)) , , , [1, 1],
, , ,
(, 10) (10, ). ,
( 7).
,
, ,
.
. , , , ,
. , ,
(. (112) ).

( , (w) = kwk2 ,
ET (w) - ), , , ,
[1, 1].
75


w(1) ,
. ,
, . , , -
1
1
[ Ne (v)+1
, Ne (v)+1
], Ne (v) v,
, .

i- w(i) ET ()
(113) W ET (w(i) ) :
w(i+1) = w(i) i ET (w(i) ) .

(114)

,
i > 0, , . , i = E(w) = w2
W = R E(w) = 2w w(i+1) = w(i) 2w(i) = (1 2)w(i) < 1
1. E(w) = 10w2
< 0.1.
, i 0 i, .
i , .
.
, ,
(. , . 77).

(114)
w(i+1) = w(i) i ET (w(i) ) + (w(i) w(i1) )

(115)

. , , ,
ET ,

.
i - .
(115), i = const,
.
(w(i+1) 2w(i) + w(i1) ) = (1 )(w(i+1) w(i) ) ET (w(i) ) ,
, -
w00 = (1 )w0 ET (w)
w W
ET () , w0 (1 ).
76


i i- (114)
, ,
f (i ) = ET (w(i) i ET (w(i) ))

min

i [0,2i1 ]

(116)

( 2
i 2i1 ). , f , . f (0) = ET (w(i) ) ,
f 0 (0) = kET (w(i) )k2 ET (w(i) )
, , , f (2i1 ).
. f 0 (0) i f .
. f , , , , f (0). f .
.. .

,
, .


, [54].

,
.

E(w) = H(w, w) + bw

(117)

H
,
. w(1) w(i) E(w(i) ),
gi = E(w(i) ) , w(i+1) w(i) 26 H. H
.

H. -, ,

, , ,
. 2
.
26 : r s H,
H(r, s) = 0 , , r> Hs = 0.

77

!!!!

2: (Polak-Ribi`ere)
1. : E() E(),
w(1) , TimeToStop().
2. g1 = E(w(1) )
d1 = g1 .
3. i = 1, . . . , (gi = 0 TimeToStop())
(a) w(i+1) E(w)
w = w(i) + di , 0;
(b) gi+1 = E(w(i+1) );
(c) di+1 = gi+1 +

(gi+1 ,gi+1 gi )
di .
(gi ,gi )

4. : w(i) .

, (117)
gi
di ,
(:
, (gi+1 , di ) = 0, E , gi gj = 2H(w(i) w(j) )).
[14]
[58].
- .

!!!!


(8), - , , .
- , ( ) ( ), .
, ,
,
, . , E E.


pB (w) e

E(w)
kT

(118)

T , k . T ,
, , .
.
(simulated annealing) [66],
78

Simulated
annealing

, , , .

.
(118) , [86] . W Rn (118)
(, E ) W. i dim(W) -
, ( ). , ,

dim(W) , , .
w(1) W
( ):

E(w(i) +i )E(w(i) )
(i)

Ti
w + i P = min 1, e
w(i+1) =
(119)
(i)
w
1 P
, , , , E T , 1.

.
(i)

5 w (119) i (118),
w(1) i .
, .
, 5 W Rn
, , , , , i ,
- , . [86] .
( , ) , W = Rn , , , w(1) ,
, .
5 (). q i , M : w(i) 7 w(i+1)
(119), M 27 :
Z
0
M p(w )
=
p(M (w) = w0 |w)p(w)dw

Z
pB (w0 )
=
q(w0 w) min 1,
p(w)dw
pB (w)
Z

!
pB (w0 )
0
+
q(w w) 1
dw p(w0 )
pB (w)
{w:pB (w0 )<pB (w)}
27 - , ,
, .

79

( 118 , ). w w0

pB (w0 )
0
0
(w, w ) = q(w w) min 1,
(120)
pB (w)
w0

Z
pB (w0 )
0
0
(w ) =
q(w w) 1
dw ;
pB (w)
{w:pB (w0 )<pB (w)}

(121)

Z
M p(w0 ) =

(w, w0 )p(w)dw + (w0 )p(w0 )


Z
(w0 , w)dw + (w0 ) = 1 .

(122)

(123)

q (120)
(w, w0 )pB (w) = (w0 , w)pB (w0 ) .
(124)
, ,
(124)
, , .
:
1. (118) (122);
2. ;
3. (122) (
).
(124) (123):
Z
M pB (w0 ) =
(w, w0 )pB (w)dw + (w0 )pB (w0 )
Z
=
(w0 , w)pB (w0 )dw + (w0 )pB (w0 ) = pB (w0 ) .
, ,
. , ,
p
Z
M p(w0 )
ln
pB (w0 )dw0
pB (w0 )

Z
Z
0
0 p(w )
0 p(w)
dw + (w )
pB (w0 )dw0
=
ln
(w, w )
pB (w0 )
pB (w0 )
Z

Z
0
p(w)
0
0 p(w )
=
ln
(w , w)
dw + (w )
pB (w0 )dw0
pB (w)
pB (w0 )

Z Z
p(w)
p(w0 )
0
0

(w , w) ln
dw + (w ) ln
pB (w0 )dw0
pB (w)
pB (w0 )
80

Z Z
Z
=
Z
=
Z
=
Z
=

Z
p(w)
p(w0 )
pB (w0 )dwdw0 + (w0 ) ln
pB (w0 )dw0
pB (w)
pB (w0 )
Z
Z
p(w0 )
p(w0 )
0
(w, w0 ) ln
p
(w)dwdw
+
(w0 ) ln
pB (w0 )dw0
B
0
pB (w )
pB (w0 )
Z
Z
p(w0 )
p(w0 )
0
0
0
p
(w
)dwdw
+
(w
)
ln
pB (w0 )dw0
(w0 , w) ln
B
pB (w0 )
pB (w0 )
Z

p(w0 )
(w0 , w)dw + (w0 ) ln
pB (w0 )dw0
pB (w0 )
p(w0 )
ln
pB (w0 )dw0 .
pB (w0 )
(w0 , w) ln

, pp(w)
6= const, .. p 6= pB ,
B (w)
, p
M .
!!!!
p(w)
: pB (w)
.

,
.
!!!!

E(w) ( ) - w . 5, -
T , Ti , . Ti , , Ti = i
,

, , .
, - , .

, , , , (), .
[57].
, .
, , :
. , w.
. - .
. -

.

81

. ET (:
,
, , , ).
.
(:
,
, , ).
. - (: ,
, ) .
[57] w ,
() ,
, .
, -
.
. , .
- . ,
, , .
, ,
(113) .
(w) ,
- (. 1.1.7).
- , , , .
,
, w
,
( ) -
. ,
- .
,
, ,
() ,
( ). , ,
3.2.2,
, ,
( ) , (,
).
82

. wi1 = wi2 = . . . = wik = z;



k

ET (w) X ET (w)
=
,
z
wij
j1
T (w)
E
wij .
, , (wij wik )2 .


(113) T . , ,
(2). , .
(
). i- w(i)
(xi , yi ) F (w(i) , xi ), E(F (w(i) , xi ), yi )
w E(F (w, xi ), yi )|w=w(i) ,

w(i+1) = w(i) i w E(F (w, xi ), yi )|w=w(i) .

(125)


(114), .
- ([96]), :
6
(xi , yi ) X Y (x, y),
w w E(F (w, x), y),
(x, y) , ,
i > 0 i 0 i,
P

i ,
P 2

i ,
1 (125)
(2).
,
P i . ,

i , . ,
83

i2 (, i 0) , , ,
(125)
w(i+1) = w(i) i w E (w)|w=w(i)

Z
E (w) =

E(F (w, x), y)d(x, y)

(x,y)X Y

( (2)), .
, ,

i
P 2
i , , i , i1
12 < 1. , ,
, .

i .
(125) (114),
. ,
, . ,
,
, .
, .
:
, , ,
,
, ..
[19].

[74].
, , [14]
[58]. - .

3.3

RBF-

, RBF- , , , (basis functions) , - (.. radial)


- (.. , ).

84

, ( 2.3.1) (
2.3.2), .
RBF- .
.
v ()

!
X
u(v) = g sv ,
(u(e) we )2 ,
evin

g g(s, t) = et/2s ,
sv wv , RBF . ,
: v

!
X
u(v) =
we u(e) wv ,
evin

, , ,

!
!
X
u(v) =
we u(e) wv ,
evin

.
.
x1 , . . . , xd . , .. d- x.
v d- v
, ,
u(v) = e

kxv k2
2sv

(2sv ) 2 x v sv .
, , ,
, .
,
, . , , .
RBF v

1
u(v) = (2)d |Av | 2 e

A1 (xv ,xv )
v
2

(126)

Av
( ), |Av | , A1
v (, ) A1
.


v
d(d+1)
d v , 2 Av .
3.1, (109). ,
.
, .
85

.

, RBF-
k (126)
3.1, , ,
k , kxk2

u(v) = e 2 , .
, Av , , .
. RBF-
, , , . RBF-

.
.
.
RBF- ,
,
. .
RBF- , ,
, ,
RBF- .
RBF-
.
8 M , RBF-.
RBF-
. , s 2 ,
dim(M ) , d , M .
(
u(x1 , . . . , xd ) = e

2
(x1 1 )
2s


). , d
. ,
.
3.3.1

RBF-: (expectation
maximization)

RBF- , .
, , , ,

86

!!!!

. : , , , . , 2.1 2.2.2, .
, v v
sv .
EM- (Expectation-Maximization), [33], , , [8]. (
[14]); .
, , ..
p(x) =

k
X
j=1

Pj pj (x) =

k
X
j=1

Pj

1
(2sj )

d
2

kxj k2
2sj

(127)

k pj (x) Pj ,
Pk
j=1 Pj = 1. p(x) (127) RBF- k .
W = (P1 , 1 , s1 , . . . , Pk , k , sk ) (. 23), ..
X = (x1 , . . . , xN ) T = ((x1 , y1 ), . . . , (xN , yN )).
, ,
. (..
), .
(127) , , .
, , ,
, , . k
q, ,
. ,
E,
, 4! 24 = 384 , .
j sj , Pj . ,
, RBF , .
, (127), . x Rd
y j {1, . . . , k} ( ).
Rd {1, . . . , k}
p(x, j) = Pj pj (x)
(xi , ji ) ,

87

EM (
):

EM

p(x, j).
Z
P {j} =
p(x, j)dx = Pj
xRd

j- ,
p(x|j) =

p(x, j)
= pj (x)
P {j}


x Pk
j, p(x) = j=1 P {j}p(x|j) (127),
j- x

p({j}, x)
P {j}p(x|j)
Pj pj (x)
P {j|x} =
= Pk
= Pk
.
(128)
p(x)
j=1 P {j}p(x|j)
j=1 Pj pj (x)
pj (x) , Pj ( j-
), P {j|x} .

p(X|W) =

N X
k
Y

Pj pj (xi ) max

(129)

i=1 j=1


E(X, W) = ln p(X|W) =

N
X

ln p(xi ) =

i=1

N
X
i=1

ln

k
X

Pj pj (xi ) min .
W

j=1

, , :
k , .. k!, .
E(X, W).
,
.
Pk
(.. Pj > 0, j=1 Pj = 1, sj > 0) W(0)
(0) .
E(X, W) E(X, W

(0)

N
X

p(xi )
(130)
(0) (x )
p
i
i=1

N
k
X
X
P
p
(x
)
j
j
i

=
ln
(0) (x )
p
i
i=1
j=1

N
k
X
X
P
p
(x
)
j
j
i

=
ln
P (0) {j|xi } (0) (0)
P
p
(x
)
i
i=1
j=1
j
j

!
N
k
XX
Pj pj (xi )
(0)

P {j|xi } ln (0) (0)
Pj pj (xi )
i=1 j=1

ln

N X
k
X

P (0) {j|xi } ln(Pj pj (xi ))

i=1 j=1

N X
k
X
i=1 j=1

88

(0) (0)

P (0) {j|xi } ln(Pj pj (xi )) .

Pk
ln() ( ): cj j=1 cj = 1,

k
k
X
X
ln
cj xj
cj ln(xj ) .
j=1

j=1


Q(X, W

(0)

, W) =

N X
k
X

P (0) {j|xi } ln(Pj pj (xi )) .

(131)

i=1 j=1

(130) ,
E(X, W) Q(X, W(0) , W) Q(X, W(0) , W(0) ) + E(X, W(0) ) .
E(X, W) W(0) .
E(X, W) W,
Q(X, W(0) , W) < Q(X, W(0) , W(0) ). Q(X, W(0) , )

.
,

k
X
L(W, ) = Q(X, W(0) , W) +
Pj 1
j=1

0 =

L X
=
Pj 1

j=1

(132)

0 =

N
L
1 X (0)
=
P {j|xi }
Pj
Pj i=1

(133)

0 =
0 =

X
j xi
L
=
P (0) {j|xi }
j
sj
i=1
!

N
X
L
kxi j k2
d
(0)
=
P {j|xi }

.
sj
2sj
2s2j
i=1

(134)
(135)

, .. k + 1 , pj () ,
=

k
X
j=1

Pj =

k X
N
X

P (0) {j|xi } =

j=1 i=1

N X
k
X

P (0) {j|xi } =

i=1 j=1

N
X

1=N

i=1

.
N

1 X (0)
P {j|xi }
N i=1
PN
P (0) {j|xi }xi
j = Pi=1
N
(0) {j|x }
i
i=1 P
PN
(0)
1 i=1 P {j|xi }kxi j k2
sj =
,
PN
(0) {j|x }
d
i
i=1 P

Pj =

j .
89

(136)
(137)
(138)

. , W
Q(X, W(0) , ) ( , ).
. , W = W(0) , W(0)
E(X, ).
RBF- . X
E(X, ), , Q(X, W(0) , ),
. W
: Pj = k1 , j = xij k
PN
( ), sj = N1d i=1 kxi j k2 .
,
pj .
( , Aj ) ,
.

!!!!
!!!!

9

(xj ,xj )

1 A1
j
2
pj (x) = (2)d |Aj | 2 e
.

(139)

(136) (137) ,
(138) d(d+1)

2
PN
Alm
j

i=1

(0)

m
P (0) {j|xi }(xli lj )(xm
i j )
,
PN
(0) {j|x }
i
i=1 P

1 l m d,

.
, RBF-
.
, (127),
,
,
,
- (
3.2.4). , RBF-
,
k.
RBF- , ,
. ,

,
, -
.
. ,
, q, RBF-:

90

!!!!

, .. . - ,
. , (l-) ,
p(x|y = l). RBF- P {y|x},
p(x|y), .

, . : ,
, [116, 117, 114]
[115],
. .

4.1

2.3.2.
.
( , , ) X ( Rd ) H. K : X X R,
H :
K(x1 , x2 ) = ((x1 ), (x2 ))

(140)

. , X , X = Rd ,

Z

f 7 K(f ) = K(, x)f (x)d(x) ,


(141)
X

- , K .
H K,
, .
,
, -
,
, , . ,
,
.. ( !) .
.
, -
, ,
91

. , , (!)
( ,
!) . ,
ridge regression (72), logistic regression (86) soft margin classifier (100).
.
7 : X H ( ) (, ) k k. f (x) = (w, (x)) + b, w H b R,
T = ((x1 , y1 ), . . . , (xN , yN ))
kwk2 +

N
X

E(f (xi ), yi ) min

(142)

w,b

i=1

E. (w, b)
w (x1 ), . . . , (xN ).
. b .
. -
, ,
/ ,
kwk,
(kwk) + E(f (x1 ), y1 , . . . , f (xN ), yN ) min
w,b

;
. (w, b) (142), wX
w (x1 ), . . . , (xN ).
wX 6= w, kwX k < kwk, i (w, (xi ))+b = (wX , (xi ))+b.
(wX , b) (142) .
.
, (N + 1)PN
. w = i=1 i (xi ) (142),
PN
kwk2 + i=1 E(f (xi ), yi )

N
N
N
N
X
X
X
X
=
i (xi ),
j (xj ) +
E
j (xj ), (xi ) + b, yi
i=1

N
X

j=1

i=1

i j ((xi ), (xj )) +

i,j=1

N
X
i,j=1

N
X

i=1

i j K(xi , xj ) +

N
X
i=1

N
X
j=1

N
X

j=1

j ((xi ), (xj )) + b, yi

j K(xi , xj ) + b, yi min

j=1

1 ,...,N ,b

(143)

H ,
. E, (
) .
, , 2.3.2.
92

4.1.1

(140) () .
:

!!!!

10
(140) (K(x1 , x2 ) = K(x2 , x1 ) x1 x2 )
( Kij = K(xi , xj ) p
x1 , . . . , xN ). , K(x1 , x1 )
0 |K(x1 , x2 )| K(x1 , x1 )K(x2 , x2 ).

. ,
.
, . , ,
.

!!!!

11 X K : X X R .

12 , K : I d I d R d- I d
.
,
:
8 (Mercer, 1909 [83]). K L2 (Rn Rn ) (141) L2 (Rn )
, .. f L2 (Rn )
Z
K(x, z)f (x)f (z)dxdz 0 ,

x,zRn

K L2 (Rn Rn )
K(x, z) =

i i (x)i (z),

i 0,

i L2 (Rn )

i=1

{i } l2 .
, L2 (Rn )
: .
10.
9 X
K : X X R H : X H,
(140).
93

. H
.
. X R H0 , (x) : z 7 K(x, z).
(, ), ((x), (z)) 7 ((x))(z) = K(x, z) = ((z))(x).
, (h, (z)) = h(z) = ((z), h)
h H0 . (x) ,
H0 (x) ,

. K ,
. , , .. (h, h) > 0 h 6= 0. ,
-
(h, g)2 (h, h)(g, g). (h, h) = 0, (h, g) = 0
g H0 , , (h, (x)) = h(x) = 0 x X , h = 0.
(, ) , H0
. H,
28 .
9 , , , H
X R. ,
x X h H0
p
p
|h(x)| = |(h, (x))| (h, h)((x), (x)) = khk K(x, x)
( -), H0 x X
.
4.1.2

,
13
Rd , : Rd Rd ,
K(x, z) = (x)(z), H = R,
, ( )
28
XX [4] (reproducing kernels).
(, ) F X
K X X , f F x X
f (x) = (K(x, ), f ()). , ,
reproducing kernel . , H
RKHS (Reproducing Kernel Hilbert Space). - X
, X .

94

!!!!

(
{0, 1}).

.
14 :
;
;
(.. K((x), (z))) ;
: X 0 2X X 0
X (, ) K
X ,
X
X
K 0 (x0 , z 0 ) =
K(x, z)
x(x0 ) z(z 0 )

;
, , .
. ( ).
. ( )
,
, P
x0 7 x(x0 ) (x), .

1 , :
;
;
;
.

Rd :
15
K(x, z) = cos(x z) x, z R ( !);
K(x, z) = exz x, z Rd ( 1);
2

K(x, z) = e(xz) x, z R ( 13 14
);
2

K(x, z) = ekxzk x, z Rd ( 14
);
2

kxzk
K(x, z) = e( ) x, z Rd , > 0 ( 14 );

K(x, z) = eakxzk x, z Rd , a > 0 ( ,

Z
kxzk 2
a 2
a

e( ) e( 2 ) d = eakxzk
0
=

a
2

kxzk

95

1).

!!!!


,

.
d

K(x, z) = (2) 2 |A| 2 e 2 A

(xz,xz)

(144)

kxzk
d
2
2

K(x, z) = (2)

(145)

, .
,
, .. .
16
K(x, z) = (xz) =

1
1 + exz

. x1 = ln 2 x2 = 2 ln 2. K(xi , xj )

2
3

4
5

4
5

16
17

, , 10 .
4.1.3

, 4.1.2, ,
Rd
. ,
K(x, z) = ((x, z) + c)m ,

c>0

( ) , x m
x (: !),
m + 1 .
:
.

!!!!

17
2

K(x, z) = ekxzk
: Rd L2 (Rd ), x
d

(x) : t 7 4 e

kt2xk2
2

x1 , . . . , xN (x1 ), . . . , (xN ) .
.

( 2, . 53).
96

!!!!

4.1.4

10, 13 14
2 K
K(x, z) +
p
K(x, x) + K(z, z) +

K1 (x, z) = lim p
&0

K(x,z)

K(x,x)K(z,z)
=
0

K(x, x)K(z, z) 6= 0
K(x, x)K(z, z) = 0 K(x, x) + K(z, z) 6= 0
K(x, x) = K(z, z) = 0

, x z |K1 (x, z)| 1


K1 (x, x) = 1.
K1 . 1 H1
K1 . x z H1
k1 (x) ( z)k2 = K1 (x, x) 2K1 (x, z) + K1 (z, z) = 2 2K1 (x, z)
, K1 (x, z).
:
, ( ,
) .
. K
K(x, x) , . , . ,

(1, 11) (4, 1)

( 15 ? 15?),
.
4.1.5

, Rd , ,
, , , ,
15, . RBF- , K(x, z) = (x z)
K ( (141))
:
Z
f 7 f = ( z)f (z)dz ,
(146)
X

, ( , ) ,
, ,
L2 .
. ,
, (0) 0 x |(x)| (0).
, ,
. , , , ( , ), ,
, .
97

!!!!
!!!!

18 , ,
.
,
.
. ,
, : f
( f, f ) = ( f,
f ) = ( f, f ) 0 ,
g(x) = g(x) .
19.
, K(x, z) = (x z) = max(1 |x z|, 0) R1 ,

[ 21 , 12 ] [ 12 , 12 ].
3 .
. 18 :
( ) ( ) = ( ) ( ) .

19 ,
, 29 F[] .
. , F[]() 0 ,
F[]() , (. 18).
F[] 0 - , (,
), (F[], ) < 0.
, ,
( F1 [], F1 []) < 0, .. , (x z),
. , ,
L2 , , F1 [] ,
.
, 18
19.
29


. ,
Z
F[f ]() =
f (x)eix dx ,

F1 [](x) =

1
(2)n

()eix d ,

.. .

98

Rn C, -

[a,a] [a, a]
Z a
eia eia
2 sin a
F [[a,a] ]() =
eix dx =
=
i

a
,
[a,a]
a(xz)
, . sin xz
a > 0.
R1 l(x) = a2 ea|x| , a > 0
Z 0

Z
a
(ai)x
(ai)x
F[l]() =
e
dx +
e
dx
2

a
1
1
a2
=

= 2
.
2 a i
a i
a + 2
1
l, F[l] . ea|xz| , a2 +(xz)
2 ( ) .


, . Rd d > 1
1
a2 +kxzk2 d. ( ).
!30 .
eakxzk1 = ea

Pd
j=1

|xj z j |

14. eakxzk
, , 15.

14. , 1
K(x, z) = a2 +kxzk
2
KMOD (Kernel with MOderate Decreasing) [6]
b2

KM ODa2 ,b2 (x, z) = e a2 +kxzk2 1 ,



1
a2 +kxzk
2.
1
d > 1 fad (x) = a2 +kxk
2 , - . . fad = (1 , 0, . . . , 0)
. x
(d 1)- (x2 , . . . , xd ). , ,
!
1
1
Z
Z
Z
ei1 x
ei1 x
1
dx
=
dx
F[fad ]() =
d
x
a2 + kxk2
(a2 + k
xk2 ) + kx1 k2
30


1
](1 ) . .
F[f
a2 +k
xk2


: 6, . 104.

99

!!!!

4.1.6

, , , , .
,
( ), , .. . - -
? ,
, ? -
?
(
), ,
. ,
( , -)
, . , ,
, : , .
CPD-:
K : X X R -
(CPD-kernel , conditionally positive definite kernel ),
x1 , . . . , xn X c Rn

n
X
ci = 0
(147)
i=1

n
X

K(xi , xj )ci cj 0 .

(148)

i,j=1

-
CPD-.
, (.. ) CPD-. CPD-, ,
K(x, z) = x + z K(x, z) = (x z)2 R R
(!). ( 22,25).
( 13,14 1)
CPD- , (
() 31 ), (148) .
20 - (CPD-) :
( );
CPD-;
CPD- ;
31

CPD- , CPD-, . 25.

100

!!!!

( , ) CPD-;
CPD-;
CPD- .

!!!!

21 K(x, z) = f (x) + f (z) ( ,


) CPD-, (148)
(147)
n
X

K(xi , xj )ci cj = 0 .

(149)

i,j=1

4 CPD- K
1
K0 (x, z) = K(x, z) (K(x, x) + K(z, z))
2

(150)

CPD-. K0 x z K0 (x, x) = 0 K0 (x, z)


0.

22 x0 X . K : X X R CPD- ,
K+ (x, z) = K(x, z) K(x, x0 ) K(x0 , z) + K(x0 , x0 )

(151)

.
. K+ .
Pn
x1 , . . . , xn X , c Rn c0 = i=1 ci .
n
X
i,j=1

n
X

K+ (xi , xj )ci cj =

K(xi , xj )ci cj ,

i,j=0

K CPD-, K+ . , K+
, CPD-,
k0 (x, z) = K(x, x0 ) + K(x0 , z) K(x0 , x0 )
21 CPD- K,
K(x, z) = K+ (x, z) + k0 (x, z) CPD-.
. ,
K(x, z) = (xz) =

1
1 + exz

CPD- (. 16).

101

!!!!

CPD-
, CPD- (!)32
X X . K+ K, ( , , K+ (X ) K(X )). (!)
CPD- K(x, z) = f (x) + f (z) ( 21) K= .
(!) ( 4) K0 , (!) K 7 K0 (150) 0 . (!) K 7 K+
(151) x0 . , x K+ (x, x0 ) = 0. x0
K+
(!) , .
x
K, K+ , K+
, K0 , x
K= 0 ,
, .
23 x, z X
K0 K= = {0}, K = K0 + K= ;
K+ K0 = {0},
x
x
K+
K+ K, K+
K= = {0};
x ;
0 : K K0 K= , 0 |K+

x
K= , x |K0 , x : K K+
x ;
0 |K+
z
x
z x
x
z
z K
x |K+
: K+
K+
.
+ K+ , |K+

CPD- x1 , . . . , xn X CPD K (148)


(n 1)- (147) n- .
5 0 x .
.

. 21 23.

CPD-
K CPD-, K0 = 0 (K), K+ = x0 (K0 )
x0 X : X H K+ .
H , ,
CPD- K K0 .
23 K0 = 0 (K+ ),
1
K0 (x, z)=K+ (x, z) (K+ (x, x) + K+ (z, z))
2

1
1
=
k(x)k2 2((x), (z)) + k(z)k2 = k(x) (z)k2 ,
2
2
.. CPD- K0
(!) 12 .
. CPD- K .
32 , , , .

102

!!!!
!!!!
!!!!
!!!!
!!!!
!!!!

H x0 .
0
0
0
x00 X , K+
= x0 (K0 ) = x0 (K+ ) ( 23)
.
0
K+
(x, z)

= K+ (x, z) K+ (x, x00 ) K+ (x00 , z) + K+ (x00 , x00 )


= ((x), (z)) ((x), (x00 )) ((x00 ), (z)) + ((x00 ), (x00 ))
= ((x) (x00 ), (z) (x00 )) ,

0
,
.. K+
K+ , H, (x00 )
0
. K+ K+
.
CPD- [103].
CPD-.

24 X , c > 0
K(x, z) = ckx zk2 CPD- X .

CPD-:
25 K : X X R CPD-
, t > 0
Kt (x, z) = etK(x,z)

(152)

.
. Kt CPD-, 1
, Ktt1 limt&0 Ktt1 = K CPD-.
: CPD- K x0 X
K+ ( (151)). t > 0 tK+ etK+
,
Kt (x, z) =
=

etK(x,z) = et(K+ (x,z)+K(x,x0 )+K(x0 ,z)K(x0 ,x0 ))

etK+ (x,z) etK(x,x0 ) etK(z,x0 ) etK(x0 ,x0 )

13,14 .
20 25 3
CPD- .
3 x 0
Z

(etx 1)t(+1) dt 0 < < 1 ;


(1 ) 0
Z t(1+x)
e
et
ln(1 + x) =
dt .
t
0
x

. x = 0 . x > 0 0 < < 1 .


x
Z
etx t dt = x1 (1 )
0
Z
1
,
et(1+x) dt =
1
+
x
0
103

. ( x > 1),
-.
26 CPD- K, ( , CPD- 4) CPD- (K) 0 < < 1
ln(1 K).

6 0 < 2 c > 0
K(x, z) = ckx zk K(x, z) = ln(1 + ckx zk ) CPD-,

1
K(x, z) = eckxzk K(x, z) = 1+ckxzk
.
. . 24, 25, 26.
2
1
, eckxzk , eckxzk 1+kxzk
2
4.1.2 4.1.5, . 6 (, ) . [25] , ,
, .
CPD- [12] negative definite kernels,
CPD- .

4.2

(SVM, SVC, SVR)

SVM (Support Vector Machine),


.. [117] [18, 26], , , ,
(143) - .
( 2.2.4)
,
( ) ( ) .

. ,
N
, .
SVM SVC (Support Vector Classification) SVR
(Support Vector Regression), .

. SVM ,
, , , , .
, , [31, 22, 106].
4.2.1

X , K : X X R,

104

Y = {1, +1}. H : X H. (99)


T (X Y)N
f (x) = (w, (x)) + b ,

(153)

, E() = max(0, ):
N

X
1
kwk2 + C
max(0, 1 yi f (xi )) min
w,b
2
i=1

(154)

, ,
1
kwk2 + C
2

N
X

i min

w,b,

i=1

(155)

SVC:
. . .

yi ((w, (xi )) + b) 1 i 1 i N
i 0 1 i N .
, (yi = 1) (f (xi ) 1), (yi = 1)
(f (xi ) 1).
(155) , w , , . 7,

. 7 SVC , w
.
(155)

L(w, b, , ) =

1
kwk2 + C
2

N
X

i=1

N
X

i (yi ((w, (xi )) + b) 1 + i ) ,

(156)

i=1

w, b ,
i 0, i 0 1 i N .
(157)
, ,
:
L(w, b, , )
0=
wi

L(w, b, , )
b

0=

L(w, b, , )
i
L(w, b, , )
0
i
L(w, b, , )
0=
i
L(w, b, , )
0
i
0=

wi

N
X

j yj ((xj ))i

j=1

N
X

j yj

j=1

C i i 6= 0

C i i = 0

1 i yi ((w, (xi )) + b) i 6= 0

1 i yi ((w, (xi )) + b) i = 0 ,
105

. . . SVC:

...

(157) ,
w=

N
X

j yj (xj )

(158)

j=1
N
X

j yj = 0

(159)

j=1

(C i )i = 0
0 i C .

. . . SVC:



. . .

(160)
(161)

(161) (158)
w , 7. (158160) (156),
( ), w, b ,

N
N
X
X
1
L ()= kwk2 + C
i
i (yi ((w, (xi )) + b) 1 + i )
2
i=1
i=1

. . . SVC:
. . .

N
N
N
N
X
X
X
X
1
= kwk2
i yi (w, (xi )) b
i yi +
i +
(C i )i
2
i=1
i=1
i=1
i=1

N
N
N
N
N
X
X
X
X
1 X
=
j yj (xj ),
j yj (xj )
i yi
j yj (xj ), (xi ) +
i
2 j=1
j=1
i=1
j=1
i=1

N
N
X
1 X
i j yi yj ((xi ), (xj )) +
i
2 i,j=1
i=1

N
N
X
1 X
i j yi yj K(xi , xj ) +
i .
2 i,j=1
i=1

L(w, b, , ) (157)
L () (159) (161).
K ,
L . K(xi , xj ) , .
. , L
, w, b , (w, b, , )
L ( (159,161) (157), ). w (158), b
. - i 0 < i < C,
i = 0 yi ((w, (xi )) + b) = 1, b :
b = yi (w, (xi )) = yi

N
X

j yj K(xj , xi )) .

(162)

j=1

, (xi )
. w i b -
i = max(0, 1 yi ((w, (xi )) + b))
106

. . . SVC: . . .

(155) , N ( (162))
b. ,
. (153)
! b
|b|. : kw2 k (155)
, b2 , .
SVC (153).
, , K, N -

N -
. . . SVC: N
N
X

1 X
i j yi yj K(xi , xj )
i min
(163)

2 i,j=1
i=1
N
X

j yj = 0

(164)

j=1

0 i C 1 i N ,

f (x) =

N
X

j yj K(xj , x) + b ,

(165)

j=1

b . f
, , j ,
, . (164) (163) - (CPD-) K, SVC CPD-,
. 5
CPD- K ,
K K+ 22.
4.2.2

SVM, - . 2.2.4 (. 58)


.
, ,
, , . ,
(156), .. (157161), , (165)
xi :
,
, .. 1 (yi f (xi ) > 1). i = 0, ..

. ,
.

107

( ),
, , .. 1
(yi f (xi ) = 1). {z : |(w, z) + b| 1}.
0 i C.
i = 0 . .
,
, .. 1 ,
, .. 0 ( yi f (xi ) < 1).
i = C
.
:
. , , , .
4.2.3

SVC


, N . , SMO (Sequential Minimal Optimization), . [91]. i , , ,
, (163). (159) , , . (163)
, (159)
(161), .. ,
. ,
, [91], [64] 33 .

i b,
4.2.1, C K
- ( 1.1.6).
SVM (LOO).
( !)
, SMO

, . , , ( ) ,
.
33 , - (153) , , b, (159)
(163)
. K , -
.

108

, , , N
.
27 f (165), (155) T N , fm ,
C K
T \ {(xm , ym )} N 1 . ym fm (xm ) 1
( (xm , ym ) fm ), ym f (xm ) 1 ( (xm , ym )
f ).

SVC

. , ym f (xm ) > 1, .. f (xm , ym ) ,


. fm f
(153, 158) ( ) fm f (1 , . . . , N , b), fm , f N 1- , (159, 161)
m = 0. fm f

Lm ((, b))

X
1
max(0, 1 yi f (xi ))
kwk2 + C
2

N
X
1 X
i j yi yj K(xi , xj ) + C
max(0, 1 yi f (xi ))
2 i,j=1

i6=m

i6=m

L((, b)) =

N
N
X
1 X
i j yi yj K(xi , xj ) + C
max(0, 1 yi f (xi )) ,
2 i,j=1
i=1

( (158) (154)). [f, fm ] = {ft = tfm + (1 t)f}, 0 t 1 1 ym f t (xm ) .


= min{t [0, 1]|ym f t (xm ) = 1}. > 0, [0, ]
L(ft ) Lm (ft ) , L(ft ) ,
t = 0, Lm (ft ) ,
t = 1. Lm (ft ) [0, ],
[0, 1]. , Lm kwk,
|b| , f fm
. ym fm (xm ) 1 < ym f (xm ). .
27
10 , N , n
, ,
n
.
N

(-SVC )
10 , C, , . , ,
109

SVC

C,
.
, , 7 (. ),
, . ,
.
28 [0, 1]
(w0 , b0 , 0 , 0 ) H R RN R
N
X
1
2
kwk +
(i ) min
w,b,,
2
i=1

(166)

-SVC:
...

yi ((w, (xi )) + b) i i = 1, . . . , N
i 0 i = 1, . . . , N
0.
0

0 > 0, ( w0 , b 0 , 0 ) (155) C = 10 ,
,
. 0 > 0 , C 0
(155) .
. (166) (155).
L(w, b, , , ) =

!!!!

N
N
X
X
1
kwk2 +
(i )
i (yi ((w, (xi )) + b) + i ) (167)
2
i=1
i=1

(. (156)), ,
,

N
1 X
i j yi yj K(xi , xj ) min

2 i,j=1
N
X

(168)

. . . -SVC:

j yj = 0

j=1
N
X

i N

i=1

0 i 1 1 i N
(. (163)), (158)
w, (166) b, . .
C,
-, , (166) .

110

!!!!


C , C (
K). (. [48]).
11 (C) b(C) (165), -
C.
. ,
K(xi , xj ) , C,
, . , (165)
.
. - N
, .
. N - 0 i C 3N
. S
{1, . . . , N } S=0 S=C S6= i,
i = 0, i = C 0 < i < C, .
C > 0 (163) -
S, ,
N
N
X
1 X
i j yi yj K(xi , xj )
i min

2 i,j=1
i=1
N
X

j yj = 0

(169)

(170)

j=1

i = 0 i S=0
i = C i S=C
0 < i < C i S6= ,

(171)
(172)
(173)

S. ( (169) , (170,171,172), (173))


C, () - (N + 1)- .
C IS ,
. {C > 0} IS , ..
C1 < . . . < Cm .
I1 , . . . , Im+1 (, Im+1 ).
(163) C , . C
S(C) , ,
. C
Ii . , C 0 Ii
(163) S(C 0 ), C 00 Ii S(C 00 ), - C
S(C 0 ), S(C 00 ), S,
, S(C 0 )
111

S(C 00 ). S - C Ii ,
Ii C, ,
C 0 C 00 . S(C 0 ) S(C 00 ), ..
. o S(C 0 ) S(C 00 ). Ii
S(Ii ).
, Ci Ii S(Ci ) S(Ii ), ( , );
. , C = Ci S(Ii )
(i ) (163).
- 0 S(Ii ), (163)
t0 + (1 t)i , 0 i -,
S(Ii ), -, (163).
C Ii S(Ii )
+ t(0 i ) ,
S(Ii ) .
- (-) (C). (0) = 0,
(Ci ) = i , Ii
i m , S(Ii ),
Im+1 .
Si b
- ( (162)), C.
0, .. , - (.
(162)).
.
, (C), .. K(xi , xj ).
, . , , ,
.
11 (C) b(C), N - . [48] ,
K(xi , xj ) ( C = 0). ,
- ,

C .
4.2.4

2.1.3 -
. Y = R, X , K : X X R,
H : X H.
T (X Y)N
f (x) = (w, (x)) + b

(174)

( , (153)),
.
1
2
2 kwk - E(r, y) = max(0, |y r| )
112

!!!!

:
N

X
X
1
1
kwk2 +C
E(f (xi ), yi ) = kwk2 +C
max(0, |(w, (xi ))+byi |) min ,
w,b
2
2
i=1
i=1
(175)
C > 0 > 0 . i+ i , max(0, yi (w, (xi )) b )
max(0, (w, (xi )) + b yi ), , (175)
, (155):
N

X
1
kwk2 + C
(i+ + i ) min
w,b,
2
i=1

(176)

SVR:
. . .

i yi ((w, (xi )) + b) + i+ 1 i N
i+ 0, i 0 1 i N ,
, i+ i
0. 7 ,
w (xi ), ,
,
,
.
4.2.1 .
(176)

N
N
X
X
1
L(w, b, , )= kwk2 + C
(i+ + i )
i+ (i+ (yi (w, (xi )) b ))
2
i=1
i=1

N
X

i (i ((w, (xi )) + b yi )) ,

(177)

i=1

w, b ,
i+ 0, i 0, i+ 0, i 0 1 i N .
(178)
, ,
:
0=

0=

L(w, b, , )
wi
L(w, b, , )
b

L(w, b, , )
i
L(w, b, , )
0
i
L(w, b, , )
0=
i+
0=

wi

N
X

(j+ j )((xj ))i

j=1

N
X

(j+ j )

j=1

C i i 6= 0

C i i = 0

(w, (xi )) + b yi i+ i+ 6= 0

113

. . . SVR:

...

L(w, b, , )
i+
L(w, b, , )
0=
i
L(w, b, , )
0
i
0

(w, (xi )) + b yi i+ i+ = 0

yi (w, (xi )) b i i 6= 0

yi (w, (xi )) b i i = 0 ,

+ . (178) ,
w=

N
X

(j+ j )(xj )

(179)

j=1
N
X
(j+ j ) = 0

(180)

j=1

(C i )i = 0
i

(181)

C .

(182)

(179181) (177),
( ), w, b ,

N
N
X
X
1
L ()= kwk2 + C
(i+ + i )
i+ (i+ (yi (w, (xi )) b ))
2
i=1
i=1

N
X

. . . SVR:



. . .

. . . SVR:
. . .

i (i ((w, (xi )) + b yi ))

i=1
N
N
N
X
X
X
1
(i+ i )(w, (xi )) b
(i+ i ) +
(i+ i )yi
= kwk2
2
i=1
i=1
i=1

N
X
i=1

N
X

(i+ + i ) +

N
X

((C i+ )i+ + (C i )i )

i=1

N
X

1
(j+ j )(xj )
= (j+ j )(xj ),
2 j=1
j=1

N
N
X
X

(i+ i ) (j+ j )(xj ), (xi )


i=1

N
X
i=1

j=1

(i+ i )yi

N
X

(i+ + i )

i=1

N
N
N
X
X
1 X +

( i )(j j )((xi ), (xj )) +


(i i )yi
(i+ + i )
=
2 i,j=1 i
i=1
i=1
N
N
N
X
X
1 X +

=
( i )(j j )K(xi , xj ) +
(i i )yi
(i+ + i ) .
2 i,j=1 i
i=1
i=1

L(w, b, , ) (178)
L () (180) (182).
(180), - K
114

(182) , L . K(xi , xj )
, , , (
, . (186)).
, , , w
L (179), b .
- i 0 < i+ < C, i+ = 0
yi = (w, (xi )) + b , b :
b = yi (w, (xi )) + = yi

N
X

(j+ j )K(xj , xi )) + .

(183)

. . . SVR: . . .

j=1

, ((xi ), yi ) H R
( R) |y f (x)| . 0 < i < C. w
i b - (176) - , 2N b. , .
, |b|.
, , K, 2N - N
-
N
N
N
X
X
1 X + +
(i i )(j j )K(xi , xj ) (i+ i )yi +
(i+ +i ) min (184)

2 i,j=1
i=1
i=1

N
X
(j+ j ) = 0
j=1

0 i+ C,

0 i C 1 i N ,


f (x) =

N
X

(j+ j )K(xj , x) + b ,

(185)

j=1

b .
(184) i+ i
0 (, min(i+ , i ),
(184)). i+ + i = |i+ i | , i+ i i , , (184)
N
N
N
X
X
1 X
i j K(xi , xj )
i yi +
|i | min

2 i,j=1
i=1
i=1
N
X

j = 0

j=1

C i C 1 i N ,
115

(186)

. . . SVR:

. -
(186),
K(xi , xj ). ,
- N 2C.
4.2.5


(184),
, 4.2.3 . .

, (176), H R.
, 4.2.2, |(w, (x)) + b| 1 H
|y ((w, (x)) + b)| HR, . ((x), y), , , (x, y),
, , ( ),
( ).
. , ,
(178) (182)
, .
yi f (xi ) < , i+ = 0, i = C.
() yi f (xi ) = , 0 i C, i+ = 0.
< yi f (xi ) < , i+ = 0, i = 0.
() yi f (xi ) = , 0 i+ C, i = 0.
yi f (xi ) > , i+ = C, i = 0.
(-SVR)

, ()
. 27,
,
(176)
N

X
1
kwk2 + C
(i+ + i + ) min
w,b,,
2
i=1

(187)

i yi ((w, (xi )) + b) + i+ 1 i N
i+ 0,
0

i 0 1 i N

[0, 1].
. ,
C ( 28), C .

116

29 (w0 , b0 , 0 , 0 ) HRRN R (187), (w0 , b0 , 0 )


(176) = 0 , ,
.
(187) (166) .

!!!!


( 4.2.1) ( 4.2.4) ,
, .

.
T yi = 1
, , . 1
() [1 , 1],
< 1 ,
, - (yi f (xi ) > 1 + ) .
, , . > 0 y + = y2 y = y + 2.
T = ((x1 , y1 ), . . . , (xN , yN )) +

T = (((x1 , y1+ ), +1), . . . , ((xN , yN


), +1), ((x1 , y1 ), 1), . . . , ((xN , yN
), 1)),
: X H K
I1 : X Y H R K 0 ((x, y), (x0 , y 0 )) = K(x, x0 ) + yy 0 .
T
f ((x, y)) = (w, (x)) uy + b
X Y,
N

X
1
(kwk2 + u2 ) + C
(i+ + i ) min
w,u,b,
2
i=1

(188)

((w, (xi )) uy + + b) 1 i+ 1 i N
((w, (xi )) uy + b) 1 i 1 i N
i+ 0, i 0 1 i N ,
(155). , (174) T , (176), . u = 1 |(w, (x)) + b y| y + y
|(w, (x)) uy + b| .
, ,
u = 1, .
, (!), .
30 (w0 , u0 , b0 , 0 ) (188) 0
0
0 0
T , ( uw0 , u0b , u0 ) (176) T C = 0 C 0 = 0 (2 u10 ).
.

117

!!!!

4.2.6

q-
[121]. ( 2.2.1). ,
.
(153)

f 1 (x) = (w1 , (x)) + b1

f 1 (x) = (w1 , (x)) + b1

f 1 f 1 , .
f 1 f 1 ,
, f 1 = f 1 . (155)
:
N

X
1
(kw1 k2 + kw1 k2 ) + C
(i1 + i1 ) min
w,b,
2
i=1

(189)

((wyi , (xi )) + byi ) ((wj , (xi )) + bj ) 2 ij 1 i N, j {1, 1}, j 6= yi


ij 0
w1 + w1 = 0,

1 i N, j {1, 1}
b1 + b1 = 0 .

(155) w = w1 = w1 , b = b1 = b1 , i = 21 iyi
iyi = 0.
, . yi
1, , xi .

SVC:
q X
q
N
X
1X j 2

j
i min
(190)
kw k + C

w,b,
2 j=1
j=1 i=1

yi
yi
((w , (xi )) + b ) ((wj , (xi )) + bj ) 2 ij 1 j q, 1 i N, j 6= yi
ij 0
q
X
wj = 0,

q
X

j=1

j=1

1 j q, 1 i N
bj = 0.

(190), (155) , (163)


C . ,
bj , 1, q 1. , SMO
q . ,

, , , Nq , q > 2
.
4.2.7


118

.
, . ,
q ,
.
. k-
(k = dlog2 qe) . k ,
k .
k
, .
,
,
,
(..
). ,
.

. ,

.
. f j , j- ,
. x
q f 1 (x), . . . , f q (x) . , ,
.
, f j (x) = (w, x) + b
ij (155) SVC,
(190)
q

1X k
fj (x) = f j (x)
f (x)
q
k=1

ij = ij + iyi 1 j q, 1 i N, j 6= yi
iyi = 0 1 i N .
fj , f j , (190) ij ,
, (155) f j , .
. q , , q(q1)
2
, . , , ,
,
- ,
. x q(q1)
2
f ij (x) = f ji (x) (),
119

i- ()
j- (), P
q f i (, x) = j6=i (f ij (x)), , , ,
( ) - (), . ,
, .
. q(q1)
2
.
x q . x
. , . q 1 , ..
q 1 , ,
, x.
. q , 34 . . .. , q 1
, q . x
, , , ..
q1 ( dlog2 qe
) .
(),
( , SVC ), ,
, ,
, . 1.
, ,
-, ,
, . ,
, .

.
SVM [59].

, A
A, , ( - ?
34 , , . ,
.
.
.

120

1
dlog2 qe
q
q(q1)
2
q(q1)
2

N 2+
dlog2 qeN 2+
qN 2+
1+
(q1) 2+
2 q1+
N

N
dlog2 qeN
qN
(q 1)N

(q1) 2+
2 q1+
N
< 2N 2+
1
q1+
N 2+

2(q1)
N
q
< 2N
N

1+

q1
q

1: , , q N .
, , ( ) ,
N
N 2+ , 0 < < 1 , , N
(.. ). , .
,
, .

). ,
.
.
: Rd H = L2 (Rd ) ( 17) - ,
0 H (.. , )
. H
(), 0
, , .. .
,
( ) ,
, .. ,
. 17 ,

L2 (Rd ), .
( ,

, , ),
.

X
1
kwk2 + C
i min
w,
2
i=1
(w, (xi )) 1 i 1 i N
121

(191)

SVC:

i 0 1 i N ,
(155),
.
28:
N
X
1
kwk2 +
(i ) min
w,b,,
2
i=1

(192)

-SVC:

(w, (xi )) i i = 1, . . . , N
i 0 i = 1, . . . , N ,
.
[102]
.
.
(191) (192), ,
K(x, z) = ((x), (z)).

(. 119) ,
, :
. f j . x q
f 1 (x), . . . , f q (x) 35 . ,
, .
, .


, 2.3.3: . , ,
[42]. , .
Y = {1, 1},
f - f (x) = sign(g(x))
f (x) = 0 , . 20 53.

5.1

, - (,
) , ,
.. . :
35

. .

122

!!!!

- ,
(,
);
- ,

(, );
,
,
(, );
,
(,
(ridge) (logistic) );
.
,
, , ,
. , ,
, . , .
, ,
, ..
.
, , ,
.
5.1.1

.
31 2n+1 ,
< 12 . , ,
n n .
. 0-1- e : X Y {0, 1}

, 0 1.
< 12 , (1 ) (!).
2n+1 .
, 2n + 1, (1)
2n+1 ,
.
, .. , , , 12 ,
123

!!!!

(1)
36 (2n+1)(
1
2
2 )
n.
n 31 . n

. , , -,
,
(13) , -,
, ( )
, :
, , . ,
.
, ,
, ( ) ( ) .
(random forests) [21].
,
,
( ) /
. :

. , .

5.1.2

1990- [39, 101] ,


.
boosting (, , ,
) , ..
.
[101]
,
, 31 . .
1990- [38, 37] AdaBoost (adaptive boosting),
, .
AdaBoost , , ,
, ,
.
36 , m
D :
P (| m| > ) D2 .

124

AdaBoost : (,
) 37 . , ,
(f ) +

N
X

E(f (xi ), yi ) min

(193)

f F

i=1

(. (4) (8)), E(f (xi ), yi ) i-


i :
(f ) +

N
X

i E(f (xi ), yi ) min .

(194)

f F

i=1

i . ,

. ,
Pn
i=1 i = 1 .
(194) ,
.
3 AdaBoost ,
, 5.2.3 .

3: AdaBoost
1. : T = ((x1 , y1 ), . . . , (xN , yN )) N ,
M.
2. i =

1
N,i

= 1, . . . , N .

3. j = 1, . . . , M
(a) hj T ;
X
(b) 0-1- E =
i ;
i|hj (xi )6=yi
j

(c) E = 0, : f () = h () ;
(d) cj = ln 1E
E ;
(e) (hj (xi ) 6= yi )
i ecj = 1E
E ;
(f) ,

N
X

i=1

1.
4. : f () = sign

M
X

cj hj ().

j=1

37 , , c .

125

AdaBoost

AdaBoost
, (,
) , . AdaBoost
[100] [41]. 1990-
, , AdaBoost ,
, .
[42, 24]. ,
, .
5.1.3

, , ,
. ,
,
; ,
, (stump), .. ,
;
( (194) = 0);
,
( ) / ;
- .
.
[101].
.

5.2
5.2.1

, 0. 4
( [41]) , .

126

4:
1. : ((x1 , y1 ), . . . , (xN , yN )) N ,
E(, ), M , c (0, 1].
2. yi zi , i = 1, . . . , N .
3. j = 1, . . . , M
(a) hj ((x1 , z1 ), . . . , (xN , zN ))
n
X

E(hj (xi ), zi );
i=1

(b) zi c hj (xi );
4. : f () =

M
X

chj () = c

j=1

M
X

hj ().

j=1

c ,
, . 5.3.
(193)
E(, ) , , ,
0 ( ), ,
c = 1
N
X

E (f m (xi ), yi )

i=1

Pm

f m () = j=1 hj (). ,
c (0, 1]. c,
(), ( ). , c , C (99)
.
,
:
,
, .. ,
c. ,
, . , , , ,
.
: 5.2.2
.
. E(r, y) = (r y)2

c = 1.
, E(, ) , ,
127

!!!!

. , , , -, , -
. 5 .

5:
1. : ((x1 , y1 ), . . . , (xN , yN )) N ,
E(, ), M .
2. :
f 0 ((x1 , y1 ), . . . , (xN , yN ))
n
X
E(f 0 (xi ), yi ), , , i=1

, f 0 = 0.
3. j = 1, . . . , M

(a) zi =

E(r,yi )

r=f j1 (xi )

, i = 1, . . . , N ;

(b) hj ((x1 , z1 ), . . . , (xN , zN ))


n
X

(hj (xi ) zi )2 ;
i=1

(c) -, , n
X

E((f j1 + cj hj )(xi ), yi ) min
cj R+

i=1

(d) f j () = f j1 () + cj hj ().
4. : f M () = f 0 () +

M
X

cj hj ().

j=1

5 , , ,
. E(, ).
,
.
, c < 1
f j () = f j1 () + c cj hj (xi )():
, .
5.2.2



( 2.2.2). ,
, .
, (, ) f : X R, -

128


ln

P (y = 1|x)
,
P (y = 1|x)

log-odds , , , . 1.2.8 2.2.2.



pf (x) =
1 pf (x) =

1
P (y = 1|x) ,
1 + ef (x)
1
P (y = 1|x) ,
1 + ef (x)

.. y = 1 1
1+eyf
(x) , , ,
E(f (x), y) = ln(1 + eyf (x) ) .

(195)

y
E(r, y)
ln(1 + eyr )
yeyf (x)
=
(196)
=
=

yf
(x)
r
r
1+e
1 + eyf (x)
r=f (x)
r=f (x)

2 E(r, y)
2 ln(1 + eyr )
y 2 eyf (x)
=
=

(r)2 r=f (x)


(r)2
(1 + eyf (x) )2
r=f (x)
1

eyf (x) )(1

(1 +
+
=pf (x)(1 pf (x))

eyf (x) )

1
(1 +

ef (x) )(1

+ ef (x) )
(197)

( y 2 = 1). (197) , ,
f T
E(f, T ) =

N
X

E(f (xi ), yi )

i=1
)
= 0 . (196) , E(f,T
f
,
.
5.2.1, . , ..
.
f (f ! , ) (196) (197)
N
X

yi
1X
h(xi ) +
pf (xi )(1 pf (xi ))(h(xi ))2
y
f
(x
)
i
i
2
1
+
e
i=1
i=1
(198)
hi = h(xi ) .
N N 38
E(f + h, T ) E(f, T )

38 N hi ,
(!) . .

129

hi :
hi =

yi
1+eyi f (xi )

pf (xi )(1 pf (xi ))

= yi (1 + eyi f (xi ) ) =

yi
,
P (yi |f (xi ))

(199)

2
N
1X
yi
E(f + h, T ) const +
pf (xi )(1 pf (xi )) h(xi )
. (200)
2 i=1
P (yi |f (xi ))
, h f xi
yi , f
, h(xi ) ()
1
2 pf (xi )(1pf (xi )). 6
.

6: LogitBoost
1. : ((x1 , y1 ), . . . , (xN , yN )) N , M .
2. :
f 0 ((x1 , y1 ), . . . , (xN , yN ))
n

X
0
ln 1 + eyi f (xi ) , , ,
i=1

, f 0 = 0.
3. j = 1, . . . , M
1
(a) pi =
j1 (x ) ,
i
1+ef

yi f j1 (xi )
i = pi (1 pi ) zi = yi 1 + e
, i =
1, . . . , N ;

(b) hj ((x1 , z1 ), . . . , (xN , zN ))


, ,
n
X
:
i (hj (xi ) zi )2 ;
i1

(c) f j () = f j1 () + hj (xi )().


4. : f M () = f 0 () +

M
X

hj () -

j=1

pf M () =
sign(f M ()).

1+ef M ()

, ,

6 ( ) LogitBoost.
, , ,
. hj , , zi
, i .
[42] , , . yi
130

LogitBoost

hj ,
, , (. 2.1.3),
zi
(!). , ,
hj , hj
cj hj cj , , ,

N
X

cj =

i=1
N
X
i=1

(1 +

!!!!

yi hj (xi )
1 + eyi f j1 (xi )

(hj (xi ))2


j1 (x)
f
e
)(1 + ef j1 (x) )

(!), .
7.

131

!!!!

7: LogitBoost
1. : ((x1 , y1 ), . . . , (xN , yN )) N , M ,
> 0, 1.
2. :
f 0 ((x1 , y1 ), . . . , (xN , yN ))
n

X
0
ln 1 + eyi f (xi ) , , ,
i=1

, f 0 = 0.
3.
N

X
0
E0 =
ln 1 + eyi f (xi ) .
i=1

4. j = 1, . . . , M
(a)

1
pi =
,

=
max(,
p
(1

p
))


j1
i
i
i
1+ef (xi )

j1
zi = yi min , 1 + eyi f (xi ) , i = 1, . . . , N ;
(b) hj ((x1 , z1 ), . . . , (xN , zN ))
,
n
X
, :
i (hj (xi ) zi )2 ;
i1
N
X

(c) cj =

i zi hj (xi )

i=1
N
X

i (hj (xi ))2

i=1

(d) f j () = f j1 () + cj hj (xi )();


(e) cj 6= 0
, ..
!
N

X
j
E j =
ln 1 + eyi f (xi )
E j1 , cj = cj /2

i=1

(f) cj = 0, ; f M () =
f j1 ().
5. : f M () = f 0 () +

M
X

cj hj ()

j=1

pf M () =

5.2.3

.
1+ef M ()

; AdaBoost

, , ,
1, -

132

, .
.
,
(198)
E(f + ch, T )E(f, T )

N
X
i=1

=E(f, T )

N
X
i=1

yi
1X
ch(xi ) +
pf (xi )(1 pf (xi ))c2 (h(xi ))2
y
f
(x
)
i
i
2 i=1
1+e
N

yi
1X
ch(x
)
+
pf (xi )(1 pf (xi ))c2 ,
i
2 i=1
1 + eyi f (xi )

h.
ch() h() c c . ,
, h,
.
h(xi ) = yi , .. 39 .
xi 1+ey2i f (xi ) , .. , 1+ey1i f (xi ) =
P (yi |f (xi )) .
c :
N
X

c=

N
X

yi h(xi )P (yi |f (xi ))

i=1
N
X

=
pf (xi )(1 pf (xi ))

i=1

yi h(xi )P (yi |f (xi ))

i=1
N
X

P (yi |f (xi ))(1 P (yi |f (xi )))

i=1

8 LogitBoost, (, , ) .
39 .

38 . 129

133

8: LogitBoost
1. : T = ((x1 , y1 ), . . . , (xN , yN )) N ,
M.
2. : f 0 () = 0.
3. j = 1, . . . , M
(a) i =

1
j1 (x ) ,
i
1+eyi f

i = 1, . . . , N ;

(b) hj T ( ) ;
N
X

(c) cj =

i=1
N
X

i yi hj (xi )
;
i (1 i )

i=1

(d) f j () = f j1 () + cj hj (xi )().


4. : f M () =

M
X

cj hj () -

j=1

pf M () =
M

1+ef M ()

, , -

sign(f ()).

, 8 , . , , ,
(195), - .
yr
E(r, y) = e 2 . ,
(,
).
, f
N

E(f + ch, T )

E(f, T ) +

1 X yi f (xi )
2
yi ch(xi )
e
2 i=1

, c > 0 h xi yi
yi f (xi )
e 2 . h
c :
N
N
X
yi (f (xi )+ch(xi ))
X yi (f (xi )+ch(xi ))

2
2
E(f + ch, T ) =
=
yi h(xi )e
e
c
c i=1
i=1
X
X
yi f (xi )
yi f (xi )
c
c

2
2
=e 2
e
e 2
e
,

0=

i|h(xi )6=yi

i|h(xi )=yi

134


. . .

X
c = ln

yi f (xi ))
2

yi f (xi ))
2

i|h(xi )=yi

i|h(xi )6=yi

.
P
yi f (xi ))
. , i|h(xi )6=yi e 2
=0?

9.

!!!!

9:
1. : T = ((x1 , y1 ), . . . , (xN , yN )) N ,
M.
2. f 0 () = 0.
3. j = 1, . . . , M
(a) i = e

yi f j1 (xi )
2

, i = 1, . . . , N ;

(b) ,

N
X

i 1;

i=1

(c) hj T ;
X
(d) 0-1- E =
i ;
i|hj (xi )6=yi
j

(e) E = 0, : f () = h () ;
(f) cj = ln 1E
E ;
(g) f j () = f j1 () + cj hj (xi )().
4. : f () = sign

M
X

cj hj ().

j=1

. , 9
3 (AdaBoost), - .
. , yf (x) > 0, yf (x)
E(f (x), y) = e 2 , E(f (x), y) =
ln(1 + eyf (x) ), |yf (x)|,
, |yf (x)|, , . -
yi . ,
, hj ,
, , hj (xi ) = yi , yi f j (xi )


yi f j (xi ) , e 2
j, hj

135

. . .
AdaBoost
!!!!

, i- .
AdaBoost, .

,
,
.
.
5.2.4

, , q- q
q1 . , q q .
: , ,
q ,
q
q q. ., , [42], (
, , ).
[24]. LogitBoost, AdaBoost ( 9), ,
, .
, q- q-
- f : X RY ,

(u1 , . . . , uq ) 7

Pqe

u1

j=1

j
eu

, . . . , Pqe

uq

j=1

eu

T = ((x1 , y1 ), . . . , (xN , yN )) yi {1, . . . , q}

yi
q
N
N
N
X
X
X
X
j
yi
ef (xi )

ln p(yi |xi ) =
ln q
=
ln
e(f (xi )f (xi )) .
X j
i=1
j=1
i=1
i=1
ef (xi )
j=1

136

10:
1. : T = ((x1 , y1 ), . . . , (xN , yN )) N ,
M.
2. f0 () = 0.
3. m = 1, . . . , M
(a) ij =

j
(x )
f
m1 i

e
q
X

, (i, j) ST ;

j
fm1
(xi )

j 0 =1

(b)

T

:
ij hj (xi )
max ;
h||hj (xi )|1
(i,j)ST

X
j
(c) R =
ij hm
(xi )
Z=

(i,j)ST

ij ;

(i,jST )

(d) R = 0, : ;
(e) cm = 12 ln ZR
Z+R (: hm
(xi , j 6= yi ) , .. hjm (xi ) > 0, R > 0, cm < 0,
, );
(f) fm j() = fm1 () + cm hm (xi )().
4. :
!
M
X
y
exp
cm hm (x)
p(y|x) =

q
X
j=1

exp

m=1
M
X

! , y = 1, . . . , q .
cm hjm (x)

m=1

10 N (q 1)
ij j 6= yi , .. . .
{(i, j)|j 6= yi } ST .
m- hm = (h1m , . . . , h1m ) ( )
[1, 1]q . [24] 40 , .
. , 1, 40 [24] , 10.
10 .
, , .

137

9: (!) , +1 1 ,
cm . cm
, 9,
.

5.3

- ,
1990-
, . 2000-
[11] , column generation ( ),

, ,
( [32]).
, ,
lasso (. 43), . .
LPBoost (linear programming boosting).
LPBoost
LPBoost 1;
. [32]. (x1 , y1 ), . . . , (xN , yN ) N
M 2N h1 , . . . , hM . 2.3.3, - h(x) = (h1 (x), . . . , hM (x))
(h(x1 ), y1 ), . . . , (h(xN ), yN ).
M , ,
.
hji = hj (xi )
, hi = h(xi ) M - ,
c = (c1 , . . . , cM ) M - (-) , PM
h 7 j=1 cj hj M -
. .

(99)
, , (73), .
, , . (99)
, , 1,
, , . LPBoost: :
N
X
D
i max
(201)
c,,
...
i=1
yi

M
X

cj hji i 1 i N

(202)

j=1
M
X

cj = 1

(203)

j=1

138

cj 0 1 j M
i 0 1 i N.

(204)
(205)

(201) D > 0, ,
.
C (99), D N .
1

D, D = N
(201)
N
X

(i ) min
c,,

i=1

(. -SVC (166)).
1
(201)
. , D = N
1
1
1
N 1, .. N D 1. D
= N
hi (i > 0), N
.
,
( [32]) ,
, . , .
(201) M : N i (202) (203)41
N . (201)

N
N
M
M
X
X
X
X
j
L(c, , ; , ) = D
i +
i yi
cj hi + i + 1
cj ,
i=1

i=1

j=1

!!!!

. . . LPBoost: . . .

j=1

(206)
c, , , , (202205)
i 0 1 i N .
L c,
, , ,
:
min

(207)

N
X

i yi hji 1 j M

. . . LPBoost:

(208)

i=1
N
X

i = 1

i=1

0 i D 1 i N.
.

(207).

41

(204)
(205),
, .

139

!!!!

. ( , ) (207). , (201) cj (208) ( , ),


, i (, PN
, (158)). , cj = 0 i=1 i yi hji < .
i ?
column generation(

j
, hi ) ,
(201), (207), (208). m-
m 1 (208), .. m 1 hj1 , . . . , hjm1 ,
. m- (.. )
hjm , , (208) . (208) hjm i
, (207) ,
, : , (201) . ,
D , ,
.
yi ,
hj (x) 1, (208) ,
, ,
N
X
i=1

i yi h(xi ) =

X
i|h(xi )=yi

X
i|h(xi )6=yi

i = 1 2

X
i|h(xi )6=yi

i max .
h

i N1 , .. .
11 (LPBoost).

140

!!!!

11: LPBoost
1. : T = ((x1 , y1 ), . . . , (xN , yN )) N .
2. i =

1
N,i

= 1, . . . , N .

3. = 0.
4. m, m = 1
(a) hm T ;
(b) E hm
T , ,
1 2E =

N
X

i yi hm (xi ) ,

i=1

;
m
(c) hm
i = h (xi );

(d)
min
,

N
X

i yi hji 1 j m

(209)
(210)

i=1
N
X

i = 1

i=1

0 i D 1 i N ,
PN
= i=1 i yi hm
i
.
Pm1
5. : f () = sign j=1 cj hj (), cj 0
j- (210) (209).

11 ,
- .
, (,
-) ,
, ,
.
, , ,
, .
LPBoost , (201).
, , ( , ),
, ., , [24].

141

5.4

,
.
,
5.1.1:
/ . (.,
, [40]), (),
.


, , !
..

,
( ), , , . , ,
. 100- 200-
1000-.
, , .

6.1

, , , , . , ,
. , , .
( 1.1.3).
. : ,
,
, , , , .
.
( 1.1.5). ,
, , .
,
, . , CART [20] C4.5
[92], , .
, , . .
142

( 1.2.5). , , .
. (missing data).

.
( 2.1.2).
.
. , , , (
) ,
(+1), . (
)
. ,
, ,
( ).
(
2.1.2). , ,
, ,
. ( ),
.
( 2.2.1). ,
, .
, . ,
. , (. )
.
( 2.2.2). , , . ,
,
.
( 2.2.3). , .
.
( 3.2). ( ) .
, , ,
: , , , , , ,
.
RBF- ( 3.3). . , , 143

. , , .
( 2.1.3, 2.2.4 4.2). . .
. ( ) ( , ) . ( ), ()
.
. , .
, ( 5.1.1).
, ,
.
,
(
) / . , . ,
.
( 2.3.3 5 5.1.1). ( , ),
, ,
. .
.

6.2

, , , ,
:
[33]
RBF- ( 3.3.1).
, (missing data), 42 (censored data),
.
:
42 , , , , - .
, , .

144

( 1.2.5)
( 2.2.1).
,

, , ,
( ). ,
- , , , -, - -
, , .
,

, .
, , [15].
, . , (HMM,
Hidden Markov Model ),
.
, ,
[94, 10].
. , , (CRF, Conditional Random Field )
[71], ( )
( ),
.

( 1.2.5). , , - , ,
, - .
, (
, ), 43 (, ) . -
- , (. (44) (46)).
80-
[80]. RVM.
43

evidence.

145

RVM () (RVM , relevance vector machine)


[112, 16, 111] . 2000
. , ,
(SVM)
, SVM, , SVM, RVM , .
RVM .
,
SVM.
.

A.1


- . ,
1. , - ,
;
2. - ;
3. -
( , , ,

);
4.
( ,
, ), - ( ,
).
13
(missing data), ( , 4 -) .
,

.
, (missing at random
(MAR)), , (missing completely at random (MCAR))44 . ,
44

. rj ,
+

1, j- 0, ; z

(observed) z (missing). MAR

146

1 , MAR,
3 , MCAR, , , MAR.
(MCAR),
.


B. [77] ( ) [29]
( ).

A.2

,
:
1. ,
, ;
2. ,
;
3. - (impute)
;
4. 0 1;
5. .
1
. , , , , ,
, - ,
, . , MCAR ,
.
2 1 , , .
, ,
.
( 5.1) , , , , . ,
- ( )
( ) .
, .
+

P (rj | z , z ) = P (rj | z ), MCAR


+

: P (rj | z , z ) = P (rj ).

147

, , A.2.1.
3 , (imputation)
.
. -
-
.
.
MCAR ( ) , (),
,
( ),
( ).
MAR, MCAR, ,
.

.
MAR , , , , , , , .
.
4 , , , .
.
- , , , 45 . , ,
.
: , , , , , . ,
, (
) .
5 MCAR;
, , .
.
45 , .

148

A.2.1

p (x, y)
d
d
Y
Y
xj j y .
p (x, y) = p (y)
p (xj |y) = y
j=1

j=1

(. (30) k
j
mk
). x +

, Dx {1, . . . , d} , Dx =
+

{1, . . . , d}\ Dx , x =x x
+

X =Xx Xx x
x . ,
, ,
,

d
Y j
X
Y
+
y
xj y .
p (x, y) = p (x, y) =
xj j y = y
(211)

j=1

xXx

jDx

, y, ,
+
, , .. x y,

X
Y j
+
p (x) = p (x) =
xj y .
(212)
y
yY

jDx

( Y = {1, 2}) -
( X = {1, 2}), . 1 , 2 , 11 ,
21 , 12 22 ( , 1, ),
1 + 2 = 1, 11 + 21 = 1 12 + 22 = 1 (1 , 11 , 12 ) [0, 1]3 . ,
, .
( 1.2.5),
(
1.2.1), (26).
. T1 = ((1, 1), (2, 2))
x = 1
R
1 11 1 11 (1 1 )(1 12 )d
pT1 (y = 1|x = 1)
=R
pT1 (y = 2|x = 1)
(1 1 )12 1 11 (1 1 )(1 12 )d

R1
R1
R1
(1 )2 (1 1 )d1 0 (11 )2 d11 0 (1 12 )d12
0
=R1
R1
R1
(1 1 )2 d1 0 11 d11 0 12 (1 12 )d12
0 1
=

1
12
1
12

1
3
1
2

1
2
1
6

=2

pT1 (y = 1|x = 2)
1
= .
pT1 (y = 2|x = 2)
2
149

,
, :
T2 = ((1, 1), (2, 2), (, 2), (, 2))
.
x = 1
R
1 11 1 11 (1 1 )(1 12 ) (1 1 )2 d
pT2 (y = 1|x = 1)
=R
pT2 (y = 2|x = 1)
(1 1 )12 1 11 (1 1 )(1 12 ) (1 1 )2 d

R1
R1
R1
(1 )2 (1 1 )3 d1 0 (11 )2 d11 0 (1 12 )d12
0
= R1
R1
R1
(1 1 )4 d1 0 11 d11 0 12 (1 12 )d12
0 1
=

1
60
1
30

1
3
1
2

1
2
1
6

=1

pT2 (y = 1|x = 2)
1
= .
pT2 (y = 2|x = 2)
4

(!
) .
. T3 = ((1, 1), (2, 2), (, 2), (, 2), (1, ))
Z
pT3 (y = 1|x = 1) =
1 11 1 11 (1 1 )(1 12 ) (1 1 )2 1 11 d

Z
+ 1 11 1 11 (1 1 )(1 12 ) (1 1 )2 (1 1 )12 d

3!3! 3!0! 0!1! 2!4! 2!0! 1!1!


1 9 8
=
+
=
+
,
7! 4! 2!
7! 3! 3!
7! 2 3

Z
(1 1 )12 1 11 (1 1 )(1 12 ) (1 1 )2 1 11 d
Z
+ (1 1 )12 1 11 (1 1 )(1 12 ) (1 1 )2 (1 1 )12 d

2!4! 2!0! 1!1! 1!5! 1!0! 2!1!


1 8
=
+
=
+5 ,
7! 3! 3!
7! 2! 4!
7! 3

pT3 (y = 2|x = 1) =


x = 1
pT3 (y = 1|x = 1)
=
pT3 (y = 2|x = 1)

9
2
8
3

+ 83
43
<1
=
46
+5

(. T2 ).
p (y=1|x=2)
. pTT3 (y=2|x=2) .
3

, (
: ) . T = ((x1 , y1 ), . . . , (xN , yN )),
, Nk
Mj
X
j
j
j
Nmk
Nk ( (31)), Nmk ( (32)) Nk =
m=1

k- j- .
150

!!!!

. x +

Dx y
PT (x, y) =

j
d
Ny + 1 Y Nxj y + 1
N +q
Nyj + M j
+

(213)

jDx

(. (35)).
. (213)
. (37) , .
A.2.2

!!!!
!!!!


, :

, , , ;
( );

.
, .
,
, , ,
, .
. ( 2.2.1)
, .
?

, , .
, , , ,
.

A.3
A.3.1

1.1.4
:
(i-):
xi (d-, );
yi (q- , );
151

!!!!

zi = (xi , yi ) -();
vi ( ) (latent variables), .
,
, :
+

xi ;
+

yi ;
+

+ +

zi = (xi , yi ) ( );

xi , yi zi , ;

ui = (vi , zi ) , .. ,
.
:
T = ((x1 , y1 ), . . . , (xN , yN )) , -
;
v= (v1 , . . . , vN ) ,
+

z= (z1 , . . . , zN ) (
) ;
z= (z1 , . . . , zN ) ;

u= (u1 , . . . , uN ) = (v, z) ( )
+

; , ( z, u) (, , ),
(, );
.
:
f p(xi , yi , vi ) , ,
F.
f p0 (f ) F,
p(xi , yi , vi ) f , , p(xi , yi , vi |f ); R
+
p(xi , yi |f ) = vi p(xi , yi , vi |f )dvi p(zi , ui |f ) (
p(xi , yi , vi |f ), ).
:
+

p(T, v|f ) p( z, u|f ) ( , ); , N


Y
+
, p(T, v|f ) =
p(xi , yi , vi |f ) p( z, u|f ) =
i=1
N
Y

+
p(zi , ui |f ).

i=1

152

d
N(,A)

d- A
d

d
N(,A)
(x) = (2) 2 |A| 2 e 2 A

(x,x)

(214)

d
N(,)
I d .
d

d
N(,)
(x) = (2) 2 e

kxk2
2

(215)

, ,
N(,A) .
, , , , .
. , E;u g(, u)
u ( -) g u
:
Z
E;u g(, u) = g(, u)(u)du ,

X
E;u g(, u) =
g(, uj )(uj ) .
j

F, , , , -. p(x, y, v|f )
, f .
, , ..
, , , y
, , . ,
,
(., , 3.3.1).
+
+
p(f | z) f z
,
Z
+
+
+
p( z, f ) = p0 (f )p( z |f ) = p0 (f ) p( z, u|f )du max ,
(216)
f

, ,
+

L(f ; z) = ln p0 (f ) + ln p( z |f ) = ln p0 (f ) + ln

p( z, u|f )du max .


f

(217)

,
,
+
. z (
)
.
. ( 3.3) ( 2.1.2).
153

xi X = Rd , i = 1, . . . , N , .
yi Y = R, i = 1, . . . , N ,
.
N
(d + 1)N , ,
+

z N ,

z N = (d + 1)N N .
(xi , yi ) vi , {1, . . . , k}.
f l : X Y, l(x) = wx + b, , k j X k Aj X , p(x, y, v|f )
v Y, v X v
:
p(x, y, v|f )

= p(y|x, f ) p(x|v, f ) Pv
1
d
= N(0,)
(y l(x)) N(
(x) Pv .
v ,Av )

(218)

,
p((x1 , y1 , v1 ), . . . , (xN , yN , vN )|f ) =

N
Y

p(xi , yi , vi |f ) .

(219)

i=1

f w
d- 0:
d
p0 (f ) = N(0,)
(w) ,

(220)

> 0 .
p(y|x, f ) p0 (f ) ,
( 2.1.2), ..
. () v
p(x|f ) =

k
X

p(x, v|f ) =

v=1

k
X

p(x|v, f )p(v)

v=1

. (x, y) (-) +

p(y| x), ..
.
A.3.2

EM
+

p( z, u|f )
+

L(f ; z) , L
154

f (217)
, , . L
L , u, :
Z

L(f ; z)= ln p0 (f ) + ln

p( z, u|f )du = ln

p0 (f )p( z, u|f )
(u)du
(u)
+

= ln E;u

p0 (f )p( z, u|f )
p0 (f )p( z, u|f )
+
E;u ln
= L (f ; z) (221)
(u)
(u)

( ). L

+

L (f ; z)=E;u ln p0 (f ) + E;u ln p( z, u|f ) E;u ln (u)


+

= ln p0 (f ) + E;u ln p( z, u|f ) + H() ,

(222)

f - ,
H()= E;u ln (u).
+

L (f ; z) max ,
f,

, L(f ; z) (217), ,
+
ln p( z, u|f )
(217).
+
, , , L (f ; z) 46
. +
L (f ; z) , expectation
maximization EM :
+

E(xpectation) () L (f ; z)
f ;
+

M(aximization) () L (f ; z) f .
46
, . 47 . 159

155

E M
EM

12: (EM);
1. : T = ((x1 , y1 ), . . . , (xN , yN )),
+

/ : z .
2. v,
(x, y, v), p0
f = f (0) .
3.
+

z,u|f

)
E L (f ; z) = E;u ln p0 (f )p(
(u)
( )
f ;
+

M L (f ; z) f ;
, f .
4. : f .

E p0 (f )
:
32 u
+
z f
+

f (u) = p(u| z, f ) =

p( z, u|f )

(223)

p( z |f )
+

L (f ; z) ,
+

Lf (f ; z) = L(f ; z).
+

. L (f ; z)
Z

ln

p( z, u|f )
(u)du R max
.
(u)
| (u)du=1

:
Z
Z
!
+
p( z, u|f )

ln
(u)du
(u)du 1
0=

(u)
!
Z
+
p( z, u|f )
=
ln
1 (u)du .
(u)
, u
+

p( z, u|f )
ln
1=0 ,
(u)
156

..

p( z, u|f )
.
e+1
,
Z
Z
+
+
p( z, u|f )
p( z |f )
1 = (u)du =
du
=
,
e+1
e+1

+
p( z, u|f )
+
(u) =
= p(u| z, f )
+
p( z |f )
(u) =


+
L (f ; z). , ,
:
+

Lf (f ; z)= ln p0 (f ) + Ef ;u ln

p( z, u|f )
+

p(u| z, f )

= ln p0 (f ) + Ef ;u ln p( z |f )

= ln p0 (f ) + ln p( z |f ) = L(f ; z)
(221).
7 12
+
+
p(f | z) f , L(f ; z) (217),
.
M , :
+

33 f , L (f ; z),
+

ln p0 (f ) + E;u ln p( z, u|f ) = L (f ; z) H()

(224)

E, -, M
+
+
, -, L (f ; z) = L(f ; z) .
M . ,
Rn
x X c (
) ( )
p(x|) =

h(x) ((x),)
e
,
g()

: X Rn Rn ,
h : X R g : R
. g
R
p(x|)dx = 1:
Z
g() = h(x)e((x),) dx .

157

X Y, X .
(224)
N
!

X +

p0
+
0=
ln p0 (f ) + E;u ln p( z, u|f ) =
ln N (f ) + E;u
(zi , ui ) ,
f
f
g
i=1
, p0 g,
f , .
. , (214), (, A), 2
= ( 12 A1 , A1 ) Rd +d (x) = ((xj xk , 1 j, k
d), x) ( - A d2 + d, d(d+1)
+ d). h g.
2
32 33 12, E (223)
M. 13, ,
, M
E.

13: (EM);
1. : T = ((x1 , y1 ), . . . , (xN , yN )),
+

/ : z .
2. v,
(x, y, v), p0
f = f (0) .
3.
E f f (223) u
+

f (u) = p(u| z, f ) ;
( )
f 0
+

Q(f 0 , f ; z) = ln p0 (f 0 ) + Ef ;u ln p( z, u|f 0 ) ;

(225)

M Q(f 0 , f ; z) f 0
f
f 0 6= f , f = f 0 , .
4. : f .

. +
p(f | z), -

158

!!!!
EM:

E M

p( z, f ) (216), p( z |f ). +

p( z |f ) p( z, f ) =
+

p0 (f )p( z |f ) ,
.
.
f , . , - . .
,
.
- .
,
, 195060- . , - [8]
13. [33]
47 , E M ,
. 12 1980-
, [88].
A.3.3

EM

13
. 3.3.1
. x Rd , ,
, v 1 k. f ( 3.3.1 W)
d
p(x, v|f ) = Pv N(
(x),
v ,sv )

v = 1, . . . , k .

(226)

(. (127)) 2k Pv sv
Pk
v=1 Pv = 1 k d- v .
p0 (f ) f 1, .. .
, 3.3.1, xi , E 13 f
f , vi .
,
+

ln p( z, u|f ) =

N
X

ln p(xi , vi |f )

i=1

N - (v)
(marginal) i (vi ):
+

E;u ln p( z, u|f ) =

X
v

(v)

N
X

ln p(xi , vi |f ) =

N X
k
X

i (vi ) ln p(xi , vi |f ) .

i=1 vi =1

i=1

47 ( (225)), .

159

!!!!



QN
i (!) (v) = i=1 i (vi ),
E P {j|xi , f } (. (128))
(225) p0 = 1

Q(f 0 , f ; z) = Ef ;u ln p( z, u|f 0 ) =

N X
k
X

P {j|xi , f } ln p(xi , j|f 0 )

i=1 j=1

(. (131), ,
, Q). M
+

Q(f 0 , f ; z) f 0 ( 3.3.1 ).

. , . x +

X =Xx Xx ,
xi . +

X =x x ,
+

x Xx x Xx . dx dx , +

, , x, x[]=x x
x,
.
.
34 13 (226)
E
+

Pvi N

P {xi , vi |f }

P {vi | xi , f } = P
k

+
j=1 P {xi , j|f }

dxi

(xi )

((vi )xi ,svi )


+

Pk
j=1

dxi

Pj N

dxi
+

,
+
(xi )

((j )xi ,sj )

p(xi , vi | xi , f )=P {vi | xi , f }p(xi | xi , vi , f )

=P {vi | xi , f }N

(xi )

((vi )xi ,svi )

Q(f 0 , f ; z)=

N X
k Z
X
i=1 j=1

p(xi , j| xi , f ) ln p(xi , j|f 0 )d xi

0 2
kx
[
]

k
+
d
s
d
i j
xi j
+
j

=
P {j| xi , f } ln Pj0 ln(2s0j )
0
2
2s
j
i=1 j=1
N X
k
X

M Q f 0 = (P10 , . . . , Pk0 , 01 , . . . , 0k , s01 , . . . , s0k )

160

Pj0

0j

s0j

1 X
+
P {j| xi , f }
N i=1
PN
+
i=1 P {j| xi , f }xi [j ]
N Pj0

PN
+
0 2
k
+
d
s
P
{j|
x
,
f
}
kx
[
]

i
i j
xi j
j
i=1
dN Pj0

(. (136)(138)).
3.3.1
.
13 . 153
(218)(220), : , ,
, +
+
. P {vi | xi , yi , f } ( P {vi | xi
+

, f }, yi ) Q(f 0 , f ; z)

Xxi , (
Q) xi , .
+

Q(f 0 , f ; z) f 0 , f 0 (217) (132)(135) (65).


+

Q(f 0 , f ; z) ,
.
- , . Z Z = X Y ( = |X |Y , Z, |X X |Y Y ),
(w = w|X w|Y , Z w = w|X |X + w|Y |Y )
(A = A|XX A|XY A|Y X A|Y Y , , Z
A(, ) = A|XX (|X , |X ) + A|XY (|X , |Y ) + A|Y X (|Y , |X ) + A|Y Y (|Y , |Y )).
+

X =Xxi Xxi
:
+i

+i

++i

+i

+i

= , w = w w A = A A A A . M 0

M (, A ) (, Xxi )
, .. M ,
M .
i X , x.
zi = (xi , yi )
Z = X Y, , , Y.

, .
+

4 N(,A) X =X X

Z
+

+
p(x) = N(,A) (x)d x= N + + (x)
( , A)

161

!!!!

p(x | x) =

p(x)
+

=N

( (,A, x),A)

p(x)
A=

(x) ,

1
++
+ + 1
+ +

+
, (, A, x) = A A1 (x ) A= A1
A1 A1 A A1
+

( A A !).

5 N(,A) (x)
l(x) l(), R(x , x )
Z
R(x , x )N(,A) (x)dx = tr(RA1 ) + R( , ) .

(218)(220):
1
p(x, y, v|f )=Pv N(v ,Av ) (x) N(0,)
(wx + b y))
d
p0 (f )=N(0,)
(w)

d+1
d
1
N(,A)
(x) N(0,)
(wx + b y) = N(,B)
(z) ,

1
>
B = A1 0 + 1 (w (1)) (w (1))
, = (w+b) (
) z = x y. |B| = |A|.

f j Bj ((d+1)) j Aj (d-) . :
i

ij

Pij

= P {j| zi , f } = P {j| xi , yi , f } ,

xij
yij

= xi [j ] =xi j ,
= yi [wxi [j ] + b] ,

zij
w1

(j , Bj , z+i ) ( 4),
+

+ +

= xij yij = (xij , yij ) ,


= w (1) = (w, 1) .

35 13
E
Piv =P {v|

+
zi , f }

Pv N
=P
k
j=1

162

+i +i
(v ,Bv )

Pj N

(zi )
+

+i +i

(j ,Bj )

(zi )

p(zi , v| zi , f )=Piv p(zi | zi , v, f ) = Piv N

( iv ,Bv )

Q(f 0 , f ; z)= ln p0 (f 0 ) +

N X
k Z
X

(zi )

p(zi , j| zi , f ) ln p(zi , j|f 0 )d zi

i=1 j=1
N X
k
X
1
2
= (ln(2) + kwk ) +
Pij Iij ,
2
i=1 j=1

Z
Iij =

p(zi | zi , j, f ) ln p(zi , j|f 0 )d zi


ln |A0j | + ln 0
1 i i
d
ln(2)
tr(Bj (Bj0 )1 )
2
2
2
1
1 i
(A0j )1 (xij 0j , xij 0j ) 0 (w0 xij + b0 yij )2
2
2

= ln Pj0

M Q f 0 = (P10 , . . . , Pk0 , 01 , . . . , 0k , A01 , . . . , A0k , w0 , b0 , 0 )


: w0 b0

!
N X
k
N X
k

X
X
i

>
>
w0 I d +
Pij Bj
+ xij (xij ) + b0
Pij (xij )

i=1 j=1
i=1 j=1
XX

!
N X
k
X
i
>
=
Pij Bj
+ yij (xij )

i=1 j=1

w0

N X
k
X

Pij xij + b0

i=1 j=1

XY

N X
k
X

Pij =

i=1 j=1

N X
k
X

Pij yij ,

i=1 j=1

f 0
N

Pj0

0j

1 X
Pij
N i=1

PN
0>
y
P
x
+
w
ij
ij
ij
i=1
PN
i=1

A0j
0

N Pj0

Pij

(xij 0j )(xij

>
0j )

dN Pj0

i
Bj

XX

N X
k
X
i
1
>
>
tr (w1 ) w1
Pij (zij j0 )(zij j0 ) + Bj
N
i=1 j=1

( , ), 34,
46 .
, k .
. , -.
x, ,
f , w b:
163

!!!!

y = wx + b, .
:
y =E

p( x| x,f )

=b + w

(wx + b) = b + wE

k
X

Z
Pj

xN

=b+ w x + w

k
X

Pj

x=b+w

( (j ,Aj , x),Aj )

j=1
++

p( x| x,f )

k
X

Pj

xp(x| x, j, f )d x

j=1

(x)d x= b + w

+
Aj A1
j

(x

Pj

x (j , Aj , x)

j=1

!
+

k
X

+
j )

j=1

( 4 5). ;
.
A.3.4

EM

12 13, , , . ?
f (m) , m- ,
()
+

L(f (m) ; z)?


+

7 L(f (m) ; z) .

+

{f |L(f ; z) const}

(227)

+
p0 (f ) , L(; z) ,
+

L(f (m) ; z) (!). ,


+
- L(; z).
+

( L(f ; z), z, f (0) ),


+

L(f (m) ; z) () ,
( ),
.
f (m) ,
(227) . [122] , +
f (m) . , , L(; z)
, , (225) [122, Corollary 1].

+
L(; z),
. , [33] , (225)
. ,
, , .
164

!!!!

, EM ,
,
+
L(; z) , .
EM , - , ,
, ,
.
EM ,
:
.
A.3.5

EM

( 12) :
+

E / M L (f ; z) f , , .
(generalized expectation maximization GEM )
+

L(f ; z),
EM
([88]).
+

L (f ; z) f M 32 7
A.3.4 ( ,
+

). L (f ; z) E
+

32 , L(f ; z) +

. L (f ; z) ,
+

L (f ; z),
+

( 0 , f 0 ) 32 , L0 (f 0 ; z) = L(f 0 ; z).
36 ( 0 , f 0 ) ()
+

L (f ; z), f 0 (, )
+

L(f ; z)
. p(x, y, v|f ) f 7 f ( (223)). f 0 f 7
+

Lf (f ; z), 32 Lf (f ; z) = L(f ; z). 32 ,


.
, ,
, , , , .
[88] E, , , () M E,
. .
, -

165

E (223)
+

f (u)=p(u| z, f ) =

p( z, u|f )
+

p( z |f )

QN
=R

QN
u1 ...uN

N
Y

p( z, u|f )
+

p( z, u|f )du

i=1

i=1

=R

p(zi , ui |f )

p(zi , ui |f )du1 . . . duN

N
Y

R
i=1

ui

p(zi , ui |f )
+

p(zi , ui |f )dui

p(ui | zi , f )

i=1
+

fi (ui ) = p(ui | zi , f ), +

L (f ; z) (u) =
N
Y
i (ui ).
i=1
+

p0 (f )p( z, u|f )
L (f ; z)=E;u ln
= E;u
(u)
+

= ln p0 (f ) +

N
X

N
X

p(zi , ui |f )
ln p0 (f ) +
ln
i (ui )
i=1

Ei ;ui ln

i=1

p(zi , ui |f )
i (ui )

, i ( , ), +

fi (ui ) = p(ui | zi , f ) (. 32).


i ,
i . , ..
i- ,
E,
14.

166

14: (GEM);
1. : T = ((x1 , y1 ), . . . , (xN , yN )),
+

/ : z .
2. v,
(x, y, v), p0
f = f (0) .
3.
(a) i 1 N
+

E ( ) L (f ; z) i ui i-
f ;
+

M ( ) L (f ; z) f
;
(b) , f .
4. : f .

14 12
(. 83).

, , ( ),
. ,
, , [67].

B.1

,
Y , - . , y (x, y)
- x - (, ). ,

, p(y|x). :
, .
(Y = R+ , )
(Y = Z+ , );
167

, ,
, .
( -) Y

F (t) = P { < t} .

(228)

,
p(t) =

d
F (t) .
dt


p(t) = F (t + 1) F (t) .
, F () 1.
F p
(survival function)
S(t) = 1 F (t) = P { t} ,

(229)

t, (hazard )
p(t)
p( = t)
h(t) =
=
= p( = t| t) ,
S(t)
P { t}

(hazard)
(230)

() t ,
.
48 . ,

d S(t)
d
h(t) = dt
= ln S(t)
S(t)
dt

Z
t

S(t) = e

h( )d
0

(231)


h(t) =

S(t) S(t + 1)
S(t + 1)
=1
,
S(t)
S(t)

S(0) = 1
S(t) =

t1
Y

(1 h(i)) ,

(232)

i=0

p(), F (), S() h()


. , h()
, 1,
S() , 1
R , S() = 0

h(t)dt
P

h(n) .
48

, ,
h(t) = lim

&0

S(t) S(t + )
P {t < t + }
= lim
&0
P { t}
S(t)

168

. , , (228) (229)
.
F (t) = P { t} S(t) = P { > t}, (230).
. , S > (t)= P { > t} ,
.
n {S1 (t), . . . , Sn (t)}
, S(t) , t:
n

S(t) =

1X
Si (t) .
n i=1

(233)

p(t) F (t)
, h(t) .
p(), F (), S() h():
h(t) = h. S(t) = eht , F (t) = 1 eht p(t) = heht .
, t 21 = lnh2 .
,
R
0 tp(t)dt = h1 .
S(t) = p1 eh1 t + p2 eh2 t (
p1 : p2 , p1 + p2 = 1).
h1 t
h2 t
2 h2 e
h(t) = p1 hp11 eeh1 t +p
p1 h1 +p2 h2
+p2 eh2 t
t = 0 min(h1 , h2 ) t .
h(t) =
1
(t+1)h

h
t+1 .

S(t) =

1
,
(t+1)h

F (t) = 1

h
.
(t+1)h+1

p(t) =
,
.
S(t) = peht + (1 p) ( pheht
p : (1p). h(t) = peht
=
+(1p)
p
h(ln 1p
ht) t . S() = 1 p > 0.
h(t) = ht. S(t) = e
2

ht2

hte

ht2
2

, F (t) = 1 e

ht2
2

p(t) =

. .

.
, k, > 0 hk (t) =
t k
t k
t k
k1
ktk1
, S (t) = e( ) , F (t) = 1 e( ) , p (t) = kt e( ) .
k

.
, (, ),
. . . .

169

B.2

, , :
, , . ,
. (censored data). , ,
-. ,
, .
49 .
.
T = ((x1 , t1 , 1 ), . . . , (xN , tN , N )) : xi X , ti Y
i {0, 1}, , . i = 0 ( ) ti
yi i- , i = 1 ( ) ti
i- , yi ti . . i = 0 , ti
, , i- , i = 1 i-
ti .
, , y x. , x, y z (, ),
(x, y, z) X Y Y
() , ,
p(y|x) p(z|x) x.
y z t = min(y, z) , ,
z y. ,
, ,
( ) , - , , . :
.
.

. -
,
, . , (missing data).
A, .

B.3

, P () , , . , , 1.2.4
49 (truncated data)

.

170

: , . ,
, .. T =
((x1 , t1 , 1 ), . . . , (xN , tN , N )) , (18),
,
Y
Y
p(T |) =
p (ti |xi )
S> (ti |xi ) .
(234)
i|i =0

i|i =1

, S> (t|x) = S (t|x)


(229)
(230)
p(T |) =

p (ti |xi )

i|i =0

S (ti |xi ) =

i|i =1

h (ti |xi )

i|i =0

N
Y

S (ti |xi ) .

(235)

i=1

, - , . ,
S (231) (232) h , ,
(235) .
, . h (t|x) =
r (x), , . S(t|x) =
er (x)t

N
Y
Y
Y
PN
h (ti |xi )
p(T |) =
S (ti |xi ) =
r (xi ) e i=1 r (xi )ti .
i|i =0

i=1

i|i =0

, , rw,b (x) = ewx+b , c p0 (w) = e(w) , ,


. p(T, w, b)
:
ln p(T, w, b) = (w) +

N
X

(i 1)(wxi + b) + ti ewxi +b min .


w,b

i=1

(236)

( 2.2.2, (86) ), (w, b),


, . ,
(w, b) , t
x
S(t|x) = ete

wx+b

( ) .
, . .
171

B.4


, (233). , T xi .
, .
, T n t1 , . . . , tn , 1930- . : F (t) - Fn (t) = n1 #{i|ti <
t}. ,
( ) < t .
. ( [45, 23], ) , Fn
n F 50 .
, S(t)
Sn (t) = n1 #{i|ti t}. ti . , :
=1
S(0)

S(t
+ )
S(t)
+ ) = S(t)

S(t
1

S(t)

(237)

#{i|t ti < t + }

= S(t)
1
. (238)
#{i|ti t}

(238) ,
t, - , .. ti
i.
t(1) < . . . <t(j) < . . . < t(m)
{ti , i = 1, . . . , n}, j = #{i|ti = t(j) } j = #{i|ti t(j) }.
S

Y
j

S(t) =
1
.
(239)
j
j|t(j) <t


T = ((t1 , 1 ), . . . , (tn , n ))
, . ,
[63] 51 , (237, 238).
, 1 0 ,
50

,
> 0

lim P {sup |Fn (t) F (t)| > } = 0 .

51

[63] . - , product limit ( ), [17] 1912


.

172

(!) ( ):

S(0)

+ ) =
S(t

#{i|i = 0 & t ti < t + }

S(t) 1
.
#{i|ti t}

(240)
(241)

, (241)
, (t, t + )
, (
S(t + ) = S(t)).
, ( = 1).
T t(1) < . . . <t(j) < . . . <
t(m) {ti |i = 0}, j =
#{i|i = 0, ti = t(j) } , ,
, j = #{i|ti t(j) } (
) t(j) .

Y
j

S(t) =
1
,
(242)
j
j|t(j) <t

(239) .
[63] (242) ( -), . (242)
.
12 T - (242) .
. .
. , . , ,
, (.. ),
, (234), (235) .
, ,
,
-, ,
.
S (ti , 1) S > (ti ), (ti , 0) S(ti ) S > (ti ). - ,
T ,
Y
Y
LT, (S) = P {T |S} =
(S(ti ) S > (ti ))
S > (ti ) .
(243)
i|i =0

i|i =1

S (
, )
173

, 1 . sj = S > (t(j) ) = S(t(j) + )


, j-
, s0 = 1. sj
S(t) LT, (S).
,

s0 = 1 t t(1)
sj
t(j) < t t(j+1) , j < m .
S(t) =
(244)

sm
t(m) < t
- (243)

j
j j
m
m
Y
Y
sj
sj

, (245)
LT, (S) =
(sj1 sj )j sj j j j+1 =
1
sj1
sj1
j=1
j=1
m+1 = 0.
sj
sj1
.
s
k m,

sj
sj1

j j
j

(),

k
Y
j
sk =
1
.
j
j=1

(246)

( ) (244, 246), -
(242). , ,
, sk .
. sm > 0, ..
. t > t(m) S(t)
sm - (243) .
.
,
-, ,
: , . ,
, P {0 t < t(1) }
0.

B.5

, y x . ,
p(y|x) - (, ,
F (y|x), S(y|x)
h(y|x)) (. . . )
, . 52 ,
52 y, B.2, (x, y).
x.

174

!!!!

,
. , , .. T
X, , (. (235))
Y
Y
p(T |X) =
S(ti |xi )
p(ti |xi )
i|i =1

n
Y
i=1

i|i =0

S(ti |xi )

exp

h(ti |xi )

i|i =0
n Z
X
i=1

ti

h( |xi )d
0

h(ti |xi )

i|i =0

( ). , .
[30] ( ) ( ),
, . ,
, .
B.5.1

( , proportional hazard model ) , h(t|x)


h(t|x) = q(t)r(x) .

(247)

(247) : q , r .

S(t|x) = (Q(t))r(x) ,

(248)

Q(t) ,
q(t):
Rt
Q(t) = e 0 q( )d .

.
. [30] - . p(=t)
0
h(t) = Pp(=t)
{t} ( (230)) h (t) = P {>t}
, (247):
h0 (t|x) = q 0 (t)r(x) .

(249)

(247) (249)
, (249)
h(t|x)
q(t)
=
r(x) .
1 h(t|x)
1 q(t)

175

(250)

B.5.2

,
r, q, -.
( ,
, )
(partial likelihood ): () ,
() , . ,
, , ,
. ,
12, ,
.
. 173 t(j) , j j
Nj ={i|i = 0, ti = t(j) } Rj ={i|ti t(j) } , t(j) , , R0 = {1, . . . , n}.
, j = #Nj j = #Rj . X = (x1 , . . . , xn )
, XA
A {1, . . . , n}, Y = ((t(1) , 1 ), . . . , (t(m) , m ))
T . X Y T
.

p0 f ( p(t|x),
S(t|x), h(t|x) )
p0 (T |X, Y, f ) =
=

m
Y
j=1
m
Y
j=1

m
Y

p(Nj |Rj , X, Y, f )
p(Nj |Rj , XRj , t(j) , j , f )
Y
iNj

p(ti |xi )
Y

j=1

m
Y

iNj

p(ti |xi )

M Rj | #M =j iM

S(ti |xi )

iRj \Nj

S(ti |xi )

iRj \M

h(ti |xi )
Y

j=1

h(ti |xi )

M Rj | #M =j iM

.. .
(247) q(yi ) ,
Y
r(xi )
m
Y
iNj
0
X
Y
p (T |X, Y, r) =
.
(251)
r(xi )
j=1
M Rj | #M =j iM

, 176

(251)
Y
r(xj )
X
p0 (T |X, Y, r) =
.
r(xi )
j|j =0

(252)

i|ti tj

,
(252), , (251).
B.5.3


( ) r , , . 53
r(x) = ewx

(253)

, , .
Y
ewxj
X
p0 (T |X, Y, w) =
(254)
ewxi
j|j =0
i|ti tj

( ) , w. , , . , ,

X
X
ln
ewxi wxj min
L(w|T ) = (w) +
(255)
j|j =0

i|ti tj

(w) = log p(w) w, L(w|T ) .


, (255)
() .
. (255) w, .
. ,

X
X
wx
ln
e i wxj
j|j =0

i|ti tj

w .
, , ..
(w) = kwk1 , , . [109].
53

x,
, , .
- (247)
r q .

177

!!!!

B.5.4

, (247) r(x)
- , q(t) , ,
, ( , ).
- (
12), .
12 Q
t(1) < . . . < t(m) ,
, . Q0 = Q(0) = 1, Qj = Q(t(j) + ). ,
( ),

m
Y
Y
Y

P (T |r, Q) =
S(t(j) |xi ) S(t(j) + |xi )
S(t(j) + |xi )
j=1

m
Y
j=1

m
Y
j=1

iNj

iRj \(Rj+1 Nj )

(Qj1 )r(xi ) (Qj )r(xi )

iNj

iNj

Qj
Qj1

r(xi )

(Qj )r(xi )

iRj \(Rj+1 Nj )

Y
iRj \Nj

Qj
Qj1

r(xi )

qj =
Qj
Qj1 [0, 1] (j-) :
Y
Y
r(x )
r(x )
1 qj i
qj i max .
(256)
iNj

iRj \Nj

qj [0,1]

qj qj = 0,
(256),
X r(xi )
X
=
r(xi ) .
(257)
r(xi )
iNj 1 qj
iRj
. , (257)
(256).
(257), , ,
qj ,
. , m qj , Qj
Qj = k=1 qk Q.
, , .. Nj , Nj = {ij }, (257) :

! 1
r (xi )
r xij
j
qj = 1 P
,
(258)
r(x
)
i
iRj
Q :

r(x1 )
i

Q(t) =

Y
i|i =0 &

r(xi )

1 X

r(xk )
ti <t
k|tk ti

178

(259)

!!!!

(258) (259) :

!
r xij
1
1
ln 1 P
P
r(x
)
r xij
i
iRj
iRj r(xi ) ,
e
qj = e
(260)

P
r xij iRj r(xi ), .. ,

X
1
,
P
Q(t) exp
(261)
k|tk ti r(xk )
i|i =0 & ti <t

.
B.5.5

, . (247, 253) ( B.5.3), ..


w, S(t|x)
x (248)
(253)
( B.5.4). ,
(261)

X
1
,
P
S(t|x) exp ewx
wxk
k|tk ti e
i|i =0 & ti <t

(259)
ew(xxi )

S(t|x) =

Y
i|i =0 &

1 X w(x x )
i
k

e
ti <t

k|tk ti

. (
w) .
, ,
Q , , .
. ,
(, ).
,
.
(, k),
.

( ,
, , .) , X , , ..
179

.
, (. 5) ,
,
, .

,
, . ,
, , ,
-;

, - ( , ..)
(, , , .. );
,
;

, ;

, ,
, ;
,
(, , ..) .
,
- : , - , . (
) D.
X [m,n] (xm , . . . , xn ),
Y [m,n] (ym , . . . , yn ) .. , 1.

X = (X1 , . . . , XN ) = (X1[1,n1 ] , . . . , XN [1,nN ] ), ,

Y. j- i-

xij = Xi[1,ni ] j .

C.1

, ( 1.1.4)
T = ((x1 , y1 ), . . . , (xN , yN )) , xi yi
x, ,
y, p(y|x). -
180

. 1.3 ,
, ,
.
,
, . ,
. X [1,k]
xk+1 p(xk+1 |x1 , . . . , xk ). xi
(
) () , . , xi = 1 , i- ( , )
, xi = 0 .
B .
X = (X1[1,k1 ] , . . . , XN [1,kN ] )
Y = (Y1[1,k1 ] , . . . , YN [1,kN ] ).
X [1,k] Y [1,k] . , (xi , yi )
,
. , ,
, xi - i- , yi .
B
. , ,
. , , ,
, , ,
ffi .
. X = (X1[1,k1 ] , . . . , XN [1,kN ] ) Y = (y1 , . . . , yN ), , .
X [1,k] (..
). , ( ),
, , .

, .
.
, , .. , .

181

X [1,k] X [1,k1 ] , X [k1 +1,k2 ] , . . . , X [kl1 +1,k] Y [1,l] . ,


.
.
- .

,
, ,
. ,
( , , ) ,
, . ,
.
() ( ), ,
, .
C.3 , ,
, .
C.4, , ,
. ,
.

C.2

X
pk (x1 , . . . , xk ), xi X , k = 1, 2, . . .,

pk x1 , . . . , xl , X , . . . , X = pl (x1 , . . . , xl ) l < k .
| {z }
kl

X ,
.
- .
, ,
pk (x1 , . . . , xk )
pk (x1 , . . . , xk ) = pk|k1 (xk |x1 , . . . , xk1 )pk1 (x1 , . . . , xk1 )
...
= pk|k1 (xk |x1 , . . . , xk1 ) . . . p2|1 (x2 |x1 )p1 (x1 ) .

(262)

, k! k . ,
182
















. 6: () .

. 7: .

, (262)
( ).
, . , ,
.
,
pk|k1 (xk |x1 , . . . , xk1 ) (262)
, m , , k, ..
pk|k1 (xk |x1 , . . . , xk1 ) = pk,m (xk |xkm , . . . , xk1 ). ( , ) m.
pk,m k, .
, () m
X (, )
1 X k m ( ),
.
: p1 (x1 ) p,1 (xi+1 |xi ). (262)
:
pk (x1 , . . . , xk ) = p,1 (xk |xk1 ) . . . p,1 (x2 |x1 )p1 (x1 ) .
(263)
X , (: ), , - .

a, b, c, d, . . . (, x1 , . . . , xk , . . .) p(| ) ( - , )
54 . ,
, p(a|b, c, . . .) b, c, . . . a. , . 6,
. 7. , .
54 , , , -
, .

183

!!!!

!!!!

  

   






  

   








. 8: : xi () yi . . .

. 9: . . . yi xi .

, . , (xi , yi ), .. ( ,
), (. . 182)
( , ).
,
, .

p(xi , yi ) = p(yi )p(xi |yi ) p(xi , yi ) = p(xi )p(yi |xi ), , . 8 9. ,
. 8 , , ( ,
).
(. 1.2.3); , ( 1.2.5) ( 2.2.1).
. 9 () ; , ,
( 2.2.2).
,
, , , .
,
(. 182), . 55 p(x1 , y1 , . . . , xk , yk )
p(x1 , y1 , . . . , xk , yk ) = p(xk |x1 , y1 , . . . , xk1 , yk1 , yk )p(yk |x1 , y1 , . . . , xk1 , yk1 )
p(x1 , y1 . . . , xk1 , yk1 )
...
55

p ,
. !

184

   


   




. 10: (
, ) xi (, ) yi .

= p(xk |x1 , y1 , . . . , xk1 , yk1 , yk )p(yk |x1 , y1 , . . . , xk1 , yk1 )


p(x2 |x1 , y1 , y2 )p(y2 |x1 , y1 )p(x1 |y1 )p(y1 ) .
(264)
, p(xi | . . .) i- ( , ,
) yi ( ),
..
p(xi |x1 , y1 , . . . , xi1 , yi1 , yi ) = p(xi |yi ) ,
p(yi | . . .) i- (, ), ..
p(yi |x1 , y1 , . . . , xi1 , yi1 ) = p(yi |y1 , . . . , yi1 ) .
, (263) ,
p(yi | . . .) yi1 :
p(yi |y1 , . . . , yi1 ) = p(yi |yi1 ) .
(HMM , hidden
Markov model ) . 10. xi
, yi
. , , .. p(xi |yi ) p(yi |yi1 )
() i, (263), (
, : p(xi |yi ), p(yi |yi1 ) p(y1 )), :
p(x1 , y1 , . . . , xk , yk ) = p(xk |yk )p(yk |yk1 ) p(x2 |y2 )p(y2 |y1 )p(x1 |y1 )p(y1 ) . (265)
. ,
, .

C.3

(HMM, Hidden Markov


Model)

,
. ,
185

. 181181 , , .
-.
C.3.1

,
p, (HMM) (265),
. 10, () ,
, Y = {s1 , . . . , sn }
X = {o1 , . . . , om }. n m
:
n- - : i = P {y1 = si },
n n- (transition matrix ) A: aij = P {yl+1 = sj |yl = si }

n m- (emission matrix ) B: bij = P {xl = oj |yl = si },


( ):
1. ((n 1) + n(n 1) + n(m 1))-
. W= (, A, B)
. , , .
( ) . : n ( ,
, . 181, ) , ,
m (
). ,
. , A B, ..
. , ..
.
HMM
A, , , HMM.
, si , i- j- ,
aij . . 11 HMM ,
9 .
. 12 HMM , : ,

186


























. 11: () : .

. 12: () :
.









































. 13: LR- .
1.
/ . -
P
a
=
1
LR- ij
j
.

(. 6) (. 7) . ,
.

.
:
. , ,
(. 181).
, ,
, , ,
,
( ) , .. . LR- (left-right model ). -
. 13. , LR- : = (1, 0, . . . , 0).
LR-
. , ( !)
, .. ,
187

, .
LR- ( !)
, LR-,
. . LR- , .
(. . 181)
,
.
. 192
.
HMM
W = (, A, B) , , .
, Y [1,L] X [1,L]
P {X [1,L] |Y [1,L] } =

L
Y

byl xl ,

(266)

l=1

Y [1,L]
P {Y [1,L] } = y1

L1
Y

ayl yl+1

(267)

l=1

( (265)). X [1,L] , Y [1,L] ,


P {X [1,L] , Y [1,L] } = P {Y [1,L] }P {X [1,L] |Y [1,L] } = y1

L1
Y

ayl yl+1

l=1

L
Y

byl xl

(268)

l=1

2L 1 , X [1,L] , ,

!
L1
L
X
X
Y
Y
P {X [1,L] } =
P {X [1,L] , Y [1,L] } =
y1
ayl yl+1
byl xl . (269)
y1 ,...,yL

Y [1,L]

l=1

l=1

:
2LnL . 10

100.
- (forward-backward
algorithm). X = X [1,L] l (i, X)
X
l (i, X) = P {X [1,l] , yl = i} =
P {X [1,l] , Y [1,l] } .
(270)
Y [1,l] |yl =i

l:
1 (i, X)

= P {x1 , y1 = i}
188

Forwardbackward
:

l+1 (i, X)

= i bix1
n
X
=
P {X [1,l] , yl = j}P {xl+1 , yl+1 = i|X [1,l] , yl = j}
j=1

n
X

l (j, X)aji bixl+1

1 l < L ,

j=1

P {X} =

n
X

L (i, X) .

(271)

i=1

P {X}, l (i, X) 1 l L 1 i n
3Ln2 . , n2 .
n0 , P {X} 3Ln0 .
n , ,
m0 , (.. )
, .. n0 m0 n. m0 , m0 m ( ).
,
.
. X = X [1,L]
l (i, X)
l (i, X) = P {X [l+1,L] |yl = i}
(272)

Forwardbackward
:

1 l L 1 i n 3Ln0 :
L (i, X) =
l (i, X) =

1
n
X

aij bjxl+1 l+1 (j, X) l = L 1, . . . , 1 .

j=1

!!!!

1 l L
n
X

l (i, X)l (i, X) = P {X} .

i=1

. l (i, X) l (i, j, X),


l (i, X)l (i, X)
P {X}
l (i, X)aij bjxl+1 l+1 (j, X)
= j|X} =
?
P {X}

l (i, X) = P {yl = i|X} =


l (i, j, X) = P {yl = i, yl+1

(273)
(274)

: X
L HMM n n0 (n n0 n2 ) O(Ln0 ) Ln
l (i, X) l (i, X),
X ( ) (271)
.
,
100 ,
189

!!!!

!!!!

P {X} ,
. l (i, X) l (i, X) - , .. , , . l (i, X), l (i, j, X)
.. . , ( )
.
l (i, X) = P {y1 = i, X [1,l] } l l0 (i, X)= P {y1 = i|X [1,l] }, l (i, X)=
P {y1 = i, xl |X [1,l1] } l (X)= P {X [1,l] |X [1,l1] }:

Forwardbackward

1 (i, X) ( = P {y1 = i, x1 }) = i bix1


n
X
1 (X) ( = P {x1 }) =
1 (i, X)
i=1

1 (i, X)
1 (X)
n
X
l+1 (i, X) =
l0 (i, X)aij bjxl+1
10 (i, X) =

j=1

l+1 (X) =
0
l+1
(i, X) =

n
X

l+1 (i, X)

1 l < L .

i=1
l+1 (i, X)

l+1 (X)

l (i, X) = P {X [l+1,L] |yl = i}


P {X [l+1,L] |yl = i}
l0 (i, X)=
:
P {X [l+1,L] |X [1,l] }
L0 (i, X)

l0 (i, X)

1
Pn
j=1

0
aij bjxl+1 l+1
(j, X)

l (X)

l = L 1, . . . , 1 .


. l0 (i, X),
l0 (i, X) l (X):
= l0 (i, X)l0 (i, X)
0
l0 (i, X)aij bjxl+1 l+1
(j, X)
l (i, j, X) =
l+1 (X)
Pn Pn
l0 (i, X)aij bjk
i=1
Pj=1
l (k, X) =
.
n
0
i=1 l (i, X)
l (i, X)

P {X} :
ln P {X} =

L
X

ln l (X) .

(275)

l=1

. .
HMM
P {X} , ( ) W.

190

!!!!

HMM

37
ln P {X}
i

ln P {X}
aij

ln P {X}
bik

1 (i, X)
i
PL1
l=1 l (i, j, X)
aij
P
l:xl =k l (i, X)
bik

(276)
(277)
(278)

, , .

!!!!

HMM
, ( )
. ,
X = X [1,L]

HMM

P {xL+1 = k|X} =
=
=

n X
n
X
i=1 j=1
n X
n
X
i=1 j=1
n X
n
X

P {yL = i, yL+1 = j, xL+1 = k|X}


P {yL = i|X}P {yL+1 = j|yL = i}P {xL+1 = k|yL+1 = j}
L (i, X)aij bjk .

i=1 j=1

,
(271) (275)
. ,
, ,
. , ,
( , ),
(., , 2.3.1).
X = X [1,L]
Y = Y [1,L] , (.
. 181). X Y ;
, , HMM . (268)
(nL (268)), ,
[118]. , (271)
(269).
, (270), l (i, X)
l (i, X) =

max

Y [1,l] |yl =i

P {X [1,l] , Y [1,l] }

HMM

(279)

l (i, X) l i ,

(l, i) l > 1 yl1


=l (i, X)
Y , (279).
( ):
191

HMM

1 (i, X) ( =

l+1 (i, X)
=
=

P {x1 , y1 = i}

= i bix1

max P {X [1,l] , yl = j}P {xl+1 , yl+1 = i|X [1,l] , yl = j}

j|1jn

bixl max l (j, X)aji


j|1jn

l+1 (i, X) = arg max l (j, X)aji


j|1jn

1 l < L
1 l < L .

P (X) X = X [1,L] Y = Y [1,L] Y :


P (X) =

yL
(X) =

yl (X [1,L] ) =

max L (j, X)

(280)

j|1jn

arg max L (j, X)


j|1jn

l+1 (yl+1
(X [1,L] ), X [1,L] ) l = L 1, . . . , 1 .

P {X [1,L] }
(271), 3Ln0 .
, - ,
l l (i, X) , l (i, X) P (X) ln l (i, X)
ln P (X), :
ln 1 (i, X) = ln i + ln bix1
ln l+1 (i, X) = ln bixl + max (ln l (j, X) + ln aji )
j|1jn

ln P (X) =

1 l < L

max ln L (j, X) ,

j|1jn

.
HMM
W,

. ,
, HMM .

, .
(. 181).
X, ,
X = (X1 , . . . , XN ) .
56 , .. .

, .
56

, .

192

,
P {W|X}
P {X|W} max .

(281)

(. 181). X = (X1 , . . . , XN ) (, ) Y = (y1 , . . . , yN ) (, ), 1 q (,


). q
W1 , . . . , Wq , . , P 1 , . . . , P q ,
, . X q-

!
1
1
q
q
1

P
P
{X|W
}
P
P
{X|W
}
P (X, W), . . . , P q (X, W) = Pq
, . . . , Pq
j
j
j
j
j=1 P P {X|W }
j=1 P P {X|W }
(P {y = 1|X}, . . . , P {y = q|X}). , , (X, Y) - E (. 1.2.6, (46))
q
N X
X
P j (Xi , W)E(j, yi ) min ,
(282)
W

i=1 j=1

,
N
X

(1 P yi (Xi , W)) min ,


W

i=1


P {Y|X, W} =

N
Y

P yi (Xi , W) max ,

(283)

i=1

(. 1.2.8)
ln P {Y|X, W} =

N
X

ln P yi (Xi , W) min .
W

i=1

(284)

.
(. 181).
X = (X1 , . . . , XN ) (,
) Y = (Y1 , . . . , YN ) (, ). ,
, 57
P {X, Y|W} =

N
Y

P {Xi , Yi |W} max ,


W

i=1
57

(285)

, (
) . : .

193

P {Xi , Yi |W} ( (268)),


P {Xi |W} .
(. 181). ,
X = (X1 , . . . , XN ) (, ) Y = (Y1 , . . . , YN ) (, - ) q W1 , . . . , Wq ,
().
q ,

.
- , , , ,
, ,
( q ). ( q ) . q (j-)
Xj = {Xi |Yi = j} (..
j- ), P (280)
:
Y
P {Xj |Wj } =
P {Xi |Wj } max
.
(286)
j
W

i:Yi =j


(281)(286).
HMM (281)
, ( 3.2.4), (276)(278) . , ,
, , RBF- 3.3.1. , ,
1960- , , [7] [8], (. [120]). -,
RBF- ,
( A.3.2). , , ,
.
,
. X N
Xi = Xi[1,Li ] Li W(0)
(, ) P {X|W}
, , E{X|W} = ln P {X|W}. (269)
,
P {X|W}

N
Y
i=1

P {Xi |W} =

N X
Y
i=1 Y [1,Li ]

194

P {Xi[1,Li ] , Y [1,Li ] |W}

HMM

N
Y

y1

i=1 y1 ,...,yLi

LY
i 1

ayl yl+1

l=1

Li
Y

!
byl xil

l=1

PN
.. 2 i=1 Li n + n2 + mn (
- )
n
X

j = 1,

j=1

n
X

ajk = 1

k=1

m
X

(287)

bjk = 1.

k=1

, . RBF- (. 88)
.
N
X

P {Xi |W}
P {Xi |W(0) }
i=1

N
X
X P {Xi , Y [1,Li ] |W}

=
ln
P {Xi |W(0) }
i=1
Y [1,Li ]

N
X
X
P
{X
,
Y
|W}
i
[1,Li ]

=
ln
P {Y [1,Li ] |Xi , W(0) }
(0) }P {X |W(0) }
P
{Y
|X
,
W
i
i
[1,L
]
i
i=1
Y [1,Li ]

N
X
X
P
{X
,
Y
|W}
i
[1,Li ]

=
ln
P {Y [1,Li ] |Xi , W(0) }
P
{X
,
Y
|W(0) }
i
[1,L
i]
i=1
Y

E(X, W) E(X, W(0) ) =

ln

[1,Li ]

N
X
X

P {Y [1,Li ] |Xi , W

(0)

} ln

i=1 Y [1,Li ]

N
X
X

P {Xi , Y [1,Li ] |W}


P {Xi , Y [1,Li ] |W(0) }

P {Y [1,Li ] |Xi , W(0) } ln P {Xi , Y [1,Li ] |W}

i=1 Y [1,Li ]

N
X
X

P {Y [1,Li ] |Xi , W(0) } ln P {Xi , Y [1,Li ] |W(0) }

i=1 Y [1,Li ]


ln() , P {Y [1,Li ] |Xi , W(0) } , .. Y Li 1.

Q(X, W(0) , W) =

N
X
X

P {Y [1,Li ] |Xi , W(0) } ln P {Xi , Y [1,Li ] |W} .

i=1 Y [1,Li ]

,
E(X, W) Q(X, W(0) , W) Q(X, W(0) , W(0) ) + E(X, W(0) ) .
E(X, W) W(0) .
E(X, W) W,
195

Q(X, W(0) , W) < Q(X, W(0) , W(0) ). Q(X, W(0) , ) .


, RBF- (. 89) .
( )
, .

!!!!

38
Q(X, W(0) , ) W
N

ajk

bjk

1 X
1 (j, Xi )
N i=1
PN PLi 1
i=1
l=1 l (j, k, Xi )
PN PLi 1
i=1
l=1 l (j, Xi )
PN P
i=1
l|xil =k l (j, Xi )
.
PN PLi
i=1
l=1 l (j, Xi )

(288)
(289)
(290)

l (, ) l (, , ) (270), (272), (273) (274)


W(0) . - j (289) (290) 0,
0 Q(X, W(0) , (, A, B)) j- A (,
, B), .. W . (0)
(0)
ajk = ajk (, bjk = bjk )
k.
W 6= W(0) , Q(X, W(0) , W ) < Q(X, W(0) , W(0) ).
- (288)(290),
. .
. :
58 :
j
ajk
bjk

= j-
j- k-
=
j-
(j- , k- )
=
j-

W(0) X. 0, .. , ,

.
58 [120] ,
HMM - , , , , .

196


(287).
A B : -
, .
PN
C(n0 + mn) i=1 Li
, C , . . 189. ..

.
A
, . B
, ,
.
-
P {X|W},
. , :
.
HMM
, (, (282) (284)) P j (Xi , W) P {X|Wj }, 37,

j
ln P {X|Wk }
P j (Xi , W)
j ln P {X|W }
j
k
=
P
(X
,
W)

P
(X
,
W)
i
i
k
Wk
Wj
Wk

!!!!

HMM

(287) q .
- [46], , .
- (288)(290) (276)(278),
39 - 59
j
ajk
bjk

PN

P {Xi }
j
Pn
PN P {Xi }
j=1 j
i=1
j

P {Xi }
ajk
Pn
PN P {Xi }
k=1 ajk
i=1 ajk

P {Xi }
bjk
Pn
PN P {Xi }
k=1 bjk
i=1 bjk

i=1

ajk

bjk

PN

i=1

PN

i=1

59

, , [7]

197

-, [46] W R(X, W) X
HMM W, :

PN
i ,W)
j i=1 R(X
+C

j = P
PN R(Xi ,W)
n
+C
j=1 j
i=1
j

PN
i ,W)
ajk i=1 R(X
+
C
ajk

ajk = P
PN R(Xi ,W)
n
+
C
a
k=1 jk
i=1
ajk

PN R(Xi ,W)
bjk i=1
+
C
b
jk
,
bjk = P
PN R(Xi ,W)
n
b
+
C
jk
k=1
i=1
bjk
C > 0. [46] , W R ( , (283) (282) ) , R,
C (!) .
C, ,
, ,
, , , .
[95], ,

-,
.
HMM . , .

HMM

40 (285)

PN yi1
i=1 j
j =
N
PN PLi 1 yil yi(l+1)
i=1
l=1 j k

ajk
=
PN PLi 1 yil
i=1
l=1 j
PN P
yil xil
i=1
l|xil =k j k

bjk
=
.
PN PLi yil
i=1
l=1 j
. (285) ,
2n + 1 , . . 2.
HMM (286)
. -, , ,
X N
Y N , 40. . ,
198

HMM

( ) , , , .
(286) .
, -
, k-means (. 36)
(. 86), segmental k-means, .
[93] [62], k, - .
C.3.2

.
. .
.
- , ,
, -.
, , .

.
.
. , HMM
, . 182 .
,
. .
,
, HMM
-.
X x1 , . . . , xL ( ,
Rd )
p(xl |x1 , . . . , xl1 , xl+1 , . . . , sj1 , . . . , sjl . . .) = p(xl |sjl ) = bjl (xl ),
l, j1 , . . . , jl1 , jl+1 , . . .. , bj (x), - m- , .. bj bj1 , . . . , bjm . B = (bij ) , .
. bj (127)
bj (x) =

m
X
k=1

Pjk pjk (x) =

m
X

Pjk

k=1

d e

(2sjk ) 2

kxjk k2
2sjk


bixl bi (xl ) l (i, X), X =
199

{X1 , . . . , XN } EM (-) .
l (i, X) l (i, X) . 189 bixl bi (xl ), l (i, X), l (i, j, X) ..
.
A (288) (289), ,
bj (. (136)(138))

Pjk

PN PLi
i=1

i=1

jk

i=1

l=1

PN PLi
1
d

l (j, k, Xi )

l=1

PN PLi
i=1

sjk

l=1

PN PLi

l (j, k, Xi )xil

l=1

PN PLi
i=1

l (j, Xi )
l (j, k, Xi )

2
l=1 l (j, k, Xi )kxil jk k
,
PN PLi
i=1
l=1 l (j, k, Xi )

Pjk pjk (xl )


l (j, k, X) = l (j, X) Pm
.
k=1 Pjk pjk (xl )

, RBF-, (. 86) A.3.3 38


.
(139).
. HMM, . ( )

.
. (, ), ,
( ) . , (, )
.
,
. , i-
aii . k akii (1 aii ), .. k,
. , i-
- , , .
. ,
, (. 182) , , ,
. (.. ) .

() (). .
200

!!!!







 






 



  


 



. 14: :
xi , yi si .









 


  

 





. 15: Input Output HMM:


xj , yj
ij .

.
- .
. ,
, , , - . ,
HMM, , . 14. si , , i-
, , yi ,
si .
.
, ,
, , .
, ,
, , .
, . 15, 201

[9] IOHMM (Input Output HMM ).


:
(, ) , - -
.
. , , . P {X [1,L] }
(269) , L ,
X, . , (. 181),
.
HMM
. y Y
x X , y c 1 x y , x 0. HMM
x ,
x ; .
y 1, 1 . ,

X [1,L] P {(Xx )[1,L+1] }.
HMM , X Y
P,
A k ajk 1.
, , ,
y ,
(aj = 0 j), (bk = 0 k).
HMM ,
. 13.
. X [1,L] .
. :
60 . ,
(. 181) , ,
.
, . 10.

,
( ) HMM.
, P {X [1,L] |Y [1,M ] } M > L (266) ,
(269).
60

202

!!!!

, ,
B b0 , bj0 j- .
B0 = b0 B.
,
bj0 < 1, .. . J diag(b0 ).
( , ) (I J)1 B , JA . , j, l
k l ,
, k, ,
l
X Y

hljk = P {yl+1 = k, X = |y1 = j} =

l
byi 0 ayi yi+1 = (JA)

y2 ,...,yl i=1

jk

(l , ).
h0jk = kj . , j, - k
k
hjk =

hljk =

(JA)

l=0

l=0

jk

1
= (I JA)

jk

( I JA (!) A 1
bj0 < 1). H = (I JA) .

Y = (y1 , . . . , yL ) = (yi1 , . . . , yiL )


- Y = Y [1,M ] , X = X [1,L] (.. Y Y
), iL = M , ( ). P {X, Y , Y }

!
n
L1

X
Y
i
i
+1
l
0
y10 hyi101
ayil yl+1
hyl+1
byil+1 xl+1
.
0
yi b yi 1 x 1
yi
1

0 =1
y10 ,...,yL

P {X, Y } =

l+1

l=1

l+1

P {X, Y , Y } -

Y ,
, Y X,
!

L1
n

Y
X

0
0

y10 hy10 y1 by1 x1


ayl yl+1
hyl+1
byl+1
P {X, Y } =
xl+1
yl+1
0 =1
y10 ,...,yL

l=1

= (H)y1 by1 x1

L1
Y

(AH)yl yl+1
byl+1
xl+1

l=1

, X
!

L
L1
X
X
Y
Y

P {X} =
P {X, Y } =
(H)y1
(AH)yl yl+1
byl xl
Y

y1 ,...,yL

y1 ,...,yL

(H(I J))y1

l=1
L1
Y

(AH(I J))yl yl+1

l=1

l=1
L
Y
l=1

203

(I J)1 B y x
l

!
l

!!!!






























































































































. 16: .
, , .

(. (269)). , , l (i, X) ( (270)),


l (i, X) ( (272))
:

j = (H(I J))j = (I JA)1 (I J) j j ,

ajk = (AH(I J))jk = A(I JA)1 (I J) jk ajk

bjk = (I J)1 B jk bjk .


. - , H(I J) , (j ) , (ajk ),
(bjk ) , . , ,
A .
, , , .
HMM , ,
1,
. , , (
),
.

204

!!!!


HMM.
. , . 16 ( ).
- (
), (), () , ,
, (), .
HMM .
.

, ,
(, ).
, . [94] [10],
.

C.4

,
, , , , ( 2.3.1), (
2.3.2) . ,
, , , (, ,
, .. , ).
.
, ,
. ,
. ,
, , V I @ G R a ,
, , , A @, I 1 ..,
, ,
.
, .
C.4.1

X [1,L]
X L. ,
.
,
( .).
205

k , X k , X [1,L] L k L k + 1 X [l,l+k] X k ,
1 l L k + 1. f X k , , , .. L k + 1 .
, , .. X ,
, ,
AUG ..
Rn .
k , n ,
.
, . ,
.., .
, . , .
.
, ,
. , (.. ) W,
W Rn (, ) W .
,
. X UX Rn , Fisher score ( ):

ln P {X|W}
UX = UX (W , W) =
.

W
W=W
X 7 UX - , .. n , .
UX UX (W , W) ,
W
W .
,
W W, , P {X|W}
X.
[61] X 7 UX
. ,
HMM (. 193), HMM, yi .
UXi Xi , (UXi , yi )
Rn , , SVM.
[61] ,
, HMM. , , ,
,
.

206

C.4.2

Rn , , ( ) . , .
,
(edit distance).
.. [73] ,
.
:
- ;
- ;
- .
, , ,
. . ,
.
. ,
.

.
d(, ) g. :


!!!!

x x0 d(x, x0 ),
x g(x),
x g(x)
(!)
, .
D
D(, ) = 0
D(Xu, ) = D(, Xu) = D(X, ) + g(u)
D(Xu, Y v) = min(D(X, Y ) + d(u, v), D(Xu, Y ) + g(v), D(X, Y v) + g(u))

(291)
(292)
(293)
,

X Y , u v ,
, , Xu,
. D(X, Y ) .
. , D(X [1,m] , Y [1,n] )
mn m + n.
.
-
. ,
, , , , ,
207

!!!!

!!!!

. d0 > 0 (!) min(D(X, Y ), d0 ), !!!!


, , !!!!
d0 (m + n) (!).
C.4.3

Rn , ,
, ( , ) . , X 7 UX
61 K(X, X 0 ) = (UX , UX 0 ).
, k Rn K(X, X 0 ) = k(UX , UX 0 ), . (. [61]) :
,
.

( ),
.
41
p X
X X X p(x, x0 ) =
p(x)p(x0 ) .
p X Y
X X -
y Y X
p(x, x0 ) = Ey p(x, x0 |y) = Ey p(x|y)p(x0 |y)
.
. 13 1, ,
.
41 -
, HMM.
xi , yi si ,
. 14, si - , xi
yi , . xi yi
X ,
, .. p(xi , yi |si ) = pi (xi |si )pi (yi |si ). HMM ,
61 I 1 (U , U 0 ) ( ,
X
X
- UX 0 > I 1 UX ), I .
, , W W, . : I = EX (UX UX > ),
W.
W, .
,
, .

208

xi yi B.
. C.3.2 , s , c
1 .
s N
N
! N
!
N
Y
Y
Y
(bsl xl bsl yl ) =
bsl xl
bsl yl = P {X|s}P {Y |s} ,
P {X, Y |s} =
l=1

l=1

l=1

41 ,
,
X
Prepl {X [1,L] , Y [1,M ] } =
P {X [1,L] , Y [1,M ] |S [1,N ] }P {S [1,N ] }
(294)
S [1,N ]

s1 ,...,sL

s1

L1
Y

asl sl+1

l=1

L
Y

!
(bsl xl bsl yl ) asL s

L 6= M
L = M

l=1

(. (269)) .
( ) , ,
(294) 62 ( 4.1.4) :
, (
1), . ,
, (294), , 1000000 999999
, .. :
,
.
, HMM 1 (. 204), ,
X, Y . , P {}
HMM, Pins {, } HMM,
, Pins {X, Y } = P {X}P {Y }, .. X Y
41 Pins .
, HMM
,
, 1
m
,
1 ,
P 1 {X [1,L] } =

L
m

(1 )

62 (294) .

209

L+M

(1 )2 .
m
,
.
, HMM, (294), HMM (
, ).
, HMM , S2 ,
, , S1 . X Y
s = (s1 , . . . , sN ) S2 , - S1 . HMM :
P {X, Y |s} = P {X|s}P {Y |s} , 41 P {X, Y } .
, n (X) X
Pins {X [1,L] , Y [1,M ] } =

X [1,L] = x1 X [1 +1,2 1] . . . xn X [n +1,n+1 1]

(295)

n n , ,
( 1 = 1 n+1 = L + 1).
P {X, Y |s} =

N
Y

bsl xl bsl yl P {X [l +1,l+1 1] }P {Y [l +1,l+1 1] }

N (X) N (Y ) l=1

N
Y

bsl xl P {X [l +1,l+1 1] }

N (X) l=1

N
Y

bsl yl P {Y [l +1,l+1 1] }

N (Y ) l=1

= P {X|s}P {Y |s} .
, , [119].
, , , ,
, .

,
. ,
, ,
, . .
. [104] ,
, , . ,
, , .

210

X k g. X X

L 6= M

0
L
Y
Krepl (X [1,L] , Y [1,M ] ) =
(296)
k(xl , yl ) L = M

l=1


L
! M
!
Y
Y
1
Kins (X [1,L] , Y [1,M ] ) =
g(xl )
g(yl )
l=1

L+M
1
Kins
(X [1,L] , Y [1,M ] ) .
M

2
Kins
(X [1,L] , Y [1,M ] ) =

(297)

l=1

(298)

1
, 13 14 , Krepl Kins
2
. , Kins
,

m+n (m+n)!
n = m!n! , .

7 m+n
.
n
m,n=0

. :
. , (
q) 1. Aql , 1 < l q

m+(l1)
(l 1)- l-. m+l

=
l
l1
(m1)+l
q
,

l-

l
l
. Aql (Aql1 Aql ) . . . (Aq2 . . . Aql )
(,
)
m+n
n
1
1
1
1
..
.

1 1
3 4
6 10
10 20
..
..
.
.
m
n
1
1
1
1
..
.

1
2
3
4
..
.

0
1
2
3
..
.

0
0
1
3
..
.

0
0
0
1
..
.

..
.

..
.

, .
63 .
63 7 , , 
K(m, n) = m+n
+ .
n
xm x
2
: + L ( + ), m m (x) =
e 2 . m!
Z

xn ex dx = n!.

211

,
k g. ,
.

1, . K
X X , K(, ) = 1,

(X ) ( !) . X = (X1 , . . . , XL )
X = (X1 , . . . , XL , , , . . .)

Y
K (X, Y ) =
K(Xl , Yl ) .
(299)
l=1

14 K .
H X X [1,L] X , , , .. (X [1,L] , ), {1, . . . , L}.

X (X ) , ,
, . . . , , . 14
i
i
((X, ), (Y, )) = Krepl ((X, ), (Y, )) Kins
((X, ), (Y, )) (300)
Kmark
i = 1, 2
.
L 2L
{1, . . . , L}.
L .
X
.
14
X X
i
K i (X [1,L] , Y [1,M ] ) =
Kmark
((X, ), (Y, )) i = 1, 2
(301)
L M

. .
, X k g , . , k
, g , , , ,
.
K 1 K 2 :
K i (, ) = 1 i = 1, 2
K (Xu, ) = K i (, Xu) = K i (X, )g(u) i = 1, 2
i

(302)
(303)

K 1 (Xu, Y v) = K 1 (X, Y )(k(u, v) g(u)g(v))


+K 1 (Xu, Y )g(u) + K 1 (X, Y v)g(u)

(304)

K 2 (Xu, Y v) = K 2 (X, Y )k(u, v) + K 2 (Xu, Y )g(v) + K 2 (X, Y v)g(u)

(305)

212

(. (291)(293)). (296) (297) (298) (299), (300) (301).


(302)(305) K 1 (X, Y ) K 2 (X, Y ), , ,
.
( , ..) . [51],
- [105]. 19972007 , ,
. , , ,
.., ()
(, ) . ( 108 ),
.

;

: !
..


, . ,
. , , , , - ,
, .

, , C . ,
, , ,
, , .

, , , .

D.1

, C.
X , xj j- ( ,
), Y yj - , .
. R(Y ) Y , YA Y A R(Y )
( , Y{j} = yj ).
213

, ,
.
. X = (X1 , . . . , XN )
Y = (Y1 , . . . , YN ).
-, , , .
i R(Yi ) R(Xi ) , i
. X
Y .
.
X = (X1 , . . . , XN ) ( -, ),
. ,
- , .
X
R(X). Xi
- Yi R(Yi ) = R(Xi ), Y .
(image labeling). X = (X1 , . . . , XN ) ( -, ), . , - , , ,
, , , .. X. , X Y , R(Y ) = R(X),
, , .
, R(Y ) ( -
), () , .
, , Xi
.
,
, (. 1.1.6)
X Y , R(Y ),
() . p (X, Y ) ( ),
p (Y |X) ( , . 1.2.3), .
, , . , ,
,
. ,
: , - ,

214

, , .
,
, .

D.2
D.2.1


(MRF, Markov Random Fields)

G S
Y = (y1 , . . . , yS ), Y.
(. . 18), ,
(. . 1721). , . v (
) N 0 (v), v
N (v)= {v} N 0 (v). ( )
p: p (yv ) v, p (Y ) = p (y1 , . . . , yS )
, p (YA )
A.
, , (MRF , Markov random field ),
v

p yv |YV \{v} = p yv |YN 0 (v) ,


(306)

, ..

. (306)

p YN 0 (v) p (Y ) = p YN (v) p YV \{v} .


(307)
42
: v v 0

p yv , yv0 |YV \{v,v0 } = p yv |YV \{v,v0 } p yv0 |YV \{v,v0 } .


(308)
. (307) S y1 , . . . , yS . yv0 .
.
MRF.
MRF
, (.. ). MRF. ,
, , MRF . :

p yv |YV \{v} p (Y );

,
MRF;
215

!!!!









 
 
  
 






. 17: ,




    


  
  

  
 
 

 
 
 

  
 

 
 
 
 

  
 
 

. 18: ,

. 19: ,

216

 






  


  
  
 
 


  
  
  
  


    
    
    
 





  


  
  


 

 


  



  
 
 
 





  



. 20: ,

. 21: .

217

MRF .
(306).
:
y Y

p yv , YV \{v}
p yv |YV \{v}

=
,
p y , YV \{v}
p y |YV \{v}

p (Y ) = p (y1 , . . . , yS ) = p y , . . . , y
| {z }
S

S
Y
j=1

p yj |y1 , . . . , yj1 , y , . . . , y
| {z }

Sj

p y |y1 , . . . , yj1 , y , . . . , y
| {z }
Sj

p y , . . . , y | {z }
S

, , .

, , , ,
.
. G , ,
p (y1 |y2 ) = p (y2 |y1 ) = yy12 .
p (y1 , y2 )
y1 = y2 .
, ,
42: , ,
(308), .
, ,
, .
43 p (Y ) ,
MRF.
. , (308) , , (306).
p (Y ) (308) (!)

p yv |YV \{v} = p yv |YV \{v,v0 } ,


(309)
(306) (309) v 0 N 0 (v) (!)

p yv |YV \{v} = p yv |YN 0 (v)\{v0 } .


v v 0 ,
(v, v 0 ) MRF .

218

!!!!

!!!!

. MRF , 0 1
: p (0, 0, 0, 0) = p (1, 1, 1, 1) = 21 .
(308),
, ,
(306). , (306)
.
, .
MRF .
, , ..
, . , , ,
. C(G) G. Y G
Y ,

Y
p (Y ) =
(310)
C (YC )

CF

F C(G) G.
, C #C Y-
: C (YC ) = p (YC ), (310)
. p (Y ) , C
. .
44
.
.

(310)

p (Y )
p yv |YV \{v} = R
p (Y ) dyv

, yv ,
Q

CF,C3v C (YC )

p yv |YV \{v} = R Q
.
CF,C3v C (YC ) dyv
, yv ( Y , ),
R

p (Y ) dYV \N (v)
p yv |YN 0 (v) = R
,
p (Y ) dYV \N (v) dyv
:

C (YC )

.
CF,C3v C (YC ) dyv

p yv |YN 0 (v) = R Q

CF,C3v

(311)

, - 0, .
:

219

!!!!

13 (J.M.Hammersley, P.Clifford, 1971 [47]). p (Y )


, (310).
MRF , .
. (J.Moussouris, 1974 [87]) G 1, 2, 3 4
[1, 2], [2, 3], [3, 4] [4, 1],
0 1, 24 = 16
8:
p (0, 0, 0, 0) = p (0, 0, 0, 1) = p (0, 0, 1, 1) = p (0, 1, 1, 1)
= p (1, 1, 1, 1) = p (1, 1, 1, 0) = p (1, 1, 0, 0) = p (1, 0, 0, 0) =

1
.
8

(306) ; ,
, . , .. ,
(310) . .
- , -. R
, f Y R ,
(,
R), y Y .
A R
yj j A
R
A
, Y A = (y1A , . . . , ySA )
Y = (y1 , . . . , yS ) Y yj =
y j 6 A
( YA , A, Y A
A), f A (Y ) = f (Y A )
X
g A (Y ) =
(312)
(1)#(A\B) f B (Y ) .
BA

8 C R
X
f C (Y ) =
g A (Y ) ,

(313)

AC

,
f (Y ) = f R (Y ) =

g A (Y ) .

AR

.
X

g A (Y )

AC

X X
X
X

(1)#(A\B) f B (Y ) =
(1)#D f B (Y )
AC BA

BC

DC\B

X
#(C \ B)
(1)k f B (Y ) =
0#(C\B) f B (Y )
k

#(C\B)

BC

k=0

BC

= f (Y ) .

220

13. MRF p 8 f (Y ) = ln p (Y ), ,
X
ln p (Y ) =
g A (Y )
AR

g A (Y ) YA
Y A, , ,
, .
v, v 0 A . (308),
,
:

p yv , YV \{v,v0 } p yv0 , YV \{v,v0 }

p yv , yv0 , YV \{v,v0 } =
.
(314)
p YV \{v,v0 }
v v 0 g A (Y ) (312) ,
B A \ {v, v 0 }:
g A (Y ) =


(1)#B ln p Y B ln p Y B{v}

X
BA\{v,v 0 }

X
BA\{v,v 0 }

0
0
ln p Y B{v } + ln p Y B{v,v }


0
p Y B p Y B{v,v }
(1)#B ln B{v} B{v0 } .
p Y
p Y

, (314) -, (1)#B ln 1 = 0.
. 13
, 44
43.
, MRF

X
gC (Y )
p (Y ) = eCC(G(Y ))

(315)

, -
(310) (315), ,
:
X

EC (YC )
1
p (Y ) =
e CF
,
(316)
Z
F , EC () = ln C () (. (310))
C, Z = ln
.

221

D.2.2

(316) E(Y )
. , p(Y ) e kT
Y , G, E ,
, - , T
, (316).
, , 1920-
. . G .
vj sj , 1 ( ,
).
i- j- Jij si sj , Jij = J > 0 (.. ), -
. ,
, .

hj j- ,
hj sj (.. ).
:

X
X
1
J
si sj +
hj sj
kT
kT
1
j
i<j|{vi ,vj }F
p (Y |h) =
(317)
e
,
Z(T, h)
F G(Y ),

X
X
1
J

si sj +
hj sj
kT
kT j
X
i<j|{vi ,vj }F
Z(T, h) =
e
.

(318)

s1 ,...,sS =1

(317),
Z(T, h), , (318),
2S , .
: (
); 64 .
,
, - , : ,
, .
.
. . .
64

, , ;
, , .

222


 
 


   
 
  

  
 










   


  

  
 








. 22: .

G , , X = (x1 , . . . , xS ) Y = (y1 , . . . , yS ), X, Y {1, 1}S , . p (X|Y ): , xj (1 +1, ) j- yj , p (xj = yj ) > p (xj 6= yj ). ,


xj yj
p (xj |yj ) = ee +e , > 0. p (Y )

yi yj
1
i<j|{vi ,vj }F
p (Y ) =
(319)
e
,
Z()
. X Y MRF, , . 22
(. . 10 HMM),

X
X

yi yj +
xj yj
1
j
i<j|{vi ,vj }F
p (X, Y ) = p (Y ) p (X|Y ) =
e
, (320)
Z(, )
, > 0. x2j = yj2 = 1 (320) -

223


1
i<j|{vi ,vj }F
p (X, Y ) =
e
Z(, )

X
(yi yj )2 +
(xj yj )2
j

(321)

Z(, ). -,
(xj , yj R) (xj , yj R3 ) . , ,
, ,

X
X

(yi yj )2 +
|xj yj |
1
j
i<j|{vi ,vj }F
p (X, Y ) =
e
.
Z(, )
. 22 MRF (320) (321)
. (, ) MRF .
p (X, Y ) p (Y |X), ,
, X . , (321)

X
X

(yi yj )2 +
(xj yj )2
1
j
i<j|{vi ,vj }F
p (Y |X) =
e
, (322)
Z(, , X)
(320) ( )
(317).
, Y (320)(322)
Y X X 65 . ,
. , ,
,
.
, MRF ,
[44]. MRF . 23. X,
(. 22), xj (-
) ,
yj , yi,j {0, 1}
65 , Y (322)
1
0
1
0
X
X
X
@
2
2A
0=
(yi yj ) +
(xj yj )
(yi yj ) + (xj yj )A ,

= 2 @
yj
j
i<j|{vi ,vj }F

i|{vi ,vj }F

..
2 y y = x .

224


     


















 
 



 
 
 

 
 
   



 

 













. 23: MRF ( [44]). , , .

. , yi,j .
. 23
; . yi,j = 1 , vi vj
, / yi yj , ..
.
, ( 2 ) ,
, ( 1 , 3 4 ) : , ,
. Q
, n(Y, s) yi,j s 0 = 0. (322)

p (Y |X) =

1
eE(X,Y ) ,
Z(, , X)

(323)

E(X, Y ) =

(1 yi,j )(yi yj )2 +

i<j|{vi ,vj }F

D.2.3

(xj yj )2 +

n(Y,s) . (324)

sQ

(CRF, Conditional Random Fields)

, G, Y = (y1 , . . . , yS ) ,
X (
G,
), (CRF , conditional random
field ), v

p yv |YV \{v} , X = p yv |YN 0 (v) , X ,


(325)

225




















 


  

. 24:
.

, (306), N 0 (v) v G.
p (Y |X) (CRF) X Y , 13,
. CRF
(316), X
X

EC (X, YC )
1
p (Y |X) =
(326)
e CF
.
Z(X)
, (321)
p (X, Y )
(322) p (Y |X), .. MRF CRF. CRF . ,
.
, , l l, (, ), p (y|X)
.
MRF p (Y ) :
l l ,
, , (. 24),

, .. : ,
, , ,

226

CRF,

MRF

YC C t(C)
Et(C) (YC ), C ; q q 4 .
, CRF (!)

10
X
X

ln p (yv |Xv )
p (Y |X) e

Et (YC )

t=1 CC(G)|t(C)=t

vV

Xv X ,
v , f yv (Xv )
p (yv |Xv ), ,
,
Et (YC ) .
CRF CRF (326)

p (Y |T, X) =

1
e
Z(T, X)

X EC (X, YC )
T

CF

(327)


EC (X, YC ) C;
T ln Z(T, X) ;
T > 0 ;
Y = (y1 , . . . , yS ) Y,
, .
CRF , . C1 , . . . , CM
, Cm
(, -
). Cm .
m, , C Cm
Em (Xm,C , YC ), YC , , C, Xm,C ,
m C X,
. CRF
Y G(Y ), , ,

M
X
X

wm
Em (Xm,C , YC )
1
m=1
CC(G(Y ))Cm
e
,
p (Y |w, X) =
(328)
Z(w, X)
w = (w1 , . . . , wM ).
227

Em,X,Y =

Em (Xm,C , YC )

CC(G(Y ))Cm

m G(Y ), Y PM
X, EX,Y Em,X,Y . m=1 wm Em,X,Y
(w, EX,Y ), (328)
p (Y |w, X) =

1
e(w,EX,Y ) .
Z(w, X)

(329)

(327),
, M - (328).
CRF (328) (329) , CRF (326), (327), (328)
(329).

D.3

, Y -
CRF (329), :
1. X Y p (Y |X);
2. X Y (
), .. p (Y |X)
Y;
3. X p (yv |X) ( );
4. (X, Y) CRF
( CRF), , p (Y|X) .
1 , . 1

Z
Z(w, X) =
e(w,EX,Y ) dy1 , . . . dyS ,
y1 ,...,yS

, , Y
X
Z(w, X) =
e(w,EX,Y ) .
(330)
y1 ,...,yS Y

( ) .
. CRF ,
- HMM (. 188).
, , , , .
, 2 ( p (Y |X) Y ) Z(w, X).
228


24 1 ( p (Y |X)) Z(w, X) p (YR |X)
, , .
34.
D.3.1

(326) (327) X Y ,

X
E(X, Y ) =
EC (X, YC ) ,
(331)
CF

y1 , . . . , yS .
, , Y = R,
, , (
3.2.4). , Y , , , CRF (322),
. , , , . Y .
yv (331), , v G(Y ).
Y,
.

(. 78),
.
(Gibbs sampler)
Y q-.

(327) p yv |YV \{v} , X


v :
X

p yv |YV \{v} , X =

1
e
Zv (T, X, Y )

CF,C3v

EC (X, YC )
T

(332)

Z(T, X) ,
X
Zv (T, X, Y ) =

CF,C3v

V \{v}

EC (X, YC
T

)
(333)

y Y

, .
G(Y ) Y (0) Y (i)
:
229

1. v (1) , v (2) , . . . v1 , . . . , vS ,
- K S K
(,
);
(i)

(i1)

2. i- v, v (i) , yv = yv
,
(i)
v (i) yv(i)

(i1)
p yv(i) |YV \{v(i) } , X , .. (332)
Y (i1) .
Y (i) (Gibbs
sampler Gibbs sampling). ,
(Gibbs sample). . . .
Y (i)
Y S . 1
, ( - ),
2 (327) .
14 (S.Geman,
([44])) Y (0)
(i) (0) D.Geman

p Y |Y , X i
(327).
, ,
. -, ,
( , , ..),
. -,
, . -, ,
=

maxY E(X, Y ) minY E(X, Y ) T = minY,v p yv |Y V \{v} , X ,

0<

e T
1
1

T
q
q
1 + (q 1)e T

i- j-
, Y Y 0 p Y (j) = Y |Y (i1) = Y 0 , X (T )S . ,
, ,
:
9 Y 0 Y 00 Y

p Y (i) = Y |Y (0) = Y 0 , X p Y (i) = Y |Y (0) = Y 00 , X


b Ki c

S
S bKc
1 (qT )
1 e T
.

(334)


(327), : i-
(i)
yv(i) (332) i- (327).
,

p (Y |Ti , X) =

1
e
Z(Ti , X)
230

X EC (X, YC )
Ti

CF

(335)

Ti 0 i . 14 (.. Ti ),
(331) Y (0) .
, . 0 (X)
Y , (331) (, ). , 14.
15 (S.Geman, D.Geman ([44])) Ti 0, i0 , Ti lnSi
b K c

i > i0 , Y (0) p Y (i) |Y (0) , X


i 0 (X).
- .
, , (334)
. , [44],
.
15, , (331), .. X Y .
,
. ,
.
, ,
, . , ,

, . , ,
.
D.3.2

(. 214),
CRF,

p (yv |X).

p yv |YV \{v} , X (332), ,


66 p (yv |X) , Z(X) (330):
p (yv |X) =

X
1
Z(w, X)

e(w,EX,Y ) .

YV \{v}

66 ( yv ) marginal
distribution (
Y ). ,
( )
(margin!) . ;
,
.

231

14 ,


(i)
(0)
Y
p yv |X p (yv |X). p (yv |X) yv
, :
16 1 Y (i)
(i)

#{i n : yv = yv }
.
n
n
16 14, 9 ( > 0
> 0 n > N (, )
1

#{in:yv(i) =yv }

p (yv |X) < , N (, ) )


n
.

!!!!

- (MCMC, Markov Chain


Monte Carlo)
16 (MCMC , Markov chain Monte Carlo),
. Y 0 . f
y (1) , . . . , y (t) , . . .
, ..
Z
n
1X
E;y f (y) = f (y)d(y) = lim
(336)
f (y (i) ) .
n n
i=1

p (yv |X) = lim

. 16,
, . MCMC , 1416 , , CRF
Y. .
. MCMC p (YR |X) Y R V , , , , . r- q r
, ,
r q r ,
MCMC R
.
D.3.3

CRF

CRF (329), , w - p0 (w) , X = (X1 , . . . , XN )


Y = (Y1 , . . . , YN ), (, ) , ..
p(w|X, Y) p(w, Y|X) = p0 (w)p (Y|w, X) = p0 (w)

N
Y
i=1

232

p (Yi |w, Xi ) max .


w

, , p0 (w) = 1,

p (Y|w, X) max .
w

p (Y|w, X) =

N
Y

p (Yi |w, Xi ) -

i=1

(329) G(Y1 ) t . . . t
N
Y
G(YN ) Z(w, X) =
Z(w, Xi ).
i=1

1.2.6 ,
(w)= ln p0 (w) L(w|X, Y)= ln p (Y|w, X)
CRF , ,
, :
E(w, X, Y) = ln p(w, Y|X) = (w) L(w|X, Y)
= (w) + ln Z(w, X) + (w, EX,Y ) min

(337)

(. . 31).
(w) ( ,
p0 (w) , ),
(337) . (86), CRF
, , .. , ,
CRF , .
45 L(w|X, Y) = ln Z(w, X) + (w, EX,Y ) ( CRF (329)) w.
. ln Z(w, X),
L(w|X, Y) w . ln Z(w, X)
w() = w0 + w1 , ,
d2
d2 ln Z(w(), X) . . [Y ]
, G(Y ).
X
d
1
ln Z(w(), X) =
(w1 , EX,Y )e(w(),EX,Y )
d
Z(w(), X)
Y [Y]

2
X
1
d2

ln Z(w(), X) =
(w1 , EX,Y )e(w(),EX,Y )
2
d2
(Z(w(), X))
Y [Y]
X
1
+
(w1 , EX,Y )2 e(w(),EX,Y ) 0 .
Z(w(), X)
Y [Y]

233

, , , (. 3.2.4) .
(337) ( ,
(w) = ln p0 (w) )
E(w, X, Y) (w) ln Z(w, X)
=
+
+ EX,Y
w
w
w
X
1
(w)

=
EX,Y e(w,EX,Y ) + EX,Y
w
Z(w, X)
Y [Y]

X
(w)

EX,Y p (Y |w, X) + EX,Y


=
w
Y [Y]

N
X
(w) X
=
+
EXi ,Yi
EXi ,Y p (Y |w, Xi )
w
i=1
Y [Yi ]

N X
M
X

X
(w)
+
wm
(338)
w
i=1 m=1
CC(G(Yi ))Cm

X
Em ((Yi )C , (Xi )m,C )
Em (Y, (Xi )m,C )p (Y |w, Xi ) .
Y Y C

C (338) -
Em (YC , (Xi )m,C ) p (Y |w, Xi ). MCMC (336),

, - MCMC
CRF: - CRF
, .
, ,
. , ,
(. 83).
, MCMC, CRF Z(w, X) , , ln Z(w, X), p (YR |w, X)
, R .
D.3.4

CRF X, .
. 227 . S ,
- Y. Y = (y1 , . . . , yS ).
234

(.. ) Y E(Y ).
p(Y ).
U (p)
X
U (p) = Ep;Y E(Y ) =
p(Y )E(Y ) ;
(339)
Y

H(p) p
X
p(Y ) ln p(Y ) ;
H(p) = Ep;Y ( ln p(Y )) =

(340)

F (p)
X
p(Y )(E(Y ) + ln p(Y )) .
F (p) = U (p) H(p) =

(341)

:
46 p(Y ) = Z1 eE(Y )
F (p) , ln Z.
.

(!)

!
X
X
L(p, ) = F (p) +
p(Y ) 1 =
p(Y ) (E(Y ) + ln p(Y ) + ) ,
Y

.
CRF G, yi ,
(328) ln Z(w, X).
, ,
, , .

CRF.

,
Z(w, X) , , q S 1, .
, .

.
, ,
, .

235

!!!!

(mean field approximation) , , , , , .

QS , ..
p(Y ) = j=1 p(yj ), (q 1)S.
(331)

S
S
XY
X
X
F (p1 , . . . , pS ) =
pj (yj )
EC (YC ) +
ln pj (yj )
j=1

j=1

CF

EC (Y )

CF Y :CY

pj (yj ) +

j:vj C

S X
X

pj (y) ln pj (y) .

j=1 yY

, j-,
pj
X
X
Y
X
EC (Y )
pk (yk ) +
pj (yj ) ln pj (yj ) min ,
CF:C3vj Y :CY

yY

k:vk C

pj

(!):

Y
X
X
pk (yk )

EC (yj , Y )
1
CF:C3vj Y :C\{vj }Y
k:vk C\{vj }
pj (yj ) =
e
,
Zj
Zj , , q yj Y. j-
(mean field) j- . -, ,
pj , , , .
, ,
. , pj (yj ),
p(yj ) ,
p(Y ), ,
, ,
, (326), .
: . , p
(331)
X
X X
U (p) =
p(Y )E(Y ) =
EC (Y )p(Y )
(342)
Y

CF Y :CY

.
,
PF = {pC : C F} . , ,
236

!!!!
Mean
field
approximation

(328) M , K
- (, M = 3
, K = 2), S
(q K 1)M S, .
PF pC
- p
. :
C 0 , C 00 F C 0 C 00 C
( , , ,
F) C
:
X
X
pC (YC ) =
(343)
pC 0 (YC , YC 0 \C ) =
pC 00 (YC , YC 00 \C ) .
YC 0 \C

YC 00 \C

.
. ([81]) G 1, 2 3, , F , Y
, pi,j
{i, j} 2 2 . ( X) :

0.4 0.1
0.4 0.1
0.1 0.4
p1,2 =
p2,3 =
p1,3 =
0.1 0.4
0.1 0.4
0.4 0.1
p1 , p2 p3 . (343) . , pi,j p. (23 = 8)- Y {1,2,3} 4- : Y 16=2 = {Y : y1 6= y2 },
Y 26=3 = {Y : y2 6= y3 } Y 1=3 = {Y : y1 = y3 },
p(Y 16=2 ) + p(Y 26=3 ) + p(Y 1=3 ) = p1,2 (Y 16=2 ) + p2,3 (Y 26=3 ) + p1,3 (Y 1=3 )
= (0.1 + 0.1) + (0.1 + 0.1) + (0.1 + 0.1) = 0.6 < 1 .
.
, ,
, . [13]
, :
(343) (
C),
PF , , ,
p
.
, , .
PR R , (343),
- , (belief ).
237

.
belief, . 30.
, , , . , . -, ,
(338)
CRF ( 3, . 228).
, , .
, .
,
, . 239
.
: . [65] (Ryoichi
Kikuchi) , , . ,
, (343),
PR R,
, , , . R R (.. ) pR ,
R
X
X
U (pR ) =
pR (Y )
EC (YC ) ,
(344)
Y :RY


H(pR ) =

CR

pR (Y ) ln pR (Y ) ,

Y :RY

F (pR ) = U (pR ) H(pR ).


FR (PR ) PR
X
FR (PR ) =
cR F (pR ) ,
(345)
RR

cR (..
) R 1,
-,
(342) ( ) 1, ..
.
. G (. 18),
, ..
, , (322).
R ( ) (.. , ,
). PR ,
(343) , . (344)
, ,
, , . , cR (345), 1 1
, .
238

67 , R F,
(342), , .. R = F V .
(343) 68 .
{v} v, p{v} pv ,
Ev () = E{v} () ( , 0), F0 = F \ V (342). PF
PF0 nv = #{C F0 : C 3 v}. c{v} (345) 1 nv

!
X
X
X
U (pC ) +
U (pv ) H(pC ) +
(1 nv ) (U (pv ) H(pv ))
FF (PF ) =
CF0

vC

vV

pC (Y ) (EC (YC ) + ln pC (YC ))

CF0 YC :CY

XX

vV yY

CF

pv (y) (Ev (y) + (1 nv ) ln pv (y))


!

U (pC )

H(pC ) +

CF0

(346)
!

(1 nv )H(pv )

vV

= U (PF ) HF (PF ) ,

(347)

U (PF ) (339), HF (PF ) .



[125], , ,
, . , ,
.
.
47
Y
Y
Y
Y
p (Y )
1nv
Q C C
p(Y ) =
pv (yv ) =
pC (YC )
(pv (yv ))
.
vC pv (yv )
0
0
CF

vV

CF

(348)

vV

.
, G , .. .
G1 . . . G#F = G : G1 C (1) F0 , Gi1 C (i) , v (i) Gi1 (
G ). (348)
Y Gi nv , , :
,
p (i) (YC (i) )
p(YGi )
= p(YC (i) |YGi1 ) = p(YC (i) |yv(i) ) = C
.
p(YGi1 )
pv(i) (yv(i) )
67 ,

, :
.
68 , , , .. F
.

239

(.. )
.

!!!!

8 , (346) (341).
. , (340) (348) (. (347)).
. PF
(340). , (339) (340), (346)
(341).
, , , , ,
( 1 . 228). . . . .

!!!!

X X
FF (PF ) =
pC (Y ) (EC (Y ) + ln pC (Y ))
CF0 Y Y C

XX

pv (y) (Ev (y) + (1 nv ) ln pv (y))

vV yY

min

{pC (Y ),pv (y)}

(349)

pC (Y ) (q k k- C F0 ) pv (y)
(q v V G),

pC (Y ) 0
pv (y) 0

C F0 , Y Y C
v V, y Y

(350)
(351)

pC (Y ) = 1

C F0

(352)

vV

(353)

v C F0 , y Y ,

(354)

Y Y C

X
Y

Y C :y

pv (y) = 1

yY

pC (Y ) = pv (y)
v =y

.. .
(!) ,
, . , [126] ,
, . ,
, .
48 f = f + f , f + f , lx
69 f x (f + lx )(x0 ) < (f + lx )(x),
f (x0 ) < f (x).
69 .. ,
. - .

240

!!!!

. f (x0 ) (f + lx )(x0 ) < (f + lx )(x) = f (x).


,
f lx
lx (x0 ) = f (x) + (f (x), x0 x) ,
:
x(1)
x(i+1) = arg min f + (x) (f (x(i) ) + (f (x(i) ), x x(i) )) .
x

f , (!) f ,
, , . ,
f , x(i) .
(349), (!) :
X X
XX
F + (PF ) =
pC (Y ) (EC (Y ) + ln pC (Y )) +
pv (y) (Ev (y) + ln pv (y))
CF0 Y Y C

F (PF ) =

vV

nv

vV yY

pv (y) ln pv (y) .

yY

F + F , - (350) (351) F +
, F +
(350) (351) F + .
F +

(i+1)
(i)
(i)
(i+1)
(i)
F + (PF ) F (PF ) + F (PF ), PF ) PF
min
(i+1)

PF

, ,

X
X
(i)
F + (PF ) F (PF ), PF = F + (PF )
nv
pv (y) 1 + ln p(i)
(y)
min
v
vV

PF

yY

(355)
(352), (353) (354).
X X
L(PF , ) =
pC (Y ) (EC (Y ) + ln pC (Y ))
CF0 Y Y C

XX

vV yY

X
CF0

pv (y) Ev (y) + ln pv (y) nv 1 + ln p(i)


(y)
v

X
Y

Y C

X XX
CF0

vC yY

pC (Y ) 1 +

vV

C,v,y
Y

X
yY

pv (y) 1

pC (Y ) pv (y) ,

Y C :Yv =y

C , v C,v,y
(352), (353) (354), .
241

!!!!

!!!!

PF
X
L(PF , )
C,v,y
= EC (Y ) + ln pC (Y ) + 1 + C +
pC (Y )
vC

X
L(PF , )
0=
= Ev (y) + ln pv (y) + 1 nv 1 + ln p(i)
(y)
+ v
C,v,y ,
v
pv (y)
0=

C3v

pC (Y ) pv (y)

!
X
EC (Y ) + 1 + C +
C,v,y
vC
pC (Y ) = e

X
(i)
Ev (y) + 1 nv 1 + ln pv (y) + v
C,v,y
C3v
pv (y) = e

(356)

(357)

pC (Y ) pv (y) ,

X
C,v,y
EC (Y ) + 1 + C +
X
X

vC
C +

L () =
e

C
CF0
Y Y

X
X

v +
e

vV
yY

Ev (y) + 1 nv 1 +

ln p(i)
v (y)

+ v

X
C3v

!
C,v,y

, L () ,
. L () = 0 :

!
X
EC (Y ) + 1 + C +
C,v,y
X
L ()
vC
0=
= 1 +
e
C
Y Y C

X
(i)
Ev (y) + 1 nv 1 + ln pv (y) + v
C,v,y
X
L ()
C3v
0=
= 1 +
e
v
yY

!
X
EC (Y ) + 1 + C +
C,v0 ,y
X
L ()
0 C
v
0=
=
e
C,v,y
Y Y C :yv =y

X
(i)
Ev (y) + 1 nv 1 + ln pv (y) + v
C 0 ,v,y
0 3v
C
e
,
. :

!
X
EC (Y ) + 1 +
C,v,y
X
vC
C = ln
e
Y Y C

242


v = ln

Ev (y) + 1 nv 1 +

!
C,v,y

C3v

yY

1
C,v,y = ln
2

ln p(i)
v (y)

EC (Y

) + C +

C,v0 ,y

v 0 C,v 0 6=v

Y Y C :yv =y

Ev (y)

nv

1 + ln p(i)
(y)
+ v
v

C 0 ,v,y

C 0 3v,C 0 6=C

e
.. . ,
. (356) (357), (i+1)
, PF
(355)
.
,
, ,
,

, ., ,
[53].
: .
()
(338) CRF. (,
X ,
.) ,
MCMC, : ,
-
; , MCMC, ,
, . ,
( ) , MCMC.
D.3.5

CRF

, ,
CRF, ,
- . , MRF (323)(324)
() . .
, . ,
A (EM). CRF [107].
.
243

D.4

1960- ([34]),
, 1920-. - [47] 1971 , - . [87] 1974
, .
.
[44] . ,
. . ,
, [78], .
[71]
. , ,
- ( , [90]), -.
, , , , . , . [124]
, ([126], [53] .) (
[124][125] .).

[107], [44], ,
[70] [55], .

, , . ,
[15] , ,
([15, 8]),
([15, 11])
([15, 10]).

244


[1] .., .., ... . ,
25(6):917936, 1964.
[2] .., .., ...
. , , 1970.
[3] M.A. Aizerman, E.M. Braverman, and L.I. Rozonoer. Theoretical foundations
of the potential function method in pattern recognition learning. Automation
and Remote Control, 25:821837, 1964.
[4] N. Aronszajn. Theory of reproducing kernels
(http://www.recognition.mccme.ru/pub/papers/kernels/ReproducingKernels.pdf).
Trans. Am. Math. Soc., 68(3):337404, 1950.
[5] David Arthur and Sergei Vassilvitskii. How slow is the k-means method?
(http://www.recognition.mccme.ru/pub/papers/clustering/kMeans-socg.pdf)
In Symposium on Computational Geometry, pages 144153, 2006.
[6] N.E. Ayat, M. Cheriet, L. Remaki, and C.Y. Suen. KMOD - A New Support
Vector Machine Kernel with Moderate Decreasing for Pattern Recognition.
Application to Digit Image Recognition
(http://www.recognition.mccme.ru/pub/papers/SVM/ayat01kmod.pdf).
In Sixth International Conference on Document Analysis and Recognition,
pages 1215 1219, 2001.
[7] Leonard E. Baum and J. A. Eagon. An inequality with applications to
statistical estimation for probabilistic functions of Markov processes and to a
model for ecology
(http://www.recognition.mccme.ru/pub/papers/EM/baum67inequality.pdf).
Bull. Amer. Math. Soc., 73(3):360363, 1967.
[8] Leonard E. Baum, Ted Petrie, George Soules, and Norman Weiss. A
Maximization Technique Occurring in the Statistical Analysis of Probabilistic
Functions of Markov Chains.
The Annals of Mathematical Statistics,
41(1):164171, 1970.
[9] Yoshua Bengio and Paolo Frasconi. An Input Output HMM Architecture
(http://www.recognition.mccme.ru/pub/papers/HMM/bengio95input.ps.gz).
In Advances in Neural Information Processing Systems, pages 427434. MIT
Press, 1995.
[10] Yoshua Bengio. Markovian Models for Sequential Data
(http://www.recognition.mccme.ru/pub/papers/HMM/Bengio99Markovian.ps.gz).
Neural Computing Surveys, 2:129162, 1999.
[11] K.P. Bennett, A. Demiriz, and J. ShaweTaylor. A Column Generation
Algorithm for Boosting
(http://www.recognition.mccme.ru/pub/papers/boosting/bennett00column.pdf).
In Pat Langley, editor, Proceedings of Seventeenth International Conference
on Machine Learning, pages 6572. Morgan Kaufmann, 2000.
[12] C. Berg, J.P.R. Christensen, and P. Ressel. Harmonic Analysis on Semigroups.
Springer-Verlag, New York, 1984.

245

[13] H. A. Bethe. Statistical theory of superlattices. Proc. Roy. Soc. London ser.
A, 150(871):552575, 1935.
[14] Christopher M. Bishop. Neural Networks for Pattern Recognition. Oxford
University Press, 1995.
[15] Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer,
2006.
[16] C.M. Bishop and M.E. Tipping. Variational Relevance Vector Machine
(http://www.recognition.mccme.ru/pub/papers/Bayes/bishop00variational.ps.gz).
In Processings of the 16-th Conference on Uncertainty in Artificial
Intelligence, pages 4653. Morgan Kauffman, 2000.
[17] P.E. Bohmer. Theorie der unabhangigen Warscheinlichkeiten. In Rapports,
Memories et Proc`es-verbaux de Septi`eme Congr`es international dActuaries,
volume 2, pages 327343, Amsterdam, 1912.
[18] B.E. Boser, I.M. Guyon, and V.N.Vapnik. A training algorithm for optimal
margin classifiers
(http://www.recognition.mccme.ru/pub/papers/SVM/boser92training.ps.gz.ps).
In Proceedings of the 5th Annual ACM Workshop on Computational Learning
Theory, pages 144152. ACM Press, 1992.
[19] Leon Bottou. Stochastic Learning
(http://www.recognition.mccme.ru/pub/papers/stochastic/mlss-2003.pdf).
In Olivier Bousquet and Ulrike von Luxburg, editors, Advanced Lectures on
Machine Learning, Lecture Notes in Artificial Intelligence, LNAI 3176, pages
146168. Springer Verlag, Berlin, 2004.
[20] Leo Breiman, Jerome Friedman, Charles J. Stone, and R. A. Olshen.
Classification and Regression Trees. Chapman & Hall/CRC, January 1984.
[21] Leo Breiman. Random Forests
(http://www.recognition.mccme.ru/pub/papers/boosting/breiman01random.pdf).
Machine Learning, 45(1):532, 2001.
[22] Christopher J.C. Burges. A Tutorial on Support Vector Machines for Pattern
Recognition
(http://www.recognition.mccme.ru/pub/papers/SVM/burges98tutorial.pdf).
Data Mining and Knowledge Discovery, 2(2):121167, 1998.
[23] F.P. Cantelli. Sulla determinazione empirica della leggi di probabilita. Giorn.
Ist. Ital. Attuari, 4:421424, 1933.
[24] Michael Collins, Robert E. Schapire, and Yoram Singer. Logistic Regression,
AdaBoost and Bregman Distances
(http://www.recognition.mccme.ru/pub/papers/boosting/collins00logistic.pdf).
Machine Learning, 48(1-3):253285, 2002.
[25] Corinna Cortes, Patrick Haffner, and Mehryar Mohri. Positive Definite
Rational Kernels
(http://www.recognition.mccme.ru/pub/papers/kernels/colt.pdf).
In In Proceedings of The 16th Annual Conference on Computational Learning
Theory (COLT 2003), pages 4156. Springer, 2003.
[26] C. Cortes and V. Vapnik. Support Vector Networks
(http://www.recognition.mccme.ru/pub/papers/SVM/cortes95support.ps.gz.ps).
Machine Learning, 20(3):273297, 1995.
246

[27] T.M. Cover and P.M. Hart. Nearest Neighbor pattern classification
(http://www.recognition.mccme.ru/pub/papers/NN/cover67nearest.pdf).
IEEE Transactions on Information Theory, 13:2127, 1967.
[28] D.R. Cox. Some procedures associated with the logistic qualitative response
curve. In F.N. David, editor, Research Papers in Statistics: Festschrift for J.
Neyman, pages 5571, New York, 1966. Wiley.
[29] D. R. Cox and D. Oakes. Analysis of Survival Data. Chapman and Hall,
London, 1984.
[30] D. R. Cox. Regression Models and Life-Tables
(http://www.recognition.mccme.ru/pub/papers/survival/cox72regression.pdf).
Journal of the Royal Statistical Society. Series B (Methodological), 34(2):187
220, 1972.
[31] N. Cristianini and J. Shawe-Taylor.
An Introduction To Support
Vector
Machines
(and
other
kernel-based
learning
methods)
(http://www.support-vector.net/references.html).
Cambridge University
Press, 2000.
[32] Ayhan Demiriz, Kristin P. Bennett, and J. Shawe-Taylor. Linear programming
boosting via column generation
(http://www.recognition.mccme.ru/pub/papers/boosting/demiriz02linear.pdf).
Machine Learning, 46(1):225254, 2002.
[33] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from
incomplete data via the EM algorithm. Journal of the Royal Statistical
Society, series B, 39(1):138, 1977.
[34] ...
(http://www.recognition.mccme.ru/pub/papers/CRF/Dobrushin68description.pdf).
, 13(2):201229, 1968.
[35] B. Efron, I. Johnstone, T. Hastie, and R. Tibshirani. Least angle regression
(http://www.recognition.mccme.ru/pub/papers/L1/LeastAngle2002.pdf).
Annals of Statistics, 32(2):407451, 2004.
[36] R.A. Fisher. The use of multiple measurements in taxonomic problems
(http://www.recognition.mccme.ru/pub/papers/SLT/fisherLDA.pdf).
Eugen., 7:179188, 1936.
[37] Yoav Freund and Robert E. Schapire. Experiments with a New Boosting
Algorithm
(http://www.recognition.mccme.ru/pub/papers/boosting/freund96experiments.pdf).
In In Proceedings of the Thirteenth International Conference on Machine
Learning, pages 148156, 1996.
[38] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of
on-line learning and an application to boosting
(http://www.recognition.mccme.ru/pub/papers/boosting/adaboost.pdf).
J. Comput. Syst. Sci., 55(1):119139, 1997.
[39] Yoav Freund. Boosting a weak learning algorithm by majority
(http://www.recognition.mccme.ru/pub/papers/boosting/freund90boosting.pdf).
In COLT 90: Proceedings of the third annual workshop on Computational
learning theory, pages 202216, 1990.
247

[40] Jerome H. Friedman. Stochastic gradient boosting


(http://www.recognition.mccme.ru/pub/papers/boosting/stobst.ps.gz).
Computational Statistics and Data Analysis, 38:367378, 1999.
[41] Jerome H. Friedman. Greedy function approximation: A gradient boosting
machine
(http://www.recognition.mccme.ru/pub/papers/boosting/trebst.ps.gz).
Annals of Statistics, 29:11891232, 2001.
[42] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Additive logistic
regression: a statistical view of boosting
(http://www.recognition.mccme.ru/pub/papers/boosting/friedman98additive.pdf).
Annals of Statistics, 28:337407, 2000.
[43] G. M. Furnival and R. W. Wilson.
Technometrics, 16(4):499511, 1974.

Regression by leaps and bounds.

[44] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the
Bayesian restoration of images
(http://www.recognition.mccme.ru/pub/papers/CRF/GemanPAMI84.pdf).
IEEE Trans. Pattern Anal. Mach. Intell., 6:721741, 1984.
[45] V.I. Glivenko. Sulla determinazione empirica di probabilita. Giorn. Ist. Ital.
Attuari, 4:9299, 1933.
[46] P.S. Gopalakrishnan, D. Kanevsky, D. Nahamoo, and A. Nadas. An inequality
for rational functions with applications to some statistical estimation
problems. IEEE Trans. Information Theory, 37(1):107113, 1991.
[47] J. M. Hammersley and P. Clifford. Markov fields on finite graphs and lattices
(http://www.recognition.mccme.ru/pub/papers/CRF/hammersley71markov.pdf),
1971.
[48] Trevor Hastie, Saharon Rosset, Robert Tibshirani, and Ji Zhu. The Entire
Regularization Path for the Support Vector Machine
(http://www.recognition.mccme.ru/pub/papers/SVM/hastie04entire.pdf).
Journal of Machine Learning Research, 5:13911415, 2004.
[49] Trevor Hastie and Patrice Y. Simard. Metrics and Models for Handwritten
Character Recognition
(http://www.recognition.mccme.ru/pub/papers/tangent/hastie97metrics.pdf).
Statistical Science, 13:5465, 1998.
[50] T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning
(http://www-stat-class.stanford.edu/~tibs/ElemStatLearn/). Springer, 2001.
[51] D. Haussler. Convolution Kernels on Discrete Structures
(http://www.recognition.mccme.ru/pub/papers/kernels/haussler99convolution.ps.gz).
Technical Report UCSC-CRL-99-10, UC Santa Cruz, 1999.
[52] R. Herbrich.
Learning Kernel Classifiers: Theory and Algorithms
(http://www.learning-kernel-classifiers.org/). The MIT Press, 2002.
[53] Tom Heskes. Stable Fixed Points of Loopy Belief Propagation Are Local
Minima of the Bethe Free Energy
(http://www.recognition.mccme.ru/pub/papers/CRF/heskes02stable.pdf).
In NIPS 15, pages 343350, 2002.

248

[54] Magnus R. Hestenes and Eduard Stiefel. Methods of Conjugate Gradients for
Solving Linear Systems
(http://www.recognition.mccme.ru/pub/papers/gradient/hestenes-stiefel-52.pdf).
J. Res. Natl. Bur. Stand., 49:409436, 1952.
Carreira-Perpi
[55] Xuming He, Richard S. Zemel, and Miguel A.
nan. Multiscale
Conditional Random Fields for Image Labeling
(http://www.recognition.mccme.ru/pub/papers/CRF/he04multiscale.pdf).
In In CVPR, pages 695702, 2004.
[56] A.E. Hoerl and R.W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12:5567, 1970.
[57] John H. Holland. Adaptation in natural and artificial systems. Univ. of
Michigan Press (Reprinted by MIT Press, 1992), 1975.
[58] R. M. Hristev. The ANN Book
(http://www.recognition.mccme.ru/pub/papers/ANN/HristevTheANNbook.djvu).
GNU Public Licence, 1998.
[59] C.-W. Hsu and C.-J. Lin. A comparison of methods for multi-class support
vector machines
(http://www.recognition.mccme.ru/pub/papers/SVM/hsu01comparison.ps.gz).
IEEE Transactions on Neural Networks, 13(2):415425, March 2002.
[60] P. Huber. Robust estimation of location parameter. Annals of Mathematical
Statistics, 53:73101, l964.
[61] Tommi Jaakkola and David Haussler. Exploiting Generative Models in
Discriminative Classifiers
(http://www.recognition.mccme.ru/pub/papers/kernels/jaakkola98exploiting.pdf).
In In Advances in Neural Information Processing Systems 11, pages 487493.
MIT Press, 1998.
[62] B. H. Juang and L. R. Rabiner. The Segmental K-Means Algorithm for
Estimating Parameters of Hidden Markov Models
(http://www.recognition.mccme.ru/pub/papers/HMM/juang90segmental.pdf).
IEEE Trans. on Acoustics, Speech, and Signal Processing, 38(9):16391641,
1990.
[63] E.L. Kaplan and P. Meier. Nonparametric estimation from incomplete
observations
(http://www.recognition.mccme.ru/pub/papers/survival/kaplan58nonparametric.pdf).
Journal of the American Statistical Association, 53:457481, 1958.
[64] S. S. Keerthi and E. G. Gilbert. Convergence of a Generalized SMO Algorithm
for SVM Classifier Design
(http://www.recognition.mccme.ru/pub/papers/SVM/keerthi00convergence.pdf).
Machine Learning, 46(1-3):351360, 2002.
[65] R. Kikuchi. A theory of cooperative phenomena. Phys. Rev., 81(6):9881003,
1951.
[66] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated
annealing. Science, 220:671680, 1983.
[67] J.P. Klein and M.L. Moeschberger. Survival Analysis. Techniques for Censored
and Truncated Data. Springer, 1997.
249

[68] Teuvo Kohonen. Self-Organizing Maps. Springer, Berlin, Heidelberg, 1995.


[69] ... . , 114(5):953956, 1957.
[70] S. Kumar and M. Hebert. Discriminative Fields for Modeling Spatial
Dependencies in Natural Images
(http://www.recognition.mccme.ru/pub/papers/CRF/kumar03discriminative.pdf).
In NIPS 16, pages 15311538, 2004.
[71] J. Lafferty, A. Mccallum, and F. Pereira. Conditional Random Fields:
Probabilistic Models for Segmenting and Labeling Data
(http://www.recognition.mccme.ru/pub/papers/CRF/lafferty01CRF.pdf).
In International Confernence on Machine Learning, 2001.
[72] L. LeCam. On some asymptotic properties of maximum likelihood estimates
and related Bayes estimate. Univ. Calif. Public. Stat, 11, 1953.
[73] ... ,
. , 163(4):845848, 1965.
[74] Yann Le Cun, Leon Bottou, Genevieve B. Orr, and Klaus-Robert M
uller.
Efficient Backprop
(http://www.recognition.mccme.ru/pub/papers/ANN/lecun-98b.pdf).
In Neural Networks, Tricks of the Trade, Lecture Notes in Computer Science
LNCS 1524. Springer Verlag, 1998.
[75] Yury Lifshits. The Homepage of Nearest Neighbors and Similarity Search
(http://simsearch.yury.name/references.html), 2004-2007.
[76] Chih-Jen Lin, Ruby C. Weng, and S. Sathiya Keerthi. Trust region Newton
methods for large-scale logistic regression
(http://www.recognition.mccme.ru/pub/papers/logistic/lin07trust.pdf).
In ICML 07: Proceedings of the 24th international conference on Machine
learning, pages 561568, New York, NY, USA, 2007. ACM.
[77] R. J. A. Little and D. B. Rubin. Statistical Analysis with Missing Data. Wiley
Series in Probability and Statistics. Wiley, New York, 1st edition, 1987. 2-nd
edition (2002).
[78] S. Z. Li.
Markov Random Field Modeling in Computer Vision
(http://www.cbsr.ia.ac.cn/users/szli/mrf_book/book.html).
Springer-Verlag,
London, UK, 1995.
[79] S. Lloyd. Least squares quantization in PCM. Technical report, Bell
Laboratories (1957); IEEE Trans. Inf. Theory, 28:128137, 1982.
[80] David J.C. MacKay. Bayesian Interpolation
(http://www.recognition.mccme.ru/pub/papers/Bayes/mackay91bayesian.pdf).
Neural Computation, 4:415447, 1991.
[81] D. J. C. Mackay, J. S. Yedidia, W. T. Freeman, and Y. Weiss. A conversation
about the Bethe free energy and sum-product
(http://www.recognition.mccme.ru/pub/papers/CRF/mackay01conversation.pdf).
Technical Report TR2001-018, MERL, 2001.
[82] Warren S. McCulloch and Walter Pitts. A logical calculus of the ideas
immanent in nervous activity. Mathematical Biophysics, 5:115133, 1943.
250

[83] J. Mercer. Functions of positive and negative type and their connection with
the theory of integral equations. Philos. Trans. Roy. Soc. London, A, 209:415
446, 1909.
[84] ... : . , , 2011.
[85] ... : . , , 2014.
[86] Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth,
Augusta H. Teller, and Edward Teller. Equation of State Calculations by
Fast Computing Machines. The Journal of Chemical Physics, 21(6):1087
1092, 1953.
[87] J. Moussouris. Gibbs and Markov systems with constraints. Journal of
statistical physics, 10:1133, 1974.
[88] Radford Neal and Geoffrey E. Hinton. A view of the EM algorithm that
justifies incremental, sparse, and other variants
(http://www.recognition.mccme.ru/pub/papers/EM/neal98view.pdf).
In Learning in Graphical Models, pages 355368. Kluwer Academic Publishers,
1998.
[89] A. B. J. Novikoff. On convergence proofs on perceptrons. Proceedings of the
Symposium on the Mathematical Theory of Automata, XII:615622, 1962.
[90] Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of
Plausible Inference. Morgan Kaufmann, San Francisco, CA, USA, 1988.
[91] J. Platt. Fast Training of Support Vector Machines using Sequential Minimal
Optimization
(http://www.recognition.mccme.ru/pub/papers/SVM/smoTR.pdf).
In B. Schlkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods
- Support Vector Learning. MIT Press, 1998.
[92] J. Ross Quinlan. C4.5: programs for machine learning. Morgan Kaufmann
Publishers Inc., San Francisco, CA, USA, 1993.
[93] L. R. Rabiner, J. G. Wilpon, and B. H. Juang. A Segmental K-Means Training
Procedure for Connected Word Recognition. AT&T Tech. J., 65(3):2131,
1986.
[94] L. R. Rabiner. A tutorial on hidden Markov models and selected applications
in speech recognition
(http://www.recognition.mccme.ru/pub/papers/HMM/rabiner.pdf).
Proceedings of the IEEE, 77(2):257286, 1989.
[95] W. Reichl and G. Ruske. Discriminative Training for Continuous Speech
Recognition
(http://www.recognition.mccme.ru/pub/papers/HMM/reichl96discriminative.ps.gz).
In Proc. 1995 Europ. Conf. on Speech Communication and Technology, pages
537540, 1995.
[96] H. Robbins and S. Monro. A Stochastic Approximation Method
(http://www.recognition.mccme.ru/pub/papers/stochastic/robbinsMonro.pdf).
Annals of Mathematical Statistics, 22:400407, 1951.

251

[97] F. Rosenblatt. The perceptron: a probabilistic model for information storage


and organization of the brain. Psychological Review, 65:386408, 1958.
[98] Sam Roweis.
Machine Learning ( CSC2515)
(http://www.cs.toronto.edu/~roweis/csc2515-2006/lectures.html).
Toronto
University, 2006. 2003
(http://www.recognition.mccme.ru/pub/papers/general/CSC2515/).
[99] D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations
by error propagation in Parallel Distributed Processing. In D.E. Rumelhart, ,
and J.L. McClelland, editors, Explorations in the Microstructure of Cognition,
pages 318362. The MIT Press, Cambridge, MA., 1986.
[100] Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting
the Margin: A New Explanation for the Effectiveness of Voting Methods
(http://www.recognition.mccme.ru/pub/papers/boosting/schapire98boosting.pdf).
The Annals of Statistics, 26(5):16511686, 1998.
[101] Robert E. Schapire. The Strength of Weak Learnability
(http://www.recognition.mccme.ru/pub/papers/boosting/schapire90strength.pdf).
Machine Learning, 5(2):197227, 1990.
[102] Bernhard Scholkopf, John C. Platt, John C. Shawe-Taylor, Alex J. Smola,
and Robert C. Williamson. Estimating the Support of a High-Dimensional
Distribution
(http://www.recognition.mccme.ru/pub/papers/SVM/sch99estimating.pdf).
Neural Comput., 13(7):14431471, 2001.
[103] Bernhard Sch
olkopf. The kernel trick for distances
(http://www.recognition.mccme.ru/pub/papers/kernels/tr-2000-51.pdf).
In Advances in Neural Information Processing Systems, volume 13, pages 301
307. MIT Press, 2001.
[104] Arseni Seregin. , 2004.

(http://www.recognition.mccme.ru/pub/papers/kernels/Seregin04StringKernel.pdf).
[105] Martin Sewell. Kernel Methods (http://www.svms.org/kernels/kernel-methods.pdf).
http://www.svms.org, 2007-2009.
[106] Alex J. Smola and Bernhard Scholkopf. A Tutorial on Support Vector
Regression
(http://www.recognition.mccme.ru/pub/papers/SVM/smola98tutorial.ps.gz).
Technical Report NC2-TR-1998-030, NeuroCOLT2, 1998.
[107] Charles Sutton and Andrew McCallum. An Introduction to Conditional
Random Fields
(http://www.recognition.mccme.ru/pub/papers/CRF/sutton10introduction.pdf).
arXiv:1011.4088v1, 2010.
[108] .., ... . , , 1974.
[109] Robert Tibshirani. The lasso method for variable selection in the Cox model
(http://www.recognition.mccme.ru/pub/papers/survival/tibshirani97lasso.pdf).
In Statistics in Medicine, pages 385395, 1997.
252

[110] R. Tibshirani. Regression shrinkage and selection via the lasso


(http://www.recognition.mccme.ru/pub/papers/L1/lasso.pdf).
J. Royal. Statist. Soc B, 58(1):267288, 1996.
[111] M.E. Tipping and A.C. Faul. Fast marginal likelihood maximization for sparce
Bayesian models
(http://www.recognition.mccme.ru/pub/papers/Bayes/tipping03fast.ps.gz).
In Proceedings of the Ninth International Workshop on Artificial Intelligence
and Statistics, 2003.
[112] M.E. Tipping. The Relevance Vector Machine
(http://www.recognition.mccme.ru/pub/papers/Bayes/tipping00relevance.ps.gz).
Advances in Neural Information Processing Systems, 12:652658, 2000.
[113] Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer,
1995.
[114] Vladimir N. Vapnik. Statistical Learning Theory. Wiley, 1998.
[115] Vladimir N. Vapnik. An Overview of Statistical Learning Theory
(http://www.recognition.mccme.ru/pub/papers/SLT/Vapnik99overview.pdf).
IEEE Transactions on Neural Networks, 10(5):988999, September 1999.
[116] .., ... . , 16(2):264280, 1971.
[117] .., ... . ,
, 1974.
[118] A. J. Viterbi. Error bounds for convolutional codes and an asymptotically
optimum decoding algorithm. IEEE Transactions on Information Theory,
13(2):260-269, 1967.
[119] Chris Watkins. Dynamic Alignment Kernels
(http://www.recognition.mccme.ru/pub/papers/kernels/dynk.ps.gz).
In Advances in Large Margin Classifiers, pages 3950. MIT Press, 1999.
[120] L.R. Welch. Hidden Markov Models and the Baum-Welch Algorithm
(http://www.recognition.mccme.ru/pub/papers/EM/Baum-Welch.pdf).
IEEE Information Theory Society Newsletter, 53(4), 2003.
[121] J. Weston and C. Watkins. Multi-class Support Vector Machines
(http://www.recognition.mccme.ru/pub/papers/SVM/multi-class-support-vector.ps.gz).
Technical Report CSD-TR-98-04, Royal Holloway Univ. of London, 1998.
[122] C. F. Jeff Wu. On the Convergence Properties of the EM Algorithm
(http://www.recognition.mccme.ru/pub/papers/EM/wu83convergence.pdf).
The Annals of Statistics, 11(1):95103, 1983.
[123] Larry S. Yaeger, Brandyn J. Webb, and Richard F. Lyon.
Combining
Neural Networks and Context-Driven Search for Online, Printed Handwriting
Recognition in the NEWTON
(http://www.recognition.mccme.ru/pub/papers/ANN-HMM/Yaegeretal.AIMag.pdf).
AI Magazine, 19(1):7390, 1998.
[124] Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. Generalized Belief
Propagation
(http://www.recognition.mccme.ru/pub/papers/CRF/yedidia00generalized.pdf).
In NIPS 13, pages 689695. MIT Press, 2000.
253

[125] Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. Constructing Free
Energy Approximations and Generalized Belief Propagation Algorithms
(http://www.recognition.mccme.ru/pub/papers/CRF/yedidia05constructing.pdf).
IEEE Transactions on Information Theory, 51(7):22822312, 2005.
[126] A. L. Yuille. CCCP algorithms to minimize the Bethe and Kikuchi free
energies: Convergent alternatives to belief propagation
(http://www.recognition.mccme.ru/pub/papers/CRF/yuille02CCCP.pdf).
Neural Computation, 14:2002, 2002.
[127] Hui Zou and Trevor Hastie. Regularization and variable selection via the
Elastic Net
(http://www.recognition.mccme.ru/pub/papers/L1/elasticnet.pdf).
Journal of the Royal Statistical Society B, 67:301320, 2005.

254


L2 , 93
L , 155
N 0 (v), 215
N (v), 215
d
, 153
N(,)
d
, 153
N(,A)
N(,A) , 153
Nj , 176
P, 10
PY , 30
R(Y ), 213
Rj , 176
S(t), 168
S > (t), 169
ST , 137
T , 9, 11
T 0 , 12
UX , 206
Vin , 66
Vout , 66
W, 15, 19
X, 30, 38, 40
X0 , 38
X , 6, 10
X 0 , 39
X , 211

k k, 42
0-1-, 11
01 , 42
, 207
l0 (i, X), 190
i , 125
l (i, X), 188
l0 (i, X), 190
l (i, X), 189
l (i, X), 189
ab , 17, 26
l (i, X), 191
l (X), 190
l (i, X), 190
, 39
i , 170
-SVC, 109
-SVR, 116
j , 172, 173
l (i, j, X), 189
j
mk
, 24
x0 , 102
0 , 102
k , 24
j , 172, 173
, 52
l (i, X), 191
S , 14
A, 186
B, 186
C, 37
C(G), 219

Xx , 149
+

Xx , 149
X [m,n] , 180
Y, 30
Y, 7, 10
Y A , 220
YA , 213
Y [m,n] , 180
[Y ], 233
, 186
(w), 233
h(t), 168
k-NN (k Nearest Neighbors), 9
k-means, 36
lp , 42
nv , 239
t(j) , 172, 173
u, 66
u, 152
v, 152
w,
40
x
, 40
x[], 160

Dx , 149
+

Dx , 149
E, 11
E, 25
E;u , 153
F , 15, 19
F (t), 168
F, 10
F[], 98
H(), 155
In , 42
In0 , 42
K, 102
K+ , 102
K0 , 102
K= , 102
L(w|X, Y), 233

255

xij , 180
y, 40
F, 219
F0 , 239
W, 186

z, 152
+
z, 152
PF , 236
, 66

-, 196
, 198
, 79
, 53
, 191
, 87, 155
, 165
-, 188
AdaBoost, 125
EM, 87, 155
GEM, 165
LogitBoost, 130
LPBoost, 140
, 21
, 22
, 20
, 18
, 18
, 20
, 60
, 60
, 64, 124
, 170
, 170
, 183

, 146
, 146
, 146

, 12
, 27, 46, 48
, 47
, 49
, 49
, 30, 237
, 72

, 235
, 227, 235
, 227
, 155, 235
, 239

, 33
, 79

, 39
, 66
(, , ), 11
, 66
, 168
, 168
, 87

, 83
, 55, 83
, 42
, 78
,
208
, 8
, 39

, 18
, 58
, 6, 35
, 6, 34
, 35
, 219
-, 16
-, 16
, 49
, 43
, 49
, 51
, 215

, 22
, 22
, 183
, 215

, 186
, 186

k , 9
k , 36

, 25, 29
, 22
-
, 232
-, 51
, 9
, 42
256

, 78
, 51
, 22
, 23
, 32
, 72
, 58, 104
, 146
, 17
, 77
, 146

, 20
, 19
, 19
, 20

, 10, 11

, 14
, 11

, 20
, 19
, 19
, 20

, 175
, 183
, 183
, 24
, 175
, 185
-, 187
, 185
LR, 187

, 9, 11
, 16
, 12, 38
, 16
, 66
, 68
, 66, 67

, 72
, 66
, 9
, 5
, 20
, 34
, 193
257

, 34
, 206
, 57
, 11
0-1-, 11
-, 45
, 45
, 134
, 11, 42
, 45

, 11
, 15
, 45

, 12

, 91
, 13
, 53
, 68
, 14
, 227

, 53
, 38
, 215
, 66
, 11
, 23
, 6
, 87
, 7
, 7
, 10

, 37
, 7, 10
, 6, 10
, 10
, 10
, 39, 91, 206
, 39
, 5
, 5
, 5
, 27
, 38

, 219
, 169

, 207
, 207


-, 14
, 43
, 66
, 8
, 175
, 18
, 42
, 39
, 51
, 51
, 46
, 15
, 11, 168
, 11
, 11
, 38
, 17, 26
, 66

, 67

, 215
, 225

, 185
, 185
, 55
,
55
, 14
, 15
, 227
, 239
, 11
0-1-, 11
, 11

, 11

, 12
, 227

-, 172
-, 220
, 70
, 93
, 54
-, 83
, 170
(), 64
, 225

, 107

, 37
, 125
, 230
, 34, 35
, 62, 91
, 208
, 62, 93
, 62
, 97
- , 100
CPD, 100
RBF, 97
AdaBoost, 124, 125
ANN (Artificial Neural Networks), 66
back-propagation, 71
belief, 30, 237
boosting, 64
censored data, 170
cluster, 6
cluster analysis, 6
conditional random field, 225
conditionally positive definite kernel,
100
confidence, 8
CPD (Conditionally Positive Definite),
100
CPD-kernel, 100
CRF (Conditional Random Field), 225
cross entropy, 33
curse of dimensionality, 10
derived feature, 7
dimension reduction, 38
discriminative methods, 20
edit distance, 207
elastic net, 44
emission matrix, 186
error function, 11
evidence, 145
expectation maximization, 86, 155
generalized, 165
feature, 6
Fisher score, 206
fitting, 5
forward-backward algorithm, 188
generative methods, 20
Gibbs sample, 230
Gibbs sampler, 230
258

Gibbs sampling, 230

missing at random, 146


missing completely at random, 146
missing data, 146, 170
ML (Maximum Likelihood), 23
MLP (Multi-Layer Perceptron), 68
MRF (Markov Random Field), 215

hazard, 168
hidden Markov model, 185
HMM (Hidden Markov Model), 185
imputation, 148
impute, 147
Input Output HMM, 202
IOHMM (Input Output Hidden Markov
Model), 202

nearest neighbor, 9
NN (Nearest Neighbor), 9
non-informative prior, 22

outlier, 34
kernel, 62
overfitting, 14
kernel trick, 62
KMOD (Kernel with MOderate Decreasing), partial likelihood, 176
99
pattern recognition, 5
positive definite kernel, 62
LARS (Least Angle Regression), 43
product limit, 172
LASSO (Least Absolute Shrinkage and
proportional hazard model, 175
Selection Operator), 43
lasso regression, 43
radial basis functions, 60
latent variables, 152
random forests, 124
learner, 5
RBF (Radial Basis Functions), 60
learning, 5
recognition, 5
learning vector quantization, 38
recognition tree, 12
Leave-One-Out, 17
recognizer, 5
left-right model, 187
relevance vector machine, 146
likelihood, 23
response, 7
linear programming boosting, 138
ridge regression, 42
log-odds, 52, 129
RKHS (Reproducing Kernel Hilbert
logistic function, 52
Space), 94
logistic regression, 50, 51
RVM (Relevance Vector Machine), 146
logit, 52
segmental k-means, 199
LogitBoost, 130
self-organizing maps, 38
LOO (Leave One Out), 17
sigmoid function, 52
loss function, 11
simulated annealing, 78
LPBoost, 138
SMO (Sequential Minimal Optimization),
LVQ (Learning Vector Quantization),
108
38
soft margin classifier, 58
machine learning, 5
SOM (Self-Organizing Maps), 38
MAP (Maximum of Aposteriori Probability), statistical learning, 5
22
structural risk minimization, 14
MAR (Missing At Random), 146
stump, 13
margin classifiers, 56
subset selection, 45
marginal distribution, 231
supervised learning, 34
Markov chain Monte Carlo, 232
support vector machine, 58
Markov random field, 215
support vector regression, 46
maximum likelihood, 23
survival function, 168
maximum of aposteriori probability,
SVC (Support Vector Classification),
22
104
MCAR (Missing Completely At Random),
SVM (Support Vector Machine), 58,
146
104
MCMC (Markov Chain Monte Carlo),
SVR (Support Vector Regression), 46,
232
104
259

test error, 12
training, 5
training error, 11
transition matrix, 186
truncated data, 170
unsupervised learning, 34
validation set, 16
VC-dimension (Vapnik-Chervonenkis
dimension), 14
vector quantization, 37

260