Академический Документы
Профессиональный Документы
Культура Документы
: 31 2014
, , . , ,
, , .
, ,
, , , , , ,
, , , ,
. ,
,
.
:
, , ,
.
, , ,
.
, . .
[84]. ,
[84], [85].
(2008) (20022005) . , .
!
1
1.1 . . .
1.1.1 . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.2 . . . . . . . . . . . . . . . . . .
1.1.3 : . . .
1.1.4 :
. . . . . . . . . . . . . .
1.1.5 : . . . . . . . . . . . . . . . .
1.1.6
1.1.7 . . . . . . . . . . . . . .
1.2 . . . . . . . .
1.2.1 . . .
1.2.2 : .
1.2.3 . . . . . . . .
1.2.4 . .
1.2.5 : . . . . . . . . . . . . .
1.2.6 () . . .
1.2.7 : . . . .
1.2.8 : . . . . . . . . . . . . . . . . . . .
1.3 . . . . . . . . . . . . . .
1.3.1 . . . . . . . . . . . . . . . . . . .
1.3.2 . . . . . . . . .
1.3.3 . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.4 . . . . .
4
6
6
8
9
10
12
14
16
17
17
18
19
20
24
29
31
32
33
34
34
34
37
2 :
2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 . . . . . . . . . . . . .
2.1.2 . .
2.1.3 . . . . . . . . . . . . . . . . .
2.2 . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 ( ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 (logistic regression) . . . . . . . .
2.2.3 . . . . . . . . . . . . . . . . . . . .
2.2.4 ( , margin
classifiers) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 . . . . .
2.3.1 . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 . . . . . . . . . . . . . . . . . . . . .
38
40
40
42
45
46
3
3.1 . . . . . . . . . . .
3.2 (MLP) . . . . . . . . . . . . . . . . . .
3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 () . . . . . . . . . .
3.2.3 : (error back-propagation) . . . . . . . .
65
65
68
47
50
52
56
60
60
61
64
69
71
71
3.2.4
3.3
: . . . . . . . . . . . . . . . . . . 75
RBF- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.3.1 RBF-: (expectation
maximization) . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4
4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 . . . . . .
4.1.2 . . . . . . . . . . . . . . . . . . .
4.1.3 . . . . . . . . . . . . . . . .
4.1.4 , . . . . . . . . . . . . . . . .
4.1.5 . . . . . . . . . . . . . . . . . . .
4.1.6 - . . . . . . . .
4.2 (SVM, SVC, SVR) . . . . . . . . . . . . .
4.2.1 . . . . . . . . . . . . . . . .
4.2.2 . . . . . . . . . . . . . . . . . .
4.2.3 SVC . . . . . . . . . . . . . . . . . . . .
4.2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.5 . . . . . . . . . . . . . . . . . .
4.2.6 . . . . . . . . . . . . . . .
4.2.7 . . . . . . . . . . . . . . . . . . . . . .
91
91
93
94
96
97
97
100
104
104
107
108
112
116
118
5
5.1 : . . . . . . . . . . .
5.1.1 . . .
5.1.2 . . . . . . . . . . . . . . .
5.1.3 . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 . . . . . . . .
5.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2 . . . . . . . . . . . . . . . . . . . .
5.2.3 ; AdaBoost . . . . . . . . . . . . . . . .
5.2.4 . . . . . . . . . . . . . . .
5.3 . . . . . . . . . . . . .
5.4 . . . . . . . . . . . . . . . . . .
122
122
123
124
126
126
126
128
118
132
136
138
142
6
142
6.1 . . 142
6.2 . . . . . 144
A
A.1 . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2 . . . . . . .
A.2.1 :
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2.2 . . . . . . .
A.3 . . . . . . . . . . . . . . . . . . . .
A.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.3.2 EM . . . .
A.3.3 EM . . . . . . .
A.3.4 EM . . . . . . . . . . . . . . . . . .
A.3.5 EM . . . . . . . . . . . . . . . . . . .
146
146
147
149
151
151
151
154
159
164
165
B
B.1 . . . . . .
B.2 . . . . . . . . . . . . . . . .
B.3 . . . . . . . .
B.4 - . . . . . . . . . . . . . . . . .
B.5 . . . . . . . . . . . . . . . . . . . . . . .
B.5.1 . . . . . . . . . . . . .
B.5.2 . . . . . . . . . . . .
B.5.3 . . . . . .
B.5.4 . . . . . . .
B.5.5 . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
C ;
C.1 . . . . . . .
C.2 . . . . . . . . . . . .
C.3 (HMM, Hidden Markov Model) . . .
C.3.1 , . . . . . . . . . . . . . . . . . . . . . . .
C.3.2
. .
C.4 . . . . . . . . . . . . . . . . .
C.4.1 . . . . . . . . .
C.4.2 . . . . . . . . . . . . . . .
C.4.3 . . . . . . . . . . . . . . . . .
167
167
170
170
172
174
175
176
177
178
179
179
180
182
185
186
199
205
205
207
208
D ;
213
D.1 . . . . 213
D.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
D.2.1 (MRF, Markov Random Fields) 215
D.2.2 . . . . . . . . . . . . . . 222
D.2.3 (CRF, Conditional Random Fields)225
D.3 . . . . . 228
D.3.1 . . . . . . . . . . 229
D.3.2 . . . . . . . . . . . . . 231
D.3.3 CRF . . . . . . . . . . . . . . . . . . . . . . . . . . 232
D.3.4 . . . . . . . . . . . . . . . . . . . 234
D.3.5 CRF . . . . . . . 243
D.4 - . . . . . . . . . . . . . . . . . . . . 244
245
255
.
..
- , , . ,
( ) . ,
4
- . , ,
.
(statistical learning, machine
learning, pattern recognition, ), .
, -
- , . , ,
, .
, .
, (-, , , . . . ), , . (recognition), (recognizer ,
learner ), (learning, training, fitting)
.
.
:
. ,
( , , , ) , , , () . ,
( ..) .
. ,
, , . , .
. . , .
. , (utterance) (,
) , .
, .
. , - , . , .
. .
,
, / . , , .
. (, ) .
, , , , .
. . ,
.
, , (: , cluster analysis). - . , ,
(cluster ), , , ,
, ,
(.. )
, , . , .
(. . . ) , ,
. , , , - , [14, 15].
, , , [114].
[117]. ,
pattern recognition, machine learning (, [15]) statistical learning, (, [50] [113])
, ,
,
-.
1.1
1.1.1
() (feature),
, . X . , d , X d- Rd . , , .
-, ( ) , , ()
, ,
. -,
- (
()
- ) ( )
. -, - ,
, : -, , , ... -,
d .
.
(,
, , -
, wavelet ..)
.
:
(, ) : , 0 1, .
, k > 2 ( , , . . . ) k
,
(derived features): j- xj 1, j- , 0 P
.
k
j
j=1 x = 1. ,
, - .
, ,
( , ,
(, , ) , )
( ), , ,
.
(, 56- ),
(, 0 10), .
d , ..
Rd , , - ,
, .
(response) , (
) , , .. () Y, , q-
Rq . ,
f : Rd Rq , . , , ,
. , , ,
.
7
1.1.2
( )
.
, - .
() , q , , j- - j-
. , , ,
, q- (confidence) .
, 1 .
, ,
( 0 1,
1).
, , ( ),
.
- . , .
, , 0
, , 31
.
- ,
, , ,
.
( )
.
: , .
, ,
: . ( )
.
( ),
: , .
,
, , . :
,
, ,
, .
, , 1 ,
. 2.2.2.
. , , .
, ., , 4.2.7.
, ( ) ( ): ,
(, ) - . , , - ,
,
- .
. , ,
- , , , ()
.
1.1.3
, - .
X , Y ,
N , .. T = ((x1 , y1 ), . . . , (xN , yN )) (X Y)N .
x yi X xi T .
: ,
.
- . :
;
Y , , R ( ), ;
Y ,
, {0, 1} ( ), .
, ,
, (NN , nearest neighbor ) k (k-NN ) k > 1.
k = 1 T ,
. k > 1 , k, , . ,
. N
, .
,
. k
. , ,
9
. (, ), .
[75], .
, (, [49]).
1.2.2 ,
, . -
. , , .
, X (, 40)
- (, 20-) ,
.
(curse of dimensionality).
, , , k :
k. - , -
. , ?
1.1.4
-.
(., ,
[115]). 1.2.4 1.2.6
: 2 , .
:
() X ,
, , d- Rd ;
Y, , , q- Rq ;
F f : X Y, , X Y, , ,
, ..;
P ( ) X Y, , X Y,
, /
, .., - ;
2 , , .., .
,
.
10
p (x, y)
p (x, y)
=R
.
p (x)
p (x, y 0 )dy 0
y 0 Y
(1)
X , Y, F, P, E T f F, ( , ,
,. . . )4
Z
E (f ) =
E(f (x), y, x)d(x, y) min,
(2)
f F
(x,y)X Y
- , ,
, .. , > 0
({(x, y) X Y|E(f (x), y, x) > }) < .
(3)
, , , , T . ,
T , - (2)
Z
E(f (x), y, x)d(x, y) E(f, T ) =
(x,y)X Y
N
1 X
E(f (xi ), yi , xi ) ,
N i=1
(4)
(training
error ), ... E (f )
N
1 X
E(f, T ) =
(5)
E(f (xi ), yi , xi ) min
f F
N i=1
3
, , .
4 E (,
E rror) ,
.
11
E
F.
, , (5) T , -
(4) f , T . (2) ,
0
T 0 = ((x01 , y10 ), . . . , (x0N 0 , yN
0 )), ,
( ,
test error ) E(f, T 0 ). , E(f, T 0 ) E (f )
. : E(f, T 0 ) > E(f, T ),
. ?. . .
, , , , .
1.1.5
...
...
return ( r[l] )
x , t , r , ?? . , ,
, .
f : X Y ,
, - , , ,
.
, ,
. , , XVIIIXIX ,
, .. ,
.
1960- 1980- (CART [20], C4.5
[92]). ,
, (..
) .
, . 5.
12
()
,
, (,
) (5). ,
( , ), <.
T , .. f1 (x) = r,
E(f1 , T ) =
n
X
E(f (xi ), yi ) .
(6)
i=1
. ,
() r
r=
n
1 X
yi ,
N i=1
0-1- () .
X
xj < xji xj xji , j = 1, . . . , d, i = 1, . . . , N ,
,
(
, stump), .. - r< r . (6),
, .
d(N 1) , .
, . , , , .
:
,
,
;
;
.
, .
, 1.1.6. , .
- k > 2
k .
13
1.1.6
, F,
.
(5) . , , N
X , , f (xi ) = yi
(xi , yi ). , , , . ,
: f , f (xi ) = yi f (x) x .
,
, ( ) .
(overfitting).
, ,
. ,
. , : ,
.
, Y F ( , ), , dim(F) < N dim(Y),
N dim(Y).
, .
, 5 (VC-dimension), .
(structural risk minimization) ( , ).
[114, 113].
5 - ( ) dim
VC (F) F X Y. Y. Y = {0, 1},
, f F .
. - F X n ( , ),
n x1 , . . . , xn X , F 2n .
- . ,
- .
.
F X =
ft (x) = {(x,t)|xt0} (x, t) t
dimVC (F) = 1 (; S
S).
14
,
, , k
. F, W, .. f (x)=F (w, x)
F : W X Y w W,
,
(5)
N
X
(7)
kwkC
i=1
C > 0. (
(5)) ,
N1 .
, ,
. , W
( , Rn ,
) , (
, -
{w|(w) C}), (4)
N
X
i=1
(8)
(8) w
F (w, xi ) = yi (. [108]). (w) = kwk
(w) = kwk2 , k k
.
(7)
, ,
kwk2 +
N
X
i=1
(9)
Rd
. F X = ft (x) = (x + t) t [0, 1]
: {0, 1}. ,
[n, n + 1] n-
;
j
k
( ) = 2b c ( b c) mod 2
dimVC (F) = . , ft 1, 2, . . .
t , ,
.
- .
- , ,
. , - , .
15
. . .
. . .
kwk +
N
X
i=1
(10)
C > 0 0, ,
.
1
(9) (10) 0 (7)
C 0.
E(F (w, x), y) w x y w
, (7) C > 0
(9) (10) 0.
(7) (
).
.
. w ,
(7) , (9)
.
1.1.7
!!!!
!!!!
, ,
C, , (8) , , (7) (2),
. ( , ..) , .
T ,
(validation
set, ) T 00 , .
, , T ,
T 00 :
.
. , k
k , 1
:
, kwk
k,
.
, -. k - ,
k , k , k-
,
k- , k .
(-),
16
,
.
N - - -
N (LOO, Leave-OneOut).
, .. ,
, .
, .
1 N
(2) ,
, N 1 .
( )
(
). .
1.2
1.1.4
, ( X Y), ,
( P
). , - ( P , , -), ( ),
(). ,
, ,
,
.
1.2.1
, .
, (2)
, , F .
:
x X , y Y
.
P {y|x} .
() 0-1-:
E(r, y, x) = 1 yr ,
(11)
ab , .. 0 1
.
X Y p (y|x), f E (f ). ,
x , .. y, p (y|x),
17
!!!!
, , y
x:
Z
f (x) = arg min
E(r, y, x)dp (y|x)
(12)
rY
yY
rY
(x,y)X Y
=
xX
yY
.
, , ,
p (y) p (x|y), p (y|x)
. , ( ), ,
, ,
.
, , .
x p (y|x)
Z
f (x) = arg min
E(r, y, x)p (y|x)dy
(14)
rY
yY
, (2).
E(r, y, x) = (r y)2
:
Z
Z
Z
d
0=
(ry)2 p (y|x)dy = 2
(ry)p (y|x)dy = 2 r
yp (y|x)dy ,
dr yY
yY
yY
Z
f (x) =
yp (y|x)dy,
(15)
yY
.. ( ) . , ,
. ,
,
1.2.6.
. - E(r, y) = |r y|.
1.2.2
EN
T = ((x1 , y1 ), . . . , (xN , yN )) = ((x1 , . . . , xN ), (y1 , . . . , yN )) = (X, Y)
18
!!!!
(16)
,
EB (x) EN (x) .
(17)
(16) (17) x
X, N ( . [27]) f (t) = 2(1 t)t,
EB lim EN 2(1 EB )EB 2EB .
N
k
, , . [27]. ,
.
1.2.3
.
, .. f F, , F.
, F = {F (w, )|w W} (, )
W F : W X Y, ,
F , T . ,
P (,
Rd , ,
),
.. d + d(d+1)
2
.
19
.
, (X = Rd , Y = R,
Pd
W = R Rd , F (w, x) = w0 + j=1 wj xj ) .
2.
(. 1.1.3). F,
, , .
: ( 1.1.5)
? ?
,
E(f, T ), E(f, T 0 )
. ( discriminative methods,
, , ,
f ).
, , - .
(generative methods, , ,
).
F
E.
: (x, y), (y|x) (
(1)). , , y x
(y|x), . ,
( 2.2.1).
1.2.4
:
. , P ( ), .. .
P ( ) T
p{T |} =
N
Y
p (xi , yi ) .
(18)
i=1
P
R
p{T |}d()
0
() = P {|T } = R
p{T |}d()
P
20
(19)
!!!!
0 P. p ,
p0 () = p{|T } = R
p{T |}p ()
.
p{T |}p ()d
P
(20)
. -
, ()!
: 0 () p (x, y)
Z
P,T (Z) =
p (Z)d0 () , Z X Y
(21)
P
Z
p,T (x, y) =
(22)
p0 , x
P,T (Y ) =
P,T ({x} Y )
, Y Y
P,T ({x} Y)
(23)
, ,
p,T (x, y)
,
p (x, y)dy
yY ,T
p,T (y|x) = R
(24)
( (1)) .
, ,
(), y p,T (y|x),
- E(r, y, x):
Z
(25)
E(r, y, x)p,T (y|x)dy min .
rY
yY
()
x y 0 y 00
p,T (y 0 |x)
.
p,T (y 00 |x)
.
,
R
QN
p (x, y 0 ) i=1 p (xi , yi )d()
p,T (y 0 |x)
P
=
.
R
QN
p,T (y 00 |x)
p (x, y 00 ) i=1 p (xi , yi )d()
(26)
. (26) ,
, ,
.
, :
. , . , , , :
21
!!!!
, P , X
P P .
, , 6 ,
- .
, . . . , , . . . : ,
, , ,
,
. , .
, P (21)
(22)
0 p (y|x).
. , P , , , p (), P
p (x, y). , ,
,
P () . (22) p (x, y)
() ,
,
.
(MAP , maximum of aposteriori probability)
N
Y
p(T |; ) = p ()
(27)
p (xi , yi ) max .
i=1
X Y
x (y|x) /
, p (y|x), ,
.
. , . ,
? , , ,
, ( noninformative prior ). P , ,
, .
T
6
, , ,
.
22
p(T |) =
N
Y
p (xi , yi ) max ,
(28)
i=1
(x,y)X Y
23
!!!!
.
.
,
(ML, MAP ) . 1.2.6.
.
1.2.5
q- d- X = {(x1 , . . . , xd )}, j-
Qd
M i . D = j=1 M j .
lk l- k- .
P
(Dq 1)- lk 0, l,k lk = 1
Dq- . MAP ML
1.2.4 , , ,
.
, D
N . N Dq 1 , - ,
.
D . ,
d = 60 , D = 2d = 260 1018
(q = 2).
, P. , , , .. y
p((x1 , . . . , xd )|y)
p((x1 , . . . , xd )|y) =
d
Y
p(xj |y) .
(30)
j=1
.
q Pd
k = p(y = k) qD0 , D0 = j=1 M j , j
mk
= p(xj = m|y = k). ((q 1) + q(D0 d))-
(q 1)-
(
)
q
X
q
q
s = (1 , . . . , q ) R+ |
k = 1
k=1
qd
sM
jk
Mj
X
j
j
j
j
M
= (1k
, . . . , M
|
mk
=1 .
j k ) R+
m=1
24
60
121.
1.2.4, (19) (23) .
.
.
S d (r) sd (r) d- (d 1)-
d- ,
, :
X
S d (r) =
x Rd+ |
xj r
j=1
X
xj = r .
sd (r) =
x Rd+ |
j=1
ES (f )
f S Rd dim(S) (
f ).
1 xn1 1 xnd d sd (1),
Qd
(d 1)! j=1 nj !
nd
n1
.
Esd (1) (x1 xd ) = P
d
n
+
(d
1)
!
j
j=1
. : :
1.
Z
d
v(S (r)) =
dx =
xS d (r)
rd
;
d!
2.
Esd (r) f (x1 , . . . , xd1 )g(xd )
=
xj )
j=1
R
=
d1
X
xS d1 (r)
Pd1
j=1
xj )dx
v(S d1 (r))
3. :
Z
xS 1 (r)
xn1 1 dx =
rn1 +1 n1 !
rn1 +1
=
;
n1 + 1
(n1 + 1)!
4. :
Z
rn1 +n2 +1 n1 !n2 !
xn1 1 (r x1 )n2 dx =
;
(n1 + n2 + 1)!
xS 1 (r)
25
5. :
Pd
Qd
Z
d1
d1
Y n
X
r j=1 nj +(d1) j=1 nj !
j
nd
.
xj (r
xj ) dx = P
d
xS d1 (r) j=1
n
+
(d
1)
!
j=1
j
j=1
1, 2 5 sd (1), .
.
T = ((x1 , y1 ), . . . , (xN , yN )).
Nk =#{i|yi = k}, k = 1, . . . , q
j
Nmk
=#{i|xji = m, yi = k}, j = 1, . . . , d, m = 1, . . . , M j , k = 1, . . . , q .
(31)
(32)
p(T |) =
N
Y
i=1
N
Y
p((xi , yi )|) =
N
Y
i=1
p(yi |)
d
Y
p(xji |yi , ) =
j=1
i=1
q
Y
(k )Nk
d Y
M
Y
j
(mk
)
j
Nmk
(33)
.
j=1 m=1
k=1
j
1 sq sM
jk , ,
q
X
Nk = N
k=1
j
M
X
j
Nmk
= Nk , , (20)
m=1
p(|T )
(k )Nk
Q
(q1)! qk=1 Nk ! Qq
Qd
j
j
Nmk
m=1 (mk )
Qd
j
(M j 1)! M
m=1 Nmk !
j=1
(Nk +M j 1)!
k=1
(N +q1)!
QM j
j=1
k=1
(34)
,
.
x y
Z
PT (x, y)=
P (x, y)p(|T )d
P
Qq
Qd QM j
j
j
Nmk
Nk
Z
d
(
)
(
)
Y
k
j=1
m=1 mk
k=1
=
xj j y d
QM j
Qq
y
j
j 1)!
Q
Q
N
!
(M
N
!
(q1)!
d
q
k
m=1
mk
k=1
P
R
=
Qq
P
(N +q1)!
(k )(Nk +y )
Q
(q1)! qk=1 Nk ! Qq
k=1
(N +q1)!
k=1
d
Y
j=1
Qd
QM j
j=1
j=1
j
m
j
(Nmk
+yk x
j)
m=1 (mk )
Qd
k=1
(Nk +M j 1)!
j=1
QM j
j
m=1 Nmk !
+M j 1)!
(M j 1)!
(Nk
Nxjj y
+1
Ny + 1
,
N + q j=1 Ny + M j
(35)
ab ;
1 .
:
PT (y|x) =
Qd N jj y +1
(Ny + 1) j=1 Nyx+M
j
PT (x, y)
Pq
=
.
j
P
Q
N
+1
P
(x,
k)
q
d
xj k
k=1 T
(N
+
1)
j
k
k=1
j=1 Nk +M
26
(36)
!!!!
PT (x, y), .
. (35) , y 0 y 00
T (y 0 , y 00 |x)= ln
PT (x, y 0 )
PT (x, y 00 )
= ln(Ny0 + 1) ln(Ny00 + 1) +
d
X
ln
j=1
= ln(Ny0 + 1)
d
X
Nxjj y0 + 1
Ny0 + M j
ln
ln(Ny0 + M j ) ln(Ny00 + 1)
j=1
d
X
Nxjj y00 + 1
Ny00 + M j
d
X
ln(Ny00 + M j )
j=1
(37)
j=1
(37) y 0 y 00 x. , x
y 0 , y 00 , - .
T (y 0 , y 00 |x)
y 00 y 0 (36)
(!), , , : y 0 , T (y 0 , y 00 |x)
( !).
. (37) , x, d
, . .
O
,
, .
2 p(x) = xn1 1 xnd d sd (1),
!
n1
nd
x = Pd
, . . . , Pd
.
j=1 nj
j=1 nj
.
, ,
nj > 0, xj > 0. , nj = 0, xj = 0 (
x
j
1
j
j
), 1x
x 1x e , e j-
j
j
, , x . , ,
, , nj > 0
.
Pd
L(x, ) = p(x) + 1 j=1 xj
, x
(
n p(x)
L
0 = x
= jxj
j = 1, . . . , d
j
Pd
L
0 = = 1 j=1 xj
27
!!!!
!!!!
j- d xj , j ,
d
X
= p(x)
nj .
j=1
d , .
,
(27) ,
p(T |). 2 (33),
j
k = NNk , mk
=
PT (x, y)
j
Nmk
Nk .
, (x, y)
Qd
j
j
d
Ny Y Nxj y
j=1 Nxj y
=
,
N j=1 Ny
(Ny )d1 N
(38)
PT (y|x) =
Qd
j
j=1 Nxj y
Q
d
j
1d
k=1 (Nk )
j=1 Nxj k
(Ny )1d
Pq
(39)
T (y 0 , y 00 |x)= ln
PT (x, y 0 )
PT (x, y 00 )
X
ln(Nxjj y0 ) ln(Nxjj y00 )
(40)
.
j=1
(35) (38) ,
: , , .
j
, Nmk
,
, - : , , ,
. , ,
( ) ,
.
, , ,
,
q
d Y
Mj
Y
Y
k
m
j
(k )n
p()
(41)
(mk
)njk ,
k=1
j=1 m=1
, 1
. , 2 , , 1
28
-7 ,
, , .
.
, (41).
. :
j
T Nk Nmk
(3132);
()
p(y|x) Y, , , ,
, .
R
(x) = 0 tx1 et dt x > 0.
(x) = (x 1)! x (x + 1) = x(x) .
8 ,
.
7 :
29
!!!!
(., , [52],
): F Y (, R)
, PY
( ) ( ), . 1.1.2.
F .
, ,
(, belief ) ,
-
. , f (x) Y, f F x X , ( )
y Y (.. ) f (x). -
.
(., , [14]) , F, , , pf (y|x),
f F , x X y Y, p(x, y) = p(y|x)p(x) P. ,
- . , ( )
X ,
. p(x) X
; ,
x1 , . . . , xN .
[ ],
().
,
F f (x) Y p pf (x) . f , X=(x1 , . . . , xN ) T = ((x1 , y1 ), . . . , (xN , yN ))
Y=(y1 , . . . , yN ) [.. Y X f ],
N
Y
p(Y|X, f ) =
pf (xi ) (yi ),
(42)
i=1
f [ ]
f F
f [ f ]
T
p(f |T ) = p(f |X, Y) =
p(Y, f |X)
p(Y|X, f )p(f )
=R
.
p(Y|X)
p(Y|X, f )p(f )df
f F
30
(43)
(43)
: [
], . 1.2.4. []
Z
fT =
f p(f |T )df
f F
( fT F, ), , , , , f (x) , . [] .
(43) f
N
Y
p(Y, f |X) = p(f )
pf (xi ) (yi ) max
(44)
f F
i=1
ln(p(Y, f |X)) = ln(p(f ))
N
X
i=1
(45)
N
X
(46)
f F
i=1
.. (8)
. , E
Y Y, - PY Y.
(42),
[], (4) .
1.2.7
, (Y = R), X , : X Rd .
d+1- F (
)
2
12 (yw(x))
2s
(47)
s Pd
(x) w(x) = j=1 wj j (x). , ,
,
.
T (42),
N
ln(p(Y, w, s|X)) =
1 X
N ln(2) N ln(s)
+
+
(yi w(xi ))2 min ,
w,s
2
2
2s i=1
31
. . .
. . .
, w
N
X
i=1
(48)
( 9
, xi , j (x)) s
N
N ln(s)
1 X
+
(yi w(xi ))2 min ,
s
2
2s i=1
s=
N
1 X
(yi w(xi ))2 .
N i=1
x y (47) , .. w(x), s.
, s (, s = 2008), ,
s = 2008 .
1.2.8
PY Y = {1, . . . , q} (q 1)-
{(y 1 , . . . , y q ) Rq+ |
q
X
y j = 1} ,
j=1
F
X Rq , Pq
j
j
(f 0 j=1 f = 1), f j (x) = p(j|x, f ).
.
Y,
PY , , k 7 (k1 , . . . , kq ). p(y|x, f )
q
Y
j
p(y|x, f ) =
(49)
(f j (x))y .
j=1
(49) y PY ,
, , ( ,
). (49)
E(f (x), y) = ln(p(y|x, f )) =
q
X
y j ln(f j (x))
(50)
j=1
9
, 2.1;
.
32
q
N X
X
(f )
yij ln(f j (xi )) min ,
(51) ,
f F
i=1 j=1
- ,
q
N
XX j
(52)
yi ln(f j (xi )) min .
f F
i=1 j=1
f () = f 1 () yi = yi1
(51) (52)
(f )
N
X
i=1
N
X
i=1
(53)
(54)
.
. ,
, () , Y = {1, . . . , q}, . : (50)
i=1
, , , (50) (55).
() F
, , 2.2.2.
1.3
, .
. ,
.
33
1.3.1
P ( 1.2.4),
.
p (x, y)
Z
p (x, y) = p (x)p (y|x), p (x) =
p (x, y)dy
yY
X X X N
. 1.3.1
, .
()
1.2.5 p(y) : . ,
, {0} Y
T = ((0, y1 ), . . . , (0, yN )).
(outlier detection), :
, ( ) .
, , .
:
F {f : X {0, 1}} T -
.
4.2.7.
: , .
1.3.3
.
N , .. .
34
(), X q
, , , , . ,
- , , , . ,
, , f : X Y = {1, . . . , q},
, , , Y.
. ,
, , , ,
,
- .
, . , ( : N q ), N q
, ( q!) . , ,
,
.
. ,
, ,
f : X Rq , x
arg maxj fj (x) ( ), ,
maxj fj (x).
( 1.3.2).
: , P
fj (x) 0
j fj (x) = 1 X
N
N
Y
Y
max
fji (xi ) =
max fj (xi ) max .
(57)
(j1 ,...,jN )Y N
i=1
i=1
jY
f F
- , , -, , .
.
.
,
, .
,
..
(57) . ,
.
, - , ,
.
.
, , , .
, : , , 35
, .
, . ,
(: , , , ,. . . )? ,
- (: ,
, , ,. . . )?
q-
( 1.2.1) d- X , : p(y) 1q , p(x|y)
2
1
d
p(x|j) = (2) 2 e 2 (xj ) , 1 j q.
q 1 , . . . , q X .
( (28)) X = (x1 , . . . , xN ) - y1 , . . . , yN
,
N
Y
i=1
p(xi , yi ) =
N
Y
i=1
N
1 Y
p(xi |yi ) max ,
q N i=1
..
N
X
i=1
N
X
(xi yi )2 min .
i=1
(58)
, :
. ,
P
i|yi =j xi
j =
#{i|yi = j}
(58).
, j
( ) j- .
xi j, p(xi |j) , .. kxi j k . , ,
.., , .. .
1.3.4 . k-means (k ), k
36
!!!!
k-means
(, q), a means . 25
, (Stuart Lloyd)
: [79]. , k-means ,
([5]). ,
.
k-means ,
,
.
k , , (8), - k
(57). ,
k, .
1.3.4
. f : X Y
c : Y X , y c(y), . ,
.
(vector quantization). . , , ,
X . ()
, . ()
.
,
( 1.1.4). :
() X , -
, d- Rd ;
Y, , ;
F f : X Y, ,
X Y;
C c : Y X , , Y X ;
P ( ) X , ,
X ,
, ..;
E : X X R, , 0
, ,
X E(x, s) = kx sk2 ;
37
X= (x1 , . . . , xN ), xi X ,
P.
X , Y, F, C, P, E X f F
c C, E (f )
Z
E (f, c) =
E(c(f (x)), x)d(x) min ,
(59)
f F ,cC
xX
P. ,
N
1 X
E (f, c) E(f, c, X) =
E(c(f (xi )), xi ) min ,
f F ,cC
N i=1
(60)
, ,
X Y ( 1.1.4). ,
X X 0 Y0 Y ,
38
X Y , X 0 Y 0 (, ,
, , ,
, ), , f ()
F, (-,
, ) X 0 Y 0 . ,
f (x) = wx f (x) = wx + b, .
, , Y 0 = Y = R . Y, ,
, - , ,
Y = {0, 1}
- (t) = 21 (1 + sign(t)), (0) = 12 .
, Y , , . 2.2.2.
2.1 2.2 , , X = X 0 : X X 0
.
2.3.
20- ,
( ).
. 21-
:
P
, ,
, ;
X
, (4),
;
X ,
,
( ) X 0 , ,
.
, 1, .
[14, 50] [98].
.
f (x) = wx f (x) = wx + b. , ,
- X 0 1 x = (x1 , . . . , xd ) x0 = 1, w
w0 = b. X .
39
x
x
1
x
=
R X0 .
x
(61)
w b X 0 Y 0 ,
w
w
= ( b w ) : R X 0 Y0 .
(62)
Y 0 (..
- )
f (x) = w
x f (x) = (w,
x
)
f (x) = b + (w, x) = w0 + (w, x).
2.1
2.1.1
(48)
E(w,
T) =
N
X
(yi w
xi )2 min .
w
i=1
(63)
E(w,
T ) w
( ), , .
w.
(63)
E(w,
T)
= 0.
w
1 1
x11 x1N
X = (
x1 , . . . , x
N ) = . .
..
..
..
.
xd1
xdN
y = (y1 , . . . , yN )
10
. Y
w
(-)
w
= (w0 , w1 , . . . , wd ).
(63)
>
E(w,
T ) = (y wX)(y
wX)
min,
w
(64)
10 , , . ,
, , .
.
40
!!!!
11
0=
E(w,
T)
>
= 2X(y wX)
,
w
(65)
>
wXX
= yX> .
>
(66)
1
>
N XX
>
XX yX , ,
Z
!
1
>
N yX ,
xk xl d(x, y)
(x,y)X Y
1k,ld
!
k
yx d(x, y)
(x,y)X Y
1kd
, . XX>
, (63)
w
= yX> (XX> )
(67)
, , . (66) w
, (67).
q- Y E(r, y) = A(r y, r y)
A (
).
:
>
>
E(r, y) = (r y) A(r y) = tr A(r y)(r y) .
(64)
>
E(w,
T ) = tr A(wX
y)(wX
y) ,
w
>
AX(wX
y) = 0,
y q-:
y11
..
y = (y1 , . . . , yN ) = .
y1q
..
.
1
yN
.. .
.
q
yN
A
(66).
PN
PN
. i=1 xi = 0 i=1 yi = 0,
(63) w0 = 0.
11 , (= ),
- .
41
!!!!
2.1.2
( (8), (46))
(kwk)
+
N
X
(yi w
xi )2 min,
w
i=1
(68)
kk , , lp -,
.. p- p- p > 0
p = 0. . ,
( ), ( ), ,
.
, , , , , 12 .
(k k), (68) , , - .
.
l2 - kwk
22 kwk22 ( , ridge regression
[56])
12 (yw(x))
2
pw (y|x) = (2)
(69)
(. (47)) 0
1 .
kwk
22 +
N
X
(yi w
xi )2 min ,
w
i=1
(70)
(63), ( (63) ) .
(63),
1
w
= yX> (Id+1 + XX> )
(71)
(. (67)), In Rn , .. n n. ,
, w0
(62), .
.
In0 =01 In1 (n 1)- Rn ,
, .
12 ,
, .
42
. . .
kwk22 +
N
X
0
(yi w
xi )2 = kwI
d+1
k22 +
i=1
N
X
(yi w
xi )2 min
w
i=1
(72)
(70) (71) :
1
0
w
= yX> (Id+1
+ XX> ) .
.
0
, Id+1
+ XX> .
. . .
!!!!
N
X
(yi w
xi )2 min
w
i=1
(73)
, ,
, , . , ( LARS , . [35])
(73) .
(73) , .. w
0. ,
, ,
. , d N .
(73), 0,
N
X
(yi w
xi )2 min ,
kwk1 C
i=1
(74)
C > 0, ( 1).
kwk1 C (, 2qd , , , ).
( , 3qd 1) ,
C.
w,
, , . (, )
, .. (,
). (74) , .
,
. (C , ),
43
LASSO
LARS,
...
...
..
. ...... .
...
......
...
..
..
.
...
..... ...... . ....
. .
. ..
...
. .....
...
.
.
.
.
.
.
. .
.. .
.
.
.
.
.
.
.
.
.
.. .
.... ... .. . .. ..
.... .. .
. .
...
....
.
........ ..
... .
.... . .
.
.. . ... ..
.
.
...
.
.
.
.
.. ...
..... ........ . ...... ......
...
.
.......................................................................................................................................................................
.
.
.
.. . . . . ..
.
....
.... ... . .
.........
.
. .. ...
....
.
.......
...
.
.
.
.... .. ...
. .. .
... ... ...
....
....
...
....
r r
...
...
..
...
...
...
...
...
...
...
...
...
...
..
..................................................................................................................................................
...
...
...
...
...
...
...
...
...
...
..
....
.
.
.
...........
.
. 1: (74). . l1 -, -
, ,
. ( )
C.
. (74) .
( C) , , (..
, . . 1) ,
.
, lp - , p = 1 . (68)
( , ) p 1,
p 1.
l1 l2 - (elastic net [127])
- lp - . , (68) (w)
= 1 kwk1 + 2 kwk22 , 1 2 ,
. (68) ,
2 1 ,
, LARS. , 2 kwk22 (73)
0
(x0 , y0 ) = 2 , 0 .
..
. 2
, , ,
. , ,
13 . ,
, , , ,
.
13
.
44
i=1
(75)
C-
w, .. dq- , , ,
dq . : ,
"leaps and bounds" ([43]) ( 50) . - ,
, .
2.1.3
. , (. . 2):
E(r, y) = |y r| ;
E(r, y) = |y r|p , p > 0 ;
|y r|2
|y r|
([60]) E(r, y) =
;
(2|y r| ) |y r| >
15
- E(r, y) = max(0, |y r| ) ;
- E(r, y) = (max(0, |y r| ))2 .
, , ,
.
:
( );
,
( );
14
elastic net ( ) , , .
15
, 0-1-, .
45
E ......
...6
..
E1
..E2
.
....
....
...
..
...
... ... ....
...
.
.
....
...
.
...
.. .... ....
.... ..
.
...
.
...........
.
.
.........
.
.
.
.
.
.
E 21 . ....... ... ......... ...... ............ ...... . ..
.
......
.......
.
...... ....
.
............ .... ................
. .. .
..........
..... .. ..
...................................................................................
....................
...................................................................................
..
...
y f (x)
....
.
....
....
....
.
E .....
.E
.. H .E
...6
...
...
.... ..
.. 2,
. ..
....... ..
...
.
E1, ..... ... ...
...
.
...
.
..
...
... .
.
. .
...
... ....
. .. ....
... .. .
...
. . ..
.
.
.
.
.
.
.
.... . ..
.. . ...
.
....
.... ...
.... ...
...
.
...
.......... ..
.........
.
. ..
...
.
.... .
...
.. ......
...... ..
...
. .
............. .
...
.
..... . ..
.
.
.
.
.
.
.
.
.
.
.
.
.
......................................................................
......................
...........
........................................................................................ .......
y f (x)
...
.
..
. 2: , . Ep , EH - Ep, .
.
,
( );
,
( );
, 0, , ( ), ,
( );
(
, , ,
).
-
( , )
. (SVR, support vector regression),
4.2.4. d N , , (lasso (.
43)).
2.2
,
P {j|x} = f j (x) x
q , , , ,
. ,
. ,
f j (x), .
f j (x), , .
,
. : (, ) ,
,
().
46
2.2.1
( )
q- d- X (, , ) ( 1.2.4). , ,
(j-) q
j X ,
A, .
j- j , j > 0 (
j- ). 16
d
p(x) =
q
X
p(x, j) =
j=1
q
X
j (2) 2 |A|
(76)
.
j=1
P {y|x}, ,
>
P {y|x}
=
=
y e 2 (xy ) A (xy )
p(x, y)
= Pq
12 (xj )> A1 (xj )
p(x)
j=1 j e
ey
Pq
j=1
>
A1 x 12 y > A1 y +ln(y )
e j
ew
P {y|x} = Pq
j=1
(77)
(78)
ew j x
w
: X Rq .
j , j 1 j q A
T .
, ..
ln(p(T )) =
N
X
ln(p(xi , yi )) min ,
i=1
,,A
p(xi , yi ) (76).
j 1,
q
X
L(T, , , A, )= ln(p(T )) +
j 1
(79)
j=1
N
X
1
1
d
> 1
=
ln(yi ) ln(2) ln |A| (xi yi ) A (xi yi )
2
2
2
i=1
q
X
+
j 1
j=1
16 :
A1 x1 x2 x2 > A1 x1 ,
A1 A .
(A1 x1 , x2 ), A1 (x1 , x2 ) .
47
, j , j A.
A, A1 (
).
. , M
( ln |M |) =
(ln |M 1 |) = Mlk ,
(M 1 )kl
(M 1 )kl
A1 , -
,
L
j
L
0=
j
0=
L
0=
A1
kl
q
X
!!!!
j 1
j=1
=
=
#{i|yi = j}
+
j
X
A1 (xi j )
i|yi =j
N
1X k
(#{k, l}) Alk +
(x kyi )(xli lyi )
2
2 i=1 i
N
#{i|yi = j}
P N
i|yi =j xi
(80)
(81)
#{i|yi = j}
PN
k
k
l
l
i=1 (xi yi )(xi yi )
Akl =
,
(82)
N
,
. ,
:
NNq (!). A
( N d).
. A. ! .
(76) , (77),
( 1.2.1). , ?
, , ij (x), pi (x) > pj (x) ( x i- , j-) ij (x) > 0. ,
,
48
!!!!
!!!!
...
.
...
.
...
.
.
...
.
....
.
...
.
...
.
...
.
.
...
.
....
.
...
.
..
...
..
.
......
......
... ....... .. .... .... . ... .......
..
...
.
...
... ..
.
.
. .
...
..
...
...
... .... . . ........
...
...
.
...
. ..
.
.
...........................
...............................................................................
... ..
... . .... . .... 2 .....
1
.
...
... .
..
..
. ..
...
...
.
..
..
. ..
...
... ..
....
..
..
.
.
.
.
...
.
.
..... ..
.
.
.........
...
.
....
...
..
...
.
...
.
...
.
1 = 2
.
...
.
...
.
.
...
.
1 > 2
...
.
...
.
.
...
.
...
.
.
. 3: (83)
j
pi (x)
pj (x)
i
1
>
>
(x i ) A1 (x i ) (x j ) A1 (x j )
j
2
1 > 1
i
>
i A i j > A1 j + (i j ) A1 (x)
= ln
j
2
i
i + j
> 1
= ln
+ (i j ) A
x
.
(83)
j
2
=
ln
A = Id i = j 0 (83) , [i , j ] .
i 6= j , A 6= Id
, [i , j ]
A1 (. . 3). : d- ( A = Id ,
A1 )
(q 1)- j .
, , ,
. (80)
(81) , (82) q
P
k
l
i|yi =j (xi yi ) (xi yi )
kl
Aj =
.
#{i|yi = j}
(83) , ,
() .
49
( ), ,
, .
. (q 1) + qd + d(d+1)
2 ,
(q 1) + q d + d(d+1)
, .. q 17 . 2
,
,
, , .
,
.
,
. , ( , ) ,
. ,
.
,
, 1.2.4,
.
(83), (80), (81), (82)
. 1936
[36]. ( )
- . , ,
2.2.2 ,
.
2.2.2
(logistic regression)
2.2.1
(78) . (q(d+1) 18 (q 1) + qd + d(d+1)
)
2
( )
, .
1.2.6 (43)
(46),
.
1.2.8 , f y (x), P (y|x),
(. (51),
(52)). , f (x) . . f j (x) ( 17 1.1.6, , -.
18 , , (q 1)(d + 1)
50
) , .. eh (x) .
, , ,
-, e100 = 0.
:
eh
f (x) = Pq
(x)
ehk (x)
k=1
(84)
(84)
hj , , , hq = 0,
P
q
j
j=1 h = 0.
(84), (51),
(logistic regression).
hj (x) = w
j x
,
ew
f j (x) = Pq
k=1
ew k x
(85)
, w
q = 0.
(85),
j
(x)
ff i (x)
(w
j w
i )
x +b = 0. ,
j
i
{x|i {1, . . . , q} f (x) f (x)} j .
, ,
w ( w)
(w)
= kwk22 ,
q
q
N
X
X
X
j
L(w,
T, ) = kwk22
yij w
j x
i ln
ew xi min
(86)
q
i=1
j=1
j=1
w
=0
( (85) (51)
P
q
j
j=1 yi = 1). , (86) w,
(71), .
w
, , () (!). (86)
. (-)
19 - .
(86)
, , ([110], . 2.1.2),
19 . . ,
1, .. ,
. n
n .
(
). ,
2,
.
. , .
, 40 .
51
!!!!
. , , ., , [76].
q = 2,
w
1 = w,
f 1 (x) = f (x) =
w
2 = 0,
w
x
e
x
1 + ew
y 1 = y, y 2 = 1 y,
1
1
=
, f 2 (x) =
x
x
1 + ew
1 + ew
(87)
(86), :
L(w,
T, )
0
= kwI
d+1
k22
N
X
xi
yi w
xi ln(1 + ew
)
i=1
L(w,
T, )
0=
w
2 L(w,
T, )
w
w
>
0
2Id+1
w
>
N
X
yi x
i
i=1
0
= 2Id+1
+
N
X
i=1
xi
ew
x
xi i
1 + ew
xi
ew
x
x
> .
xi )2 i i
(1 + ew
(88)
(89)
(90)
, ,
. , 0
, , Id+1
1
1 + eyh(x)
ln(1 + eyh(x) ) .
!!!!
.
logistic regression, (84)
hj (x) logistic functions, log-odds logit, , . logistic function
(t) = 1+e1 t ,
sigmoid function. - f (x) = (w
x) ( (87)).
. 1950- , 1960- ([28]),
, , [76].
2.2.3
, , .. 1950- , , . [97].
52
- , ( , ),
.
: X = Rd ,
Y = {1, +1}, F = {sign(w, x)|w Rd }. 2 ,
,
, . , sign Y, 0,
, 20 .
T . ,
{xi |yi = 1} {xi |yi = +1} {x X |(w, x) = 0}, , , w,
(xi , yi ) yi (w, xi ) > 0.
, .
(w, x) + b = 0
2 xi
yi xi
.
.
(w, x) = 0,
{x X | |(w, x)| < },
= min yi (w, xi ) > 0.
1iN
2 = 2
.
kwk
(. 1),
, , .
20
. . .
53
!!!!
1:
1. : T = ((x1 , y1 ), . . . , (xN , yN )) N .
2. w = 0 x 7 sign(w, x),
k = 0 n = N .
3. n > 0
(a) : n = 0;
(b) (xi , yi ) yi (w, xi ) 0
( )
i. n = n + 1 ( );
ii. w = w + yi xi ( );
iii. k = k + 1 ( ).
4. : w k.
2 (A.Novikoff ([89]) ) -
, . ,
2 R = max1iN kxi k,
2
(w, x) , R
w.
w w0 , ,
w
2.
. w(k)
w k . w(0) = 0, w(k) w(k1) - yi xi ,
w0 , w(k1) , R.
(w(k) , w0 ) = (w(k1) , w0 ) + (yi xi , w0 ) (w(k1) , w0 ) + kw0 k,
(w(k) , w0 ) kkw0 k
kw(k) k2 kR2 .
54
(92)
(93)
(91) (93)
2
,
, . .
, ,
. ,
, , ,
, ,
, w (,
PN
w = i=1 i yi xi i ),
2
R
,
.
, ,
2
, R
< N ,
w.
2
, -, w , R
, -,
2
N R
2
, , 1 NR2 -
. , ,
2
NR2 . , , ,
[116].
, .. ( . 83),
ry ry < 0
E(r, y) = (ry)+ =
0
ry 0
(. 5) T :
N
X
i=1
EP ((w, xi ), yi )
N
X
i=1
w
( , , ),
EP ((w, xi ), yi ).
.
55
, , . 6 . 83. 19501960- ,
,
. XXI
, .
,
2,
. :
yi (w, xi ) 0, yi (w, xi ) kwk. ,
2.
R
.
2
. 2 (91)
, (92)
kw(k) k2 = kw(k) + yi xi k2 kw(k1) k2 + 2kw(k1) k + R2 .
(94)
kw(k) k k ,
k, (93), - ,
, (91). , ,
R2
< < , (94) , kw(k1) k 2()
(
), kw(k) k kw(k1) k+.
,
R2
kw(k) k
(95)
+ k.
2( )
(95) (91)
k
R2
,
2( )( )
.. . , = +
2
k
2R2
.
( )2
. ?
2.2.4
( , margin
classifiers)
2.2.3 , , . (61) .
56
!!!!
......
2 ........
...
|w|
...
.
.
.
...
..... ....
...
...
...
...
.
...
...
...
...
...
...
.
...
...
...
...
...
.
...
...
...
...
...
...
.
...
...
...
.
...
...
...
...
...
...
...
...
.
...
...
...
...
...
...
.
...
...
...
...
...
...
.
...
...
...
...
...
.
...
...
...
...
...
...
.
...
...
...
...
...
...
.
...
...
...
...
...
.
...
...
...
...
...
...
.
..
...
.
..
...
...
.
+ w
*
+
+
+
+
+
. 4: (97).
.
40, ..
sign((w,
x
)) = sign((w, x) + b) .
(96)
w
( ) ,
yi (w,
x
i ) > 0, yi (w,
x
i ) 1. 2
kwk
w
, , :
kwk
2 min
(97)
yi (w,
x
i ) 1
1 i N.
, , , , . ,
. : , ( , . . 4),
, .
, , ,
(. (70) (72)). (97)
, ,
kwk2 min
(98)
w,b
yi ((w, xi ) + b) 1
1 i N.
,
, , , (yi ((w, xi ) + b) < 0) (yi ((w, xi ) + b) < 1),
, ,
57
. :
N
X
1
kwk2 + C
E(i ) min
w,b,
2
i=1
yi ((w, xi ) + b) 1 i
i 0
(99)
1 i N
1 i N.
E()
, C > 0
,
.
(soft margin classifier 21 ).
xi
i 1, ..
0 < 1
E() =
1 1
, - , . -
E() = max(, 0), ., ,
2.1.3.
.. [117] [18, 26]
(SVM , support vector machine). [26] [31], [22] . , ,
( (8)),
- .
( (97))
, f (x) = (w, x) + b ,
, , |f (x)| 1. ,
, .
, , -, , ( [113]
35% , 3050%),
-,
.
2.2.2 ,
21 margin ,
( ) (, ) (
(w, x) + b). margin
.
((w, x) + b) ,
.. . margin
1
([114],[15]),
([50]), , .. kwk
([52]), , .. y((w, x) + b) ([100],[52]),
([52])
58
...
E ......
.E
...6
E....L... .....S
...
.......... .....
...
.....
...
...
EP ................ ..... ...
...
............. .. ..
.
.............. .. .. .......
.............. .. .. .....
............ .. . ...
...... . ...... . ...... . ...... . ...............................
... .............................
EM
. ............................................................................................................................................................................................................................................
...........
...
0....... 1
yf (x)
.
.
. , ,
x 1
((w, x) + b) =
1
1 + e((w,x)+b)
(. 2.2.2). ,
(8). , , .
1
C 2C
, SVM
(99)
kwk2 +
N
X
i=1
ES (t) = (1 t)+ =
1t
0
(100)
t 1
t > 1
N
X
i=1
(101)
EL (t) = ln(1 + et ).
2.3
2. , - , ( 1.1.1) . - .
R {|x| < 1} {2 < |x| < 3}
. .
. ( ) . ,
.
.
2.3.1
j : X R, 1 j d0 ,
0
, , , : X X 0 = Rd X X 0 .
f : X 0 Y , .. f ,
. , f f (x0 ) = (w, x0 ) + b. f : X Y
X
0
d
X
wj j (x) + b,
(102)
j=1
.. (,
). ,
, .
. ,
n. d-
d0 = d+n
,
d
. (
)
n. (
), n
.
,
.
, , X
. X - ()
{c1 , . . . , cd0 } - : R+ R+ j (x) = ((x, cj )). , , (RBF , radial
basis functions). ,
x, - cj ,
60
RBF
(102) , cj . () (,
, ),
( 1.3.3 1.3.4)
RBF- ( 3.3).
.
(, ), ().
,
(
- ). .
N . , , ( .. [114, 113, 115]) .
N ,
, , , , , .
N , , , , (,
) , N
.
. ,
(
, X , , ),
(- . . . ) .
2.3.2
.
2.3.2
22 (kernels)
X K : X X R. T = ((x1 , y1 ), . . . , (xN , yN ))
T : X RN T (x) = (K(x1 , x), . . . , K(xN , x))
- RN
X 0 (T ) ( T ) . jT () = K(xj , ) .
x
T (x) = (K(x1 , x), . . . , K(xN , x)).
(103)
(w, T (x)) + b =
d
X
wj K(xj , x) + b
(104)
j=1
(. (102)), wj b
K(xi , xj ) yi .
22 . , ,
.
61
K (kernel ),
N (. 2.3.1) kernel trick . X K . , X
, K(x, y) ,
, 3.14.
K(x, y) (. (. 61))
, x y .
.
kernel trick , ()
(T (x), T (x0 )) =
N
X
K(xj , x)K(xj , x0 )
j=1
((x), (x0 )) = K(x, x0 ),
(105)
( 2N ). , K , ,
(K(x, x0 ) = K(x0 , x)) ( (x1 , . . . , xn ) (K(xi , xj )) ). ( 9, . 93).
(positive definite kernel ) .
X , , , .
T d- N -.
, ,
, . , N > d
, N < d .
, . ,
, ,
0 2
K(x, x0 ) = ekxx k .
,
- H X
: X H. .
K(x, x0 ) N x1 , . . . , xN N -
( (105)), ,
, ,
K(x, x0 ) = ((x), (x0 )). ,
, 4.1.
62
Kernel trick
, .
.
[1]23 .
.
, (x, y)
K(x, y), w PN
(. 55), w = i=1 i yi xi .
4.2.
(70)
1
w
= yX> Id+1 + XX>
( (71))
1
f (
x) = w
x = yX> Id+1 + XX>
x
,
(106)
, , . ,
1 >
1
IN + X> X
X = X> Id+1 + XX>
( , ),
1 >
w
= y IN + X> X
X
()
1 >
f (
x) = w
x = y IN + X> X
X x
.
(107)
N 1
N
f (
x) = y IN + K(
xi , x
j ) i,j=1
K(
xi , x
) i=1 .
(108)
, (106) (107) (103) , , (108),
, ,
.
. k k
.
23
[3] . ,
, [2].
63
!!!!
2.3.3
(.. , , , ) , .
, (d0 ), (..
), -
( , 21 ).
, ,
(.. ),
, .
(,
, ),
.
,
, .. , .
, (boosting ), . 5.
, , ,
, wj w (
- ),
b ( ), , , , l1 -
Pd0
w, j=1 wj . (99) :
0
d
X
j=1
wj + C
N
X
i min
i=1
yi (w, x0i )
w,
1 i
wj 0
i 0
1 i N
1 j d0
1 i N,
x0i i- , d0 - , yi {1, 1} , i-
. 5.3. , , ,
,
wj w (
- ), b ( ),
, , , l1 - w, Pd0
j=1 wj . (99) :
0
d
X
j=1
wj + C
N
X
i min
i=1
yi (w, x0i )
w,
1 i
j
w 0
64
1 i N
1 j d0
i 0
1 i N,
x0i i- , d0 -
, yi {1, 1} ,
i- . 5.3.
:
,
.
, ,
, .
( , ) ( 2), ,
(, ) ( ). ,
, .
,
, ,
.
. . . .
, , . -
!
, . , , ,
,
.
3.1
19401960- (
computer science) . .
( 1011 ) (),
() ( 104 )
(), /
. () ( , ).
: ( , )
(), . , ( ) , .
65
, () (, )
. [82]
1943 24 . -: ( ,
)
. ,
( , 1011 ) ( , 1015 ), ,
. .
,
.
(ANN, Artificial Neural Networks), , , ,
.
.
V = {v1 , . . . , vS } ( ) () E = {e1 , . . . , eT }, :
- () Vin - (
) Vout (, ) ;
e we R he : R2 R, ();
v V \ Vin wv R
gv : R2 R, ().
q
1
d
1
Vin = (vin
, . . . , vin
) Vout = (vout
, . . . , vout
)
d q , . f : Rd Rq ,
. y = f (x) u (),
- x w, u(e) e u(v) v,
(e vout ). :
j
j
j- vin
j- xj , .. u(vin
) = xj ;
v, ,
X
he (we , u(e))
s(v) =
(109)
evin
u(v) = gv (wv , s(v)) .
24
(110)
, 2.2.3.
66
. -
q
1
)) -.
y = (y 1 , . . . , y q ) = (u(vout
), . . . , u(vout
w, we wv , .
w,
y, y = f (x) = F (w, x)
x, , . ,
he gv .
() , . V
V0 = Vin , V1 , . . . , VL = Vout ,
, i- , (i 1)- ;
gv ; he .
L-. .
.
:
he (we , u) = we u, gv (wv , s) = th(s wv ) ( ), gv (wv , s) = (s wv ) ( ),
- ; . 3.2;
he (we , u) = (u we )2 , gv (wv , s) = es/2wv - -; . 3.3.
( )
, .
( ) gv (wv , s) = (s wv ),
( ) gv (wv , s) = (s wv ) (.
(87) ); -. : gv (wv , s) =
s wv .
, .
he gv ,
. he gv .
L : , . ,
, - , . ,
.
67
d q . d ( ), q
1 () ( ). ,
, ,
, , ,
. , ,
, , .
, , ,
. ()
.
:
,
, ,
.
( 1.1.4),
( 1.1.6), , , ( 1.2.4).
-
: , .. , w.
, .
.
, , , (, ), , .. - .
(MLP RBF)
, : , .
, .
3.2
(MLP)
() 1950-
[97] ,
. (MLP (Multi-Layer
Perceptron)) gv , t = s(v) wv ,
, t
t
th(t/2)+1
= 1+e1 t .
th(t) = eet e
+et (t) =
2
68
...
, : gv (t) = t.
v
!
!
X
u(v) = gv
we u(e) wv .
(111)
. . . , ,
evin
we , wv .
th :
th, Fth (w, ), ,
,
w0 , F (w0 , ) = Fth (w, ),
u (v) = uth (v)+1
.
2
. w 7 w0 , .
, , ,
, Fth (0, ) = 0 w x f (x) = Fth (w, x) .
1 v
wv . wv , v (111)
!
X
u(v) = gv
we u(e) .
(112)
!!!!
evin
3.2.1
,
, ,
.
3 ( )
( )
g(t) = (t).
( , L2loc , , ..)
g(t) = (t).
.
4 ( )
69
!!!!
.
.
2d .
5 ( )
.
.
, , .
,
.
, ,
.
!?
: , ,
,
() .
70
!!!!
- .
, 5
. ,
.
.
19501960- . , ( , . 1.1.6) ()
,
.
3.2.2
()
( 5). :
, - , .
.
,
, .. , ( ) .
(
), . - ,
. ()
.
,
1990- , [123, . 4].
3.2.3
: (error back-propagation)
N
X
E(F (w, xi ), yi )
(113)
i=1
, ()
, .
, .
, , ,
.
,
{u(v)} . ,
v...
v... ;
q
1
, . . . , vL
vL
;
e... e... ;
e w
e = we , e e;
v gv , gv0 (gv1 (u(v))) = gv0 (s(v) wv ),
.. gv v v , u(v) gv ,
.
, ,
.
gv0 (gv1 (u(v)))
.
. gv (t)
, gv0 (gv1 (u(v))) = 1 (u(v))2 ;
, gv0 (gv1 (u(v))) = u(v)(1 (u(v)));
, gv0 (gv1 (u(v))) = 1.
, (
) s(v) = gv1 (u(v)) 0 .
j
vL
E(r, y)
,
rj r=F (w,x)
j
gv0 (gv1 (u(vL
))). ,
j
j
u
(
vL
) = 2(F j (w, x) y j ) = 2(u(vL
) yj ) ,
72
!!!!
-
E(F (w, x), y) = y ln F (w, x) (1 y) ln(1 F (w, x)) ,
vL , - (. (54)),
gvL (t) = 1+e1 t ,
y
1F1y
(w,x) F (w,x) , F (w, x)(1 F (w, x))
u
(
vL ) = (1 y)F (w, x) y(1 F (w, x)) = F (w, x) y = u(vL ) y ,
.. 2 .
. .
. , , , .
: (111). , s(v) v, (109),
.
7 v
!!!!
!!!!
v , v. e
E(F (w, x), y)
= u(e)
u(
e)
we
e , e.
,
(x, y) x ,
, , (F (w, x), y), , ,
( wv ).
(113)
(w). P
(w) = e we2 , ,
, (w) =
P
P
e e we2 + v v wv2 ,
, .. ,
, (113)
. ().
, (113) ,
, .
3.2.4.
73
!!!!
.
: .
.
he gv (109) (110). 7
:
, . ,
.
:
e
he (we , t)
;
t
t=u(e)
v
gv ,
gv (wv , t)
;
t
t=s(v)
j
vL
,
gvj (wvj , t)
L
L
;
t
j
t=s(vL )
gv (r, s(v))
E(F (w, x), y)
=
u
(
v) ;
wv
r
r=wv
e
we
r
r=we
. , (w,x),y)
25 w E(F (w, x), y) = E(Fw
.
1970- [99].
f (x,y)
25 f (x, y)
f x
x
x ,
, .
74
!!!!
3.2.4
3.2.3
,
. ( ) , . :
W Rn (
) n;
( );
;
(
);
( ).
-
.
, :
,
, .. .
,
(, g(t) = th(t)) , , , [1, 1],
, , ,
(, 10) (10, ). ,
( 7).
,
, ,
.
. , , , ,
. , ,
(. (112) ).
( , (w) = kwk2 ,
ET (w) - ), , , ,
[1, 1].
75
w(1) ,
. ,
, . , , -
1
1
[ Ne (v)+1
, Ne (v)+1
], Ne (v) v,
, .
i- w(i) ET ()
(113) W ET (w(i) ) :
w(i+1) = w(i) i ET (w(i) ) .
(114)
,
i > 0, , . , i = E(w) = w2
W = R E(w) = 2w w(i+1) = w(i) 2w(i) = (1 2)w(i) < 1
1. E(w) = 10w2
< 0.1.
, i 0 i, .
i , .
.
, ,
(. , . 77).
(114)
w(i+1) = w(i) i ET (w(i) ) + (w(i) w(i1) )
(115)
. , , ,
ET ,
.
i - .
(115), i = const,
.
(w(i+1) 2w(i) + w(i1) ) = (1 )(w(i+1) w(i) ) ET (w(i) ) ,
, -
w00 = (1 )w0 ET (w)
w W
ET () , w0 (1 ).
76
i i- (114)
, ,
f (i ) = ET (w(i) i ET (w(i) ))
min
i [0,2i1 ]
(116)
( 2
i 2i1 ). , f , . f (0) = ET (w(i) ) ,
f 0 (0) = kET (w(i) )k2 ET (w(i) )
, , , f (2i1 ).
. f 0 (0) i f .
. f , , , , f (0). f .
.. .
,
, .
, [54].
,
.
E(w) = H(w, w) + bw
(117)
H
,
. w(1) w(i) E(w(i) ),
gi = E(w(i) ) , w(i+1) w(i) 26 H. H
.
H. -, ,
, , ,
. 2
.
26 : r s H,
H(r, s) = 0 , , r> Hs = 0.
77
!!!!
2: (Polak-Ribi`ere)
1. : E() E(),
w(1) , TimeToStop().
2. g1 = E(w(1) )
d1 = g1 .
3. i = 1, . . . , (gi = 0 TimeToStop())
(a) w(i+1) E(w)
w = w(i) + di , 0;
(b) gi+1 = E(w(i+1) );
(c) di+1 = gi+1 +
(gi+1 ,gi+1 gi )
di .
(gi ,gi )
4. : w(i) .
, (117)
gi
di ,
(:
, (gi+1 , di ) = 0, E , gi gj = 2H(w(i) w(j) )).
[14]
[58].
- .
!!!!
(8), - , , .
- , ( ) ( ), .
, ,
,
, . , E E.
pB (w) e
E(w)
kT
(118)
T , k . T ,
, , .
.
(simulated annealing) [66],
78
Simulated
annealing
, , , .
.
(118) , [86] . W Rn (118)
(, E ) W. i dim(W) -
, ( ). , ,
dim(W) , , .
w(1) W
( ):
E(w(i) +i )E(w(i) )
(i)
Ti
w + i P = min 1, e
w(i+1) =
(119)
(i)
w
1 P
, , , , E T , 1.
.
(i)
5 w (119) i (118),
w(1) i .
, .
, 5 W Rn
, , , , , i ,
- , . [86] .
( , ) , W = Rn , , , w(1) ,
, .
5 (). q i , M : w(i) 7 w(i+1)
(119), M 27 :
Z
0
M p(w )
=
p(M (w) = w0 |w)p(w)dw
Z
pB (w0 )
=
q(w0 w) min 1,
p(w)dw
pB (w)
Z
!
pB (w0 )
0
+
q(w w) 1
dw p(w0 )
pB (w)
{w:pB (w0 )<pB (w)}
27 - , ,
, .
79
( 118 , ). w w0
pB (w0 )
0
0
(w, w ) = q(w w) min 1,
(120)
pB (w)
w0
Z
pB (w0 )
0
0
(w ) =
q(w w) 1
dw ;
pB (w)
{w:pB (w0 )<pB (w)}
(121)
Z
M p(w0 ) =
Z
(w0 , w)dw + (w0 ) = 1 .
(122)
(123)
q (120)
(w, w0 )pB (w) = (w0 , w)pB (w0 ) .
(124)
, ,
(124)
, , .
:
1. (118) (122);
2. ;
3. (122) (
).
(124) (123):
Z
M pB (w0 ) =
(w, w0 )pB (w)dw + (w0 )pB (w0 )
Z
=
(w0 , w)pB (w0 )dw + (w0 )pB (w0 ) = pB (w0 ) .
, ,
. , ,
p
Z
M p(w0 )
ln
pB (w0 )dw0
pB (w0 )
Z
Z
0
0 p(w )
0 p(w)
dw + (w )
pB (w0 )dw0
=
ln
(w, w )
pB (w0 )
pB (w0 )
Z
Z
0
p(w)
0
0 p(w )
=
ln
(w , w)
dw + (w )
pB (w0 )dw0
pB (w)
pB (w0 )
Z Z
p(w)
p(w0 )
0
0
(w , w) ln
dw + (w ) ln
pB (w0 )dw0
pB (w)
pB (w0 )
80
Z Z
Z
=
Z
=
Z
=
Z
=
Z
p(w)
p(w0 )
pB (w0 )dwdw0 + (w0 ) ln
pB (w0 )dw0
pB (w)
pB (w0 )
Z
Z
p(w0 )
p(w0 )
0
(w, w0 ) ln
p
(w)dwdw
+
(w0 ) ln
pB (w0 )dw0
B
0
pB (w )
pB (w0 )
Z
Z
p(w0 )
p(w0 )
0
0
0
p
(w
)dwdw
+
(w
)
ln
pB (w0 )dw0
(w0 , w) ln
B
pB (w0 )
pB (w0 )
Z
p(w0 )
(w0 , w)dw + (w0 ) ln
pB (w0 )dw0
pB (w0 )
p(w0 )
ln
pB (w0 )dw0 .
pB (w0 )
(w0 , w) ln
, pp(w)
6= const, .. p 6= pB ,
B (w)
, p
M .
!!!!
p(w)
: pB (w)
.
,
.
!!!!
E(w) ( ) - w . 5, -
T , Ti , . Ti , , Ti = i
,
, , .
, - , .
, , , , (), .
[57].
, .
, , :
. , w.
. - .
. -
.
81
. ET (:
,
, , , ).
.
(:
,
, , ).
. - (: ,
, ) .
[57] w ,
() ,
, .
, -
.
. , .
- . ,
, , .
, ,
(113) .
(w) ,
- (. 1.1.7).
- , , , .
,
, w
,
( ) -
. ,
- .
,
, ,
() ,
( ). , ,
3.2.2,
, ,
( ) , (,
).
82
ET (w) X ET (w)
=
,
z
wij
j1
T (w)
E
wij .
, , (wij wik )2 .
(113) T . , ,
(2). , .
(
). i- w(i)
(xi , yi ) F (w(i) , xi ), E(F (w(i) , xi ), yi )
w E(F (w, xi ), yi )|w=w(i) ,
w(i+1) = w(i) i w E(F (w, xi ), yi )|w=w(i) .
(125)
(114), .
- ([96]), :
6
(xi , yi ) X Y (x, y),
w w E(F (w, x), y),
(x, y) , ,
i > 0 i 0 i,
P
i ,
P 2
i ,
1 (125)
(2).
,
P i . ,
i , . ,
83
i2 (, i 0) , , ,
(125)
w(i+1) = w(i) i w E (w)|w=w(i)
Z
E (w) =
(x,y)X Y
( (2)), .
, ,
i
P 2
i , , i , i1
12 < 1. , ,
, .
i .
(125) (114),
. ,
, . ,
,
, .
, .
:
, , ,
,
, ..
[19].
[74].
, , [14]
[58]. - .
3.3
RBF-
84
, ( 2.3.1) (
2.3.2), .
RBF- .
.
v ()
!
X
u(v) = g sv ,
(u(e) we )2 ,
evin
g g(s, t) = et/2s ,
sv wv , RBF . ,
: v
!
X
u(v) =
we u(e) wv ,
evin
, , ,
!
!
X
u(v) =
we u(e) wv ,
evin
.
.
x1 , . . . , xd . , .. d- x.
v d- v
, ,
u(v) = e
kxv k2
2sv
(2sv ) 2 x v sv .
, , ,
, .
,
, . , , .
RBF v
1
u(v) = (2)d |Av | 2 e
A1 (xv ,xv )
v
2
(126)
Av
( ), |Av | , A1
v (, ) A1
.
v
d(d+1)
d v , 2 Av .
3.1, (109). ,
.
, .
85
.
, RBF-
k (126)
3.1, , ,
k , kxk2
u(v) = e 2 , .
, Av , , .
. RBF-
, , , . RBF-
.
.
.
RBF- ,
,
. .
RBF- , ,
, ,
RBF- .
RBF-
.
8 M , RBF-.
RBF-
. , s 2 ,
dim(M ) , d , M .
(
u(x1 , . . . , xd ) = e
2
(x1 1 )
2s
). , d
. ,
.
3.3.1
RBF-: (expectation
maximization)
RBF- , .
, , , ,
86
!!!!
. : , , , . , 2.1 2.2.2, .
, v v
sv .
EM- (Expectation-Maximization), [33], , , [8]. (
[14]); .
, , ..
p(x) =
k
X
j=1
Pj pj (x) =
k
X
j=1
Pj
1
(2sj )
d
2
kxj k2
2sj
(127)
k pj (x) Pj ,
Pk
j=1 Pj = 1. p(x) (127) RBF- k .
W = (P1 , 1 , s1 , . . . , Pk , k , sk ) (. 23), ..
X = (x1 , . . . , xN ) T = ((x1 , y1 ), . . . , (xN , yN )).
, ,
. (..
), .
(127) , , .
, , ,
, , . k
q, ,
. ,
E,
, 4! 24 = 384 , .
j sj , Pj . ,
, RBF , .
, (127), . x Rd
y j {1, . . . , k} ( ).
Rd {1, . . . , k}
p(x, j) = Pj pj (x)
(xi , ji ) ,
87
EM (
):
EM
p(x, j).
Z
P {j} =
p(x, j)dx = Pj
xRd
j- ,
p(x|j) =
p(x, j)
= pj (x)
P {j}
x Pk
j, p(x) = j=1 P {j}p(x|j) (127),
j- x
p({j}, x)
P {j}p(x|j)
Pj pj (x)
P {j|x} =
= Pk
= Pk
.
(128)
p(x)
j=1 P {j}p(x|j)
j=1 Pj pj (x)
pj (x) , Pj ( j-
), P {j|x} .
p(X|W) =
N X
k
Y
Pj pj (xi ) max
(129)
i=1 j=1
E(X, W) = ln p(X|W) =
N
X
ln p(xi ) =
i=1
N
X
i=1
ln
k
X
Pj pj (xi ) min .
W
j=1
, , :
k , .. k!, .
E(X, W).
,
.
Pk
(.. Pj > 0, j=1 Pj = 1, sj > 0) W(0)
(0) .
E(X, W) E(X, W
(0)
N
X
p(xi )
(130)
(0) (x )
p
i
i=1
N
k
X
X
P
p
(x
)
j
j
i
=
ln
(0) (x )
p
i
i=1
j=1
N
k
X
X
P
p
(x
)
j
j
i
=
ln
P (0) {j|xi } (0) (0)
P
p
(x
)
i
i=1
j=1
j
j
!
N
k
XX
Pj pj (xi )
(0)
P {j|xi } ln (0) (0)
Pj pj (xi )
i=1 j=1
ln
N X
k
X
i=1 j=1
N X
k
X
i=1 j=1
88
(0) (0)
Pk
ln() ( ): cj j=1 cj = 1,
k
k
X
X
ln
cj xj
cj ln(xj ) .
j=1
j=1
Q(X, W
(0)
, W) =
N X
k
X
(131)
i=1 j=1
(130) ,
E(X, W) Q(X, W(0) , W) Q(X, W(0) , W(0) ) + E(X, W(0) ) .
E(X, W) W(0) .
E(X, W) W,
Q(X, W(0) , W) < Q(X, W(0) , W(0) ). Q(X, W(0) , )
.
,
k
X
L(W, ) = Q(X, W(0) , W) +
Pj 1
j=1
0 =
L X
=
Pj 1
j=1
(132)
0 =
N
L
1 X (0)
=
P {j|xi }
Pj
Pj i=1
(133)
0 =
0 =
X
j xi
L
=
P (0) {j|xi }
j
sj
i=1
!
N
X
L
kxi j k2
d
(0)
=
P {j|xi }
.
sj
2sj
2s2j
i=1
(134)
(135)
, .. k + 1 , pj () ,
=
k
X
j=1
Pj =
k X
N
X
P (0) {j|xi } =
j=1 i=1
N X
k
X
P (0) {j|xi } =
i=1 j=1
N
X
1=N
i=1
.
N
1 X (0)
P {j|xi }
N i=1
PN
P (0) {j|xi }xi
j = Pi=1
N
(0) {j|x }
i
i=1 P
PN
(0)
1 i=1 P {j|xi }kxi j k2
sj =
,
PN
(0) {j|x }
d
i
i=1 P
Pj =
j .
89
(136)
(137)
(138)
. , W
Q(X, W(0) , ) ( , ).
. , W = W(0) , W(0)
E(X, ).
RBF- . X
E(X, ), , Q(X, W(0) , ),
. W
: Pj = k1 , j = xij k
PN
( ), sj = N1d i=1 kxi j k2 .
,
pj .
( , Aj ) ,
.
!!!!
!!!!
9
(xj ,xj )
1 A1
j
2
pj (x) = (2)d |Aj | 2 e
.
(139)
(136) (137) ,
(138) d(d+1)
2
PN
Alm
j
i=1
(0)
m
P (0) {j|xi }(xli lj )(xm
i j )
,
PN
(0) {j|x }
i
i=1 P
1 l m d,
.
, RBF-
.
, (127),
,
,
,
- (
3.2.4). , RBF-
,
k.
RBF- , ,
. ,
,
, -
.
. ,
, q, RBF-:
90
!!!!
, .. . - ,
. , (l-) ,
p(x|y = l). RBF- P {y|x},
p(x|y), .
, . : ,
, [116, 117, 114]
[115],
. .
4.1
2.3.2.
.
( , , ) X ( Rd ) H. K : X X R,
H :
K(x1 , x2 ) = ((x1 ), (x2 ))
(140)
. , X , X = Rd ,
Z
- , K .
H K,
, .
,
, -
,
, , . ,
,
.. ( !) .
.
, -
, ,
91
. , , (!)
( ,
!) . ,
ridge regression (72), logistic regression (86) soft margin classifier (100).
.
7 : X H ( ) (, ) k k. f (x) = (w, (x)) + b, w H b R,
T = ((x1 , y1 ), . . . , (xN , yN ))
kwk2 +
N
X
(142)
w,b
i=1
E. (w, b)
w (x1 ), . . . , (xN ).
. b .
. -
, ,
/ ,
kwk,
(kwk) + E(f (x1 ), y1 , . . . , f (xN ), yN ) min
w,b
;
. (w, b) (142), wX
w (x1 ), . . . , (xN ).
wX 6= w, kwX k < kwk, i (w, (xi ))+b = (wX , (xi ))+b.
(wX , b) (142) .
.
, (N + 1)PN
. w = i=1 i (xi ) (142),
PN
kwk2 + i=1 E(f (xi ), yi )
N
N
N
N
X
X
X
X
=
i (xi ),
j (xj ) +
E
j (xj ), (xi ) + b, yi
i=1
N
X
j=1
i=1
i j ((xi ), (xj )) +
i,j=1
N
X
i,j=1
N
X
i=1
i j K(xi , xj ) +
N
X
i=1
N
X
j=1
N
X
j=1
j ((xi ), (xj )) + b, yi
j K(xi , xj ) + b, yi min
j=1
1 ,...,N ,b
(143)
H ,
. E, (
) .
, , 2.3.2.
92
4.1.1
(140) () .
:
!!!!
10
(140) (K(x1 , x2 ) = K(x2 , x1 ) x1 x2 )
( Kij = K(xi , xj ) p
x1 , . . . , xN ). , K(x1 , x1 )
0 |K(x1 , x2 )| K(x1 , x1 )K(x2 , x2 ).
. ,
.
, . , ,
.
!!!!
11 X K : X X R .
12 , K : I d I d R d- I d
.
,
:
8 (Mercer, 1909 [83]). K L2 (Rn Rn ) (141) L2 (Rn )
, .. f L2 (Rn )
Z
K(x, z)f (x)f (z)dxdz 0 ,
x,zRn
K L2 (Rn Rn )
K(x, z) =
i i (x)i (z),
i 0,
i L2 (Rn )
i=1
{i } l2 .
, L2 (Rn )
: .
10.
9 X
K : X X R H : X H,
(140).
93
. H
.
. X R H0 , (x) : z 7 K(x, z).
(, ), ((x), (z)) 7 ((x))(z) = K(x, z) = ((z))(x).
, (h, (z)) = h(z) = ((z), h)
h H0 . (x) ,
H0 (x) ,
. K ,
. , , .. (h, h) > 0 h 6= 0. ,
-
(h, g)2 (h, h)(g, g). (h, h) = 0, (h, g) = 0
g H0 , , (h, (x)) = h(x) = 0 x X , h = 0.
(, ) , H0
. H,
28 .
9 , , , H
X R. ,
x X h H0
p
p
|h(x)| = |(h, (x))| (h, h)((x), (x)) = khk K(x, x)
( -), H0 x X
.
4.1.2
,
13
Rd , : Rd Rd ,
K(x, z) = (x)(z), H = R,
, ( )
28
XX [4] (reproducing kernels).
(, ) F X
K X X , f F x X
f (x) = (K(x, ), f ()). , ,
reproducing kernel . , H
RKHS (Reproducing Kernel Hilbert Space). - X
, X .
94
!!!!
(
{0, 1}).
.
14 :
;
;
(.. K((x), (z))) ;
: X 0 2X X 0
X (, ) K
X ,
X
X
K 0 (x0 , z 0 ) =
K(x, z)
x(x0 ) z(z 0 )
;
, , .
. ( ).
. ( )
,
, P
x0 7 x(x0 ) (x), .
1 , :
;
;
;
.
Rd :
15
K(x, z) = cos(x z) x, z R ( !);
K(x, z) = exz x, z Rd ( 1);
2
K(x, z) = e(xz) x, z R ( 13 14
);
2
K(x, z) = ekxzk x, z Rd ( 14
);
2
kxzk
K(x, z) = e( ) x, z Rd , > 0 ( 14 );
Z
kxzk 2
a 2
a
e( ) e( 2 ) d = eakxzk
0
=
a
2
kxzk
95
1).
!!!!
,
.
d
(xz,xz)
(144)
kxzk
d
2
2
K(x, z) = (2)
(145)
, .
,
, .. .
16
K(x, z) = (xz) =
1
1 + exz
. x1 = ln 2 x2 = 2 ln 2. K(xi , xj )
2
3
4
5
4
5
16
17
, , 10 .
4.1.3
, 4.1.2, ,
Rd
. ,
K(x, z) = ((x, z) + c)m ,
c>0
( ) , x m
x (: !),
m + 1 .
:
.
!!!!
17
2
K(x, z) = ekxzk
: Rd L2 (Rd ), x
d
(x) : t 7 4 e
kt2xk2
2
x1 , . . . , xN (x1 ), . . . , (xN ) .
.
( 2, . 53).
96
!!!!
4.1.4
10, 13 14
2 K
K(x, z) +
p
K(x, x) + K(z, z) +
K1 (x, z) = lim p
&0
K(x,z)
K(x,x)K(z,z)
=
0
K(x, x)K(z, z) 6= 0
K(x, x)K(z, z) = 0 K(x, x) + K(z, z) 6= 0
K(x, x) = K(z, z) = 0
( 15 ? 15?),
.
4.1.5
, Rd , ,
, , , ,
15, . RBF- , K(x, z) = (x z)
K ( (141))
:
Z
f 7 f = ( z)f (z)dz ,
(146)
X
, ( , ) ,
, ,
L2 .
. ,
, (0) 0 x |(x)| (0).
, ,
. , , , ( , ), ,
, .
97
!!!!
!!!!
18 , ,
.
,
.
. ,
, : f
( f, f ) = ( f,
f ) = ( f, f ) 0 ,
g(x) = g(x) .
19.
, K(x, z) = (x z) = max(1 |x z|, 0) R1 ,
[ 21 , 12 ] [ 12 , 12 ].
3 .
. 18 :
( ) ( ) = ( ) ( ) .
19 ,
, 29 F[] .
. , F[]() 0 ,
F[]() , (. 18).
F[] 0 - , (,
), (F[], ) < 0.
, ,
( F1 [], F1 []) < 0, .. , (x z),
. , ,
L2 , , F1 [] ,
.
, 18
19.
29
. ,
Z
F[f ]() =
f (x)eix dx ,
F1 [](x) =
1
(2)n
()eix d ,
.. .
98
Rn C, -
[a,a] [a, a]
Z a
eia eia
2 sin a
F [[a,a] ]() =
eix dx =
=
i
a
,
[a,a]
a(xz)
, . sin xz
a > 0.
R1 l(x) = a2 ea|x| , a > 0
Z 0
Z
a
(ai)x
(ai)x
F[l]() =
e
dx +
e
dx
2
a
1
1
a2
=
= 2
.
2 a i
a i
a + 2
1
l, F[l] . ea|xz| , a2 +(xz)
2 ( ) .
, . Rd d > 1
1
a2 +kxzk2 d. ( ).
!30 .
eakxzk1 = ea
Pd
j=1
|xj z j |
14. eakxzk
, , 15.
14. , 1
K(x, z) = a2 +kxzk
2
KMOD (Kernel with MOderate Decreasing) [6]
b2
1
](1 ) . .
F[f
a2 +k
xk2
: 6, . 104.
99
!!!!
4.1.6
, , , , .
,
( ), , .. . - -
? ,
, ? -
?
(
), ,
. ,
( , -)
, . , ,
, : , .
CPD-:
K : X X R -
(CPD-kernel , conditionally positive definite kernel ),
x1 , . . . , xn X c Rn
n
X
ci = 0
(147)
i=1
n
X
K(xi , xj )ci cj 0 .
(148)
i,j=1
-
CPD-.
, (.. ) CPD-. CPD-, ,
K(x, z) = x + z K(x, z) = (x z)2 R R
(!). ( 22,25).
( 13,14 1)
CPD- , (
() 31 ), (148) .
20 - (CPD-) :
( );
CPD-;
CPD- ;
31
100
!!!!
( , ) CPD-;
CPD-;
CPD- .
!!!!
K(xi , xj )ci cj = 0 .
(149)
i,j=1
4 CPD- K
1
K0 (x, z) = K(x, z) (K(x, x) + K(z, z))
2
(150)
22 x0 X . K : X X R CPD- ,
K+ (x, z) = K(x, z) K(x, x0 ) K(x0 , z) + K(x0 , x0 )
(151)
.
. K+ .
Pn
x1 , . . . , xn X , c Rn c0 = i=1 ci .
n
X
i,j=1
n
X
K+ (xi , xj )ci cj =
K(xi , xj )ci cj ,
i,j=0
K CPD-, K+ . , K+
, CPD-,
k0 (x, z) = K(x, x0 ) + K(x0 , z) K(x0 , x0 )
21 CPD- K,
K(x, z) = K+ (x, z) + k0 (x, z) CPD-.
. ,
K(x, z) = (xz) =
1
1 + exz
CPD- (. 16).
101
!!!!
CPD-
, CPD- (!)32
X X . K+ K, ( , , K+ (X ) K(X )). (!)
CPD- K(x, z) = f (x) + f (z) ( 21) K= .
(!) ( 4) K0 , (!) K 7 K0 (150) 0 . (!) K 7 K+
(151) x0 . , x K+ (x, x0 ) = 0. x0
K+
(!) , .
x
K, K+ , K+
, K0 , x
K= 0 ,
, .
23 x, z X
K0 K= = {0}, K = K0 + K= ;
K+ K0 = {0},
x
x
K+
K+ K, K+
K= = {0};
x ;
0 : K K0 K= , 0 |K+
x
K= , x |K0 , x : K K+
x ;
0 |K+
z
x
z x
x
z
z K
x |K+
: K+
K+
.
+ K+ , |K+
. 21 23.
CPD-
K CPD-, K0 = 0 (K), K+ = x0 (K0 )
x0 X : X H K+ .
H , ,
CPD- K K0 .
23 K0 = 0 (K+ ),
1
K0 (x, z)=K+ (x, z) (K+ (x, x) + K+ (z, z))
2
1
1
=
k(x)k2 2((x), (z)) + k(z)k2 = k(x) (z)k2 ,
2
2
.. CPD- K0
(!) 12 .
. CPD- K .
32 , , , .
102
!!!!
!!!!
!!!!
!!!!
!!!!
!!!!
H x0 .
0
0
0
x00 X , K+
= x0 (K0 ) = x0 (K+ ) ( 23)
.
0
K+
(x, z)
0
,
.. K+
K+ , H, (x00 )
0
. K+ K+
.
CPD- [103].
CPD-.
24 X , c > 0
K(x, z) = ckx zk2 CPD- X .
CPD-:
25 K : X X R CPD-
, t > 0
Kt (x, z) = etK(x,z)
(152)
.
. Kt CPD-, 1
, Ktt1 limt&0 Ktt1 = K CPD-.
: CPD- K x0 X
K+ ( (151)). t > 0 tK+ etK+
,
Kt (x, z) =
=
13,14 .
20 25 3
CPD- .
3 x 0
Z
. ( x > 1),
-.
26 CPD- K, ( , CPD- 4) CPD- (K) 0 < < 1
ln(1 K).
6 0 < 2 c > 0
K(x, z) = ckx zk K(x, z) = ln(1 + ckx zk ) CPD-,
1
K(x, z) = eckxzk K(x, z) = 1+ckxzk
.
. . 24, 25, 26.
2
1
, eckxzk , eckxzk 1+kxzk
2
4.1.2 4.1.5, . 6 (, ) . [25] , ,
, .
CPD- [12] negative definite kernels,
CPD- .
4.2
X , K : X X R,
104
(153)
, E() = max(0, ):
N
X
1
kwk2 + C
max(0, 1 yi f (xi )) min
w,b
2
i=1
(154)
, ,
1
kwk2 + C
2
N
X
i min
w,b,
i=1
(155)
SVC:
. . .
yi ((w, (xi )) + b) 1 i 1 i N
i 0 1 i N .
, (yi = 1) (f (xi ) 1), (yi = 1)
(f (xi ) 1).
(155) , w , , . 7,
. 7 SVC , w
.
(155)
L(w, b, , ) =
1
kwk2 + C
2
N
X
i=1
N
X
(156)
i=1
w, b ,
i 0, i 0 1 i N .
(157)
, ,
:
L(w, b, , )
0=
wi
L(w, b, , )
b
0=
L(w, b, , )
i
L(w, b, , )
0
i
L(w, b, , )
0=
i
L(w, b, , )
0
i
0=
wi
N
X
j yj ((xj ))i
j=1
N
X
j yj
j=1
C i i 6= 0
C i i = 0
1 i yi ((w, (xi )) + b) i 6= 0
1 i yi ((w, (xi )) + b) i = 0 ,
105
. . . SVC:
...
(157) ,
w=
N
X
j yj (xj )
(158)
j=1
N
X
j yj = 0
(159)
j=1
(C i )i = 0
0 i C .
. . . SVC:
. . .
(160)
(161)
(161) (158)
w , 7. (158160) (156),
( ), w, b ,
N
N
X
X
1
L ()= kwk2 + C
i
i (yi ((w, (xi )) + b) 1 + i )
2
i=1
i=1
. . . SVC:
. . .
N
N
N
N
X
X
X
X
1
= kwk2
i yi (w, (xi )) b
i yi +
i +
(C i )i
2
i=1
i=1
i=1
i=1
N
N
N
N
N
X
X
X
X
1 X
=
j yj (xj ),
j yj (xj )
i yi
j yj (xj ), (xi ) +
i
2 j=1
j=1
i=1
j=1
i=1
N
N
X
1 X
i j yi yj ((xi ), (xj )) +
i
2 i,j=1
i=1
N
N
X
1 X
i j yi yj K(xi , xj ) +
i .
2 i,j=1
i=1
L(w, b, , ) (157)
L () (159) (161).
K ,
L . K(xi , xj ) , .
. , L
, w, b , (w, b, , )
L ( (159,161) (157), ). w (158), b
. - i 0 < i < C,
i = 0 yi ((w, (xi )) + b) = 1, b :
b = yi (w, (xi )) = yi
N
X
j yj K(xj , xi )) .
(162)
j=1
, (xi )
. w i b -
i = max(0, 1 yi ((w, (xi )) + b))
106
. . . SVC: . . .
(155) , N ( (162))
b. ,
. (153)
! b
|b|. : kw2 k (155)
, b2 , .
SVC (153).
, , K, N -
N -
. . . SVC: N
N
X
1 X
i j yi yj K(xi , xj )
i min
(163)
2 i,j=1
i=1
N
X
j yj = 0
(164)
j=1
0 i C 1 i N ,
f (x) =
N
X
j yj K(xj , x) + b ,
(165)
j=1
b . f
, , j ,
, . (164) (163) - (CPD-) K, SVC CPD-,
. 5
CPD- K ,
K K+ 22.
4.2.2
107
( ),
, , .. 1
(yi f (xi ) = 1). {z : |(w, z) + b| 1}.
0 i C.
i = 0 . .
,
, .. 1 ,
, .. 0 ( yi f (xi ) < 1).
i = C
.
:
. , , , .
4.2.3
SVC
, N . , SMO (Sequential Minimal Optimization), . [91]. i , , ,
, (163). (159) , , . (163)
, (159)
(161), .. ,
. ,
, [91], [64] 33 .
i b,
4.2.1, C K
- ( 1.1.6).
SVM (LOO).
( !)
, SMO
, . , , ( ) ,
.
33 , - (153) , , b, (159)
(163)
. K , -
.
108
, , , N
.
27 f (165), (155) T N , fm ,
C K
T \ {(xm , ym )} N 1 . ym fm (xm ) 1
( (xm , ym ) fm ), ym f (xm ) 1 ( (xm , ym )
f ).
SVC
Lm ((, b))
X
1
max(0, 1 yi f (xi ))
kwk2 + C
2
N
X
1 X
i j yi yj K(xi , xj ) + C
max(0, 1 yi f (xi ))
2 i,j=1
i6=m
i6=m
L((, b)) =
N
N
X
1 X
i j yi yj K(xi , xj ) + C
max(0, 1 yi f (xi )) ,
2 i,j=1
i=1
(-SVC )
10 , C, , . , ,
109
SVC
C,
.
, , 7 (. ),
, . ,
.
28 [0, 1]
(w0 , b0 , 0 , 0 ) H R RN R
N
X
1
2
kwk +
(i ) min
w,b,,
2
i=1
(166)
-SVC:
...
yi ((w, (xi )) + b) i i = 1, . . . , N
i 0 i = 1, . . . , N
0.
0
0 > 0, ( w0 , b 0 , 0 ) (155) C = 10 ,
,
. 0 > 0 , C 0
(155) .
. (166) (155).
L(w, b, , , ) =
!!!!
N
N
X
X
1
kwk2 +
(i )
i (yi ((w, (xi )) + b) + i ) (167)
2
i=1
i=1
(. (156)), ,
,
N
1 X
i j yi yj K(xi , xj ) min
2 i,j=1
N
X
(168)
. . . -SVC:
j yj = 0
j=1
N
X
i N
i=1
0 i 1 1 i N
(. (163)), (158)
w, (166) b, . .
C,
-, , (166) .
110
!!!!
C , C (
K). (. [48]).
11 (C) b(C) (165), -
C.
. ,
K(xi , xj ) , C,
, . , (165)
.
. - N
, .
. N - 0 i C 3N
. S
{1, . . . , N } S=0 S=C S6= i,
i = 0, i = C 0 < i < C, .
C > 0 (163) -
S, ,
N
N
X
1 X
i j yi yj K(xi , xj )
i min
2 i,j=1
i=1
N
X
j yj = 0
(169)
(170)
j=1
i = 0 i S=0
i = C i S=C
0 < i < C i S6= ,
(171)
(172)
(173)
S(C 00 ). S - C Ii ,
Ii C, ,
C 0 C 00 . S(C 0 ) S(C 00 ), ..
. o S(C 0 ) S(C 00 ). Ii
S(Ii ).
, Ci Ii S(Ci ) S(Ii ), ( , );
. , C = Ci S(Ii )
(i ) (163).
- 0 S(Ii ), (163)
t0 + (1 t)i , 0 i -,
S(Ii ), -, (163).
C Ii S(Ii )
+ t(0 i ) ,
S(Ii ) .
- (-) (C). (0) = 0,
(Ci ) = i , Ii
i m , S(Ii ),
Im+1 .
Si b
- ( (162)), C.
0, .. , - (.
(162)).
.
, (C), .. K(xi , xj ).
, . , , ,
.
11 (C) b(C), N - . [48] ,
K(xi , xj ) ( C = 0). ,
- ,
C .
4.2.4
2.1.3 -
. Y = R, X , K : X X R,
H : X H.
T (X Y)N
f (x) = (w, (x)) + b
(174)
( , (153)),
.
1
2
2 kwk - E(r, y) = max(0, |y r| )
112
!!!!
:
N
X
X
1
1
kwk2 +C
E(f (xi ), yi ) = kwk2 +C
max(0, |(w, (xi ))+byi |) min ,
w,b
2
2
i=1
i=1
(175)
C > 0 > 0 . i+ i , max(0, yi (w, (xi )) b )
max(0, (w, (xi )) + b yi ), , (175)
, (155):
N
X
1
kwk2 + C
(i+ + i ) min
w,b,
2
i=1
(176)
SVR:
. . .
i yi ((w, (xi )) + b) + i+ 1 i N
i+ 0, i 0 1 i N ,
, i+ i
0. 7 ,
w (xi ), ,
,
,
.
4.2.1 .
(176)
N
N
X
X
1
L(w, b, , )= kwk2 + C
(i+ + i )
i+ (i+ (yi (w, (xi )) b ))
2
i=1
i=1
N
X
i (i ((w, (xi )) + b yi )) ,
(177)
i=1
w, b ,
i+ 0, i 0, i+ 0, i 0 1 i N .
(178)
, ,
:
0=
0=
L(w, b, , )
wi
L(w, b, , )
b
L(w, b, , )
i
L(w, b, , )
0
i
L(w, b, , )
0=
i+
0=
wi
N
X
j=1
N
X
(j+ j )
j=1
C i i 6= 0
C i i = 0
(w, (xi )) + b yi i+ i+ 6= 0
113
. . . SVR:
...
L(w, b, , )
i+
L(w, b, , )
0=
i
L(w, b, , )
0
i
0
(w, (xi )) + b yi i+ i+ = 0
yi (w, (xi )) b i i 6= 0
yi (w, (xi )) b i i = 0 ,
+ . (178) ,
w=
N
X
(j+ j )(xj )
(179)
j=1
N
X
(j+ j ) = 0
(180)
j=1
(C i )i = 0
i
(181)
C .
(182)
(179181) (177),
( ), w, b ,
N
N
X
X
1
L ()= kwk2 + C
(i+ + i )
i+ (i+ (yi (w, (xi )) b ))
2
i=1
i=1
N
X
. . . SVR:
. . .
. . . SVR:
. . .
i (i ((w, (xi )) + b yi ))
i=1
N
N
N
X
X
X
1
(i+ i )(w, (xi )) b
(i+ i ) +
(i+ i )yi
= kwk2
2
i=1
i=1
i=1
N
X
i=1
N
X
(i+ + i ) +
N
X
((C i+ )i+ + (C i )i )
i=1
N
X
1
(j+ j )(xj )
= (j+ j )(xj ),
2 j=1
j=1
N
N
X
X
N
X
i=1
j=1
(i+ i )yi
N
X
(i+ + i )
i=1
N
N
N
X
X
1 X +
=
( i )(j j )K(xi , xj ) +
(i i )yi
(i+ + i ) .
2 i,j=1 i
i=1
i=1
L(w, b, , ) (178)
L () (180) (182).
(180), - K
114
(182) , L . K(xi , xj )
, , , (
, . (186)).
, , , w
L (179), b .
- i 0 < i+ < C, i+ = 0
yi = (w, (xi )) + b , b :
b = yi (w, (xi )) + = yi
N
X
(j+ j )K(xj , xi )) + .
(183)
. . . SVR: . . .
j=1
, ((xi ), yi ) H R
( R) |y f (x)| . 0 < i < C. w
i b - (176) - , 2N b. , .
, |b|.
, , K, 2N - N
-
N
N
N
X
X
1 X + +
(i i )(j j )K(xi , xj ) (i+ i )yi +
(i+ +i ) min (184)
2 i,j=1
i=1
i=1
N
X
(j+ j ) = 0
j=1
0 i+ C,
0 i C 1 i N ,
f (x) =
N
X
(j+ j )K(xj , x) + b ,
(185)
j=1
b .
(184) i+ i
0 (, min(i+ , i ),
(184)). i+ + i = |i+ i | , i+ i i , , (184)
N
N
N
X
X
1 X
i j K(xi , xj )
i yi +
|i | min
2 i,j=1
i=1
i=1
N
X
j = 0
j=1
C i C 1 i N ,
115
(186)
. . . SVR:
. -
(186),
K(xi , xj ). ,
- N 2C.
4.2.5
(184),
, 4.2.3 . .
, (176), H R.
, 4.2.2, |(w, (x)) + b| 1 H
|y ((w, (x)) + b)| HR, . ((x), y), , , (x, y),
, , ( ),
( ).
. , ,
(178) (182)
, .
yi f (xi ) < , i+ = 0, i = C.
() yi f (xi ) = , 0 i C, i+ = 0.
< yi f (xi ) < , i+ = 0, i = 0.
() yi f (xi ) = , 0 i+ C, i = 0.
yi f (xi ) > , i+ = C, i = 0.
(-SVR)
, ()
. 27,
,
(176)
N
X
1
kwk2 + C
(i+ + i + ) min
w,b,,
2
i=1
(187)
i yi ((w, (xi )) + b) + i+ 1 i N
i+ 0,
0
i 0 1 i N
[0, 1].
. ,
C ( 28), C .
116
!!!!
( 4.2.1) ( 4.2.4) ,
, .
.
T yi = 1
, , . 1
() [1 , 1],
< 1 ,
, - (yi f (xi ) > 1 + ) .
, , . > 0 y + = y2 y = y + 2.
T = ((x1 , y1 ), . . . , (xN , yN )) +
X
1
(kwk2 + u2 ) + C
(i+ + i ) min
w,u,b,
2
i=1
(188)
((w, (xi )) uy + + b) 1 i+ 1 i N
((w, (xi )) uy + b) 1 i 1 i N
i+ 0, i 0 1 i N ,
(155). , (174) T , (176), . u = 1 |(w, (x)) + b y| y + y
|(w, (x)) uy + b| .
, ,
u = 1, .
, (!), .
30 (w0 , u0 , b0 , 0 ) (188) 0
0
0 0
T , ( uw0 , u0b , u0 ) (176) T C = 0 C 0 = 0 (2 u10 ).
.
117
!!!!
4.2.6
q-
[121]. ( 2.2.1). ,
.
(153)
f 1 (x) = (w1 , (x)) + b1
f 1 f 1 , .
f 1 f 1 ,
, f 1 = f 1 . (155)
:
N
X
1
(kw1 k2 + kw1 k2 ) + C
(i1 + i1 ) min
w,b,
2
i=1
(189)
1 i N, j {1, 1}
b1 + b1 = 0 .
(155) w = w1 = w1 , b = b1 = b1 , i = 21 iyi
iyi = 0.
, . yi
1, , xi .
SVC:
q X
q
N
X
1X j 2
j
i min
(190)
kw k + C
w,b,
2 j=1
j=1 i=1
yi
yi
((w , (xi )) + b ) ((wj , (xi )) + bj ) 2 ij 1 j q, 1 i N, j 6= yi
ij 0
q
X
wj = 0,
q
X
j=1
j=1
1 j q, 1 i N
bj = 0.
118
.
, . ,
q ,
.
. k-
(k = dlog2 qe) . k ,
k .
k
, .
,
,
,
(..
). ,
.
. ,
.
. f j , j- ,
. x
q f 1 (x), . . . , f q (x) . , ,
.
, f j (x) = (w, x) + b
ij (155) SVC,
(190)
q
1X k
fj (x) = f j (x)
f (x)
q
k=1
ij = ij + iyi 1 j q, 1 i N, j 6= yi
iyi = 0 1 i N .
fj , f j , (190) ij ,
, (155) f j , .
. q , , q(q1)
2
, . , , ,
,
- ,
. x q(q1)
2
f ij (x) = f ji (x) (),
119
i- ()
j- (), P
q f i (, x) = j6=i (f ij (x)), , , ,
( ) - (), . ,
, .
. q(q1)
2
.
x q . x
. , . q 1 , ..
q 1 , ,
, x.
. q , 34 . . .. , q 1
, q . x
, , , ..
q1 ( dlog2 qe
) .
(),
( , SVC ), ,
, ,
, . 1.
, ,
-, ,
, . ,
, .
.
SVM [59].
, A
A, , ( - ?
34 , , . ,
.
.
.
120
1
dlog2 qe
q
q(q1)
2
q(q1)
2
N 2+
dlog2 qeN 2+
qN 2+
1+
(q1) 2+
2 q1+
N
N
dlog2 qeN
qN
(q 1)N
(q1) 2+
2 q1+
N
< 2N 2+
1
q1+
N 2+
2(q1)
N
q
< 2N
N
1+
q1
q
1: , , q N .
, , ( ) ,
N
N 2+ , 0 < < 1 , , N
(.. ). , .
,
, .
). ,
.
.
: Rd H = L2 (Rd ) ( 17) - ,
0 H (.. , )
. H
(), 0
, , .. .
,
( ) ,
, .. ,
. 17 ,
L2 (Rd ), .
( ,
, , ),
.
X
1
kwk2 + C
i min
w,
2
i=1
(w, (xi )) 1 i 1 i N
121
(191)
SVC:
i 0 1 i N ,
(155),
.
28:
N
X
1
kwk2 +
(i ) min
w,b,,
2
i=1
(192)
-SVC:
(w, (xi )) i i = 1, . . . , N
i 0 i = 1, . . . , N ,
.
[102]
.
.
(191) (192), ,
K(x, z) = ((x), (z)).
(. 119) ,
, :
. f j . x q
f 1 (x), . . . , f q (x) 35 . ,
, .
, .
, 2.3.3: . , ,
[42]. , .
Y = {1, 1},
f - f (x) = sign(g(x))
f (x) = 0 , . 20 53.
5.1
, - (,
) , ,
.. . :
35
. .
122
!!!!
- ,
(,
);
- ,
(, );
,
,
(, );
,
(,
(ridge) (logistic) );
.
,
, , ,
. , ,
, . , .
, ,
, ..
.
, , ,
.
5.1.1
.
31 2n+1 ,
< 12 . , ,
n n .
. 0-1- e : X Y {0, 1}
, 0 1.
< 12 , (1 ) (!).
2n+1 .
, 2n + 1, (1)
2n+1 ,
.
, .. , , , 12 ,
123
!!!!
(1)
36 (2n+1)(
1
2
2 )
n.
n 31 . n
. , , -,
,
(13) , -,
, ( )
, :
, , . ,
.
, ,
, ( ) ( ) .
(random forests) [21].
,
,
( ) /
. :
. , .
5.1.2
124
AdaBoost : (,
) 37 . , ,
(f ) +
N
X
(193)
f F
i=1
N
X
(194)
f F
i=1
i . ,
. ,
Pn
i=1 i = 1 .
(194) ,
.
3 AdaBoost ,
, 5.2.3 .
3: AdaBoost
1. : T = ((x1 , y1 ), . . . , (xN , yN )) N ,
M.
2. i =
1
N,i
= 1, . . . , N .
3. j = 1, . . . , M
(a) hj T ;
X
(b) 0-1- E =
i ;
i|hj (xi )6=yi
j
(c) E = 0, : f () = h () ;
(d) cj = ln 1E
E ;
(e) (hj (xi ) 6= yi )
i ecj = 1E
E ;
(f) ,
N
X
i=1
1.
4. : f () = sign
M
X
cj hj ().
j=1
37 , , c .
125
AdaBoost
AdaBoost
, (,
) , . AdaBoost
[100] [41]. 1990-
, , AdaBoost ,
, .
[42, 24]. ,
, .
5.1.3
, , ,
. ,
,
; ,
, (stump), .. ,
;
( (194) = 0);
,
( ) / ;
- .
.
[101].
.
5.2
5.2.1
, 0. 4
( [41]) , .
126
4:
1. : ((x1 , y1 ), . . . , (xN , yN )) N ,
E(, ), M , c (0, 1].
2. yi zi , i = 1, . . . , N .
3. j = 1, . . . , M
(a) hj ((x1 , z1 ), . . . , (xN , zN ))
n
X
E(hj (xi ), zi );
i=1
(b) zi c hj (xi );
4. : f () =
M
X
chj () = c
j=1
M
X
hj ().
j=1
c ,
, . 5.3.
(193)
E(, ) , , ,
0 ( ), ,
c = 1
N
X
E (f m (xi ), yi )
i=1
Pm
f m () = j=1 hj (). ,
c (0, 1]. c,
(), ( ). , c , C (99)
.
,
:
,
, .. ,
c. ,
, . , , , ,
.
: 5.2.2
.
. E(r, y) = (r y)2
c = 1.
, E(, ) , ,
127
!!!!
. , , , -, , -
. 5 .
5:
1. : ((x1 , y1 ), . . . , (xN , yN )) N ,
E(, ), M .
2. :
f 0 ((x1 , y1 ), . . . , (xN , yN ))
n
X
E(f 0 (xi ), yi ), , , i=1
, f 0 = 0.
3. j = 1, . . . , M
(a) zi =
E(r,yi )
r=f j1 (xi )
, i = 1, . . . , N ;
(c) -, , n
X
E((f j1 + cj hj )(xi ), yi ) min
cj R+
i=1
(d) f j () = f j1 () + cj hj ().
4. : f M () = f 0 () +
M
X
cj hj ().
j=1
5 , , ,
. E(, ).
,
.
, c < 1
f j () = f j1 () + c cj hj (xi )():
, .
5.2.2
( 2.2.2). ,
, .
, (, ) f : X R, -
128
ln
P (y = 1|x)
,
P (y = 1|x)
1
P (y = 1|x) ,
1 + ef (x)
1
P (y = 1|x) ,
1 + ef (x)
.. y = 1 1
1+eyf
(x) , , ,
E(f (x), y) = ln(1 + eyf (x) ) .
(195)
y
E(r, y)
ln(1 + eyr )
yeyf (x)
=
(196)
=
=
yf
(x)
r
r
1+e
1 + eyf (x)
r=f (x)
r=f (x)
2 E(r, y)
2 ln(1 + eyr )
y 2 eyf (x)
=
=
(1 +
+
=pf (x)(1 pf (x))
eyf (x) )
1
(1 +
ef (x) )(1
+ ef (x) )
(197)
( y 2 = 1). (197) , ,
f T
E(f, T ) =
N
X
E(f (xi ), yi )
i=1
)
= 0 . (196) , E(f,T
f
,
.
5.2.1, . , ..
.
f (f ! , ) (196) (197)
N
X
yi
1X
h(xi ) +
pf (xi )(1 pf (xi ))(h(xi ))2
y
f
(x
)
i
i
2
1
+
e
i=1
i=1
(198)
hi = h(xi ) .
N N 38
E(f + h, T ) E(f, T )
38 N hi ,
(!) . .
129
hi :
hi =
yi
1+eyi f (xi )
= yi (1 + eyi f (xi ) ) =
yi
,
P (yi |f (xi ))
(199)
2
N
1X
yi
E(f + h, T ) const +
pf (xi )(1 pf (xi )) h(xi )
. (200)
2 i=1
P (yi |f (xi ))
, h f xi
yi , f
, h(xi ) ()
1
2 pf (xi )(1pf (xi )). 6
.
6: LogitBoost
1. : ((x1 , y1 ), . . . , (xN , yN )) N , M .
2. :
f 0 ((x1 , y1 ), . . . , (xN , yN ))
n
X
0
ln 1 + eyi f (xi ) , , ,
i=1
, f 0 = 0.
3. j = 1, . . . , M
1
(a) pi =
j1 (x ) ,
i
1+ef
yi f j1 (xi )
i = pi (1 pi ) zi = yi 1 + e
, i =
1, . . . , N ;
M
X
hj () -
j=1
pf M () =
sign(f M ()).
1+ef M ()
, ,
6 ( ) LogitBoost.
, , ,
. hj , , zi
, i .
[42] , , . yi
130
LogitBoost
hj ,
, , (. 2.1.3),
zi
(!). , ,
hj , hj
cj hj cj , , ,
N
X
cj =
i=1
N
X
i=1
(1 +
!!!!
yi hj (xi )
1 + eyi f j1 (xi )
(!), .
7.
131
!!!!
7: LogitBoost
1. : ((x1 , y1 ), . . . , (xN , yN )) N , M ,
> 0, 1.
2. :
f 0 ((x1 , y1 ), . . . , (xN , yN ))
n
X
0
ln 1 + eyi f (xi ) , , ,
i=1
, f 0 = 0.
3.
N
X
0
E0 =
ln 1 + eyi f (xi ) .
i=1
4. j = 1, . . . , M
(a)
1
pi =
,
=
max(,
p
(1
p
))
j1
i
i
i
1+ef (xi )
j1
zi = yi min , 1 + eyi f (xi ) , i = 1, . . . , N ;
(b) hj ((x1 , z1 ), . . . , (xN , zN ))
,
n
X
, :
i (hj (xi ) zi )2 ;
i1
N
X
(c) cj =
i zi hj (xi )
i=1
N
X
i=1
X
j
E j =
ln 1 + eyi f (xi )
E j1 , cj = cj /2
i=1
(f) cj = 0, ; f M () =
f j1 ().
5. : f M () = f 0 () +
M
X
cj hj ()
j=1
pf M () =
5.2.3
.
1+ef M ()
; AdaBoost
, , ,
1, -
132
, .
.
,
(198)
E(f + ch, T )E(f, T )
N
X
i=1
=E(f, T )
N
X
i=1
yi
1X
ch(xi ) +
pf (xi )(1 pf (xi ))c2 (h(xi ))2
y
f
(x
)
i
i
2 i=1
1+e
N
yi
1X
ch(x
)
+
pf (xi )(1 pf (xi ))c2 ,
i
2 i=1
1 + eyi f (xi )
h.
ch() h() c c . ,
, h,
.
h(xi ) = yi , .. 39 .
xi 1+ey2i f (xi ) , .. , 1+ey1i f (xi ) =
P (yi |f (xi )) .
c :
N
X
c=
N
X
i=1
N
X
=
pf (xi )(1 pf (xi ))
i=1
i=1
N
X
i=1
8 LogitBoost, (, , ) .
39 .
38 . 129
133
8: LogitBoost
1. : T = ((x1 , y1 ), . . . , (xN , yN )) N ,
M.
2. : f 0 () = 0.
3. j = 1, . . . , M
(a) i =
1
j1 (x ) ,
i
1+eyi f
i = 1, . . . , N ;
(b) hj T ( ) ;
N
X
(c) cj =
i=1
N
X
i yi hj (xi )
;
i (1 i )
i=1
M
X
cj hj () -
j=1
pf M () =
M
1+ef M ()
, , -
sign(f ()).
, 8 , . , , ,
(195), - .
yr
E(r, y) = e 2 . ,
(,
).
, f
N
E(f + ch, T )
E(f, T ) +
1 X yi f (xi )
2
yi ch(xi )
e
2 i=1
, c > 0 h xi yi
yi f (xi )
e 2 . h
c :
N
N
X
yi (f (xi )+ch(xi ))
X yi (f (xi )+ch(xi ))
2
2
E(f + ch, T ) =
=
yi h(xi )e
e
c
c i=1
i=1
X
X
yi f (xi )
yi f (xi )
c
c
2
2
=e 2
e
e 2
e
,
0=
i|h(xi )6=yi
i|h(xi )=yi
134
. . .
X
c = ln
yi f (xi ))
2
yi f (xi ))
2
i|h(xi )=yi
i|h(xi )6=yi
.
P
yi f (xi ))
. , i|h(xi )6=yi e 2
=0?
9.
!!!!
9:
1. : T = ((x1 , y1 ), . . . , (xN , yN )) N ,
M.
2. f 0 () = 0.
3. j = 1, . . . , M
(a) i = e
yi f j1 (xi )
2
, i = 1, . . . , N ;
(b) ,
N
X
i 1;
i=1
(c) hj T ;
X
(d) 0-1- E =
i ;
i|hj (xi )6=yi
j
(e) E = 0, : f () = h () ;
(f) cj = ln 1E
E ;
(g) f j () = f j1 () + cj hj (xi )().
4. : f () = sign
M
X
cj hj ().
j=1
. , 9
3 (AdaBoost), - .
. , yf (x) > 0, yf (x)
E(f (x), y) = e 2 , E(f (x), y) =
ln(1 + eyf (x) ), |yf (x)|,
, |yf (x)|, , . -
yi . ,
, hj ,
, , hj (xi ) = yi , yi f j (xi )
yi f j (xi ) , e 2
j, hj
135
. . .
AdaBoost
!!!!
, i- .
AdaBoost, .
,
,
.
.
5.2.4
, , q- q
q1 . , q q .
: , ,
q ,
q
q q. ., , [42], (
, , ).
[24]. LogitBoost, AdaBoost ( 9), ,
, .
, q- q-
- f : X RY ,
(u1 , . . . , uq ) 7
Pqe
u1
j=1
j
eu
, . . . , Pqe
uq
j=1
eu
yi
q
N
N
N
X
X
X
X
j
yi
ef (xi )
ln p(yi |xi ) =
ln q
=
ln
e(f (xi )f (xi )) .
X j
i=1
j=1
i=1
i=1
ef (xi )
j=1
136
10:
1. : T = ((x1 , y1 ), . . . , (xN , yN )) N ,
M.
2. f0 () = 0.
3. m = 1, . . . , M
(a) ij =
j
(x )
f
m1 i
e
q
X
, (i, j) ST ;
j
fm1
(xi )
j 0 =1
(b)
T
:
ij hj (xi )
max ;
h||hj (xi )|1
(i,j)ST
X
j
(c) R =
ij hm
(xi )
Z=
(i,j)ST
ij ;
(i,jST )
(d) R = 0, : ;
(e) cm = 12 ln ZR
Z+R (: hm
(xi , j 6= yi ) , .. hjm (xi ) > 0, R > 0, cm < 0,
, );
(f) fm j() = fm1 () + cm hm (xi )().
4. :
!
M
X
y
exp
cm hm (x)
p(y|x) =
q
X
j=1
exp
m=1
M
X
! , y = 1, . . . , q .
cm hjm (x)
m=1
10 N (q 1)
ij j 6= yi , .. . .
{(i, j)|j 6= yi } ST .
m- hm = (h1m , . . . , h1m ) ( )
[1, 1]q . [24] 40 , .
. , 1, 40 [24] , 10.
10 .
, , .
137
9: (!) , +1 1 ,
cm . cm
, 9,
.
5.3
- ,
1990-
, . 2000-
[11] , column generation ( ),
, ,
( [32]).
, ,
lasso (. 43), . .
LPBoost (linear programming boosting).
LPBoost
LPBoost 1;
. [32]. (x1 , y1 ), . . . , (xN , yN ) N
M 2N h1 , . . . , hM . 2.3.3, - h(x) = (h1 (x), . . . , hM (x))
(h(x1 ), y1 ), . . . , (h(xN ), yN ).
M , ,
.
hji = hj (xi )
, hi = h(xi ) M - ,
c = (c1 , . . . , cM ) M - (-) , PM
h 7 j=1 cj hj M -
. .
(99)
, , (73), .
, , . (99)
, , 1,
, , . LPBoost: :
N
X
D
i max
(201)
c,,
...
i=1
yi
M
X
cj hji i 1 i N
(202)
j=1
M
X
cj = 1
(203)
j=1
138
cj 0 1 j M
i 0 1 i N.
(204)
(205)
(201) D > 0, ,
.
C (99), D N .
1
D, D = N
(201)
N
X
(i ) min
c,,
i=1
(. -SVC (166)).
1
(201)
. , D = N
1
1
1
N 1, .. N D 1. D
= N
hi (i > 0), N
.
,
( [32]) ,
, . , .
(201) M : N i (202) (203)41
N . (201)
N
N
M
M
X
X
X
X
j
L(c, , ; , ) = D
i +
i yi
cj hi + i + 1
cj ,
i=1
i=1
j=1
!!!!
. . . LPBoost: . . .
j=1
(206)
c, , , , (202205)
i 0 1 i N .
L c,
, , ,
:
min
(207)
N
X
i yi hji 1 j M
. . . LPBoost:
(208)
i=1
N
X
i = 1
i=1
0 i D 1 i N.
.
(207).
41
(204)
(205),
, .
139
!!!!
i yi h(xi ) =
X
i|h(xi )=yi
X
i|h(xi )6=yi
i = 1 2
X
i|h(xi )6=yi
i max .
h
i N1 , .. .
11 (LPBoost).
140
!!!!
11: LPBoost
1. : T = ((x1 , y1 ), . . . , (xN , yN )) N .
2. i =
1
N,i
= 1, . . . , N .
3. = 0.
4. m, m = 1
(a) hm T ;
(b) E hm
T , ,
1 2E =
N
X
i yi hm (xi ) ,
i=1
;
m
(c) hm
i = h (xi );
(d)
min
,
N
X
i yi hji 1 j m
(209)
(210)
i=1
N
X
i = 1
i=1
0 i D 1 i N ,
PN
= i=1 i yi hm
i
.
Pm1
5. : f () = sign j=1 cj hj (), cj 0
j- (210) (209).
11 ,
- .
, (,
-) ,
, ,
.
, , ,
, .
LPBoost , (201).
, , ( , ),
, ., , [24].
141
5.4
,
.
,
5.1.1:
/ . (.,
, [40]), (),
.
, , !
..
,
( ), , , . , ,
. 100- 200-
1000-.
, , .
6.1
, , , , . , ,
. , , .
( 1.1.3).
. : ,
,
, , , , .
.
( 1.1.5). ,
, , .
,
, . , CART [20] C4.5
[92], , .
, , . .
142
( 1.2.5). , , .
. (missing data).
.
( 2.1.2).
.
. , , , (
) ,
(+1), . (
)
. ,
, ,
( ).
(
2.1.2). , ,
, ,
. ( ),
.
( 2.2.1). ,
, .
, . ,
. , (. )
.
( 2.2.2). , , . ,
,
.
( 2.2.3). , .
.
( 3.2). ( ) .
, , ,
: , , , , , ,
.
RBF- ( 3.3). . , , 143
. , , .
( 2.1.3, 2.2.4 4.2). . .
. ( ) ( , ) . ( ), ()
.
. , .
, ( 5.1.1).
, ,
.
,
(
) / . , . ,
.
( 2.3.3 5 5.1.1). ( , ),
, ,
. .
.
6.2
, , , ,
:
[33]
RBF- ( 3.3.1).
, (missing data), 42 (censored data),
.
:
42 , , , , - .
, , .
144
( 1.2.5)
( 2.2.1).
,
, , ,
( ). ,
- , , , -, - -
, , .
,
, .
, , [15].
, . , (HMM,
Hidden Markov Model ),
.
, ,
[94, 10].
. , , (CRF, Conditional Random Field )
[71], ( )
( ),
.
( 1.2.5). , , - , ,
, - .
, (
, ), 43 (, ) . -
- , (. (44) (46)).
80-
[80]. RVM.
43
evidence.
145
A.1
- . ,
1. , - ,
;
2. - ;
3. -
( , , ,
);
4.
( ,
, ), - ( ,
).
13
(missing data), ( , 4 -) .
,
.
, (missing at random
(MAR)), , (missing completely at random (MCAR))44 . ,
44
. rj ,
+
1, j- 0, ; z
146
1 , MAR,
3 , MCAR, , , MAR.
(MCAR),
.
B. [77] ( ) [29]
( ).
A.2
,
:
1. ,
, ;
2. ,
;
3. - (impute)
;
4. 0 1;
5. .
1
. , , , , ,
, - ,
, . , MCAR ,
.
2 1 , , .
, ,
.
( 5.1) , , , , . ,
- ( )
( ) .
, .
+
: P (rj | z , z ) = P (rj ).
147
, , A.2.1.
3 , (imputation)
.
. -
-
.
.
MCAR ( ) , (),
,
( ),
( ).
MAR, MCAR, ,
.
.
MAR , , , , , , , .
.
4 , , , .
.
- , , , 45 . , ,
.
: , , , , , . ,
, (
) .
5 MCAR;
, , .
.
45 , .
148
A.2.1
p (x, y)
d
d
Y
Y
xj j y .
p (x, y) = p (y)
p (xj |y) = y
j=1
j=1
(. (30) k
j
mk
). x +
, Dx {1, . . . , d} , Dx =
+
{1, . . . , d}\ Dx , x =x x
+
X =Xx Xx x
x . ,
, ,
,
d
Y j
X
Y
+
y
xj y .
p (x, y) = p (x, y) =
xj j y = y
(211)
j=1
xXx
jDx
, y, ,
+
, , .. x y,
X
Y j
+
p (x) = p (x) =
xj y .
(212)
y
yY
jDx
( Y = {1, 2}) -
( X = {1, 2}), . 1 , 2 , 11 ,
21 , 12 22 ( , 1, ),
1 + 2 = 1, 11 + 21 = 1 12 + 22 = 1 (1 , 11 , 12 ) [0, 1]3 . ,
, .
( 1.2.5),
(
1.2.1), (26).
. T1 = ((1, 1), (2, 2))
x = 1
R
1 11 1 11 (1 1 )(1 12 )d
pT1 (y = 1|x = 1)
=R
pT1 (y = 2|x = 1)
(1 1 )12 1 11 (1 1 )(1 12 )d
R1
R1
R1
(1 )2 (1 1 )d1 0 (11 )2 d11 0 (1 12 )d12
0
=R1
R1
R1
(1 1 )2 d1 0 11 d11 0 12 (1 12 )d12
0 1
=
1
12
1
12
1
3
1
2
1
2
1
6
=2
pT1 (y = 1|x = 2)
1
= .
pT1 (y = 2|x = 2)
2
149
,
, :
T2 = ((1, 1), (2, 2), (, 2), (, 2))
.
x = 1
R
1 11 1 11 (1 1 )(1 12 ) (1 1 )2 d
pT2 (y = 1|x = 1)
=R
pT2 (y = 2|x = 1)
(1 1 )12 1 11 (1 1 )(1 12 ) (1 1 )2 d
R1
R1
R1
(1 )2 (1 1 )3 d1 0 (11 )2 d11 0 (1 12 )d12
0
= R1
R1
R1
(1 1 )4 d1 0 11 d11 0 12 (1 12 )d12
0 1
=
1
60
1
30
1
3
1
2
1
2
1
6
=1
pT2 (y = 1|x = 2)
1
= .
pT2 (y = 2|x = 2)
4
(!
) .
. T3 = ((1, 1), (2, 2), (, 2), (, 2), (1, ))
Z
pT3 (y = 1|x = 1) =
1 11 1 11 (1 1 )(1 12 ) (1 1 )2 1 11 d
Z
+ 1 11 1 11 (1 1 )(1 12 ) (1 1 )2 (1 1 )12 d
Z
(1 1 )12 1 11 (1 1 )(1 12 ) (1 1 )2 1 11 d
Z
+ (1 1 )12 1 11 (1 1 )(1 12 ) (1 1 )2 (1 1 )12 d
pT3 (y = 2|x = 1) =
x = 1
pT3 (y = 1|x = 1)
=
pT3 (y = 2|x = 1)
9
2
8
3
+ 83
43
<1
=
46
+5
(. T2 ).
p (y=1|x=2)
. pTT3 (y=2|x=2) .
3
, (
: ) . T = ((x1 , y1 ), . . . , (xN , yN )),
, Nk
Mj
X
j
j
j
Nmk
Nk ( (31)), Nmk ( (32)) Nk =
m=1
k- j- .
150
!!!!
. x +
Dx y
PT (x, y) =
j
d
Ny + 1 Y Nxj y + 1
N +q
Nyj + M j
+
(213)
jDx
(. (35)).
. (213)
. (37) , .
A.2.2
!!!!
!!!!
, :
, , , ;
( );
.
, .
,
, , ,
, .
. ( 2.2.1)
, .
?
, , .
, , , ,
.
A.3
A.3.1
1.1.4
:
(i-):
xi (d-, );
yi (q- , );
151
!!!!
zi = (xi , yi ) -();
vi ( ) (latent variables), .
,
, :
+
xi ;
+
yi ;
+
+ +
zi = (xi , yi ) ( );
xi , yi zi , ;
ui = (vi , zi ) , .. ,
.
:
T = ((x1 , y1 ), . . . , (xN , yN )) , -
;
v= (v1 , . . . , vN ) ,
+
z= (z1 , . . . , zN ) (
) ;
z= (z1 , . . . , zN ) ;
u= (u1 , . . . , uN ) = (v, z) ( )
+
; , ( z, u) (, , ),
(, );
.
:
f p(xi , yi , vi ) , ,
F.
f p0 (f ) F,
p(xi , yi , vi ) f , , p(xi , yi , vi |f ); R
+
p(xi , yi |f ) = vi p(xi , yi , vi |f )dvi p(zi , ui |f ) (
p(xi , yi , vi |f ), ).
:
+
+
p(zi , ui |f ).
i=1
152
d
N(,A)
d- A
d
d
N(,A)
(x) = (2) 2 |A| 2 e 2 A
(x,x)
(214)
d
N(,)
I d .
d
d
N(,)
(x) = (2) 2 e
kxk2
2
(215)
, ,
N(,A) .
, , , , .
. , E;u g(, u)
u ( -) g u
:
Z
E;u g(, u) = g(, u)(u)du ,
X
E;u g(, u) =
g(, uj )(uj ) .
j
F, , , , -. p(x, y, v|f )
, f .
, , ..
, , , y
, , . ,
,
(., , 3.3.1).
+
+
p(f | z) f z
,
Z
+
+
+
p( z, f ) = p0 (f )p( z |f ) = p0 (f ) p( z, u|f )du max ,
(216)
f
, ,
+
L(f ; z) = ln p0 (f ) + ln p( z |f ) = ln p0 (f ) + ln
(217)
,
,
+
. z (
)
.
. ( 3.3) ( 2.1.2).
153
xi X = Rd , i = 1, . . . , N , .
yi Y = R, i = 1, . . . , N ,
.
N
(d + 1)N , ,
+
z N ,
z N = (d + 1)N N .
(xi , yi ) vi , {1, . . . , k}.
f l : X Y, l(x) = wx + b, , k j X k Aj X , p(x, y, v|f )
v Y, v X v
:
p(x, y, v|f )
= p(y|x, f ) p(x|v, f ) Pv
1
d
= N(0,)
(y l(x)) N(
(x) Pv .
v ,Av )
(218)
,
p((x1 , y1 , v1 ), . . . , (xN , yN , vN )|f ) =
N
Y
p(xi , yi , vi |f ) .
(219)
i=1
f w
d- 0:
d
p0 (f ) = N(0,)
(w) ,
(220)
> 0 .
p(y|x, f ) p0 (f ) ,
( 2.1.2), ..
. () v
p(x|f ) =
k
X
p(x, v|f ) =
v=1
k
X
p(x|v, f )p(v)
v=1
. (x, y) (-) +
p(y| x), ..
.
A.3.2
EM
+
p( z, u|f )
+
L(f ; z) , L
154
f (217)
, , . L
L , u, :
Z
L(f ; z)= ln p0 (f ) + ln
p( z, u|f )du = ln
p0 (f )p( z, u|f )
(u)du
(u)
+
= ln E;u
p0 (f )p( z, u|f )
p0 (f )p( z, u|f )
+
E;u ln
= L (f ; z) (221)
(u)
(u)
( ). L
+
(222)
f - ,
H()= E;u ln (u).
+
L (f ; z) max ,
f,
, L(f ; z) (217), ,
+
ln p( z, u|f )
(217).
+
, , , L (f ; z) 46
. +
L (f ; z) , expectation
maximization EM :
+
E(xpectation) () L (f ; z)
f ;
+
M(aximization) () L (f ; z) f .
46
, . 47 . 159
155
E M
EM
12: (EM);
1. : T = ((x1 , y1 ), . . . , (xN , yN )),
+
/ : z .
2. v,
(x, y, v), p0
f = f (0) .
3.
+
z,u|f
)
E L (f ; z) = E;u ln p0 (f )p(
(u)
( )
f ;
+
M L (f ; z) f ;
, f .
4. : f .
E p0 (f )
:
32 u
+
z f
+
f (u) = p(u| z, f ) =
p( z, u|f )
(223)
p( z |f )
+
L (f ; z) ,
+
Lf (f ; z) = L(f ; z).
+
. L (f ; z)
Z
ln
p( z, u|f )
(u)du R max
.
(u)
| (u)du=1
:
Z
Z
!
+
p( z, u|f )
ln
(u)du
(u)du 1
0=
(u)
!
Z
+
p( z, u|f )
=
ln
1 (u)du .
(u)
, u
+
p( z, u|f )
ln
1=0 ,
(u)
156
..
p( z, u|f )
.
e+1
,
Z
Z
+
+
p( z, u|f )
p( z |f )
1 = (u)du =
du
=
,
e+1
e+1
+
p( z, u|f )
+
(u) =
= p(u| z, f )
+
p( z |f )
(u) =
+
L (f ; z). , ,
:
+
Lf (f ; z)= ln p0 (f ) + Ef ;u ln
p( z, u|f )
+
p(u| z, f )
= ln p0 (f ) + Ef ;u ln p( z |f )
= ln p0 (f ) + ln p( z |f ) = L(f ; z)
(221).
7 12
+
+
p(f | z) f , L(f ; z) (217),
.
M , :
+
33 f , L (f ; z),
+
(224)
E, -, M
+
+
, -, L (f ; z) = L(f ; z) .
M . ,
Rn
x X c (
) ( )
p(x|) =
h(x) ((x),)
e
,
g()
: X Rn Rn ,
h : X R g : R
. g
R
p(x|)dx = 1:
Z
g() = h(x)e((x),) dx .
157
X Y, X .
(224)
N
!
X +
p0
+
0=
ln p0 (f ) + E;u ln p( z, u|f ) =
ln N (f ) + E;u
(zi , ui ) ,
f
f
g
i=1
, p0 g,
f , .
. , (214), (, A), 2
= ( 12 A1 , A1 ) Rd +d (x) = ((xj xk , 1 j, k
d), x) ( - A d2 + d, d(d+1)
+ d). h g.
2
32 33 12, E (223)
M. 13, ,
, M
E.
13: (EM);
1. : T = ((x1 , y1 ), . . . , (xN , yN )),
+
/ : z .
2. v,
(x, y, v), p0
f = f (0) .
3.
E f f (223) u
+
f (u) = p(u| z, f ) ;
( )
f 0
+
Q(f 0 , f ; z) = ln p0 (f 0 ) + Ef ;u ln p( z, u|f 0 ) ;
(225)
M Q(f 0 , f ; z) f 0
f
f 0 6= f , f = f 0 , .
4. : f .
. +
p(f | z), -
158
!!!!
EM:
E M
p( z, f ) (216), p( z |f ). +
p( z |f ) p( z, f ) =
+
p0 (f )p( z |f ) ,
.
.
f , . , - . .
,
.
- .
,
, 195060- . , - [8]
13. [33]
47 , E M ,
. 12 1980-
, [88].
A.3.3
EM
13
. 3.3.1
. x Rd , ,
, v 1 k. f ( 3.3.1 W)
d
p(x, v|f ) = Pv N(
(x),
v ,sv )
v = 1, . . . , k .
(226)
(. (127)) 2k Pv sv
Pk
v=1 Pv = 1 k d- v .
p0 (f ) f 1, .. .
, 3.3.1, xi , E 13 f
f , vi .
,
+
ln p( z, u|f ) =
N
X
ln p(xi , vi |f )
i=1
N - (v)
(marginal) i (vi ):
+
E;u ln p( z, u|f ) =
X
v
(v)
N
X
ln p(xi , vi |f ) =
N X
k
X
i (vi ) ln p(xi , vi |f ) .
i=1 vi =1
i=1
47 ( (225)), .
159
!!!!
QN
i (!) (v) = i=1 i (vi ),
E P {j|xi , f } (. (128))
(225) p0 = 1
Q(f 0 , f ; z) = Ef ;u ln p( z, u|f 0 ) =
N X
k
X
i=1 j=1
(. (131), ,
, Q). M
+
Q(f 0 , f ; z) f 0 ( 3.3.1 ).
. , . x +
X =Xx Xx ,
xi . +
X =x x ,
+
x Xx x Xx . dx dx , +
, , x, x[]=x x
x,
.
.
34 13 (226)
E
+
Pvi N
P {xi , vi |f }
P {vi | xi , f } = P
k
+
j=1 P {xi , j|f }
dxi
(xi )
Pk
j=1
dxi
Pj N
dxi
+
,
+
(xi )
=P {vi | xi , f }N
(xi )
Q(f 0 , f ; z)=
N X
k Z
X
i=1 j=1
0 2
kx
[
]
k
+
d
s
d
i j
xi j
+
j
=
P {j| xi , f } ln Pj0 ln(2s0j )
0
2
2s
j
i=1 j=1
N X
k
X
160
Pj0
0j
s0j
1 X
+
P {j| xi , f }
N i=1
PN
+
i=1 P {j| xi , f }xi [j ]
N Pj0
PN
+
0 2
k
+
d
s
P
{j|
x
,
f
}
kx
[
]
i
i j
xi j
j
i=1
dN Pj0
(. (136)(138)).
3.3.1
.
13 . 153
(218)(220), : , ,
, +
+
. P {vi | xi , yi , f } ( P {vi | xi
+
, f }, yi ) Q(f 0 , f ; z)
Xxi , (
Q) xi , .
+
Q(f 0 , f ; z) ,
.
- , . Z Z = X Y ( = |X |Y , Z, |X X |Y Y ),
(w = w|X w|Y , Z w = w|X |X + w|Y |Y )
(A = A|XX A|XY A|Y X A|Y Y , , Z
A(, ) = A|XX (|X , |X ) + A|XY (|X , |Y ) + A|Y X (|Y , |X ) + A|Y Y (|Y , |Y )).
+
X =Xxi Xxi
:
+i
+i
++i
+i
+i
= , w = w w A = A A A A . M 0
M (, A ) (, Xxi )
, .. M ,
M .
i X , x.
zi = (xi , yi )
Z = X Y, , , Y.
, .
+
4 N(,A) X =X X
Z
+
+
p(x) = N(,A) (x)d x= N + + (x)
( , A)
161
!!!!
p(x | x) =
p(x)
+
=N
( (,A, x),A)
p(x)
A=
(x) ,
1
++
+ + 1
+ +
+
, (, A, x) = A A1 (x ) A= A1
A1 A1 A A1
+
( A A !).
5 N(,A) (x)
l(x) l(), R(x , x )
Z
R(x , x )N(,A) (x)dx = tr(RA1 ) + R( , ) .
(218)(220):
1
p(x, y, v|f )=Pv N(v ,Av ) (x) N(0,)
(wx + b y))
d
p0 (f )=N(0,)
(w)
d+1
d
1
N(,A)
(x) N(0,)
(wx + b y) = N(,B)
(z) ,
1
>
B = A1 0 + 1 (w (1)) (w (1))
, = (w+b) (
) z = x y. |B| = |A|.
f j Bj ((d+1)) j Aj (d-) . :
i
ij
Pij
= P {j| zi , f } = P {j| xi , yi , f } ,
xij
yij
= xi [j ] =xi j ,
= yi [wxi [j ] + b] ,
zij
w1
(j , Bj , z+i ) ( 4),
+
+ +
35 13
E
Piv =P {v|
+
zi , f }
Pv N
=P
k
j=1
162
+i +i
(v ,Bv )
Pj N
(zi )
+
+i +i
(j ,Bj )
(zi )
( iv ,Bv )
Q(f 0 , f ; z)= ln p0 (f 0 ) +
N X
k Z
X
(zi )
i=1 j=1
N X
k
X
1
2
= (ln(2) + kwk ) +
Pij Iij ,
2
i=1 j=1
Z
Iij =
= ln Pj0
!
N X
k
N X
k
X
X
i
>
>
w0 I d +
Pij Bj
+ xij (xij ) + b0
Pij (xij )
i=1 j=1
i=1 j=1
XX
!
N X
k
X
i
>
=
Pij Bj
+ yij (xij )
i=1 j=1
w0
N X
k
X
Pij xij + b0
i=1 j=1
XY
N X
k
X
Pij =
i=1 j=1
N X
k
X
Pij yij ,
i=1 j=1
f 0
N
Pj0
0j
1 X
Pij
N i=1
PN
0>
y
P
x
+
w
ij
ij
ij
i=1
PN
i=1
A0j
0
N Pj0
Pij
(xij 0j )(xij
>
0j )
dN Pj0
i
Bj
XX
N X
k
X
i
1
>
>
tr (w1 ) w1
Pij (zij j0 )(zij j0 ) + Bj
N
i=1 j=1
( , ), 34,
46 .
, k .
. , -.
x, ,
f , w b:
163
!!!!
y = wx + b, .
:
y =E
p( x| x,f )
=b + w
(wx + b) = b + wE
k
X
Z
Pj
xN
=b+ w x + w
k
X
Pj
x=b+w
( (j ,Aj , x),Aj )
j=1
++
p( x| x,f )
k
X
Pj
xp(x| x, j, f )d x
j=1
(x)d x= b + w
+
Aj A1
j
(x
Pj
x (j , Aj , x)
j=1
!
+
k
X
+
j )
j=1
( 4 5). ;
.
A.3.4
EM
12 13, , , . ?
f (m) , m- ,
()
+
7 L(f (m) ; z) .
+
{f |L(f ; z) const}
(227)
+
p0 (f ) , L(; z) ,
+
L(f (m) ; z) () ,
( ),
.
f (m) ,
(227) . [122] , +
f (m) . , , L(; z)
, , (225) [122, Corollary 1].
+
L(; z),
. , [33] , (225)
. ,
, , .
164
!!!!
, EM ,
,
+
L(; z) , .
EM , - , ,
, ,
.
EM ,
:
.
A.3.5
EM
( 12) :
+
E / M L (f ; z) f , , .
(generalized expectation maximization GEM )
+
L(f ; z),
EM
([88]).
+
L (f ; z) f M 32 7
A.3.4 ( ,
+
). L (f ; z) E
+
32 , L(f ; z) +
. L (f ; z) ,
+
L (f ; z),
+
( 0 , f 0 ) 32 , L0 (f 0 ; z) = L(f 0 ; z).
36 ( 0 , f 0 ) ()
+
L (f ; z), f 0 (, )
+
L(f ; z)
. p(x, y, v|f ) f 7 f ( (223)). f 0 f 7
+
165
E (223)
+
f (u)=p(u| z, f ) =
p( z, u|f )
+
p( z |f )
QN
=R
QN
u1 ...uN
N
Y
p( z, u|f )
+
p( z, u|f )du
i=1
i=1
=R
p(zi , ui |f )
N
Y
R
i=1
ui
p(zi , ui |f )
+
p(zi , ui |f )dui
p(ui | zi , f )
i=1
+
fi (ui ) = p(ui | zi , f ), +
L (f ; z) (u) =
N
Y
i (ui ).
i=1
+
p0 (f )p( z, u|f )
L (f ; z)=E;u ln
= E;u
(u)
+
= ln p0 (f ) +
N
X
N
X
p(zi , ui |f )
ln p0 (f ) +
ln
i (ui )
i=1
Ei ;ui ln
i=1
p(zi , ui |f )
i (ui )
, i ( , ), +
166
14: (GEM);
1. : T = ((x1 , y1 ), . . . , (xN , yN )),
+
/ : z .
2. v,
(x, y, v), p0
f = f (0) .
3.
(a) i 1 N
+
E ( ) L (f ; z) i ui i-
f ;
+
M ( ) L (f ; z) f
;
(b) , f .
4. : f .
14 12
(. 83).
, , ( ),
. ,
, , [67].
B.1
,
Y , - . , y (x, y)
- x - (, ). ,
, p(y|x). :
, .
(Y = R+ , )
(Y = Z+ , );
167
, ,
, .
( -) Y
F (t) = P { < t} .
(228)
,
p(t) =
d
F (t) .
dt
p(t) = F (t + 1) F (t) .
, F () 1.
F p
(survival function)
S(t) = 1 F (t) = P { t} ,
(229)
t, (hazard )
p(t)
p( = t)
h(t) =
=
= p( = t| t) ,
S(t)
P { t}
(hazard)
(230)
() t ,
.
48 . ,
d S(t)
d
h(t) = dt
= ln S(t)
S(t)
dt
Z
t
S(t) = e
h( )d
0
(231)
h(t) =
S(t) S(t + 1)
S(t + 1)
=1
,
S(t)
S(t)
S(0) = 1
S(t) =
t1
Y
(1 h(i)) ,
(232)
i=0
h(t)dt
P
h(n) .
48
, ,
h(t) = lim
&0
S(t) S(t + )
P {t < t + }
= lim
&0
P { t}
S(t)
168
. , , (228) (229)
.
F (t) = P { t} S(t) = P { > t}, (230).
. , S > (t)= P { > t} ,
.
n {S1 (t), . . . , Sn (t)}
, S(t) , t:
n
S(t) =
1X
Si (t) .
n i=1
(233)
p(t) F (t)
, h(t) .
p(), F (), S() h():
h(t) = h. S(t) = eht , F (t) = 1 eht p(t) = heht .
, t 21 = lnh2 .
,
R
0 tp(t)dt = h1 .
S(t) = p1 eh1 t + p2 eh2 t (
p1 : p2 , p1 + p2 = 1).
h1 t
h2 t
2 h2 e
h(t) = p1 hp11 eeh1 t +p
p1 h1 +p2 h2
+p2 eh2 t
t = 0 min(h1 , h2 ) t .
h(t) =
1
(t+1)h
h
t+1 .
S(t) =
1
,
(t+1)h
F (t) = 1
h
.
(t+1)h+1
p(t) =
,
.
S(t) = peht + (1 p) ( pheht
p : (1p). h(t) = peht
=
+(1p)
p
h(ln 1p
ht) t . S() = 1 p > 0.
h(t) = ht. S(t) = e
2
ht2
hte
ht2
2
, F (t) = 1 e
ht2
2
p(t) =
. .
.
, k, > 0 hk (t) =
t k
t k
t k
k1
ktk1
, S (t) = e( ) , F (t) = 1 e( ) , p (t) = kt e( ) .
k
.
, (, ),
. . . .
169
B.2
, , :
, , . ,
. (censored data). , ,
-. ,
, .
49 .
.
T = ((x1 , t1 , 1 ), . . . , (xN , tN , N )) : xi X , ti Y
i {0, 1}, , . i = 0 ( ) ti
yi i- , i = 1 ( ) ti
i- , yi ti . . i = 0 , ti
, , i- , i = 1 i-
ti .
, , y x. , x, y z (, ),
(x, y, z) X Y Y
() , ,
p(y|x) p(z|x) x.
y z t = min(y, z) , ,
z y. ,
, ,
( ) , - , , . :
.
.
. -
,
, . , (missing data).
A, .
B.3
, P () , , . , , 1.2.4
49 (truncated data)
.
170
: , . ,
, .. T =
((x1 , t1 , 1 ), . . . , (xN , tN , N )) , (18),
,
Y
Y
p(T |) =
p (ti |xi )
S> (ti |xi ) .
(234)
i|i =0
i|i =1
p (ti |xi )
i|i =0
S (ti |xi ) =
i|i =1
h (ti |xi )
i|i =0
N
Y
S (ti |xi ) .
(235)
i=1
, - , . ,
S (231) (232) h , ,
(235) .
, . h (t|x) =
r (x), , . S(t|x) =
er (x)t
N
Y
Y
Y
PN
h (ti |xi )
p(T |) =
S (ti |xi ) =
r (xi ) e i=1 r (xi )ti .
i|i =0
i=1
i|i =0
N
X
i=1
(236)
wx+b
( ) .
, . .
171
B.4
, (233). , T xi .
, .
, T n t1 , . . . , tn , 1930- . : F (t) - Fn (t) = n1 #{i|ti <
t}. ,
( ) < t .
. ( [45, 23], ) , Fn
n F 50 .
, S(t)
Sn (t) = n1 #{i|ti t}. ti . , :
=1
S(0)
S(t
+ )
S(t)
+ ) = S(t)
S(t
1
S(t)
(237)
#{i|t ti < t + }
= S(t)
1
. (238)
#{i|ti t}
(238) ,
t, - , .. ti
i.
t(1) < . . . <t(j) < . . . < t(m)
{ti , i = 1, . . . , n}, j = #{i|ti = t(j) } j = #{i|ti t(j) }.
S
Y
j
S(t) =
1
.
(239)
j
j|t(j) <t
T = ((t1 , 1 ), . . . , (tn , n ))
, . ,
[63] 51 , (237, 238).
, 1 0 ,
50
,
> 0
51
172
(!) ( ):
S(0)
+ ) =
S(t
S(t) 1
.
#{i|ti t}
(240)
(241)
, (241)
, (t, t + )
, (
S(t + ) = S(t)).
, ( = 1).
T t(1) < . . . <t(j) < . . . <
t(m) {ti |i = 0}, j =
#{i|i = 0, ti = t(j) } , ,
, j = #{i|ti t(j) } (
) t(j) .
Y
j
S(t) =
1
,
(242)
j
j|t(j) <t
(239) .
[63] (242) ( -), . (242)
.
12 T - (242) .
. .
. , . , ,
, (.. ),
, (234), (235) .
, ,
,
-, ,
.
S (ti , 1) S > (ti ), (ti , 0) S(ti ) S > (ti ). - ,
T ,
Y
Y
LT, (S) = P {T |S} =
(S(ti ) S > (ti ))
S > (ti ) .
(243)
i|i =0
i|i =1
S (
, )
173
s0 = 1 t t(1)
sj
t(j) < t t(j+1) , j < m .
S(t) =
(244)
sm
t(m) < t
- (243)
j
j j
m
m
Y
Y
sj
sj
, (245)
LT, (S) =
(sj1 sj )j sj j j j+1 =
1
sj1
sj1
j=1
j=1
m+1 = 0.
sj
sj1
.
s
k m,
sj
sj1
j j
j
(),
k
Y
j
sk =
1
.
j
j=1
(246)
( ) (244, 246), -
(242). , ,
, sk .
. sm > 0, ..
. t > t(m) S(t)
sm - (243) .
.
,
-, ,
: , . ,
, P {0 t < t(1) }
0.
B.5
, y x . ,
p(y|x) - (, ,
F (y|x), S(y|x)
h(y|x)) (. . . )
, . 52 ,
52 y, B.2, (x, y).
x.
174
!!!!
,
. , , .. T
X, , (. (235))
Y
Y
p(T |X) =
S(ti |xi )
p(ti |xi )
i|i =1
n
Y
i=1
i|i =0
S(ti |xi )
exp
h(ti |xi )
i|i =0
n Z
X
i=1
ti
h( |xi )d
0
h(ti |xi )
i|i =0
( ). , .
[30] ( ) ( ),
, . ,
, .
B.5.1
(247)
(247) : q , r .
S(t|x) = (Q(t))r(x) ,
(248)
Q(t) ,
q(t):
Rt
Q(t) = e 0 q( )d .
.
. [30] - . p(=t)
0
h(t) = Pp(=t)
{t} ( (230)) h (t) = P {>t}
, (247):
h0 (t|x) = q 0 (t)r(x) .
(249)
(247) (249)
, (249)
h(t|x)
q(t)
=
r(x) .
1 h(t|x)
1 q(t)
175
(250)
B.5.2
,
r, q, -.
( ,
, )
(partial likelihood ): () ,
() , . ,
, , ,
. ,
12, ,
.
. 173 t(j) , j j
Nj ={i|i = 0, ti = t(j) } Rj ={i|ti t(j) } , t(j) , , R0 = {1, . . . , n}.
, j = #Nj j = #Rj . X = (x1 , . . . , xn )
, XA
A {1, . . . , n}, Y = ((t(1) , 1 ), . . . , (t(m) , m ))
T . X Y T
.
p0 f ( p(t|x),
S(t|x), h(t|x) )
p0 (T |X, Y, f ) =
=
m
Y
j=1
m
Y
j=1
m
Y
p(Nj |Rj , X, Y, f )
p(Nj |Rj , XRj , t(j) , j , f )
Y
iNj
p(ti |xi )
Y
j=1
m
Y
iNj
p(ti |xi )
M Rj | #M =j iM
S(ti |xi )
iRj \Nj
S(ti |xi )
iRj \M
h(ti |xi )
Y
j=1
h(ti |xi )
M Rj | #M =j iM
.. .
(247) q(yi ) ,
Y
r(xi )
m
Y
iNj
0
X
Y
p (T |X, Y, r) =
.
(251)
r(xi )
j=1
M Rj | #M =j iM
, 176
(251)
Y
r(xj )
X
p0 (T |X, Y, r) =
.
r(xi )
j|j =0
(252)
i|ti tj
,
(252), , (251).
B.5.3
( ) r , , . 53
r(x) = ewx
(253)
, , .
Y
ewxj
X
p0 (T |X, Y, w) =
(254)
ewxi
j|j =0
i|ti tj
( ) , w. , , . , ,
X
X
ln
ewxi wxj min
L(w|T ) = (w) +
(255)
j|j =0
i|ti tj
X
X
wx
ln
e i wxj
j|j =0
i|ti tj
w .
, , ..
(w) = kwk1 , , . [109].
53
x,
, , .
- (247)
r q .
177
!!!!
B.5.4
, (247) r(x)
- , q(t) , ,
, ( , ).
- (
12), .
12 Q
t(1) < . . . < t(m) ,
, . Q0 = Q(0) = 1, Qj = Q(t(j) + ). ,
( ),
m
Y
Y
Y
P (T |r, Q) =
S(t(j) |xi ) S(t(j) + |xi )
S(t(j) + |xi )
j=1
m
Y
j=1
m
Y
j=1
iNj
iRj \(Rj+1 Nj )
iNj
iNj
Qj
Qj1
r(xi )
(Qj )r(xi )
iRj \(Rj+1 Nj )
Y
iRj \Nj
Qj
Qj1
r(xi )
qj =
Qj
Qj1 [0, 1] (j-) :
Y
Y
r(x )
r(x )
1 qj i
qj i max .
(256)
iNj
iRj \Nj
qj [0,1]
qj qj = 0,
(256),
X r(xi )
X
=
r(xi ) .
(257)
r(xi )
iNj 1 qj
iRj
. , (257)
(256).
(257), , ,
qj ,
. , m qj , Qj
Qj = k=1 qk Q.
, , .. Nj , Nj = {ij }, (257) :
! 1
r (xi )
r xij
j
qj = 1 P
,
(258)
r(x
)
i
iRj
Q :
r(x1 )
i
Q(t) =
Y
i|i =0 &
r(xi )
1 X
r(xk )
ti <t
k|tk ti
178
(259)
!!!!
(258) (259) :
!
r xij
1
1
ln 1 P
P
r(x
)
r xij
i
iRj
iRj r(xi ) ,
e
qj = e
(260)
P
r xij iRj r(xi ), .. ,
X
1
,
P
Q(t) exp
(261)
k|tk ti r(xk )
i|i =0 & ti <t
.
B.5.5
X
1
,
P
S(t|x) exp ewx
wxk
k|tk ti e
i|i =0 & ti <t
(259)
ew(xxi )
S(t|x) =
Y
i|i =0 &
1 X w(x x )
i
k
e
ti <t
k|tk ti
. (
w) .
, ,
Q , , .
. ,
(, ).
,
.
(, k),
.
( ,
, , .) , X , , ..
179
.
, (. 5) ,
,
, .
,
, . ,
, , ,
-;
, - ( , ..)
(, , , .. );
,
;
, ;
, ,
, ;
,
(, , ..) .
,
- : , - , . (
) D.
X [m,n] (xm , . . . , xn ),
Y [m,n] (ym , . . . , yn ) .. , 1.
X = (X1 , . . . , XN ) = (X1[1,n1 ] , . . . , XN [1,nN ] ), ,
Y. j- i-
xij = Xi[1,ni ] j .
C.1
, ( 1.1.4)
T = ((x1 , y1 ), . . . , (xN , yN )) , xi yi
x, ,
y, p(y|x). -
180
. 1.3 ,
, ,
.
,
, . ,
. X [1,k]
xk+1 p(xk+1 |x1 , . . . , xk ). xi
(
) () , . , xi = 1 , i- ( , )
, xi = 0 .
B .
X = (X1[1,k1 ] , . . . , XN [1,kN ] )
Y = (Y1[1,k1 ] , . . . , YN [1,kN ] ).
X [1,k] Y [1,k] . , (xi , yi )
,
. , ,
, xi - i- , yi .
B
. , ,
. , , ,
, , ,
ffi .
. X = (X1[1,k1 ] , . . . , XN [1,kN ] ) Y = (y1 , . . . , yN ), , .
X [1,k] (..
). , ( ),
, , .
, .
.
, , .. , .
181
C.2
X
pk (x1 , . . . , xk ), xi X , k = 1, 2, . . .,
pk x1 , . . . , xl , X , . . . , X = pl (x1 , . . . , xl ) l < k .
| {z }
kl
X ,
.
- .
, ,
pk (x1 , . . . , xk )
pk (x1 , . . . , xk ) = pk|k1 (xk |x1 , . . . , xk1 )pk1 (x1 , . . . , xk1 )
...
= pk|k1 (xk |x1 , . . . , xk1 ) . . . p2|1 (x2 |x1 )p1 (x1 ) .
(262)
, k! k . ,
182
. 6: () .
. 7: .
, (262)
( ).
, . , ,
.
,
pk|k1 (xk |x1 , . . . , xk1 ) (262)
, m , , k, ..
pk|k1 (xk |x1 , . . . , xk1 ) = pk,m (xk |xkm , . . . , xk1 ). ( , ) m.
pk,m k, .
, () m
X (, )
1 X k m ( ),
.
: p1 (x1 ) p,1 (xi+1 |xi ). (262)
:
pk (x1 , . . . , xk ) = p,1 (xk |xk1 ) . . . p,1 (x2 |x1 )p1 (x1 ) .
(263)
X , (: ), , - .
a, b, c, d, . . . (, x1 , . . . , xk , . . .) p(| ) ( - , )
54 . ,
, p(a|b, c, . . .) b, c, . . . a. , . 6,
. 7. , .
54 , , , -
, .
183
!!!!
!!!!
. 8: : xi () yi . . .
. 9: . . . yi xi .
, . , (xi , yi ), .. ( ,
), (. . 182)
( , ).
,
, .
p(xi , yi ) = p(yi )p(xi |yi ) p(xi , yi ) = p(xi )p(yi |xi ), , . 8 9. ,
. 8 , , ( ,
).
(. 1.2.3); , ( 1.2.5) ( 2.2.1).
. 9 () ; , ,
( 2.2.2).
,
, , , .
,
(. 182), . 55 p(x1 , y1 , . . . , xk , yk )
p(x1 , y1 , . . . , xk , yk ) = p(xk |x1 , y1 , . . . , xk1 , yk1 , yk )p(yk |x1 , y1 , . . . , xk1 , yk1 )
p(x1 , y1 . . . , xk1 , yk1 )
...
55
p ,
. !
184
. 10: (
, ) xi (, ) yi .
C.3
,
. ,
185
. 181181 , , .
-.
C.3.1
,
p, (HMM) (265),
. 10, () ,
, Y = {s1 , . . . , sn }
X = {o1 , . . . , om }. n m
:
n- - : i = P {y1 = si },
n n- (transition matrix ) A: aij = P {yl+1 = sj |yl = si }
186
. 11: () : .
. 12: () :
.
. 13: LR- .
1.
/ . -
P
a
=
1
LR- ij
j
.
(. 6) (. 7) . ,
.
.
:
. , ,
(. 181).
, ,
, , ,
,
( ) , .. . LR- (left-right model ). -
. 13. , LR- : = (1, 0, . . . , 0).
LR-
. , ( !)
, .. ,
187
, .
LR- ( !)
, LR-,
. . LR- , .
(. . 181)
,
.
. 192
.
HMM
W = (, A, B) , , .
, Y [1,L] X [1,L]
P {X [1,L] |Y [1,L] } =
L
Y
byl xl ,
(266)
l=1
Y [1,L]
P {Y [1,L] } = y1
L1
Y
ayl yl+1
(267)
l=1
L1
Y
ayl yl+1
l=1
L
Y
byl xl
(268)
l=1
2L 1 , X [1,L] , ,
!
L1
L
X
X
Y
Y
P {X [1,L] } =
P {X [1,L] , Y [1,L] } =
y1
ayl yl+1
byl xl . (269)
y1 ,...,yL
Y [1,L]
l=1
l=1
:
2LnL . 10
100.
- (forward-backward
algorithm). X = X [1,L] l (i, X)
X
l (i, X) = P {X [1,l] , yl = i} =
P {X [1,l] , Y [1,l] } .
(270)
Y [1,l] |yl =i
l:
1 (i, X)
= P {x1 , y1 = i}
188
Forwardbackward
:
l+1 (i, X)
= i bix1
n
X
=
P {X [1,l] , yl = j}P {xl+1 , yl+1 = i|X [1,l] , yl = j}
j=1
n
X
1 l < L ,
j=1
P {X} =
n
X
L (i, X) .
(271)
i=1
P {X}, l (i, X) 1 l L 1 i n
3Ln2 . , n2 .
n0 , P {X} 3Ln0 .
n , ,
m0 , (.. )
, .. n0 m0 n. m0 , m0 m ( ).
,
.
. X = X [1,L]
l (i, X)
l (i, X) = P {X [l+1,L] |yl = i}
(272)
Forwardbackward
:
1 l L 1 i n 3Ln0 :
L (i, X) =
l (i, X) =
1
n
X
j=1
!!!!
1 l L
n
X
i=1
(273)
(274)
: X
L HMM n n0 (n n0 n2 ) O(Ln0 ) Ln
l (i, X) l (i, X),
X ( ) (271)
.
,
100 ,
189
!!!!
!!!!
P {X} ,
. l (i, X) l (i, X) - , .. , , . l (i, X), l (i, j, X)
.. . , ( )
.
l (i, X) = P {y1 = i, X [1,l] } l l0 (i, X)= P {y1 = i|X [1,l] }, l (i, X)=
P {y1 = i, xl |X [1,l1] } l (X)= P {X [1,l] |X [1,l1] }:
Forwardbackward
1 (i, X)
1 (X)
n
X
l+1 (i, X) =
l0 (i, X)aij bjxl+1
10 (i, X) =
j=1
l+1 (X) =
0
l+1
(i, X) =
n
X
l+1 (i, X)
1 l < L .
i=1
l+1 (i, X)
l+1 (X)
l0 (i, X)
1
Pn
j=1
0
aij bjxl+1 l+1
(j, X)
l (X)
l = L 1, . . . , 1 .
. l0 (i, X),
l0 (i, X) l (X):
= l0 (i, X)l0 (i, X)
0
l0 (i, X)aij bjxl+1 l+1
(j, X)
l (i, j, X) =
l+1 (X)
Pn Pn
l0 (i, X)aij bjk
i=1
Pj=1
l (k, X) =
.
n
0
i=1 l (i, X)
l (i, X)
P {X} :
ln P {X} =
L
X
ln l (X) .
(275)
l=1
. .
HMM
P {X} , ( ) W.
190
!!!!
HMM
37
ln P {X}
i
ln P {X}
aij
ln P {X}
bik
1 (i, X)
i
PL1
l=1 l (i, j, X)
aij
P
l:xl =k l (i, X)
bik
(276)
(277)
(278)
, , .
!!!!
HMM
, ( )
. ,
X = X [1,L]
HMM
P {xL+1 = k|X} =
=
=
n X
n
X
i=1 j=1
n X
n
X
i=1 j=1
n X
n
X
i=1 j=1
,
(271) (275)
. ,
, ,
. , ,
( , ),
(., , 2.3.1).
X = X [1,L]
Y = Y [1,L] , (.
. 181). X Y ;
, , HMM . (268)
(nL (268)), ,
[118]. , (271)
(269).
, (270), l (i, X)
l (i, X) =
max
Y [1,l] |yl =i
P {X [1,l] , Y [1,l] }
HMM
(279)
l (i, X) l i ,
HMM
1 (i, X) ( =
l+1 (i, X)
=
=
P {x1 , y1 = i}
= i bix1
j|1jn
1 l < L
1 l < L .
yL
(X) =
yl (X [1,L] ) =
max L (j, X)
(280)
j|1jn
l+1 (yl+1
(X [1,L] ), X [1,L] ) l = L 1, . . . , 1 .
P {X [1,L] }
(271), 3Ln0 .
, - ,
l l (i, X) , l (i, X) P (X) ln l (i, X)
ln P (X), :
ln 1 (i, X) = ln i + ln bix1
ln l+1 (i, X) = ln bixl + max (ln l (j, X) + ln aji )
j|1jn
ln P (X) =
1 l < L
max ln L (j, X) ,
j|1jn
.
HMM
W,
. ,
, HMM .
, .
(. 181).
X, ,
X = (X1 , . . . , XN ) .
56 , .. .
, .
56
, .
192
,
P {W|X}
P {X|W} max .
(281)
!
1
1
q
q
1
P
P
{X|W
}
P
P
{X|W
}
P (X, W), . . . , P q (X, W) = Pq
, . . . , Pq
j
j
j
j
j=1 P P {X|W }
j=1 P P {X|W }
(P {y = 1|X}, . . . , P {y = q|X}). , , (X, Y) - E (. 1.2.6, (46))
q
N X
X
P j (Xi , W)E(j, yi ) min ,
(282)
W
i=1 j=1
,
N
X
i=1
P {Y|X, W} =
N
Y
P yi (Xi , W) max ,
(283)
i=1
(. 1.2.8)
ln P {Y|X, W} =
N
X
ln P yi (Xi , W) min .
W
i=1
(284)
.
(. 181).
X = (X1 , . . . , XN ) (,
) Y = (Y1 , . . . , YN ) (, ). ,
, 57
P {X, Y|W} =
N
Y
i=1
57
(285)
, (
) . : .
193
i:Yi =j
(281)(286).
HMM (281)
, ( 3.2.4), (276)(278) . , ,
, , RBF- 3.3.1. , ,
1960- , , [7] [8], (. [120]). -,
RBF- ,
( A.3.2). , , ,
.
,
. X N
Xi = Xi[1,Li ] Li W(0)
(, ) P {X|W}
, , E{X|W} = ln P {X|W}. (269)
,
P {X|W}
N
Y
i=1
P {Xi |W} =
N X
Y
i=1 Y [1,Li ]
194
HMM
N
Y
y1
i=1 y1 ,...,yLi
LY
i 1
ayl yl+1
l=1
Li
Y
!
byl xil
l=1
PN
.. 2 i=1 Li n + n2 + mn (
- )
n
X
j = 1,
j=1
n
X
ajk = 1
k=1
m
X
(287)
bjk = 1.
k=1
, . RBF- (. 88)
.
N
X
P {Xi |W}
P {Xi |W(0) }
i=1
N
X
X P {Xi , Y [1,Li ] |W}
=
ln
P {Xi |W(0) }
i=1
Y [1,Li ]
N
X
X
P
{X
,
Y
|W}
i
[1,Li ]
=
ln
P {Y [1,Li ] |Xi , W(0) }
(0) }P {X |W(0) }
P
{Y
|X
,
W
i
i
[1,L
]
i
i=1
Y [1,Li ]
N
X
X
P
{X
,
Y
|W}
i
[1,Li ]
=
ln
P {Y [1,Li ] |Xi , W(0) }
P
{X
,
Y
|W(0) }
i
[1,L
i]
i=1
Y
ln
[1,Li ]
N
X
X
P {Y [1,Li ] |Xi , W
(0)
} ln
i=1 Y [1,Li ]
N
X
X
i=1 Y [1,Li ]
N
X
X
i=1 Y [1,Li ]
ln() , P {Y [1,Li ] |Xi , W(0) } , .. Y Li 1.
Q(X, W(0) , W) =
N
X
X
i=1 Y [1,Li ]
,
E(X, W) Q(X, W(0) , W) Q(X, W(0) , W(0) ) + E(X, W(0) ) .
E(X, W) W(0) .
E(X, W) W,
195
!!!!
38
Q(X, W(0) , ) W
N
ajk
bjk
1 X
1 (j, Xi )
N i=1
PN PLi 1
i=1
l=1 l (j, k, Xi )
PN PLi 1
i=1
l=1 l (j, Xi )
PN P
i=1
l|xil =k l (j, Xi )
.
PN PLi
i=1
l=1 l (j, Xi )
(288)
(289)
(290)
= j-
j- k-
=
j-
(j- , k- )
=
j-
W(0) X. 0, .. , ,
.
58 [120] ,
HMM - , , , , .
196
(287).
A B : -
, .
PN
C(n0 + mn) i=1 Li
, C , . . 189. ..
.
A
, . B
, ,
.
-
P {X|W},
. , :
.
HMM
, (, (282) (284)) P j (Xi , W) P {X|Wj }, 37,
j
ln P {X|Wk }
P j (Xi , W)
j ln P {X|W }
j
k
=
P
(X
,
W)
P
(X
,
W)
i
i
k
Wk
Wj
Wk
!!!!
HMM
(287) q .
- [46], , .
- (288)(290) (276)(278),
39 - 59
j
ajk
bjk
PN
P {Xi }
j
Pn
PN P {Xi }
j=1 j
i=1
j
P {Xi }
ajk
Pn
PN P {Xi }
k=1 ajk
i=1 ajk
P {Xi }
bjk
Pn
PN P {Xi }
k=1 bjk
i=1 bjk
i=1
ajk
bjk
PN
i=1
PN
i=1
59
, , [7]
197
-, [46] W R(X, W) X
HMM W, :
PN
i ,W)
j i=1 R(X
+C
j = P
PN R(Xi ,W)
n
+C
j=1 j
i=1
j
PN
i ,W)
ajk i=1 R(X
+
C
ajk
ajk = P
PN R(Xi ,W)
n
+
C
a
k=1 jk
i=1
ajk
PN R(Xi ,W)
bjk i=1
+
C
b
jk
,
bjk = P
PN R(Xi ,W)
n
b
+
C
jk
k=1
i=1
bjk
C > 0. [46] , W R ( , (283) (282) ) , R,
C (!) .
C, ,
, ,
, , , .
[95], ,
-,
.
HMM . , .
HMM
40 (285)
PN yi1
i=1 j
j =
N
PN PLi 1 yil yi(l+1)
i=1
l=1 j k
ajk
=
PN PLi 1 yil
i=1
l=1 j
PN P
yil xil
i=1
l|xil =k j k
bjk
=
.
PN PLi yil
i=1
l=1 j
. (285) ,
2n + 1 , . . 2.
HMM (286)
. -, , ,
X N
Y N , 40. . ,
198
HMM
( ) , , , .
(286) .
, -
, k-means (. 36)
(. 86), segmental k-means, .
[93] [62], k, - .
C.3.2
.
. .
.
- , ,
, -.
, , .
.
.
. , HMM
, . 182 .
,
. .
,
, HMM
-.
X x1 , . . . , xL ( ,
Rd )
p(xl |x1 , . . . , xl1 , xl+1 , . . . , sj1 , . . . , sjl . . .) = p(xl |sjl ) = bjl (xl ),
l, j1 , . . . , jl1 , jl+1 , . . .. , bj (x), - m- , .. bj bj1 , . . . , bjm . B = (bij ) , .
. bj (127)
bj (x) =
m
X
k=1
m
X
Pjk
k=1
d e
(2sjk ) 2
kxjk k2
2sjk
bixl bi (xl ) l (i, X), X =
199
{X1 , . . . , XN } EM (-) .
l (i, X) l (i, X) . 189 bixl bi (xl ), l (i, X), l (i, j, X) ..
.
A (288) (289), ,
bj (. (136)(138))
Pjk
PN PLi
i=1
i=1
jk
i=1
l=1
PN PLi
1
d
l (j, k, Xi )
l=1
PN PLi
i=1
sjk
l=1
PN PLi
l (j, k, Xi )xil
l=1
PN PLi
i=1
l (j, Xi )
l (j, k, Xi )
2
l=1 l (j, k, Xi )kxil jk k
,
PN PLi
i=1
l=1 l (j, k, Xi )
!!!!
. 14: :
xi , yi si .
.
- .
. ,
, , , - . ,
HMM, , . 14. si , , i-
, , yi ,
si .
.
, ,
, , .
, ,
, , .
, . 15, 201
202
!!!!
, ,
B b0 , bj0 j- .
B0 = b0 B.
,
bj0 < 1, .. . J diag(b0 ).
( , ) (I J)1 B , JA . , j, l
k l ,
, k, ,
l
X Y
l
byi 0 ayi yi+1 = (JA)
y2 ,...,yl i=1
jk
(l , ).
h0jk = kj . , j, - k
k
hjk =
hljk =
(JA)
l=0
l=0
jk
1
= (I JA)
jk
( I JA (!) A 1
bj0 < 1). H = (I JA) .
!
n
L1
X
Y
i
i
+1
l
0
y10 hyi101
ayil yl+1
hyl+1
byil+1 xl+1
.
0
yi b yi 1 x 1
yi
1
0 =1
y10 ,...,yL
P {X, Y } =
l+1
l=1
l+1
P {X, Y , Y } -
Y ,
, Y X,
!
L1
n
Y
X
0
0
l=1
= (H)y1 by1 x1
L1
Y
(AH)yl yl+1
byl+1
xl+1
l=1
, X
!
L
L1
X
X
Y
Y
P {X} =
P {X, Y } =
(H)y1
(AH)yl yl+1
byl xl
Y
y1 ,...,yL
y1 ,...,yL
(H(I J))y1
l=1
L1
Y
l=1
l=1
L
Y
l=1
203
(I J)1 B y x
l
!
l
!!!!
. 16: .
, , .
204
!!!!
HMM.
. , . 16 ( ).
- (
), (), () , ,
, (), .
HMM .
.
, ,
(, ).
, . [94] [10],
.
C.4
,
, , , , ( 2.3.1), (
2.3.2) . ,
, , , (, ,
, .. , ).
.
, ,
. ,
. ,
, , V I @ G R a ,
, , , A @, I 1 ..,
, ,
.
, .
C.4.1
X [1,L]
X L. ,
.
,
( .).
205
k , X k , X [1,L] L k L k + 1 X [l,l+k] X k ,
1 l L k + 1. f X k , , , .. L k + 1 .
, , .. X ,
, ,
AUG ..
Rn .
k , n ,
.
, . ,
.., .
, . , .
.
, ,
. , (.. ) W,
W Rn (, ) W .
,
. X UX Rn , Fisher score ( ):
ln P {X|W}
UX = UX (W , W) =
.
W
W=W
X 7 UX - , .. n , .
UX UX (W , W) ,
W
W .
,
W W, , P {X|W}
X.
[61] X 7 UX
. ,
HMM (. 193), HMM, yi .
UXi Xi , (UXi , yi )
Rn , , SVM.
[61] ,
, HMM. , , ,
,
.
206
C.4.2
Rn , , ( ) . , .
,
(edit distance).
.. [73] ,
.
:
- ;
- ;
- .
, , ,
. . ,
.
. ,
.
.
d(, ) g. :
!!!!
x x0 d(x, x0 ),
x g(x),
x g(x)
(!)
, .
D
D(, ) = 0
D(Xu, ) = D(, Xu) = D(X, ) + g(u)
D(Xu, Y v) = min(D(X, Y ) + d(u, v), D(Xu, Y ) + g(v), D(X, Y v) + g(u))
(291)
(292)
(293)
,
X Y , u v ,
, , Xu,
. D(X, Y ) .
. , D(X [1,m] , Y [1,n] )
mn m + n.
.
-
. ,
, , , , ,
207
!!!!
!!!!
Rn , ,
, ( , ) . , X 7 UX
61 K(X, X 0 ) = (UX , UX 0 ).
, k Rn K(X, X 0 ) = k(UX , UX 0 ), . (. [61]) :
,
.
( ),
.
41
p X
X X X p(x, x0 ) =
p(x)p(x0 ) .
p X Y
X X -
y Y X
p(x, x0 ) = Ey p(x, x0 |y) = Ey p(x|y)p(x0 |y)
.
. 13 1, ,
.
41 -
, HMM.
xi , yi si ,
. 14, si - , xi
yi , . xi yi
X ,
, .. p(xi , yi |si ) = pi (xi |si )pi (yi |si ). HMM ,
61 I 1 (U , U 0 ) ( ,
X
X
- UX 0 > I 1 UX ), I .
, , W W, . : I = EX (UX UX > ),
W.
W, .
,
, .
208
xi yi B.
. C.3.2 , s , c
1 .
s N
N
! N
!
N
Y
Y
Y
(bsl xl bsl yl ) =
bsl xl
bsl yl = P {X|s}P {Y |s} ,
P {X, Y |s} =
l=1
l=1
l=1
41 ,
,
X
Prepl {X [1,L] , Y [1,M ] } =
P {X [1,L] , Y [1,M ] |S [1,N ] }P {S [1,N ] }
(294)
S [1,N ]
s1 ,...,sL
s1
L1
Y
asl sl+1
l=1
L
Y
!
(bsl xl bsl yl ) asL s
L 6= M
L = M
l=1
(. (269)) .
( ) , ,
(294) 62 ( 4.1.4) :
, (
1), . ,
, (294), , 1000000 999999
, .. :
,
.
, HMM 1 (. 204), ,
X, Y . , P {}
HMM, Pins {, } HMM,
, Pins {X, Y } = P {X}P {Y }, .. X Y
41 Pins .
, HMM
,
, 1
m
,
1 ,
P 1 {X [1,L] } =
L
m
(1 )
62 (294) .
209
L+M
(1 )2 .
m
,
.
, HMM, (294), HMM (
, ).
, HMM , S2 ,
, , S1 . X Y
s = (s1 , . . . , sN ) S2 , - S1 . HMM :
P {X, Y |s} = P {X|s}P {Y |s} , 41 P {X, Y } .
, n (X) X
Pins {X [1,L] , Y [1,M ] } =
(295)
n n , ,
( 1 = 1 n+1 = L + 1).
P {X, Y |s} =
N
Y
N (X) N (Y ) l=1
N
Y
bsl xl P {X [l +1,l+1 1] }
N (X) l=1
N
Y
bsl yl P {Y [l +1,l+1 1] }
N (Y ) l=1
= P {X|s}P {Y |s} .
, , [119].
, , , ,
, .
,
. ,
, ,
, . .
. [104] ,
, , . ,
, , .
210
X k g. X X
L 6= M
0
L
Y
Krepl (X [1,L] , Y [1,M ] ) =
(296)
k(xl , yl ) L = M
l=1
L
! M
!
Y
Y
1
Kins (X [1,L] , Y [1,M ] ) =
g(xl )
g(yl )
l=1
L+M
1
Kins
(X [1,L] , Y [1,M ] ) .
M
2
Kins
(X [1,L] , Y [1,M ] ) =
(297)
l=1
(298)
1
, 13 14 , Krepl Kins
2
. , Kins
,
m+n (m+n)!
n = m!n! , .
7 m+n
.
n
m,n=0
. :
. , (
q) 1. Aql , 1 < l q
m+(l1)
(l 1)- l-. m+l
=
l
l1
(m1)+l
q
,
l-
l
l
. Aql (Aql1 Aql ) . . . (Aq2 . . . Aql )
(,
)
m+n
n
1
1
1
1
..
.
1 1
3 4
6 10
10 20
..
..
.
.
m
n
1
1
1
1
..
.
1
2
3
4
..
.
0
1
2
3
..
.
0
0
1
3
..
.
0
0
0
1
..
.
..
.
..
.
, .
63 .
63 7 , ,
K(m, n) = m+n
+ .
n
xm x
2
: + L ( + ), m m (x) =
e 2 . m!
Z
xn ex dx = n!.
211
,
k g. ,
.
1, . K
X X , K(, ) = 1,
(X ) ( !) . X = (X1 , . . . , XL )
X = (X1 , . . . , XL , , , . . .)
Y
K (X, Y ) =
K(Xl , Yl ) .
(299)
l=1
14 K .
H X X [1,L] X , , , .. (X [1,L] , ), {1, . . . , L}.
X (X ) , ,
, . . . , , . 14
i
i
((X, ), (Y, )) = Krepl ((X, ), (Y, )) Kins
((X, ), (Y, )) (300)
Kmark
i = 1, 2
.
L 2L
{1, . . . , L}.
L .
X
.
14
X X
i
K i (X [1,L] , Y [1,M ] ) =
Kmark
((X, ), (Y, )) i = 1, 2
(301)
L M
. .
, X k g , . , k
, g , , , ,
.
K 1 K 2 :
K i (, ) = 1 i = 1, 2
K (Xu, ) = K i (, Xu) = K i (X, )g(u) i = 1, 2
i
(302)
(303)
(304)
(305)
212
;
: !
..
, . ,
. , , , , - ,
, .
, , C . ,
, , ,
, , .
, , , .
D.1
, C.
X , xj j- ( ,
), Y yj - , .
. R(Y ) Y , YA Y A R(Y )
( , Y{j} = yj ).
213
, ,
.
. X = (X1 , . . . , XN )
Y = (Y1 , . . . , YN ).
-, , , .
i R(Yi ) R(Xi ) , i
. X
Y .
.
X = (X1 , . . . , XN ) ( -, ),
. ,
- , .
X
R(X). Xi
- Yi R(Yi ) = R(Xi ), Y .
(image labeling). X = (X1 , . . . , XN ) ( -, ), . , - , , ,
, , , .. X. , X Y , R(Y ) = R(X),
, , .
, R(Y ) ( -
), () , .
, , Xi
.
,
, (. 1.1.6)
X Y , R(Y ),
() . p (X, Y ) ( ),
p (Y |X) ( , . 1.2.3), .
, , . , ,
,
. ,
: , - ,
214
, , .
,
, .
D.2
D.2.1
(MRF, Markov Random Fields)
G S
Y = (y1 , . . . , yS ), Y.
(. . 18), ,
(. . 1721). , . v (
) N 0 (v), v
N (v)= {v} N 0 (v). ( )
p: p (yv ) v, p (Y ) = p (y1 , . . . , yS )
, p (YA )
A.
, , (MRF , Markov random field ),
v
, ..
. (306)
p yv |YV \{v} p (Y );
,
MRF;
215
!!!!
. 17: ,
. 19: ,
216
. 20: ,
. 21: .
217
MRF .
(306).
:
y Y
p yv , YV \{v}
p yv |YV \{v}
=
,
p y , YV \{v}
p y |YV \{v}
p (Y ) = p (y1 , . . . , yS ) = p y , . . . , y
| {z }
S
S
Y
j=1
p yj |y1 , . . . , yj1 , y , . . . , y
| {z }
Sj
p y |y1 , . . . , yj1 , y , . . . , y
| {z }
Sj
p y , . . . , y | {z }
S
, , .
, , , ,
.
. G , ,
p (y1 |y2 ) = p (y2 |y1 ) = yy12 .
p (y1 , y2 )
y1 = y2 .
, ,
42: , ,
(308), .
, ,
, .
43 p (Y ) ,
MRF.
. , (308) , , (306).
p (Y ) (308) (!)
218
!!!!
!!!!
. MRF , 0 1
: p (0, 0, 0, 0) = p (1, 1, 1, 1) = 21 .
(308),
, ,
(306). , (306)
.
, .
MRF .
, , ..
, . , , ,
. C(G) G. Y G
Y ,
Y
p (Y ) =
(310)
C (YC )
CF
F C(G) G.
, C #C Y-
: C (YC ) = p (YC ), (310)
. p (Y ) , C
. .
44
.
.
(310)
p (Y )
p yv |YV \{v} = R
p (Y ) dyv
, yv ,
Q
CF,C3v C (YC )
p yv |YV \{v} = R Q
.
CF,C3v C (YC ) dyv
, yv ( Y , ),
R
p (Y ) dYV \N (v)
p yv |YN 0 (v) = R
,
p (Y ) dYV \N (v) dyv
:
C (YC )
.
CF,C3v C (YC ) dyv
p yv |YN 0 (v) = R Q
CF,C3v
(311)
, - 0, .
:
219
!!!!
1
.
8
(306) ; ,
, . , .. ,
(310) . .
- , -. R
, f Y R ,
(,
R), y Y .
A R
yj j A
R
A
, Y A = (y1A , . . . , ySA )
Y = (y1 , . . . , yS ) Y yj =
y j 6 A
( YA , A, Y A
A), f A (Y ) = f (Y A )
X
g A (Y ) =
(312)
(1)#(A\B) f B (Y ) .
BA
8 C R
X
f C (Y ) =
g A (Y ) ,
(313)
AC
,
f (Y ) = f R (Y ) =
g A (Y ) .
AR
.
X
g A (Y )
AC
X X
X
X
(1)#(A\B) f B (Y ) =
(1)#D f B (Y )
AC BA
BC
DC\B
X
#(C \ B)
(1)k f B (Y ) =
0#(C\B) f B (Y )
k
#(C\B)
BC
k=0
BC
= f (Y ) .
220
13. MRF p 8 f (Y ) = ln p (Y ), ,
X
ln p (Y ) =
g A (Y )
AR
g A (Y ) YA
Y A, , ,
, .
v, v 0 A . (308),
,
:
p yv , yv0 , YV \{v,v0 } =
.
(314)
p YV \{v,v0 }
v v 0 g A (Y ) (312) ,
B A \ {v, v 0 }:
g A (Y ) =
(1)#B ln p Y B ln p Y B{v}
X
BA\{v,v 0 }
X
BA\{v,v 0 }
0
0
ln p Y B{v } + ln p Y B{v,v }
0
p Y B p Y B{v,v }
(1)#B ln B{v} B{v0 } .
p Y
p Y
, (314) -, (1)#B ln 1 = 0.
. 13
, 44
43.
, MRF
X
gC (Y )
p (Y ) = eCC(G(Y ))
(315)
, -
(310) (315), ,
:
X
EC (YC )
1
p (Y ) =
e CF
,
(316)
Z
F , EC () = ln C () (. (310))
C, Z = ln
.
221
D.2.2
(316) E(Y )
. , p(Y ) e kT
Y , G, E ,
, - , T
, (316).
, , 1920-
. . G .
vj sj , 1 ( ,
).
i- j- Jij si sj , Jij = J > 0 (.. ), -
. ,
, .
hj j- ,
hj sj (.. ).
:
X
X
1
J
si sj +
hj sj
kT
kT
1
j
i<j|{vi ,vj }F
p (Y |h) =
(317)
e
,
Z(T, h)
F G(Y ),
X
X
1
J
si sj +
hj sj
kT
kT j
X
i<j|{vi ,vj }F
Z(T, h) =
e
.
(318)
s1 ,...,sS =1
(317),
Z(T, h), , (318),
2S , .
: (
); 64 .
,
, - , : ,
, .
.
. . .
64
, , ;
, , .
222
. 22: .
yi yj
1
i<j|{vi ,vj }F
p (Y ) =
(319)
e
,
Z()
. X Y MRF, , . 22
(. . 10 HMM),
X
X
yi yj +
xj yj
1
j
i<j|{vi ,vj }F
p (X, Y ) = p (Y ) p (X|Y ) =
e
, (320)
Z(, )
, > 0. x2j = yj2 = 1 (320) -
223
1
i<j|{vi ,vj }F
p (X, Y ) =
e
Z(, )
X
(yi yj )2 +
(xj yj )2
j
(321)
Z(, ). -,
(xj , yj R) (xj , yj R3 ) . , ,
, ,
X
X
(yi yj )2 +
|xj yj |
1
j
i<j|{vi ,vj }F
p (X, Y ) =
e
.
Z(, )
. 22 MRF (320) (321)
. (, ) MRF .
p (X, Y ) p (Y |X), ,
, X . , (321)
X
X
(yi yj )2 +
(xj yj )2
1
j
i<j|{vi ,vj }F
p (Y |X) =
e
, (322)
Z(, , X)
(320) ( )
(317).
, Y (320)(322)
Y X X 65 . ,
. , ,
,
.
, MRF ,
[44]. MRF . 23. X,
(. 22), xj (-
) ,
yj , yi,j {0, 1}
65 , Y (322)
1
0
1
0
X
X
X
@
2
2A
0=
(yi yj ) +
(xj yj )
(yi yj ) + (xj yj )A ,
= 2 @
yj
j
i<j|{vi ,vj }F
i|{vi ,vj }F
..
2 y y = x .
224
. 23: MRF ( [44]). , , .
. , yi,j .
. 23
; . yi,j = 1 , vi vj
, / yi yj , ..
.
, ( 2 ) ,
, ( 1 , 3 4 ) : , ,
. Q
, n(Y, s) yi,j s 0 = 0. (322)
p (Y |X) =
1
eE(X,Y ) ,
Z(, , X)
(323)
E(X, Y ) =
(1 yi,j )(yi yj )2 +
i<j|{vi ,vj }F
D.2.3
(xj yj )2 +
n(Y,s) . (324)
sQ
, G, Y = (y1 , . . . , yS ) ,
X (
G,
), (CRF , conditional random
field ), v
225
. 24:
.
, (306), N 0 (v) v G.
p (Y |X) (CRF) X Y , 13,
. CRF
(316), X
X
EC (X, YC )
1
p (Y |X) =
(326)
e CF
.
Z(X)
, (321)
p (X, Y )
(322) p (Y |X), .. MRF CRF. CRF . ,
.
, , l l, (, ), p (y|X)
.
MRF p (Y ) :
l l ,
, , (. 24),
, .. : ,
, , ,
226
CRF,
MRF
YC C t(C)
Et(C) (YC ), C ; q q 4 .
, CRF (!)
10
X
X
ln p (yv |Xv )
p (Y |X) e
Et (YC )
t=1 CC(G)|t(C)=t
vV
Xv X ,
v , f yv (Xv )
p (yv |Xv ), ,
,
Et (YC ) .
CRF CRF (326)
p (Y |T, X) =
1
e
Z(T, X)
X EC (X, YC )
T
CF
(327)
EC (X, YC ) C;
T ln Z(T, X) ;
T > 0 ;
Y = (y1 , . . . , yS ) Y,
, .
CRF , . C1 , . . . , CM
, Cm
(, -
). Cm .
m, , C Cm
Em (Xm,C , YC ), YC , , C, Xm,C ,
m C X,
. CRF
Y G(Y ), , ,
M
X
X
wm
Em (Xm,C , YC )
1
m=1
CC(G(Y ))Cm
e
,
p (Y |w, X) =
(328)
Z(w, X)
w = (w1 , . . . , wM ).
227
Em,X,Y =
Em (Xm,C , YC )
CC(G(Y ))Cm
m G(Y ), Y PM
X, EX,Y Em,X,Y . m=1 wm Em,X,Y
(w, EX,Y ), (328)
p (Y |w, X) =
1
e(w,EX,Y ) .
Z(w, X)
(329)
(327),
, M - (328).
CRF (328) (329) , CRF (326), (327), (328)
(329).
D.3
, Y -
CRF (329), :
1. X Y p (Y |X);
2. X Y (
), .. p (Y |X)
Y;
3. X p (yv |X) ( );
4. (X, Y) CRF
( CRF), , p (Y|X) .
1 , . 1
Z
Z(w, X) =
e(w,EX,Y ) dy1 , . . . dyS ,
y1 ,...,yS
, , Y
X
Z(w, X) =
e(w,EX,Y ) .
(330)
y1 ,...,yS Y
( ) .
. CRF ,
- HMM (. 188).
, , , , .
, 2 ( p (Y |X) Y ) Z(w, X).
228
24 1 ( p (Y |X)) Z(w, X) p (YR |X)
, , .
34.
D.3.1
(326) (327) X Y ,
X
E(X, Y ) =
EC (X, YC ) ,
(331)
CF
y1 , . . . , yS .
, , Y = R,
, , (
3.2.4). , Y , , , CRF (322),
. , , , . Y .
yv (331), , v G(Y ).
Y,
.
(. 78),
.
(Gibbs sampler)
Y q-.
p yv |YV \{v} , X =
1
e
Zv (T, X, Y )
CF,C3v
EC (X, YC )
T
(332)
Z(T, X) ,
X
Zv (T, X, Y ) =
CF,C3v
V \{v}
EC (X, YC
T
)
(333)
y Y
, .
G(Y ) Y (0) Y (i)
:
229
1. v (1) , v (2) , . . . v1 , . . . , vS ,
- K S K
(,
);
(i)
(i1)
2. i- v, v (i) , yv = yv
,
(i)
v (i) yv(i)
(i1)
p yv(i) |YV \{v(i) } , X , .. (332)
Y (i1) .
Y (i) (Gibbs
sampler Gibbs sampling). ,
(Gibbs sample). . . .
Y (i)
Y S . 1
, ( - ),
2 (327) .
14 (S.Geman,
([44])) Y (0)
(i) (0) D.Geman
p Y |Y , X i
(327).
, ,
. -, ,
( , , ..),
. -,
, . -, ,
=
0<
e T
1
1
T
q
q
1 + (q 1)e T
i- j-
, Y Y 0 p Y (j) = Y |Y (i1) = Y 0 , X (T )S . ,
, ,
:
9 Y 0 Y 00 Y
S
S bKc
1 (qT )
1 e T
.
(334)
(327), : i-
(i)
yv(i) (332) i- (327).
,
p (Y |Ti , X) =
1
e
Z(Ti , X)
230
X EC (X, YC )
Ti
CF
(335)
Ti 0 i . 14 (.. Ti ),
(331) Y (0) .
, . 0 (X)
Y , (331) (, ). , 14.
15 (S.Geman, D.Geman ([44])) Ti 0, i0 , Ti lnSi
b K c
(. 214),
CRF,
p (yv |X).
X
1
Z(w, X)
e(w,EX,Y ) .
YV \{v}
66 ( yv ) marginal
distribution (
Y ). ,
( )
(margin!) . ;
,
.
231
14 ,
(i)
(0)
Y
p yv |X p (yv |X). p (yv |X) yv
, :
16 1 Y (i)
(i)
#{i n : yv = yv }
.
n
n
16 14, 9 ( > 0
> 0 n > N (, )
1
#{in:yv(i) =yv }
!!!!
. 16,
, . MCMC , 1416 , , CRF
Y. .
. MCMC p (YR |X) Y R V , , , , . r- q r
, ,
r q r ,
MCMC R
.
D.3.3
CRF
N
Y
i=1
232
, , p0 (w) = 1,
p (Y|w, X) max .
w
p (Y|w, X) =
N
Y
p (Yi |w, Xi ) -
i=1
(329) G(Y1 ) t . . . t
N
Y
G(YN ) Z(w, X) =
Z(w, Xi ).
i=1
1.2.6 ,
(w)= ln p0 (w) L(w|X, Y)= ln p (Y|w, X)
CRF , ,
, :
E(w, X, Y) = ln p(w, Y|X) = (w) L(w|X, Y)
= (w) + ln Z(w, X) + (w, EX,Y ) min
(337)
(. . 31).
(w) ( ,
p0 (w) , ),
(337) . (86), CRF
, , .. , ,
CRF , .
45 L(w|X, Y) = ln Z(w, X) + (w, EX,Y ) ( CRF (329)) w.
. ln Z(w, X),
L(w|X, Y) w . ln Z(w, X)
w() = w0 + w1 , ,
d2
d2 ln Z(w(), X) . . [Y ]
, G(Y ).
X
d
1
ln Z(w(), X) =
(w1 , EX,Y )e(w(),EX,Y )
d
Z(w(), X)
Y [Y]
2
X
1
d2
ln Z(w(), X) =
(w1 , EX,Y )e(w(),EX,Y )
2
d2
(Z(w(), X))
Y [Y]
X
1
+
(w1 , EX,Y )2 e(w(),EX,Y ) 0 .
Z(w(), X)
Y [Y]
233
, , , (. 3.2.4) .
(337) ( ,
(w) = ln p0 (w) )
E(w, X, Y) (w) ln Z(w, X)
=
+
+ EX,Y
w
w
w
X
1
(w)
=
EX,Y e(w,EX,Y ) + EX,Y
w
Z(w, X)
Y [Y]
X
(w)
N
X
(w) X
=
+
EXi ,Yi
EXi ,Y p (Y |w, Xi )
w
i=1
Y [Yi ]
N X
M
X
X
(w)
+
wm
(338)
w
i=1 m=1
CC(G(Yi ))Cm
X
Em ((Yi )C , (Xi )m,C )
Em (Y, (Xi )m,C )p (Y |w, Xi ) .
Y Y C
C (338) -
Em (YC , (Xi )m,C ) p (Y |w, Xi ). MCMC (336),
, - MCMC
CRF: - CRF
, .
, ,
. , ,
(. 83).
, MCMC, CRF Z(w, X) , , ln Z(w, X), p (YR |w, X)
, R .
D.3.4
CRF X, .
. 227 . S ,
- Y. Y = (y1 , . . . , yS ).
234
(.. ) Y E(Y ).
p(Y ).
U (p)
X
U (p) = Ep;Y E(Y ) =
p(Y )E(Y ) ;
(339)
Y
H(p) p
X
p(Y ) ln p(Y ) ;
H(p) = Ep;Y ( ln p(Y )) =
(340)
F (p)
X
p(Y )(E(Y ) + ln p(Y )) .
F (p) = U (p) H(p) =
(341)
:
46 p(Y ) = Z1 eE(Y )
F (p) , ln Z.
.
(!)
!
X
X
L(p, ) = F (p) +
p(Y ) 1 =
p(Y ) (E(Y ) + ln p(Y ) + ) ,
Y
.
CRF G, yi ,
(328) ln Z(w, X).
, ,
, , .
CRF.
,
Z(w, X) , , q S 1, .
, .
.
, ,
, .
235
!!!!
QS , ..
p(Y ) = j=1 p(yj ), (q 1)S.
(331)
S
S
XY
X
X
F (p1 , . . . , pS ) =
pj (yj )
EC (YC ) +
ln pj (yj )
j=1
j=1
CF
EC (Y )
CF Y :CY
pj (yj ) +
j:vj C
S X
X
pj (y) ln pj (y) .
j=1 yY
, j-,
pj
X
X
Y
X
EC (Y )
pk (yk ) +
pj (yj ) ln pj (yj ) min ,
CF:C3vj Y :CY
yY
k:vk C
pj
(!):
Y
X
X
pk (yk )
EC (yj , Y )
1
CF:C3vj Y :C\{vj }Y
k:vk C\{vj }
pj (yj ) =
e
,
Zj
Zj , , q yj Y. j-
(mean field) j- . -, ,
pj , , , .
, ,
. , pj (yj ),
p(yj ) ,
p(Y ), ,
, ,
, (326), .
: . , p
(331)
X
X X
U (p) =
p(Y )E(Y ) =
EC (Y )p(Y )
(342)
Y
CF Y :CY
.
,
PF = {pC : C F} . , ,
236
!!!!
Mean
field
approximation
(328) M , K
- (, M = 3
, K = 2), S
(q K 1)M S, .
PF pC
- p
. :
C 0 , C 00 F C 0 C 00 C
( , , ,
F) C
:
X
X
pC (YC ) =
(343)
pC 0 (YC , YC 0 \C ) =
pC 00 (YC , YC 00 \C ) .
YC 0 \C
YC 00 \C
.
. ([81]) G 1, 2 3, , F , Y
, pi,j
{i, j} 2 2 . ( X) :
0.4 0.1
0.4 0.1
0.1 0.4
p1,2 =
p2,3 =
p1,3 =
0.1 0.4
0.1 0.4
0.4 0.1
p1 , p2 p3 . (343) . , pi,j p. (23 = 8)- Y {1,2,3} 4- : Y 16=2 = {Y : y1 6= y2 },
Y 26=3 = {Y : y2 6= y3 } Y 1=3 = {Y : y1 = y3 },
p(Y 16=2 ) + p(Y 26=3 ) + p(Y 1=3 ) = p1,2 (Y 16=2 ) + p2,3 (Y 26=3 ) + p1,3 (Y 1=3 )
= (0.1 + 0.1) + (0.1 + 0.1) + (0.1 + 0.1) = 0.6 < 1 .
.
, ,
, . [13]
, :
(343) (
C),
PF , , ,
p
.
, , .
PR R , (343),
- , (belief ).
237
.
belief, . 30.
, , , . , . -, ,
(338)
CRF ( 3, . 228).
, , .
, .
,
, . 239
.
: . [65] (Ryoichi
Kikuchi) , , . ,
, (343),
PR R,
, , , . R R (.. ) pR ,
R
X
X
U (pR ) =
pR (Y )
EC (YC ) ,
(344)
Y :RY
H(pR ) =
CR
pR (Y ) ln pR (Y ) ,
Y :RY
cR (..
) R 1,
-,
(342) ( ) 1, ..
.
. G (. 18),
, ..
, , (322).
R ( ) (.. , ,
). PR ,
(343) , . (344)
, ,
, , . , cR (345), 1 1
, .
238
67 , R F,
(342), , .. R = F V .
(343) 68 .
{v} v, p{v} pv ,
Ev () = E{v} () ( , 0), F0 = F \ V (342). PF
PF0 nv = #{C F0 : C 3 v}. c{v} (345) 1 nv
!
X
X
X
U (pC ) +
U (pv ) H(pC ) +
(1 nv ) (U (pv ) H(pv ))
FF (PF ) =
CF0
vC
vV
CF0 YC :CY
XX
vV yY
CF
U (pC )
H(pC ) +
CF0
(346)
!
(1 nv )H(pv )
vV
= U (PF ) HF (PF ) ,
(347)
vV
CF
(348)
vV
.
, G , .. .
G1 . . . G#F = G : G1 C (1) F0 , Gi1 C (i) , v (i) Gi1 (
G ). (348)
Y Gi nv , , :
,
p (i) (YC (i) )
p(YGi )
= p(YC (i) |YGi1 ) = p(YC (i) |yv(i) ) = C
.
p(YGi1 )
pv(i) (yv(i) )
67 ,
, :
.
68 , , , .. F
.
239
(.. )
.
!!!!
8 , (346) (341).
. , (340) (348) (. (347)).
. PF
(340). , (339) (340), (346)
(341).
, , , , ,
( 1 . 228). . . . .
!!!!
X X
FF (PF ) =
pC (Y ) (EC (Y ) + ln pC (Y ))
CF0 Y Y C
XX
vV yY
min
(349)
pC (Y ) (q k k- C F0 ) pv (y)
(q v V G),
pC (Y ) 0
pv (y) 0
C F0 , Y Y C
v V, y Y
(350)
(351)
pC (Y ) = 1
C F0
(352)
vV
(353)
v C F0 , y Y ,
(354)
Y Y C
X
Y
Y C :y
pv (y) = 1
yY
pC (Y ) = pv (y)
v =y
.. .
(!) ,
, . , [126] ,
, . ,
, .
48 f = f + f , f + f , lx
69 f x (f + lx )(x0 ) < (f + lx )(x),
f (x0 ) < f (x).
69 .. ,
. - .
240
!!!!
f , (!) f ,
, , . ,
f , x(i) .
(349), (!) :
X X
XX
F + (PF ) =
pC (Y ) (EC (Y ) + ln pC (Y )) +
pv (y) (Ev (y) + ln pv (y))
CF0 Y Y C
F (PF ) =
vV
nv
vV yY
pv (y) ln pv (y) .
yY
F + F , - (350) (351) F +
, F +
(350) (351) F + .
F +
(i+1)
(i)
(i)
(i+1)
(i)
F + (PF ) F (PF ) + F (PF ), PF ) PF
min
(i+1)
PF
, ,
X
X
(i)
F + (PF ) F (PF ), PF = F + (PF )
nv
pv (y) 1 + ln p(i)
(y)
min
v
vV
PF
yY
(355)
(352), (353) (354).
X X
L(PF , ) =
pC (Y ) (EC (Y ) + ln pC (Y ))
CF0 Y Y C
XX
vV yY
X
CF0
X
Y
Y C
X XX
CF0
vC yY
pC (Y ) 1 +
vV
C,v,y
Y
X
yY
pv (y) 1
pC (Y ) pv (y) ,
Y C :Yv =y
C , v C,v,y
(352), (353) (354), .
241
!!!!
!!!!
PF
X
L(PF , )
C,v,y
= EC (Y ) + ln pC (Y ) + 1 + C +
pC (Y )
vC
X
L(PF , )
0=
= Ev (y) + ln pv (y) + 1 nv 1 + ln p(i)
(y)
+ v
C,v,y ,
v
pv (y)
0=
C3v
pC (Y ) pv (y)
!
X
EC (Y ) + 1 + C +
C,v,y
vC
pC (Y ) = e
X
(i)
Ev (y) + 1 nv 1 + ln pv (y) + v
C,v,y
C3v
pv (y) = e
(356)
(357)
pC (Y ) pv (y) ,
X
C,v,y
EC (Y ) + 1 + C +
X
X
vC
C +
L () =
e
C
CF0
Y Y
X
X
v +
e
vV
yY
Ev (y) + 1 nv 1 +
ln p(i)
v (y)
+ v
X
C3v
!
C,v,y
, L () ,
. L () = 0 :
!
X
EC (Y ) + 1 + C +
C,v,y
X
L ()
vC
0=
= 1 +
e
C
Y Y C
X
(i)
Ev (y) + 1 nv 1 + ln pv (y) + v
C,v,y
X
L ()
C3v
0=
= 1 +
e
v
yY
!
X
EC (Y ) + 1 + C +
C,v0 ,y
X
L ()
0 C
v
0=
=
e
C,v,y
Y Y C :yv =y
X
(i)
Ev (y) + 1 nv 1 + ln pv (y) + v
C 0 ,v,y
0 3v
C
e
,
. :
!
X
EC (Y ) + 1 +
C,v,y
X
vC
C = ln
e
Y Y C
242
v = ln
Ev (y) + 1 nv 1 +
!
C,v,y
C3v
yY
1
C,v,y = ln
2
ln p(i)
v (y)
EC (Y
) + C +
C,v0 ,y
v 0 C,v 0 6=v
Y Y C :yv =y
Ev (y)
nv
1 + ln p(i)
(y)
+ v
v
C 0 ,v,y
C 0 3v,C 0 6=C
e
.. . ,
. (356) (357), (i+1)
, PF
(355)
.
,
, ,
,
, ., ,
[53].
: .
()
(338) CRF. (,
X ,
.) ,
MCMC, : ,
-
; , MCMC, ,
, . ,
( ) , MCMC.
D.3.5
CRF
, ,
CRF, ,
- . , MRF (323)(324)
() . .
, . ,
A (EM). CRF [107].
.
243
D.4
1960- ([34]),
, 1920-. - [47] 1971 , - . [87] 1974
, .
.
[44] . ,
. . ,
, [78], .
[71]
. , ,
- ( , [90]), -.
, , , , . , . [124]
, ([126], [53] .) (
[124][125] .).
[107], [44], ,
[70] [55], .
, , . ,
[15] , ,
([15, 8]),
([15, 11])
([15, 10]).
244
[1] .., .., ... . ,
25(6):917936, 1964.
[2] .., .., ...
. , , 1970.
[3] M.A. Aizerman, E.M. Braverman, and L.I. Rozonoer. Theoretical foundations
of the potential function method in pattern recognition learning. Automation
and Remote Control, 25:821837, 1964.
[4] N. Aronszajn. Theory of reproducing kernels
(http://www.recognition.mccme.ru/pub/papers/kernels/ReproducingKernels.pdf).
Trans. Am. Math. Soc., 68(3):337404, 1950.
[5] David Arthur and Sergei Vassilvitskii. How slow is the k-means method?
(http://www.recognition.mccme.ru/pub/papers/clustering/kMeans-socg.pdf)
In Symposium on Computational Geometry, pages 144153, 2006.
[6] N.E. Ayat, M. Cheriet, L. Remaki, and C.Y. Suen. KMOD - A New Support
Vector Machine Kernel with Moderate Decreasing for Pattern Recognition.
Application to Digit Image Recognition
(http://www.recognition.mccme.ru/pub/papers/SVM/ayat01kmod.pdf).
In Sixth International Conference on Document Analysis and Recognition,
pages 1215 1219, 2001.
[7] Leonard E. Baum and J. A. Eagon. An inequality with applications to
statistical estimation for probabilistic functions of Markov processes and to a
model for ecology
(http://www.recognition.mccme.ru/pub/papers/EM/baum67inequality.pdf).
Bull. Amer. Math. Soc., 73(3):360363, 1967.
[8] Leonard E. Baum, Ted Petrie, George Soules, and Norman Weiss. A
Maximization Technique Occurring in the Statistical Analysis of Probabilistic
Functions of Markov Chains.
The Annals of Mathematical Statistics,
41(1):164171, 1970.
[9] Yoshua Bengio and Paolo Frasconi. An Input Output HMM Architecture
(http://www.recognition.mccme.ru/pub/papers/HMM/bengio95input.ps.gz).
In Advances in Neural Information Processing Systems, pages 427434. MIT
Press, 1995.
[10] Yoshua Bengio. Markovian Models for Sequential Data
(http://www.recognition.mccme.ru/pub/papers/HMM/Bengio99Markovian.ps.gz).
Neural Computing Surveys, 2:129162, 1999.
[11] K.P. Bennett, A. Demiriz, and J. ShaweTaylor. A Column Generation
Algorithm for Boosting
(http://www.recognition.mccme.ru/pub/papers/boosting/bennett00column.pdf).
In Pat Langley, editor, Proceedings of Seventeenth International Conference
on Machine Learning, pages 6572. Morgan Kaufmann, 2000.
[12] C. Berg, J.P.R. Christensen, and P. Ressel. Harmonic Analysis on Semigroups.
Springer-Verlag, New York, 1984.
245
[13] H. A. Bethe. Statistical theory of superlattices. Proc. Roy. Soc. London ser.
A, 150(871):552575, 1935.
[14] Christopher M. Bishop. Neural Networks for Pattern Recognition. Oxford
University Press, 1995.
[15] Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer,
2006.
[16] C.M. Bishop and M.E. Tipping. Variational Relevance Vector Machine
(http://www.recognition.mccme.ru/pub/papers/Bayes/bishop00variational.ps.gz).
In Processings of the 16-th Conference on Uncertainty in Artificial
Intelligence, pages 4653. Morgan Kauffman, 2000.
[17] P.E. Bohmer. Theorie der unabhangigen Warscheinlichkeiten. In Rapports,
Memories et Proc`es-verbaux de Septi`eme Congr`es international dActuaries,
volume 2, pages 327343, Amsterdam, 1912.
[18] B.E. Boser, I.M. Guyon, and V.N.Vapnik. A training algorithm for optimal
margin classifiers
(http://www.recognition.mccme.ru/pub/papers/SVM/boser92training.ps.gz.ps).
In Proceedings of the 5th Annual ACM Workshop on Computational Learning
Theory, pages 144152. ACM Press, 1992.
[19] Leon Bottou. Stochastic Learning
(http://www.recognition.mccme.ru/pub/papers/stochastic/mlss-2003.pdf).
In Olivier Bousquet and Ulrike von Luxburg, editors, Advanced Lectures on
Machine Learning, Lecture Notes in Artificial Intelligence, LNAI 3176, pages
146168. Springer Verlag, Berlin, 2004.
[20] Leo Breiman, Jerome Friedman, Charles J. Stone, and R. A. Olshen.
Classification and Regression Trees. Chapman & Hall/CRC, January 1984.
[21] Leo Breiman. Random Forests
(http://www.recognition.mccme.ru/pub/papers/boosting/breiman01random.pdf).
Machine Learning, 45(1):532, 2001.
[22] Christopher J.C. Burges. A Tutorial on Support Vector Machines for Pattern
Recognition
(http://www.recognition.mccme.ru/pub/papers/SVM/burges98tutorial.pdf).
Data Mining and Knowledge Discovery, 2(2):121167, 1998.
[23] F.P. Cantelli. Sulla determinazione empirica della leggi di probabilita. Giorn.
Ist. Ital. Attuari, 4:421424, 1933.
[24] Michael Collins, Robert E. Schapire, and Yoram Singer. Logistic Regression,
AdaBoost and Bregman Distances
(http://www.recognition.mccme.ru/pub/papers/boosting/collins00logistic.pdf).
Machine Learning, 48(1-3):253285, 2002.
[25] Corinna Cortes, Patrick Haffner, and Mehryar Mohri. Positive Definite
Rational Kernels
(http://www.recognition.mccme.ru/pub/papers/kernels/colt.pdf).
In In Proceedings of The 16th Annual Conference on Computational Learning
Theory (COLT 2003), pages 4156. Springer, 2003.
[26] C. Cortes and V. Vapnik. Support Vector Networks
(http://www.recognition.mccme.ru/pub/papers/SVM/cortes95support.ps.gz.ps).
Machine Learning, 20(3):273297, 1995.
246
[27] T.M. Cover and P.M. Hart. Nearest Neighbor pattern classification
(http://www.recognition.mccme.ru/pub/papers/NN/cover67nearest.pdf).
IEEE Transactions on Information Theory, 13:2127, 1967.
[28] D.R. Cox. Some procedures associated with the logistic qualitative response
curve. In F.N. David, editor, Research Papers in Statistics: Festschrift for J.
Neyman, pages 5571, New York, 1966. Wiley.
[29] D. R. Cox and D. Oakes. Analysis of Survival Data. Chapman and Hall,
London, 1984.
[30] D. R. Cox. Regression Models and Life-Tables
(http://www.recognition.mccme.ru/pub/papers/survival/cox72regression.pdf).
Journal of the Royal Statistical Society. Series B (Methodological), 34(2):187
220, 1972.
[31] N. Cristianini and J. Shawe-Taylor.
An Introduction To Support
Vector
Machines
(and
other
kernel-based
learning
methods)
(http://www.support-vector.net/references.html).
Cambridge University
Press, 2000.
[32] Ayhan Demiriz, Kristin P. Bennett, and J. Shawe-Taylor. Linear programming
boosting via column generation
(http://www.recognition.mccme.ru/pub/papers/boosting/demiriz02linear.pdf).
Machine Learning, 46(1):225254, 2002.
[33] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from
incomplete data via the EM algorithm. Journal of the Royal Statistical
Society, series B, 39(1):138, 1977.
[34] ...
(http://www.recognition.mccme.ru/pub/papers/CRF/Dobrushin68description.pdf).
, 13(2):201229, 1968.
[35] B. Efron, I. Johnstone, T. Hastie, and R. Tibshirani. Least angle regression
(http://www.recognition.mccme.ru/pub/papers/L1/LeastAngle2002.pdf).
Annals of Statistics, 32(2):407451, 2004.
[36] R.A. Fisher. The use of multiple measurements in taxonomic problems
(http://www.recognition.mccme.ru/pub/papers/SLT/fisherLDA.pdf).
Eugen., 7:179188, 1936.
[37] Yoav Freund and Robert E. Schapire. Experiments with a New Boosting
Algorithm
(http://www.recognition.mccme.ru/pub/papers/boosting/freund96experiments.pdf).
In In Proceedings of the Thirteenth International Conference on Machine
Learning, pages 148156, 1996.
[38] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of
on-line learning and an application to boosting
(http://www.recognition.mccme.ru/pub/papers/boosting/adaboost.pdf).
J. Comput. Syst. Sci., 55(1):119139, 1997.
[39] Yoav Freund. Boosting a weak learning algorithm by majority
(http://www.recognition.mccme.ru/pub/papers/boosting/freund90boosting.pdf).
In COLT 90: Proceedings of the third annual workshop on Computational
learning theory, pages 202216, 1990.
247
[44] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the
Bayesian restoration of images
(http://www.recognition.mccme.ru/pub/papers/CRF/GemanPAMI84.pdf).
IEEE Trans. Pattern Anal. Mach. Intell., 6:721741, 1984.
[45] V.I. Glivenko. Sulla determinazione empirica di probabilita. Giorn. Ist. Ital.
Attuari, 4:9299, 1933.
[46] P.S. Gopalakrishnan, D. Kanevsky, D. Nahamoo, and A. Nadas. An inequality
for rational functions with applications to some statistical estimation
problems. IEEE Trans. Information Theory, 37(1):107113, 1991.
[47] J. M. Hammersley and P. Clifford. Markov fields on finite graphs and lattices
(http://www.recognition.mccme.ru/pub/papers/CRF/hammersley71markov.pdf),
1971.
[48] Trevor Hastie, Saharon Rosset, Robert Tibshirani, and Ji Zhu. The Entire
Regularization Path for the Support Vector Machine
(http://www.recognition.mccme.ru/pub/papers/SVM/hastie04entire.pdf).
Journal of Machine Learning Research, 5:13911415, 2004.
[49] Trevor Hastie and Patrice Y. Simard. Metrics and Models for Handwritten
Character Recognition
(http://www.recognition.mccme.ru/pub/papers/tangent/hastie97metrics.pdf).
Statistical Science, 13:5465, 1998.
[50] T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning
(http://www-stat-class.stanford.edu/~tibs/ElemStatLearn/). Springer, 2001.
[51] D. Haussler. Convolution Kernels on Discrete Structures
(http://www.recognition.mccme.ru/pub/papers/kernels/haussler99convolution.ps.gz).
Technical Report UCSC-CRL-99-10, UC Santa Cruz, 1999.
[52] R. Herbrich.
Learning Kernel Classifiers: Theory and Algorithms
(http://www.learning-kernel-classifiers.org/). The MIT Press, 2002.
[53] Tom Heskes. Stable Fixed Points of Loopy Belief Propagation Are Local
Minima of the Bethe Free Energy
(http://www.recognition.mccme.ru/pub/papers/CRF/heskes02stable.pdf).
In NIPS 15, pages 343350, 2002.
248
[54] Magnus R. Hestenes and Eduard Stiefel. Methods of Conjugate Gradients for
Solving Linear Systems
(http://www.recognition.mccme.ru/pub/papers/gradient/hestenes-stiefel-52.pdf).
J. Res. Natl. Bur. Stand., 49:409436, 1952.
Carreira-Perpi
[55] Xuming He, Richard S. Zemel, and Miguel A.
nan. Multiscale
Conditional Random Fields for Image Labeling
(http://www.recognition.mccme.ru/pub/papers/CRF/he04multiscale.pdf).
In In CVPR, pages 695702, 2004.
[56] A.E. Hoerl and R.W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12:5567, 1970.
[57] John H. Holland. Adaptation in natural and artificial systems. Univ. of
Michigan Press (Reprinted by MIT Press, 1992), 1975.
[58] R. M. Hristev. The ANN Book
(http://www.recognition.mccme.ru/pub/papers/ANN/HristevTheANNbook.djvu).
GNU Public Licence, 1998.
[59] C.-W. Hsu and C.-J. Lin. A comparison of methods for multi-class support
vector machines
(http://www.recognition.mccme.ru/pub/papers/SVM/hsu01comparison.ps.gz).
IEEE Transactions on Neural Networks, 13(2):415425, March 2002.
[60] P. Huber. Robust estimation of location parameter. Annals of Mathematical
Statistics, 53:73101, l964.
[61] Tommi Jaakkola and David Haussler. Exploiting Generative Models in
Discriminative Classifiers
(http://www.recognition.mccme.ru/pub/papers/kernels/jaakkola98exploiting.pdf).
In In Advances in Neural Information Processing Systems 11, pages 487493.
MIT Press, 1998.
[62] B. H. Juang and L. R. Rabiner. The Segmental K-Means Algorithm for
Estimating Parameters of Hidden Markov Models
(http://www.recognition.mccme.ru/pub/papers/HMM/juang90segmental.pdf).
IEEE Trans. on Acoustics, Speech, and Signal Processing, 38(9):16391641,
1990.
[63] E.L. Kaplan and P. Meier. Nonparametric estimation from incomplete
observations
(http://www.recognition.mccme.ru/pub/papers/survival/kaplan58nonparametric.pdf).
Journal of the American Statistical Association, 53:457481, 1958.
[64] S. S. Keerthi and E. G. Gilbert. Convergence of a Generalized SMO Algorithm
for SVM Classifier Design
(http://www.recognition.mccme.ru/pub/papers/SVM/keerthi00convergence.pdf).
Machine Learning, 46(1-3):351360, 2002.
[65] R. Kikuchi. A theory of cooperative phenomena. Phys. Rev., 81(6):9881003,
1951.
[66] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated
annealing. Science, 220:671680, 1983.
[67] J.P. Klein and M.L. Moeschberger. Survival Analysis. Techniques for Censored
and Truncated Data. Springer, 1997.
249
[83] J. Mercer. Functions of positive and negative type and their connection with
the theory of integral equations. Philos. Trans. Roy. Soc. London, A, 209:415
446, 1909.
[84] ... : . , , 2011.
[85] ... : . , , 2014.
[86] Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth,
Augusta H. Teller, and Edward Teller. Equation of State Calculations by
Fast Computing Machines. The Journal of Chemical Physics, 21(6):1087
1092, 1953.
[87] J. Moussouris. Gibbs and Markov systems with constraints. Journal of
statistical physics, 10:1133, 1974.
[88] Radford Neal and Geoffrey E. Hinton. A view of the EM algorithm that
justifies incremental, sparse, and other variants
(http://www.recognition.mccme.ru/pub/papers/EM/neal98view.pdf).
In Learning in Graphical Models, pages 355368. Kluwer Academic Publishers,
1998.
[89] A. B. J. Novikoff. On convergence proofs on perceptrons. Proceedings of the
Symposium on the Mathematical Theory of Automata, XII:615622, 1962.
[90] Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of
Plausible Inference. Morgan Kaufmann, San Francisco, CA, USA, 1988.
[91] J. Platt. Fast Training of Support Vector Machines using Sequential Minimal
Optimization
(http://www.recognition.mccme.ru/pub/papers/SVM/smoTR.pdf).
In B. Schlkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods
- Support Vector Learning. MIT Press, 1998.
[92] J. Ross Quinlan. C4.5: programs for machine learning. Morgan Kaufmann
Publishers Inc., San Francisco, CA, USA, 1993.
[93] L. R. Rabiner, J. G. Wilpon, and B. H. Juang. A Segmental K-Means Training
Procedure for Connected Word Recognition. AT&T Tech. J., 65(3):2131,
1986.
[94] L. R. Rabiner. A tutorial on hidden Markov models and selected applications
in speech recognition
(http://www.recognition.mccme.ru/pub/papers/HMM/rabiner.pdf).
Proceedings of the IEEE, 77(2):257286, 1989.
[95] W. Reichl and G. Ruske. Discriminative Training for Continuous Speech
Recognition
(http://www.recognition.mccme.ru/pub/papers/HMM/reichl96discriminative.ps.gz).
In Proc. 1995 Europ. Conf. on Speech Communication and Technology, pages
537540, 1995.
[96] H. Robbins and S. Monro. A Stochastic Approximation Method
(http://www.recognition.mccme.ru/pub/papers/stochastic/robbinsMonro.pdf).
Annals of Mathematical Statistics, 22:400407, 1951.
251
[125] Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. Constructing Free
Energy Approximations and Generalized Belief Propagation Algorithms
(http://www.recognition.mccme.ru/pub/papers/CRF/yedidia05constructing.pdf).
IEEE Transactions on Information Theory, 51(7):22822312, 2005.
[126] A. L. Yuille. CCCP algorithms to minimize the Bethe and Kikuchi free
energies: Convergent alternatives to belief propagation
(http://www.recognition.mccme.ru/pub/papers/CRF/yuille02CCCP.pdf).
Neural Computation, 14:2002, 2002.
[127] Hui Zou and Trevor Hastie. Regularization and variable selection via the
Elastic Net
(http://www.recognition.mccme.ru/pub/papers/L1/elasticnet.pdf).
Journal of the Royal Statistical Society B, 67:301320, 2005.
254
L2 , 93
L , 155
N 0 (v), 215
N (v), 215
d
, 153
N(,)
d
, 153
N(,A)
N(,A) , 153
Nj , 176
P, 10
PY , 30
R(Y ), 213
Rj , 176
S(t), 168
S > (t), 169
ST , 137
T , 9, 11
T 0 , 12
UX , 206
Vin , 66
Vout , 66
W, 15, 19
X, 30, 38, 40
X0 , 38
X , 6, 10
X 0 , 39
X , 211
k k, 42
0-1-, 11
01 , 42
, 207
l0 (i, X), 190
i , 125
l (i, X), 188
l0 (i, X), 190
l (i, X), 189
l (i, X), 189
ab , 17, 26
l (i, X), 191
l (X), 190
l (i, X), 190
, 39
i , 170
-SVC, 109
-SVR, 116
j , 172, 173
l (i, j, X), 189
j
mk
, 24
x0 , 102
0 , 102
k , 24
j , 172, 173
, 52
l (i, X), 191
S , 14
A, 186
B, 186
C, 37
C(G), 219
Xx , 149
+
Xx , 149
X [m,n] , 180
Y, 30
Y, 7, 10
Y A , 220
YA , 213
Y [m,n] , 180
[Y ], 233
, 186
(w), 233
h(t), 168
k-NN (k Nearest Neighbors), 9
k-means, 36
lp , 42
nv , 239
t(j) , 172, 173
u, 66
u, 152
v, 152
w,
40
x
, 40
x[], 160
Dx , 149
+
Dx , 149
E, 11
E, 25
E;u , 153
F , 15, 19
F (t), 168
F, 10
F[], 98
H(), 155
In , 42
In0 , 42
K, 102
K+ , 102
K0 , 102
K= , 102
L(w|X, Y), 233
255
xij , 180
y, 40
F, 219
F0 , 239
W, 186
z, 152
+
z, 152
PF , 236
, 66
-, 196
, 198
, 79
, 53
, 191
, 87, 155
, 165
-, 188
AdaBoost, 125
EM, 87, 155
GEM, 165
LogitBoost, 130
LPBoost, 140
, 21
, 22
, 20
, 18
, 18
, 20
, 60
, 60
, 64, 124
, 170
, 170
, 183
, 146
, 146
, 146
, 12
, 27, 46, 48
, 47
, 49
, 49
, 30, 237
, 72
, 235
, 227, 235
, 227
, 155, 235
, 239
, 33
, 79
, 39
, 66
(, , ), 11
, 66
, 168
, 168
, 87
, 83
, 55, 83
, 42
, 78
,
208
, 8
, 39
, 18
, 58
, 6, 35
, 6, 34
, 35
, 219
-, 16
-, 16
, 49
, 43
, 49
, 51
, 215
, 22
, 22
, 183
, 215
, 186
, 186
k , 9
k , 36
, 25, 29
, 22
-
, 232
-, 51
, 9
, 42
256
, 78
, 51
, 22
, 23
, 32
, 72
, 58, 104
, 146
, 17
, 77
, 146
, 20
, 19
, 19
, 20
, 10, 11
, 14
, 11
, 20
, 19
, 19
, 20
, 175
, 183
, 183
, 24
, 175
, 185
-, 187
, 185
LR, 187
, 9, 11
, 16
, 12, 38
, 16
, 66
, 68
, 66, 67
, 72
, 66
, 9
, 5
, 20
, 34
, 193
257
, 34
, 206
, 57
, 11
0-1-, 11
-, 45
, 45
, 134
, 11, 42
, 45
, 11
, 15
, 45
, 12
, 91
, 13
, 53
, 68
, 14
, 227
, 53
, 38
, 215
, 66
, 11
, 23
, 6
, 87
, 7
, 7
, 10
, 37
, 7, 10
, 6, 10
, 10
, 10
, 39, 91, 206
, 39
, 5
, 5
, 5
, 27
, 38
, 219
, 169
, 207
, 207
-, 14
, 43
, 66
, 8
, 175
, 18
, 42
, 39
, 51
, 51
, 46
, 15
, 11, 168
, 11
, 11
, 38
, 17, 26
, 66
, 67
, 215
, 225
, 185
, 185
, 55
,
55
, 14
, 15
, 227
, 239
, 11
0-1-, 11
, 11
, 11
, 12
, 227
-, 172
-, 220
, 70
, 93
, 54
-, 83
, 170
(), 64
, 225
, 107
, 37
, 125
, 230
, 34, 35
, 62, 91
, 208
, 62, 93
, 62
, 97
- , 100
CPD, 100
RBF, 97
AdaBoost, 124, 125
ANN (Artificial Neural Networks), 66
back-propagation, 71
belief, 30, 237
boosting, 64
censored data, 170
cluster, 6
cluster analysis, 6
conditional random field, 225
conditionally positive definite kernel,
100
confidence, 8
CPD (Conditionally Positive Definite),
100
CPD-kernel, 100
CRF (Conditional Random Field), 225
cross entropy, 33
curse of dimensionality, 10
derived feature, 7
dimension reduction, 38
discriminative methods, 20
edit distance, 207
elastic net, 44
emission matrix, 186
error function, 11
evidence, 145
expectation maximization, 86, 155
generalized, 165
feature, 6
Fisher score, 206
fitting, 5
forward-backward algorithm, 188
generative methods, 20
Gibbs sample, 230
Gibbs sampler, 230
258
hazard, 168
hidden Markov model, 185
HMM (Hidden Markov Model), 185
imputation, 148
impute, 147
Input Output HMM, 202
IOHMM (Input Output Hidden Markov
Model), 202
nearest neighbor, 9
NN (Nearest Neighbor), 9
non-informative prior, 22
outlier, 34
kernel, 62
overfitting, 14
kernel trick, 62
KMOD (Kernel with MOderate Decreasing), partial likelihood, 176
99
pattern recognition, 5
positive definite kernel, 62
LARS (Least Angle Regression), 43
product limit, 172
LASSO (Least Absolute Shrinkage and
proportional hazard model, 175
Selection Operator), 43
lasso regression, 43
radial basis functions, 60
latent variables, 152
random forests, 124
learner, 5
RBF (Radial Basis Functions), 60
learning, 5
recognition, 5
learning vector quantization, 38
recognition tree, 12
Leave-One-Out, 17
recognizer, 5
left-right model, 187
relevance vector machine, 146
likelihood, 23
response, 7
linear programming boosting, 138
ridge regression, 42
log-odds, 52, 129
RKHS (Reproducing Kernel Hilbert
logistic function, 52
Space), 94
logistic regression, 50, 51
RVM (Relevance Vector Machine), 146
logit, 52
segmental k-means, 199
LogitBoost, 130
self-organizing maps, 38
LOO (Leave One Out), 17
sigmoid function, 52
loss function, 11
simulated annealing, 78
LPBoost, 138
SMO (Sequential Minimal Optimization),
LVQ (Learning Vector Quantization),
108
38
soft margin classifier, 58
machine learning, 5
SOM (Self-Organizing Maps), 38
MAP (Maximum of Aposteriori Probability), statistical learning, 5
22
structural risk minimization, 14
MAR (Missing At Random), 146
stump, 13
margin classifiers, 56
subset selection, 45
marginal distribution, 231
supervised learning, 34
Markov chain Monte Carlo, 232
support vector machine, 58
Markov random field, 215
support vector regression, 46
maximum likelihood, 23
survival function, 168
maximum of aposteriori probability,
SVC (Support Vector Classification),
22
104
MCAR (Missing Completely At Random),
SVM (Support Vector Machine), 58,
146
104
MCMC (Markov Chain Monte Carlo),
SVR (Support Vector Regression), 46,
232
104
259
test error, 12
training, 5
training error, 11
transition matrix, 186
truncated data, 170
unsupervised learning, 34
validation set, 16
VC-dimension (Vapnik-Chervonenkis
dimension), 14
vector quantization, 37
260