Вы находитесь на странице: 1из 39

()

. .
9 2006 .

1
1.1 . .
1.2 . . . . . . . . . . . . . . . . . . . . .
1.2.1 . . . . . . . . . . . . .
1.2.2 . . . . . . . . . . . . .
1.3 . . . . . . . . . . . . . . . . . .
1.3.1 . . . . . . .
1.3.2 .
1.4 ( ) . . .
1.4.1 . . . . . . . . . . .
1.4.2 .
1.4.3 . . .
1.5 . . . . . . . . .
1.5.1 . . . . . . .
1.5.2
1.6 . . . . . . . . . . . . . . .
1.7 . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

2
3
6
6
9
12
12
17
19
20
21
24
27
28
31
34
35

, . , ,
.
, ,
.
.
, ?
, , ? ?
? .
. hX, Y, y , X i,
X ;
Y ;
y : X Y ;
X = (x1 , . . . , x ) ;
Y = (y1 , . . . , y ) , yi = y (xi ).
a : X Y , y X.
,
. . [6].
R, .
, a(x) = C(b(x)), b : X R , C : R Y . : , . , ,
- .
. 1.1. , bt : X R, t = 1, . . . , T , F : RT R
C : R Y a : X Y

a(x) = C F (b1 (x), . . . , bT (x)) , x X.


(1.1)

b1 , . . . , bT .

R , .
F : Y T Y . ,
Y ,
, F [6]. ,

3
. F
, , .
1.1. ,
R = R. (real-valued classiers):
Y = {0, 1},

C(b) = [b > 0],

a(x) = [b(x) > 0].


Y = {1, +1},

C(b) = sign(b),

a(x) = sign b(x).

1.2. M , Y = {1, . . . , M }, R = RM .

x , b(x) = b1 (x), . . . , bM (x) .


C(b) = C(b1 , . . . , bM ) = arg max by .
yY


. .
1.3. Y ,
Y = R, . (1.1) , R = Y C(b) b.
1.4.
:
T

1X
b(x) = F b1 (x), . . . , bT (x) =
bt (x),
T t=1

x X.

(1.2)

X , bt
. , b(x)
O(T 1/2 ) T . , , , . , ,
T .

1.1

. a : X Y X Y W = (w1 , . . . , w ):
Q(a; X , Y , W ) =

X
i=1

wi L a(xi ), yi ,

(1.3)

4
L : Y Y R+ .
Q . ,
L(y, y ) = [y 6= y ] ;
L(y, y ) = (y y )2 .
C .
e y ) = L(C(b), y )
L(b,
, :
Q(b; X , Y , W ) =

X
i=1

e b(xi ), yi .
wi L

(1.4)

,
, Q(b) B:
(X , Y , W ) = arg min Q(b; X , Y , W ).
bB

.
b1 . ,
, .
b1 b2 , F . t- bt F b1 , . . . , bt1 :
b1 = arg min Q(b; X , Y );
b

(1.5)

b2 = arg min Q(F (b1 , b); X , Y );


b,F

...
bt = arg min Q(F (b1 , . . . , bt1 , b); X , Y ).
b,F

(1.6)

(1.6) , (1.5).
(X , Y , W ) W / Y .
1.1, , . .
, , , , ,
, yi a(xi )
yi . (1.6) (1.5) .
, , Q.

5
1.1.

:
X , Y ;
;
T ;
:
F (b1 , . . . , bT );
1: i = 1, . . . ,
2:
wi := 1;
3: t = 1, . . . , T ,
4:
bt := (X , Y , W );
5:
W / Y , bt ;

(1.51.6) . bt , t- ,
. ,
:
bk = arg min Q(F (b1 , . . . , bk1 , b, bk+1 , . . . , bt ); X , Y ),
b,F

1 6 k < t.

(1.6).
1.4.2.
,
; :
T .
:
Q(F (b1 , . . . , bt ); X , Y ) 6 .
X k d : t t > d, d ,
t = arg min Q(F (b1 , . . . , bs ); X k , Y k ).
s=1,...,t

.
, t - .
, (1.6) (1.5).

1.2

,
. .
1.2.1

Y = {1, 1}
R = R. ,
1 1.
(
1,
b > 0;
C(b) = sign(b) =
1, b < 0;
. 1.2. (simple voting) (1.2) sign:
X

T
a(x) = sign
bt (x) .
t=1

a(x) x , . a(x) , .
TODO:
ToDo1
. ,
- .

. a , :
Q(a) =

yi a(xi ) < 0 =
SiT < 0 ,
i=1

i=1

Sit = yi b1 (xi ) + + yi bt (xi ). , Sit xi .


t t. Sit < 0, a(xi )
xi , (Sit ) , bt+1 , bt+2 , . . .
xi , .

bt , Si,t1 6 0. ,
,

Q(bt ; W ) wi = Si,t1 6 0 .
. . ,
, .
, , xi
Si,t1 . .

7
1.2.
:
0 ;
1: i = 1, . . . ,
2:
wi := 1; Si := 0;
3: t = 1, . . . , T ,
4:
bt := (X , Y , W );
5:
i = 1, . . . ,
6:
Si := Si + yi bt (xi );
7:
X Si + ri ,
8:
9:

ri (0, 1);
i = 1, . . . ,
wi := [i 6 0 ];

. 0 ,
. X Si,t1 . Si,t1
. 0 , wi = 1,
wi = 0. 1.2.
0
. 0 , . , 0 = ,
. 0
.
, Sit Si ,
Sit , Si,t1 .
. 0
X (h) = {xi X | Si,t1 6 h},

h = 0, 1, . . . , t 1,
(h)

bt = (X (h) , Y (h) )
. .
, h. ,
h = ht1 , ht1 1, . . . , ht1 d,
ht1 h, , d
, .
. , , ,

8
. , (weighted majority) 9 [26]:
(
wi / yi bt (xi ) > 0;
wi =
wi
yi bt (xi ) < 0;
, , wi = Si,t1 . > 1 ,
.


Q(bt ; W ) .
, .
TODO: .

(weighted voting)
:
X

T
a(x) = sign
t bt (x) .
t=1

, , . , 1.3. , , ( t = 1),

, b1 (x), . . . , bT (x)
x t
. : ( ??),
( ??), ( ??), SVM ( ??). , ,
.
, t > 0. t
, bt ,
.
, t .
1.5. ( ??). , bt (x), t = 1, . . . , T
.

t = ln

1 t
,
t

t bt .
bt , t . , t + 1/. bt

ToDo2

9
1.3. x X
1: t = 1, . . . , T
2:
bt (x) = 1
3:
ct ;
4: c0 .
, t < 0. t = 0, .
1.2.2

, |Y | = M .
, 2 , bt : X {0, 1}. R = {0, 1}. .
. 1.3.

a(x) = F b1 (x), . . . , bT (x) ,

F c1 , . . . , cT Y , c0
/ Y
1.3.

a(x) .

bt x X : bt (x) = 1 ct , ,
bt+1 .
. 1977.
. , , [30] [27].
, , . ??, . ??.
bt (x) = 1, , bt x.
c0 . , . ,
, .
, ,
.


x X : a(x) = c0 .
1, .
, :

X
X

Q(a) =
a(xi ) 6= c0 a(xi ) 6= yi +
a(xi ) = c0 .
i=1

i=1

, t- , bt , ct .

10
-, bt ,
b1 , . . . , bt1 .
It1 :
It1 = {i : b1 (xi ) = = bt1 (xi ) = 0} .
-, bt bt (xi ) = 1
ct 6= yi , bt (xi ) = 0 ct = yi , i It1.
, Q(a) = Q F (b1 , . . . , bt )
bt
X
X

Q(bt ) =
bt (xi ) = 1 ct 6= yi +
bt (xi ) = 0 ct = yi ,
iIt1

iIt1

:
Q(bt ) =

X
i=1

[i It1 ] [yi =
6 ct ] + [yi = ct ] bt (xi ) 6= [yi = ct ] .
| {z }
{z
}
|
zi

wi

-
: bt = (X , Z , W ), W = (w1 , . . . , w ) Z = (z1 , . . . , z ).
wi zi .
bt , ct .
, , , . 1,
, -
.
= 0
, .
1.4 . ct 4.
.
c1 , . . . , cT .
, wi .
, ( M ).
Y = {v1 , . . . , vM }, Tm
vm , m = 1, . . . , M :
bv1 1 (x), . . . , bv1 T1 (x) , bv2 1(x) , . . . , bv2 T2 (x) , . . . , bvM 1 (x), . . . , bvM TM (x) ,
{z
} |
|
{z
}
{z
}
|
v1

v2

vM

11
1.4.
:
;
1: i = 1, . . . ,
2:
wi := 1;
3: t = 1, . . . , T ,
4:
ct ;
5:
i = 1, . . . ,
6:
zi := [yi = ct ];
7:
wi 6= 0 wi := 1 zi (1 );
8:
bt := (X , Z , W );
9:
i = 1, . . . ,
10:
bt (xi ) = 1 wi := 0;

Y = {v1 , . . . , vM },
. ,

.
. , 7:
wi 6= 0 ct 6= ct1 wi := 1 zi (1 );
. , ,
, .
. .
Q(bt )
, - 1 .
0.05 0.5.
,
.
. ,
, .
. ,
.

.
- . , .

12
1.6. : X = R2 , Y = {, }, bt (x) .
ToDo3

TODO:

1. . :
.
2. , .
:
.
3. . b2
b1 , . :
; ,
.

1.3

. 1.4. F : RT R
T
1X
F (b1 , . . . , bT ) =
t bt ,
T t=1

(1.7)

t .

. (1.7)
(weighted voting).
1.3.1

, Y = {1, 1}. ,
1 1, : C(b) = sign(b).
:
X

a(x) = C F (b1 (x), . . . , bT (x)) = sign


t bt (x) ,

x X.

t=1

,
:

X
T
X
QT =
yi
t bt (xi ) < 0 .
i=1

t=1

QT .
1. t bt (x) bt t ,
j bj (x), j = 1, . . . , t 1.

13
5.0
4.5

4.0

3.5
3.0
2.5

2.0
1.5
1.0
0.5
0.0
-5

-4

-3

-2

-1

. 1. : E , S ,
L , Q .

2. t , - .
(boosting), .
, , . , ,
b R, . . 1:

;
e
b 1
[b < 0] 6 2(1 + e )
;

log2 (1 + eb ) .
, , (,
). ,
. AdaBoost [32].
QT :

X
X
e
t bt (xi ) =
QT 6 QT =
exp yi
=

i=1

X
i=1

t=1

T 1
X
t bt (xi ) exp(yi T bT (xi )).
exp yi

t=1

{z
wi

, wi T bT , ,

bT .

f
W = w
e1 , . . . , w
e , w
ei = wi
j=1 wj .

1.1. Q b X , Y U :
Q(b; U ) =

X
i=1

ui yi b(xi ) < 0 ,

X
i=1

ui = 1.

14
min Q(b; W ) < 1/2 W . b

eT
Q
f );
bT = arg min Q(b; W
b

T =

f )
1 1 Q(bT ; W
ln
.
f )
2
Q(bT ; W

.
eT . Q
, e = e [ = 1] + e [ = 1],
R {1, 1}:
eT =
Q

|i=1

w
ei [bT (xi ) = yi ] +e
{z

1Q

eT 1 ,
= eT (1 Q) + eT Q Q

|i=1

w
ei [bT (xi ) 6= yi ]
{z
Q

wi =

} |i=1{z }
e T 1
Q

f ). Q
eT
Q = Q(bT ; W
T ,
T :
T =

1 1Q
ln
.
2
Q

eT ,
Q
p
eT = Q
eT 1 4Q(1 Q).
Q

(1.8)

eT 1 , Q
eT Q
Q
Q < 1/2 Q Q > 1/2, .

1.1.
bt
(
exp(t ),
bt (xi ) 6= yi ;
.
wi := wi
exp(t ), bt (xi ) = yi ;
, bt . , , .
1.2.
(, ) . wi
. , ,
.

15
1.5. AdaBoost
1: i = 1, . . . ,
2:
wi := 1/;
3: t = 1, . . . , T ,
4:
bt := (X , Y , W ) Q(b; W ) min;
5:
6:
7:
8:

1 1 Q(bt ; W )
;
t := ln
2
Q(bt ; W )
i = 1, . . . ,

wi := wi exp t yi bt (xi ) ;
P
: wi := wi
j=1 wj , i = 1, . . . , ;

f ) < 1/2 ,
1.3. Q(bt ; W
bt (x), 1/2. ,
, , .
, , , .
, ,
AdaBoost .
1.2. f ) < 1/2 > 0, AdaBoost
bt , Q(bt ; W
a(x) .

.
(1.8) QT eT +1 6 Q
e1 (1 4 2 ) T2 . ,
: QT +1 6 Q
eT 1. QT ,
Q
.

. 10
, . , , ( ), , ,
.
( ) . ,
[21]. ,
. ,
.
. , -

16
, . [17].
(margin) xi Mi = yi a(xi ).
. , . , ,
.
, .
.
, [17]
.
- [36], ,
104 . . . 106 .
, [29, 28]. [23]
.
AdaBoost.
. ( , )
, .
.
. .
, .
xi , wi .
AdaBoost.
AdaBoost
. , . . AdaBoost ,
. . , LogitBoost
[22].
AdaBoost .
, , , .

17

.
. t
-
.
SVM [37].
,
. ,
.
TODO: AdaBoost .
ToDo4
AdaBoost .
:
.
.
, ( ).
.

1.3.2

, , , .
( ) .
. ,
. . ,

, .
, .
. t .
. (bagging, bootstrap aggregation) . 1996 [18].
.
, . ,
1/e 37% . ,
, .

18
1.6. RSM
:
;
n ;
1 ;
2 ;
1: t = 1, . . . , T ,

2:
X := X ;

3:
F n := F n n ;

4:
bt := (F n , X , Y );

5:
Q(bt , X ) > 1 ( > 0 Q(bt , X \ X ) > 2 )
6:
bt ;

. -,
,
. -, - . , ,
, .
,
.
0 ,
0 .
RSM. (random subspace method, RSM) ,
[33].
,
. , ,
, .
RSM 1.6.

F n = {f1 , . . . , fn } (F n , X , Y )

X X ,

F n F n . 6 n 6 n . 1.6 n = n,
= .
56 . bt ,

1 , X \ X 2 . ,
,
.

19

. , , . , , RSM [33].
.
.
RSM .
RSM , ,
/ .
RSM ,
.

1.4

( )


, x X.
bt (x) gt (x), [0, 1].
, bt (x) x gt (x).
. 1.5. F : R2T R,
T

X
F b1 , . . . , bT , g1 , . . . , gT =
gt b t .

(1.9)

t=1

,
a(x) = C

X
T
t=1

gt (x)bt (x) ,

C . gt (x), bt (x) x. gt (x) {0, 1}, t = {x X : gt (x) = 1}


bt (x). gt (x)
.
[10]. (1.9)
(mixture of experts, ME), , (gates) [16]. , x. - ,
. gt (x)bt (x) .

20
:
T
X

gt (x) = 1,

x X.

(1.10)

t=1

, x

,

min bt (x), max bt (x) .


t

C(b) = [b > 0]
C(b) = sign(b) ,
(1.9).
gt (x) t = 1, . . . , T , , x ,
. , ,
. ,
(novelty detection).
. , ,
f (x)
.
g(x; , ) = (f (x) + ),
, R , (z) = (1 + ez )1 ,
.
, . , X = Rn

g(x; , ) = (xT + )

g(x; , ) = exp(kx k2 ),
Rn , R ,
, .
1.4.1

, X
, ,
. , -,
, .
(1.3),
W = (w1 , . . . , w ).

21
e y) b,
1.1. L(b,
b1 , b2 R, y Y g1 , g2 , , g1 + g2 = 1,
e 1 b1 + g2 b2 , y) 6 g1 L(b
e 1 , y) + g2 L(b
e 2 , y).
L(g

: ,
y.
. ,
e y) = (b y)2 , . ,
L(b,
e y) = [by < 0],
L(b,
Y = {1, 1}, .
, . . 1 . 13:
(
by
(E);
e y) = [by < 0] 6 e
L(b,
log2 (1 + eby ) (L).

. ,
(. ??), AdaBoost (. 1.3.1).
,
. , .
e , (1.10) L
. Q(a) :
X
X

T
T X

e
e bt (xi ), yi .
Q(a) =
L
gt (xi )bt (xi ), yi 6
gt (xi )L
i=1

t=1

t=1 i=1

{z

Q(bt ;W )

(1.11)

Q(a) T ,
bt gt . gt ,
bt -, .
. , . ,
, , .
, , .
1.4.2

1.7 .
b1 (x) . LearnInitG, -

22
1.7.
:
0 ;
;
1: :

2:
3:
4:
5:
6:

7:
8:
9:
10:

11:

b1 := (X , Y );
g1 := LearnInitG (b1 , X , Y );
t = 1, . . . , T 1, :

at (x) :
(t)

X := xi X : L(at (xi ), yi ) > ;


|X (t) | 6 0
(at );
t- :
bt+1 := (X (t) , Y (t) );
gt+1 := LearnInitG (bt+1 , X , Y );
k = 1, 2, . . . , Q(at+1 ; X , Y )
j = 1, . . . , t + 1
wi := gj (xi ) i = 1, . . . , ;
j- k- :
bj := (X , Y , W );
gj := LearnGj (X , Y );
(aT );

g1 (x), , b1 (x) :
LearnInitG (b, X , Y ) = arg min
g

(g(xi ) zi )2 ,

i=1

zi = L(b(xi ), yi ) , [0, 1]
. ,
zi = [b(xi ) = yi ] ;

zi = |b(xi ) yi | < > 0 .

t .
3 , t :

at (x) = C g1 (x)b1 (x) + + gt (x)bt (x) , x X.

(t + 1)- , 1, bt+1 ,
X (t) , , at (x) . 0 , .
( 710). (1.11)

23
bj Q(bj ; W ) .
, , .
, gj (x), :
Aji =

t+1
X

gs (xi )bs (xi );

t+1
X

Gji =

s=1, s6=j

gs (xi );

bji = bj (xi );

i = 1, . . . , .

s=1, s6=j

Q gj ,
gj :

LearnGj (X , Y ) = arg min


g

X
i=1

e ji + bji gj (xi ), yi ).
L(A

LearnGj ,
.
.
, Y = {1, 1},
1 1. gj (x) at+1 (x)
Q :
Q(at+1 ) =
=

i=1

X
i=1

exp Aji yi gj (xi )bji yi =

exp(Aji yi ) exp gj (xi ) bji yi =


{z
}
|{z}
|
w
ei

yei

X
i=1

w
ei exp(gj (xi )e
yi ).

,
,
Q. w
ei = exp(Aji yi )
yei = bji yi :
, bj (x)
. ,
.
.
Y = R. (j)
gj (x). , , , Aji /Gji = at (xi ),
(j)
at , at+1 (x) j- .
(j)
at .

24
gj (x) at+1 (x) :
Q(at+1 ) =

(at+1 (xi ) yi ) =


X
Aji + gj (xi )bji

yi =
G
+
g
(x
)
ji
j
i
i=1
i=1

X
Aji
Aji
gj (xi )
bji
yi
=
=
Gji + gj (xi )
Gji
Gji
i=1
2
2

X
yi Aji /Gji
Aji
gj (xi )

=
bji
.
Gji
Gji + gj (xi ) bji Aji /Gji
i=1 |
|
{z
}
{z
}
w
ei

yei

gj (x) . w
ei yei . . w
ei xi , bj (xi )
. , Q gj (xi ) . yei yi (j)
bj (xi ) , at (xi ).
, bj ,
. gj (xi ) . , bj ,
. .
.
.
, . .
1.4.3

. ,
, , :

a(x) = C g1 (x)b1 (x) + g2 (x)b2 (x) , g1 (x) + g2 (x) = 1, x X.

1.1 Q(a; W )
:
e W ) =
Q(a; W ) 6 Q(a;

|i=1

X
e b2 (xi ), yi .
e b1 (xi ), yi +
g2 (xi )wi L
g1 (xi )wi L

{z

Q(b1 ;W1 )

|i=1

{z

Q(b2 ;W2 )

25
1.8. M2E
:
X , Y ;
W = (w1 , . . . , w ) ;
b0 (x) ;
:
: a(x) = b1 (x)g1 (x) + b2 (x)g2 (x);
P
e 0 (xi ), yi );
1: g1 := arg min i=1 g(xi )wi L(b
g
2:
3:
:
b1 := (X , Y , W1 ) w1i = wi g1 (xi );
b2 := (X , Y , W2 ) w2i = wi (1 g1 (xi ));
4:
:
e 1 (xi ), yi ) L(b
e 2 (xi ), yi ) , i = 1, . . . , ;
di := wi L(b
P
g1 := arg min i=1 di g(xi );
g
5:
P
:

Q := i=1 wi L b1 (xi )g1 (xi ) + b2 (xi )g2 (xi ), yi ;


6: Q;
g1 (x) g2 (x) , Q(b1 ; W1 ) Q(b2 ; W2 ) .
:
b1 = (X , Y , W1 ),
b2 = (X , Y , W2 ),

W1 = wi g1 (xi ) i=1 ;

W2 = wi g2 (xi ) i=1 .

: b1 (x) b2 (x),
g1 (x) g2 (x). g2 (x) = 1 g1 (x) g1 (x),
e 1) =
Q(g

X
i=1

e 1 (xi ), yi ) L(b
e 2 (xi ), yi ) .
g1 (xi ) wi L(b
{z
}
|

(1.12)

di =const(g1 )

g1 (x), , .
, , . ,
.
b0 (x) . 1.8 M2E (mixture of 2 experts) .
. M2E , .

26
1.9.
:
X , Y ;
1: (W , b0 ):

2:
3:
4:
5:
6:
7:
8:

W = (w1 , . . . , w ) ;
b0 , a(x) = b1 (x)g1 (x) + b2 (x)g2 (x);
b1 , g1 , b2 , g2 := M 2E(W , b0 );

W1 = wi g1 (xi ) i=1 ;
(W1 , b1 )
(W1 , b1 );

W2 = wi g2 (xi ) i=1 ;
(W2 , b2 )
(W2 , b2 );


9: W := (1/, . . . , 1/);
10: b0 := (X , Y , W );
11: (W , b0 );
.
. ,
, g(x) g(x)g1 (x) g(x)g2 (x), g1 (x) + g2 (x) = 1, -
. , . , (hierarchical mixture of experts, HME). 1.9
.
HME
(. ??). , .
.
. ,
. , .
X :
h
i
P
(W , b) := Q(b; W ) > i=1 wi .
.

X k , :

(W , b) := Q(b; X k , Y k , W k ) > Q(b0 ; X k , Y k , W k ) .

27
.
.

1.5

. F (b1 , . . . , bT ) = 1 b1 + + T bT t RT Y .
( ) ,
. , , . F RT Y , b1 (x), . . . , bT (x) F (b1 (x), . . . , bT (x)).
, bt (x) , x .
,
F (b1 (x), . . . , bT (x)),
. ,
x , , .
. 1.6. R Y . F : RT Y .
, , Y . , a a(x) = F (b1 (x), . . . , bT (x)).
.
. 1.7. U , V . (ui , vi )i=1 U V , uj 6 uk
vj 6 vk j, k = 1, . . . , .
1.3. U , V . f : U V , f (ui ) = vi i = 1, . . . , ,
, (ui , vi )i=1 .
.
: f , (ui , f (ui ))i=1 .
. , (ui , vi )i=1 ,
f - . -

28
u U I(u) = {k : ui 6 u}

min vi , I(u) = ;
i=1,...,
f (u) =
max vi , I(u) 6= .
iI(u)

, f . u u u 6 u
I(u) I(u ), f (u) 6 f (u ).
, f (ui ) = vi i = 1, . . . , . I(ui ) ,
i I(ui ). k I(ui ) (ui , vi )i=1 uk 6 ui
vk 6 vi . max vk k = i, f (ui ) = vi .

kI(ui )

, f .
( Y = R) , f , f . ( Y = {0, 1})
, , 0.
.
1.5.1

ui xi , fi , a(x) xi :
ui = (b1 (xi ), . . . , bt (xi ));
fi = a(xi ) = F (b1 (xi ), . . . , bt (xi )) = F (ui );

i = 1, . . . , .

a(xi ) = yi
F (ui ) = yi ,

i = 1, . . . , .

(1.13)

. 1.8. V . (j, k)
b : X V , yj < yk b(xj ) > b(xk ). b(x) :
D(b) = {(j, k) : yj < yk b(xj ) > b(xk )} .
, F F (b(xj )) > F (b(xk )), , a(x) = F (b(x))
xj , xk , F .
. 1.9. b1 , . . . , bt
DT (b1 , . . . , bt ) = D(b1 ) . . . D(bt ) = {(j, k) : yj < yk uj > uk }.
Dt = Dt (b1 , . . . , bt ).
(j, k) F
F (uj ) > F (uk ), , a(x) = F (b1 (x), . . . , bt (x)) xj , xk .

29
1.4. F : Rt Y , 1.13 , b1 , . . . , bt . a = F (b1 , . . . , bt ) .
.
.
) : Dt (b1 , . . . , bt ) = ;
) j, k (uk 6 uj ) (yj < yk ) ( . 1.9);
) j, k (uk 6 uj ) (yk 6 yj );
) (ui , yi )i=1 ( . 1.7);
) F : Rt Y , F (ui ) = yi
i = 1, . . . , ( 1.3).
) , yj < yk F (uj ) > F (uk )
, , D(F (b1 , . . . , bt )) .

, a = F (b1 , . . . , bt ), b1 , . . . , bt , .
1.9
: Dt Dt1 . (j, k)
bt (xj ) < bt (xk ),

(j, k) Dt1 ,

(1.14)

Dt ,
uk uj , . , (1.14)
(j, k).
1.5 ( ). (1.5) b1 , B , X 2m
2m, m > 1, b B F ,
F (b(xi )) = yi ,

xi X 2m .

(1.51.6)

a = F (b1 , . . . , bT ) T 6 D(b1 )/m + 1.

.
t- , t > 2, (1.6). |Dt1 | > m,
m- t1 Dt1 .
|Dt1 | 6 m, t1 = Dt1 .
, t1 :

U = xi X k : (k, i) t1 (i, k) t1 .
, U 2m. bt B F ,
F (bt (xi )) = yi xi U . bt

bt (xj ) < bt (xk ),

(j, k) t1 .

30
: bt (xk ) 6 bt (xj ), F (bt (xk )) 6 F (bt (xj )), , yk 6 yj , yj < yk ,
(j, k).
bt , t = 2, 3, . . . ,
, , m
:
|Dt | 6 |Dt1 |m.
D(b1 )/m , b1 .

.
,
.
, (1.14) ,
, . ,
: ,
(1.3),
.
, i-
Dt1 , .

e y) = [b > 0] 6= y ,
1.6. Y = {0, 1} L(b,

|Dt (b1 , . . . , bt )| 6

X
i=1

e t (xi ), yi );
wi L(b

wi = |D(i)|;

D(i) = xk X (k, i) Dt1 (i, k) Dt1 ;

(1.15)
i = 1, . . . , .

(1.16)

.
i = bt (xi )
t- i- ,

e
Li = L(i , yi ) = [i > 0] 6= yi , 1,
bt i- , 0 .
j , k
[j > k ] 6 [j > 0] + [k 6 0].
yj < yk , Y :


yj = 0;
[j > 0] = [j > 0] 6= 0 = [j > 0] 6= yj = Lj ;

yk = 1;
[k 6 0] = [k > 0] 6= 1 = [k > 0] 6= yk = Lk .

31
Dt = Dt1 D(bt ),
[j > k ] 6 Lj + Lk :
X
X
X

|Dt | =
(j, k) D(bt ) =
j > k 6
(Lj + Lk ) =
(j,k)Dt1

j=1
yj =0

X
i1

Lj

(j,k)Dt1

X
k=1

(j, k) Dt1 +

k=1
yk =1

(j,k)Dt1

Lk

X
j=1

(j, k) Dt1 =

X
Li
(k, i) Dt1 (i, k) Dt1 =
Li |D(i)|.
k=1

i1

[2].
e y) = (b y)2 , 1.7. Y = R L(b,
(1.15), -: wi = h2
i |D(i)|,
hi = 21 min |yi yk |, D(i) (1.16).
kD(i)

1.5.2

b1 , . . . , bt ,
F , F (ui ) = yi i = 1, . . . , .
.
, (ui , yi )i=1 , F (ui ) = yi .
F , F (ui ) = yi , . .
fi Y , i = 1, . . . , , yi , (ui , fi )i=1 :

X
(fi yi )2 min;

i=1
fj 6 fk , (i, j) : uj 6 uk .

-. , , .
, F (ui ) = fi i = 1, . . . , .
, . .

. (ui , fi )i=1 , ui Rt , fi R. F : Rt R : F (ui ) = fi , i = 1, . . . , .


, fi ( ).

32

. 2. (Mi ) (Mi )
ui .

. 3. u
.

Mi Mi ui ,
. 2:
Mi = {u Rt | ui 6 u};
Mi = {u Rt | u 6 ui }.
: Rt+ R+ ,
(0, . . . , 0). ,

max(z1 , . . . , zt );
(z1 , . . . , zt ) = z1 + + zt ;

p 2
z1 + + zt2 ;

, min(z1 , . . . , zt ) ,
.
u = (u1 , . . . , ut )
ui = (u1i , . . . , uti ):

ri(u) = (u1i u1 )+ , . . . , (uti ut )+ ;

ri(u) = (u1 u1i )+ , . . . , (ut uti )+ ;

(z)+ = [z > 0]z. ri(u) u


Mi. ri(u) u Mi. , ri
ri .
[mini fi , maxi fi ) u = (u1 , . . . , ut ) :
h(u, ) = min ri(u);
i:fi >

h (u, ) = min ri(u).


i:fi 6

.
Y = {0, 1} h, h F (u). u 1,

33

. 4. F (u) . 5.
.
(u, ) .

1 , 0.
1.8. , Y = {0, 1},
F (u) = [h(u, ) 6 h(u, )]
[0, 1) Rt , ,
Y F (ui ) = fi , i = 1, . . . , .
[2].
. 4 F (u):
, 1,
0. . . 4
t = 2, (z 1 , . . . , zt ) = (z12 + + zt2 )1/2 .
. , a(x),
F (u), .
.

(u, ) =

h(u, )
,
h(u, ) + h(u, )

u Rt ,

R.

, , , ui , i = 1, . . . , 0, 1:
(ui , ) = [fi > ].
(u, )
, , . 4 . 5.
,
, , 1
(u, ).

34

. 6.
F (u) .

. 7. ()
, .

1.9. Y = R : f1 6 f2 . . . 6 f .
1
X
F (u) = f1 +
(fi+1 fi )(u, fi )
i=1

Rt ,
F (ui ) = fi i = 1, . . . , . (u) , F (u)
.
[2].
, F (u) ,
.
. 6 F (u), . . 7 , . , .
TODO: .
ToDo5
TODO: . , .
ToDo6
.

1.6


[4, 5].
, [8]. ,
, , . ,
.
.

35
, ,
[10] . [16]. 1990
(weak) [31]. AdaBoost [20]. ,
, [38] [18]. [35].
[1, 2, 15]

, , , .
. .
,
[13, 11, 12, 14].
[24] [34] , ,
, .

1.7

, .
, .
.
, :
. : .
,
.
.
. , ,
.
. [9]
[25].
.
, , , .

36
, . ,
, .
, [21], [18], [2].
.
. ,
.
. ,
. , [32], [1].
.
. , ,
, . [19]
(random subspace method) [33].
, .
. , ,
. ,
. ,
, EM- [25].
, [3]. ,
.
,
[6, 7], , . (1.1).
,
. , ,
[1, 2].

37


[1] . . - // . 1998. . 38, 5. . 870880.
http://www.ccas.ru/frc/papers/voron98jvm.pdf. .
[2] . .
// . 2000.
. 40, 1. . 166176.
http://www.ccas.ru/frc/papers/voron00jvm.pdf. .
[3] . ., . . // .
2005. 2. . 5166.
http://www.ccas.ru/frc/papers/voron05twim.pdf. .
[4] . . // . 1976. 6.
[5] . . // . .
1976. . 231, 3.
[6] . .
// . 1978. . 33. . 568. .
[7] . ., . . () //
. 1987. . 187198.
http://www.ccas.ru/frc/papers/zhurrud87correct.pdf.
[8] . . // . 1971. 3.
[9] . . .
.: , 1990.
[10] . ., . . .
.: , 1981. P. 244.
[11] . .
// . 1987. 3.
. 106109.
[12] . . // . 1987.
4. . 7377.
http://www.ccas.ru/frc/papers/rudakov87symmetr.pdf.
[13] . .
// . 1987. 2. . 3035.
http://www.ccas.ru/frc/papers/rudakov87universal.pdf.

38
[14] . . // . 1988. 1. . 15.
http://www.ccas.ru/frc/papers/rudakov88universal.pdf.
[15] . ., . .
// . . 1999.
. 367, 3. . 314317.
http://www.ccas.ru/frc/papers/rudvoron99dan.pdf. .
[16] Adaptive mixtures of local experts / R. A. Jacobs, M. I. Jordan, S. J. Nowlan,
G. E. Hinton // Neural Computation. 1991. no. 3. Pp. 7987.
[17] Boosting the margin: a new explanation for the eectiveness of voting methods /
R. E. Schapire, Y. Freund, W. S. Lee, P. Bartlett // Annals of Statistics. 1998.
Vol. 26, no. 5. Pp. 16511686.
http://citeseer.ist.psu.edu/article/schapire98boosting.html. .
[18] Breiman L. Bagging predictors // Machine Learning. 1996. Vol. 24, no. 2.
Pp. 123140.
http://citeseer.ist.psu.edu/breiman96bagging.html.
[19] Breiman L. Arcing classiers // The Annals of Statistics. 1998. Vol. 26, no. 3.
Pp. 801849.
http://citeseer.ist.psu.edu/breiman98arcing.html.
[20] Freund Y., Schapire R. E. A decision-theoretic generalization of on-line learning
and an application to boosting // European Conference on Computational Learning
Theory. 1995. Pp. 2337.
http://citeseer.ist.psu.edu/article/freund95decisiontheoretic.html.
[21] Freund Y., Schapire R. E. Experiments with a new boosting algorithm //
International Conference on Machine Learning. 1996. Pp. 148156.
http://citeseer.ist.psu.edu/freund96experiments.html.
[22] Friedman J., Hastie T., Tibshirani R. Additive logistic regression: a statistical view
of boosting: Tech. rep.: Dept. of Statistics, Stanford University Technical Report,
1998.
http://citeseer.ist.psu.edu/friedman98additive.html.
[23] Grove A. J., Schuurmans D. Boosting in the limit: Maximizing the margin of learned
ensembles // AAAI/IAAI. 1998. Pp. 692699.
http://citeseer.ist.psu.edu/grove98boosting.html.
[24] Jain A. K., Duin R. P. W., Mao J. Statistical pattern recognition: A review //
IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000. Vol. 22,
no. 1. Pp. 437.
http://citeseer.ist.psu.edu/article/jain00statistical.html.
[25] Jordan M. I., Jacobs R. A. Hierarchical mixtures of experts and the EM algorithm //
Neural Computation. 1994. no. 6. Pp. 181214.
http://citeseer.ist.psu.edu/article/jordan94hierarchical.html.

39
[26] Littlestone N., Warmuth M. K. The weighted majority algorithm // IEEE
Symposium on Foundations of Computer Science. 1989. Pp. 256261.
http://citeseer.ist.psu.edu/littlestone92weighted.html.
[27] Marchand M., Shawe-Taylor J. Learning with the set covering machine // Proc. 18th
International Conf. on Machine Learning. Morgan Kaufmann, San Francisco, CA,
2001. Pp. 345352.
http://citeseer.ist.psu.edu/452556.html.
[28] Ratsch G., Onoda T., Muller K. R. An improvement of adaboost to avoid
overtting // Advances in Neutral Information Processing Systems, Kitakyushu,
Japan. 1998. Pp. 506509.
http://citeseer.ist.psu.edu/6344.html.
[29] Ratsch G., Onoda T., Muller K.-R. Soft margins for AdaBoost // Machine
Learning. 2001. Vol. 42, no. 3. Pp. 287320.
http://citeseer.ist.psu.edu/ratsch00soft.html.
[30] Rivest R. L. Learning decision lists // Machine Learning. 1987. Vol. 2, no. 3.
Pp. 229246.
http://citeseer.ist.psu.edu/rivest87learning.html. .
[31] Schapire R. E. The strength of weak learnability // Machine Learning. 1990.
Vol. 5. Pp. 197227.
http://citeseer.ist.psu.edu/schapire90strength.html.
[32] Schapire R. The boosting approach to machine learning: An overview // MSRI
Workshop on Nonlinear Estimation and Classication, Berkeley, CA. 2001.
http://citeseer.ist.psu.edu/schapire02boosting.html.
[33] Skurichina M., Duin R. P. W. Limited bagging, boosting and the random subspace
method for linear classiers // Pattern Analysis & Applications. 2002. no. 5.
Pp. 121135.
http://citeseer.ist.psu.edu/skurichina02limited.html.
[34] Tresp V. Committee machines // Handbook for Neural Network Signal Processing /
Ed. by Y. H. Hu, J.-N. Hwang. CRC Press, 2001.
http://citeseer.ist.psu.edu/tresp01committee.html.
[35] Tresp V., Taniguchi M. Combining estimators using non-constant weighting
functions // Advances in Neural Information Processing Systems / Ed. by G. Tesauro,
D. Touretzky, T. Leen. Vol. 7. The MIT Press, 1995. Pp. 419426.
http://citeseer.ist.psu.edu/tresp95combining.html.
[36] Vapnik V. Statistical Learning Theory. Wiley, New York, 1998.
[37] Vapnik V. The nature of statistical learning theory. 2 edition. Springer-Verlag,
New York, 2000.
[38] Wolpert D. H. Stacked generalization // Neural Networks. 1992. no. 5.
Pp. 241259.
http://citeseer.ist.psu.edu/wolpert92stacked.html.