Вы находитесь на странице: 1из 22

9

Performance Optimization

1
9 Basic Optimization Algorithm
x k + 1 = x k + k pk

or
x k = ( x k + 1 x k ) = k p k

xk + 1
k pk
xk

pk - Search Direction

k - Learning Rate
2
9 Steepest Descent
Choose the next step so that the function decreases:
F ( xk + 1 ) < F ( xk )

For small changes in x we can approximate F(x):


T
F ( x k + 1 ) = F ( x k + x k ) F ( x k ) + g k x k

where
g k F ( x )
x = xk

If we want the function to decrease:


T T
g k x k = k g k p k < 0

We can maximize the decrease by choosing:


pk = gk

xk + 1 = xk k gk
3
9 Example
2 2
F ( x ) = x 1 + 2x 1 x 2 + 2x 2 + x 1

x 0 = 0.5 = 0.1
0.5


F(x)
x1 2x 1 + 2x 2 + 1 g 0 = F ( x ) = 3
F ( x ) = = x = x0
2x 1 + 4x 2 3
F(x)
x2

x 1 = x 0 g 0 = 0.5 0.1 3 = 0.2


0.5 3 0.2

x 2 = x1 g 1 = 0.2 0.1 1.8 = 0.02


0.2 1.2 0.08
4
9 Plot
2

-1

-2
-2 -1 0 1 2

5
9 Stable Learning Rates (Quadratic)
1 T T
F ( x ) = --- x Ax + d x + c
2

F ( x ) = Ax + d

x k + 1 = x k g k = x k ( Ax k + d ) xk + 1 = [ I A ] xk d
Stability is determined
by the eigenvalues of
this matrix.

[ I A ] z i = z i Az i = zi i zi = ( 1 i ) z i

(i - eigenvalue of A) Eigenvalues
of [I - A].

Stability Requirement:
2 2
( 1 i ) < 1 < ---- < ------------
i max
6
9 Example
0.851 , = 5.24, z = 0.526
A = 22 (
1 = 0.764 ) , 1
z = 2 2
2 4 0.526 0.851

2 2
< ------------ = ---------- = 0.38
max 5.24

= 0.37 = 0.39
2 2

1 1

0 0

-1 -1

-2 -2
-2 -1 0 1 2 -2 -1 0 1 2
7
9 Minimizing Along a Line
Choose k to minimize F ( x k + k pk )

d T T
---------(F ( x k + k p k )) = F ( x ) p k + k p k 2 F ( x ) p
d k x = xk x = xk k

T
F ( x ) pk T
g k pk
x = xk - = -------------------
k = ----------------------------------------------- -
T 2 T
pk F ( x ) pk pk Ak pk
x = xk

where

A k 2F ( x )
x = xk

8
9 Example

1 T
F ( x ) = --- x 2 2 x + 1 0 x x 0 = 0.5
2 2 4 0.5


F(x)
x1 2x 1 + 2x 2 + 1
F ( x ) = = p 0 = g 0 = F ( x ) = 3
2x 1 + 4x 2 x = x0 3
F(x)
x2

3
33
3
0 = -------------------------------------------- = 0.2 x1 = x0 0 g0 = 0.5 0.2 3 = 0.1
2 2 3 0.5 3 0.1
3 3
2 4 3

9
9 Plot
Contour Plot
2

Successive steps are orthogonal.


x2

-1

-2
-2 -1 0 1 2
x1

d d T d
F ( xk + k pk ) = F ( x k + 1 ) = F ( x ) [ x + k pk ]
d k d k x = xk + 1 d k k

T T
= F ( x ) pk = gk + 1 pk
x = xk + 1
10
9 Newtons Method
T 1 T
F ( x k + 1 ) = F ( x k + x k ) F ( x k ) + g k x k + --- x k A k x k
2

Take the gradient of this second-order approximation


and set it equal to zero to find the stationary point:

gk + Ak xk = 0

1
xk = Ak gk

1
x k + 1 = xk Ak g k

11
9 Example
2 2
F ( x ) = x 1 + 2x 1 x 2 + 2x 2 + x 1

x 0 = 0.5
0.5

g 0 = F ( x ) = 3
F(x) x = x0
x1 2x 1 + 2x 2 + 1 3
F ( x ) = =
2x 1 + 4x 2
F(x)
x2 A = 22
2 4

1
x 1 = 0.5 2 2 3
=
0.5

1 0.5 3
=
0.5

1.5
=
1
0.5 2 4 3 0.5 0.5 0.5 3 0.5 0 0.5

12
9 Plot

-1

-2
-2 -1 0 1 2

13
9 Non-Quadratic Example
4
F ( x ) = ( x 2 x 1 ) + 8x 1 x 2 x 1 + x 2 + 3

x = 0.42 x = 0.13 0.55


1 2 3
Stationary Points: x =
0.42 0.13 0.55

F(x) F2(x)
2 2

1 1

0 0

-1 -1

-2 -2
-2 -1 0 1 2 -2 -1 0 1 2
14
9 Different Initial Conditions
2 2 2

1 1 1

F(x) 0 0 0

-1 -1 -1

-2 -2 -2
-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2

2 2 2

1 1 1

F2(x) 0 0 0

-1 -1 -1

-2 -2 -2
-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2

15
9 Conjugate Vectors
1 T T
F ( x ) = --- x Ax + d x + c
2

A set of vectors is mutually conjugate with respect to a positive


definite Hessian matrix A if
p Tk Ap j = 0 k j

One set of conjugate vectors consists of the eigenvectors of A.

T T
z k Az j = j z k z j = 0 k j

(The eigenvectors of symmetric matrices are orthogonal.)


16
9 For Quadratic Functions

F ( x ) = Ax + d

2 F ( x ) = A

The change in the gradient at iteration k is


g k = g k + 1 g k = ( Ax k + 1 + d ) ( Ax k + d ) = A x k

where
x k = ( x k + 1 x k ) = k p k

The conjugacy conditions can be rewritten


T T T
k p k Ap j = x k Ap j = g k p j = 0 k j

This does not require knowledge of the Hessian matrix.


17
9 Forming Conjugate Directions

Choose the initial search direction as the negative of the gradient.


p0 = g0

Choose subsequent search directions to be conjugate.


pk = gk + k pk 1

where
T T T
g k 1 g k gk gk g k 1 g k
k = ----------------------------
T
- or k = ------------------------- or k = -------------------------
g k 1 p k 1 g Tk 1 g k 1 g Tk 1 g k 1

18
9 Conjugate Gradient algorithm
The first search direction is the negative of the gradient.
p0 = g0

Select the learning rate to minimize along the line.


T
F ( x ) pk T
g k pk
x = xk - = ------------------- (For quadratic
k = ----------------------------------------------- -
T 2 T functions.)
pk F ( x ) pk pk Ak pk
x = xk
Select the next search direction using
pk = gk + k pk 1

If the algorithm has not converged, return to second step.

A quadratic function will be minimized in n steps.


19
9 Example

1 T
F ( x ) = --- x 2 2 x + 1 0 x x 0 = 0.5
2 2 4 0.5


F(x)
x1 2x 1 + 2x 2 + 1
F ( x ) = = p 0 = g 0 = F ( x ) = 3
2x 1 + 4x 2 x = x0 3
F(x)
x2

3
33
3
0 = -------------------------------------------- = 0.2 x1 = x0 0 g0 = 0.5 0.2 3 = 0.1
2 2 3 0.5 3 0.1
3 3
2 4 3

20
9 Example

g 1 = F ( x ) = 2 2 0.1 + 1 = 0.6
x = x1 2 4 0.1 0 0.6

0.6
0.6 0.6
g T1 g 1 0.6 0.72
1 = ------------ = ----------------------------------------- = ---------- = 0.04
T
g0 g0 18
3
3 3
3

p 1 = g1 + 1 p 0 = 0.6 + 0.04 3 = 0.72


0.6 3 0.48

0.72
0.6 0.6
0.48 0.72
1 = --------------------------------------------------------------- = ------------- = 1.25
0.576
2 2 0.72
0.72 0.48
2 4 0.48
21
9 Plots

x 2 = x1 + 1 p 1 = 0.1 + 1.25 0.72 = 1


0.1 0.48 0.5

Conjugate Gradient Steepest Descent


Contour Plot
2

x2
0

-1

-2
-2 -1 0 1 2
x1

22

Вам также может понравиться