Ch9 Presnedit PDF

9
Performance Optimization
1
9 Basic Optimization Algorithm
x k + 1 = x k + k pk
or
x k = ( x k + 1 x k ) = k p k
xk + 1
k pk
xk
pk - Search Direction
k - Learning Rate
2
9 Steepest Descent
Choose the next step so that the function decreases:
F ( xk + 1 ) < F ( xk )
For small changes in x we can approximate F(x):

T
F ( x k + 1 ) = F ( x k + x k ) F ( x k ) + g k x k
where
g k F ( x )
x = xk
If we want the function to decrease:

T T
g k x k = k g k p k < 0
We can maximize the decrease by choosing:

pk = gk
xk + 1 = xk k gk
3
9 Example
2 2
F ( x ) = x 1 + 2x 1 x 2 + 2x 2 + x 1
x 0 = 0.5 = 0.1
0.5

F(x)
x1 2x 1 + 2x 2 + 1 g 0 = F ( x ) = 3
F ( x ) = = x = x0
2x 1 + 4x 2 3
F(x)
x2
x 1 = x 0 g 0 = 0.5 0.1 3 = 0.2

0.5 3 0.2
x 2 = x1 g 1 = 0.2 0.1 1.8 = 0.02

0.2 1.2 0.08
4
9 Plot
2
-1
-2
-2 -1 0 1 2
5
9 Stable Learning Rates (Quadratic)
1 T T
F ( x ) = --- x Ax + d x + c
2
F ( x ) = Ax + d
x k + 1 = x k g k = x k ( Ax k + d ) xk + 1 = [ I A ] xk d
Stability is determined
by the eigenvalues of
this matrix.
[ I A ] z i = z i Az i = zi i zi = ( 1 i ) z i
(i - eigenvalue of A) Eigenvalues
of [I - A].
Stability Requirement:
2 2
( 1 i ) < 1 < ---- < ------------
i max
6
9 Example
0.851 , = 5.24, z = 0.526
A = 22 (
1 = 0.764 ) , 1
z = 2 2
2 4 0.526 0.851
2 2
< ------------ = ---------- = 0.38
max 5.24
= 0.37 = 0.39
2 2
1 1
0 0
-1 -1
-2 -2
-2 -1 0 1 2 -2 -1 0 1 2
7
9 Minimizing Along a Line
Choose k to minimize F ( x k + k pk )
d T T
---------(F ( x k + k p k )) = F ( x ) p k + k p k 2 F ( x ) p
d k x = xk x = xk k
T
F ( x ) pk T
g k pk
x = xk - = -------------------
k = ----------------------------------------------- -
T 2 T
pk F ( x ) pk pk Ak pk
x = xk
where
A k 2F ( x )
x = xk
8
9 Example
1 T
F ( x ) = --- x 2 2 x + 1 0 x x 0 = 0.5
2 2 4 0.5

F(x)
x1 2x 1 + 2x 2 + 1
F ( x ) = = p 0 = g 0 = F ( x ) = 3
2x 1 + 4x 2 x = x0 3
F(x)
x2
3
33
3
0 = -------------------------------------------- = 0.2 x1 = x0 0 g0 = 0.5 0.2 3 = 0.1
2 2 3 0.5 3 0.1
3 3
2 4 3
9
9 Plot
Contour Plot
2
Successive steps are orthogonal.

x2
-1
-2
-2 -1 0 1 2
x1
d d T d
F ( xk + k pk ) = F ( x k + 1 ) = F ( x ) [ x + k pk ]
d k d k x = xk + 1 d k k
T T
= F ( x ) pk = gk + 1 pk
x = xk + 1
10
9 Newtons Method
T 1 T
F ( x k + 1 ) = F ( x k + x k ) F ( x k ) + g k x k + --- x k A k x k
2
Take the gradient of this second-order approximation

and set it equal to zero to find the stationary point:
gk + Ak xk = 0
1
xk = Ak gk
1
x k + 1 = xk Ak g k
11
9 Example
2 2
F ( x ) = x 1 + 2x 1 x 2 + 2x 2 + x 1
x 0 = 0.5
0.5
g 0 = F ( x ) = 3
F(x) x = x0
x1 2x 1 + 2x 2 + 1 3
F ( x ) = =
2x 1 + 4x 2
F(x)
x2 A = 22
2 4
1
x 1 = 0.5 2 2 3
=
0.5

1 0.5 3
=
0.5

1.5
=
1
0.5 2 4 3 0.5 0.5 0.5 3 0.5 0 0.5
12
9 Plot
-1
-2
-2 -1 0 1 2
13
9 Non-Quadratic Example
4
F ( x ) = ( x 2 x 1 ) + 8x 1 x 2 x 1 + x 2 + 3
x = 0.42 x = 0.13 0.55

1 2 3
Stationary Points: x =
0.42 0.13 0.55
F(x) F2(x)
2 2
1 1
0 0
-1 -1
-2 -2
-2 -1 0 1 2 -2 -1 0 1 2
14
9 Different Initial Conditions
2 2 2
1 1 1
F(x) 0 0 0
-1 -1 -1
-2 -2 -2
-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2
2 2 2
1 1 1
F2(x) 0 0 0
-1 -1 -1
-2 -2 -2
-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2
15
9 Conjugate Vectors
1 T T
F ( x ) = --- x Ax + d x + c
2
A set of vectors is mutually conjugate with respect to a positive

definite Hessian matrix A if
p Tk Ap j = 0 k j
One set of conjugate vectors consists of the eigenvectors of A.
T T
z k Az j = j z k z j = 0 k j
(The eigenvectors of symmetric matrices are orthogonal.)

16
9 For Quadratic Functions
F ( x ) = Ax + d
2 F ( x ) = A
The change in the gradient at iteration k is

g k = g k + 1 g k = ( Ax k + 1 + d ) ( Ax k + d ) = A x k
where
x k = ( x k + 1 x k ) = k p k
The conjugacy conditions can be rewritten

T T T
k p k Ap j = x k Ap j = g k p j = 0 k j
This does not require knowledge of the Hessian matrix.

17
9 Forming Conjugate Directions
Choose the initial search direction as the negative of the gradient.

p0 = g0
Choose subsequent search directions to be conjugate.

pk = gk + k pk 1
where
T T T
g k 1 g k gk gk g k 1 g k
k = ----------------------------
T
- or k = ------------------------- or k = -------------------------
g k 1 p k 1 g Tk 1 g k 1 g Tk 1 g k 1
18
9 Conjugate Gradient algorithm
The first search direction is the negative of the gradient.
p0 = g0
Select the learning rate to minimize along the line.

T
F ( x ) pk T
g k pk
x = xk - = ------------------- (For quadratic
k = ----------------------------------------------- -
T 2 T functions.)
pk F ( x ) pk pk Ak pk
x = xk
Select the next search direction using
pk = gk + k pk 1
If the algorithm has not converged, return to second step.
A quadratic function will be minimized in n steps.

19
9 Example
1 T
F ( x ) = --- x 2 2 x + 1 0 x x 0 = 0.5
2 2 4 0.5

F(x)
x1 2x 1 + 2x 2 + 1
F ( x ) = = p 0 = g 0 = F ( x ) = 3
2x 1 + 4x 2 x = x0 3
F(x)
x2
3
33
3
0 = -------------------------------------------- = 0.2 x1 = x0 0 g0 = 0.5 0.2 3 = 0.1
2 2 3 0.5 3 0.1
3 3
2 4 3
20
9 Example
g 1 = F ( x ) = 2 2 0.1 + 1 = 0.6
x = x1 2 4 0.1 0 0.6
0.6
0.6 0.6
g T1 g 1 0.6 0.72
1 = ------------ = ----------------------------------------- = ---------- = 0.04
T
g0 g0 18
3
3 3
3
p 1 = g1 + 1 p 0 = 0.6 + 0.04 3 = 0.72

0.6 3 0.48
0.72
0.6 0.6
0.48 0.72
1 = --------------------------------------------------------------- = ------------- = 1.25
0.576
2 2 0.72
0.72 0.48
2 4 0.48
21
9 Plots
x 2 = x1 + 1 p 1 = 0.1 + 1.25 0.72 = 1

0.1 0.48 0.5
Conjugate Gradient Steepest Descent

Contour Plot
2
x2
0
-1
-2
-2 -1 0 1 2
x1
22

Ch9 Presnedit PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Ch9 Presnedit PDF

Загружено:

Авторское право:

Доступные форматы

9

For small changes in x we can approximate F(x):

If we want the function to decrease:

We can maximize the decrease by choosing:

x 1 = x 0 g 0 = 0.5 0.1 3 = 0.2

x 2 = x1 g 1 = 0.2 0.1 1.8 = 0.02

Successive steps are orthogonal.

Take the gradient of this second-order approximation

x = 0.42 x = 0.13 0.55

A set of vectors is mutually conjugate with respect to a positive

One set of conjugate vectors consists of the eigenvectors of A.

(The eigenvectors of symmetric matrices are orthogonal.)

The change in the gradient at iteration k is

The conjugacy conditions can be rewritten

This does not require knowledge of the Hessian matrix.

Choose the initial search direction as the negative of the gradient.

Choose subsequent search directions to be conjugate.

Select the learning rate to minimize along the line.

If the algorithm has not converged, return to second step.

A quadratic function will be minimized in n steps.

p 1 = g1 + 1 p 0 = 0.6 + 0.04 3 = 0.72

x 2 = x1 + 1 p 1 = 0.1 + 1.25 0.72 = 1

Conjugate Gradient Steepest Descent

Вам также может понравиться