Академический Документы
Профессиональный Документы
Культура Документы
6. Quasi-Newton methods
variable metric methods quasi-Newton methods BFGS update limited-memory quasi-Newton methods
6-1
Newton method x+ = x t2f (x)1f (x) advantages: fast convergence, ane invariance disadvantages: requires second derivatives, solution of linear equation can be too expensive for large scale applications
Quasi-Newton methods
6-2
= z T Hz
1/2
Quasi-Newton methods
6-3
Quasi-Newton methods
given starting point x(0) dom f , H0 0 for k = 1, 2, . . ., until a stopping criterion is satised
1 1. compute quasi-Newton direction x = Hk1f (x(k1))
2. determine step size t (e.g., by backtracking line search) 3. compute x(k) = x(k1) + tx 4. compute Hk
Quasi-Newton methods
6-4
y = f (x(k)) f (x(k1))
ysT sy T 1 Hk1 I T I T y s y s
ssT + T y s
note that y T s > 0 for strictly convex f ; see page 1-10 cost of update or inverse update is O(n2) operations
Quasi-Newton methods 6-5
Positive deniteness
if y T s > 0, BFGS update preserves positive denitess of Hk
sT v v T y s y
T 1 Hk1
sT v (sT v)2 v T y + T s y y s
if Hk1 0, both terms are nonnegative for all v second term is zero only if sT v = 0; then rst term is zero only if v = 0
Quasi-Newton methods
6-6
Secant condition
BFGS update satises the secant condition Hk s = y, i.e., Hk (x(k) x(k1)) = f (x(k)) f (x(k1))
) + f (x
(k) T
) (z x
(k)
1 ) + (z x(k))T Hk (z x(k)) 2
secant condition implies that gradient of fquad agrees with f at x(k1): fquad(x(k1)) = f (x(k)) + Hk (x(k1) x(k)) = f (x(k1))
Quasi-Newton methods
6-7
secant method for f : R R, BFGS with unit step size gives the secant method x(k+1) f (x(k)) = x(k) , Hk f (x(k)) f (x(k1)) Hk = x(k) x(k1)
x(k1)
x(k)
x(k+1)
fquad(z)
f (z)
Quasi-Newton methods
6-8
Convergence
global result if f is strongly convex, BFGS with backtracking line search (EE236B, lecture 10-6) converges from any x(0), H (0) 0
local convergence if f is strongly convex and 2f (x) is Lipschitz continuous, local convergence is superlinear : for suciently large k, x(k+1) x
2
ck x(k) x
Quasi-Newton methods
6-9
Example
m
minimize cT x
i=1
log(bi aT x) i
n = 100, m = 500
Newton
10 10
2
BFGS
10 10
2 0
10 10 10 10 10 10
-2
f (x(k)) f
0 1 2 3 4 5 6 7 8 9
f (x(k)) f
10 10 10 10 10 10
-2
-4
-4
-6
-6
-8
-8
-10
-10
-12
-12
20
40
60
80
100
120
140
cost per Newton iteration: O(n3) plus computing 2f (x) cost per BFGS iteration: O(n2)
Quasi-Newton methods 6-10
( s) sT y I+ sT s
1/2
Quasi-Newton methods
6-11
cost function is nonnegative, equal to zero only if X = Hk1 also known as relative entropy between densities N (0, X), N (0, Hk1) optimality result follows from KKT conditions: X = Hk satises X with = 1 sT y
1 2Hk1y 1
1 Hk1
1 T (s + sT ), 2
Xs = y,
X0
1 y T Hk1y 1+ yT s
Quasi-Newton methods
6-12
(known as DFP update) pre-dates BFGS update, but is less often used
Quasi-Newton methods 6-13
instead we store the m (e.g., m = 30) most recent values of sj = x(j) x(j1), yj = f (x(j)) f (x(j1))
1 Hj =
1 Hj1
yj s T j I T yj s j
sj sT j + T yj s j
References
J. Nocedal and S. J. Wright, Numerical Optimization (2006), chapters 6 and 7 J. E. Dennis and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations (1983)
Quasi-Newton methods
6-15