Вы находитесь на странице: 1из 15

EE236C (Spring 2010-11)

6. Quasi-Newton methods

variable metric methods quasi-Newton methods BFGS update limited-memory quasi-Newton methods

6-1

Newton method for unconstrained minimization


minimize f (x) f convex, twice continously dierentiable

Newton method x+ = x t2f (x)1f (x) advantages: fast convergence, ane invariance disadvantages: requires second derivatives, solution of linear equation can be too expensive for large scale applications

Quasi-Newton methods

6-2

Variable metric methods


x+ = x tH 1f (x) H 0 is approximation of the Hessian at x, chosen to: avoid calculation of second derivatives simplify computation of search direction variable metric interpretation (EE236B, lecture 10, page 11) x = H 1f (x) is steepest descent direction at x for quadratic norm z
H

= z T Hz

1/2

Quasi-Newton methods

6-3

Quasi-Newton methods

given starting point x(0) dom f , H0 0 for k = 1, 2, . . ., until a stopping criterion is satised
1 1. compute quasi-Newton direction x = Hk1f (x(k1))

2. determine step size t (e.g., by backtracking line search) 3. compute x(k) = x(k1) + tx 4. compute Hk

dierent methods use dierent rules for updating H in step 4


1 can also propagate Hk to simplify calculation of x

Quasi-Newton methods

6-4

Broyden-Fletcher-Goldfarb-Shanno (BFGS) update


BFGS update yy T Hk1ssT Hk1 Hk = Hk1 + T y s sT Hk1s where s = x(k) x(k1), inverse update
1 Hk =

y = f (x(k)) f (x(k1))

ysT sy T 1 Hk1 I T I T y s y s

ssT + T y s

note that y T s > 0 for strictly convex f ; see page 1-10 cost of update or inverse update is O(n2) operations
Quasi-Newton methods 6-5

Positive deniteness
if y T s > 0, BFGS update preserves positive denitess of Hk

proof: from inverse update formula,


1 v T Hk v

sT v v T y s y

T 1 Hk1

sT v (sT v)2 v T y + T s y y s

if Hk1 0, both terms are nonnegative for all v second term is zero only if sT v = 0; then rst term is zero only if v = 0

1 this ensures that x = Hk f (x(k)) is a descent direction

Quasi-Newton methods

6-6

Secant condition
BFGS update satises the secant condition Hk s = y, i.e., Hk (x(k) x(k1)) = f (x(k)) f (x(k1))

interpretation: dene second-order approximation at x(k) fquad(z) = f (x


(k)

) + f (x

(k) T

) (z x

(k)

1 ) + (z x(k))T Hk (z x(k)) 2

secant condition implies that gradient of fquad agrees with f at x(k1): fquad(x(k1)) = f (x(k)) + Hk (x(k1) x(k)) = f (x(k1))

Quasi-Newton methods

6-7

secant method for f : R R, BFGS with unit step size gives the secant method x(k+1) f (x(k)) = x(k) , Hk f (x(k)) f (x(k1)) Hk = x(k) x(k1)

x(k1)

x(k)

x(k+1)

fquad(z)

f (z)

Quasi-Newton methods

6-8

Convergence
global result if f is strongly convex, BFGS with backtracking line search (EE236B, lecture 10-6) converges from any x(0), H (0) 0

local convergence if f is strongly convex and 2f (x) is Lipschitz continuous, local convergence is superlinear : for suciently large k, x(k+1) x
2

ck x(k) x

where ck 0 (cf., quadratic local convergence of Newton method)

Quasi-Newton methods

6-9

Example
m

minimize cT x
i=1

log(bi aT x) i

n = 100, m = 500
Newton
10 10
2

BFGS
10 10
2 0

10 10 10 10 10 10

-2

f (x(k)) f
0 1 2 3 4 5 6 7 8 9

f (x(k)) f

10 10 10 10 10 10

-2

-4

-4

-6

-6

-8

-8

-10

-10

-12

-12

20

40

60

80

100

120

140

cost per Newton iteration: O(n3) plus computing 2f (x) cost per BFGS iteration: O(n2)
Quasi-Newton methods 6-10

Square root BFGS update


to improve numerical stability, can propagate Hk in factored form

if Hk1 = Lk1LT then Hk = Lk LT with k1 k Lk = Lk1 where y = L1 y, k1 s = Lk1s, = s s yT s


T

( s) sT y I+ sT s

1/2

if Lk1 is triangular, cost of reducing Lk to triangular is O(n2)

Quasi-Newton methods

6-11

Optimality of BFGS update


X = Hk solves the convex optimization problem
1 1 minimize tr(Hk1X) log det(Hk1X) n subject to Xs = y

cost function is nonnegative, equal to zero only if X = Hk1 also known as relative entropy between densities N (0, X), N (0, Hk1) optimality result follows from KKT conditions: X = Hk satises X with = 1 sT y
1 2Hk1y 1

1 Hk1

1 T (s + sT ), 2

Xs = y,

X0

1 y T Hk1y 1+ yT s

Quasi-Newton methods

6-12

Davidon-Fletcher-Powell (DFP) update


switch Hk1 and X in objective on previous page minimize tr(Hk1X 1) log det(Hk1X 1) n subject to Xs = y minimize relative entropy between N (0, Hk1) and N (0, X) problem is convex in X 1 (with constraint written as s = X 1y) solution is dual of BFGS formula Hk = sy T ysT Hk1 I T I T s y s y yy T + T s y

(known as DFP update) pre-dates BFGS update, but is less often used
Quasi-Newton methods 6-13

Limited memory quasi-Newton methods


1 main disadvantage of quasi-Newton method is need to store Hk or Hk 1 limited-memory BFGS (L-BFGS): do not store Hk explicitly

instead we store the m (e.g., m = 30) most recent values of sj = x(j) x(j1), yj = f (x(j)) f (x(j1))

1 we evaluate x = Hk f (x(k)) recursively, using T s j yj I T yj s j

1 Hj =

1 Hj1

yj s T j I T yj s j

sj sT j + T yj s j

1 for j = k, k 1, . . . , k m + 1, assuming, for example, Hkm = I

cost per iteration is O(nm); storage is O(nm)


Quasi-Newton methods 6-14

References
J. Nocedal and S. J. Wright, Numerical Optimization (2006), chapters 6 and 7 J. E. Dennis and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations (1983)

Quasi-Newton methods

6-15

Вам также может понравиться