Академический Документы
Профессиональный Документы
Культура Документы
Sangkyun Lee
The content is from Nocedal and Wright (2006). Topics marked with ** are
optional.
4
5
6
7
Input: x0 ;
for k = 0, 1, 2, . . . do
Choose Bk = 2 f (xk ) + Ek , where Ek = 0 if 2 f (xk ) is sufficiently
positive definite; otherwise Ek is chosen to ensure that Bk is sufficiently
positive definite;
Solve Bk pk = f (xk );
Choose k from Wolfe or Armijo backtracking linesearch;
xk+1 xk + k pk ;
end
1.1
Global Convergence
Quasi-Newton Methods
(Secant Equation),
(Curvature Condition).
This condition is guaranteed to hold if k is chosen from the Wolfe (or strong Wolfe)
line search. From the second Wolfe condition (curvature condition), we have (for
c2 (0, 1)),
f (xk+1 )T sk c2 f (xk )T sk ,
and therefore
ykT sk = [f (xk+1 ) f (xk )]T sk (c2 1)k f (xk )T pk .
Since c2 < 1 and pk is a descent direction, ykT sk > 0.
2.1
The secant equation has possibly many solutions. We obtain a unique Bk by considering the following optimization problem,
min kB Bk k,
B
s.t. B = B T , Bsk = yk .
Choosing a weighted Frobenius norm in the objective, we obtain the following expression,
Bk+1 = (I k yk sTk )Bk (I k sk ykT ) + k yk ykT ,
2
(DFP)
(10.1)
Algorithm 2: BFGS
1
2
3
4
5
6
7
8
9
10
where
k =
1
ykT sk
This is first proposed by Davidon in 1959 and further studied by Fletcher and
Powell.
The inverse of Bk , denoted by Hk = Bk1 is indeed more useful for implementation, which can be derived using the Sherman-Morrison-Woodbury formula,
Hk+1 = Hk
2.2
sk sTk
Hk yk ykT Hk
+
.
ykT Hk yk
ykT sk
(DFP)
(10.2)
s.t. H = H T , Hyk = sk .
The unique solution is given by
Hk+1 = (I k sk ykT )Hk (I k yk sTk ) + k sk sTk ,
(BFGS)
(10.3)
for the same definition of k = yT1s . This is the most popular update rule for
k k
quasi-Newton methods.
The formulation in terms of Bk can be derived using the Sherman-MorrisonWoodbury formula,
Bk+1 = Bk
yk ykT
Bk sk sTk Bk
+
.
sTk Bk sk
ykT sk
(BFGS)
(10.4)
Using this would be less favorable since it involves a matrix inversion (although
there are some work-arounds).
A nice property of BFGS is that if Hk is positive definite and ykT sk > 0 (so that
k > 0), then Hk+1 is also positive definite. For any nonzero vector z,
z T Hk+1 z = wT Hk w + k (sTk z)2 0
3
where w := zk yk (sTk z). The RHS can be zero only if sTk z = 0, but then w = z 6= 0
implying that the first term is positive.
Comparing to DFP (and other update rules), BFGS tends to correct inaccurate
approximations automatically (given that step sizes satisfy Wolfe conditions).
(10.5)
then
the step size k = 1 is admissible for all k > k0 , for a certain index k0 , and
if k = 1 for all k > k0 , then {xk } converges to x superlinearly.
We make a few observations.
Newtons direction pk = 2 f (xk )f (xk ) satisfies the condition (10.5).
It is easy to check that For pk = Bk1 2 f (xk ), the search direction condition
becomes
k(Bk 2 f (xk ))pk k2
lim
= 0,
(10.6)
k
kpk k2
and it suffice that Bk becomes increasingly accurate approximations to 2 f (xk )
along the search directions pk ; it is not necessary that Bk must converge to 2 f (x ).
3.1
Convergence of BFGS
m,
M,
sTk sk
ykT sk
for all x L. The sequence {xk } generated by Algorithm 2 (with = 0) converges
to the minimizer x of f .
This theorem generalizes for other quasi-Newton methods (e.g. the restricted
Broyden class), but not for the DFP method.
4
3.1.1
X
kxk x k2 < ,
k=1
and the Hessian of f Lipschitz continuous near x , then we can show that the
BFGS algorithm satisfies the Dennis and More characterization (10.5) and therefore
achieves superlinear convergence.
References
Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer, second
edition.