Академический Документы
Профессиональный Документы
Культура Документы
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
16.323 Lecture 1
Nonlinear Optimization
Unconstrained nonlinear optimization
Line search methods
Spr 2008
Basics Unconstrained
16.323 11
F(x)
0
2
1.5
0.5
0
x
0.5
1.5
Spr 2008
16.323 12
2F
x1xn
T
x1
x21
x
1
.
.
...
..
..
x =
...
, g =
=
...
, G =
2F
2F
F
xn
xn
xn x1
x2
n
Spr 2008
16.323 13
F (x + x) F (x) + xT G(x)x + . . .
2
For a strong minimum, need xT G(x)x > 0 for all x, which
is sucient to ensure that F (x + x) > F (x).
To be true for arbitrary x =
0, sucient condition is that
G(x) > 0 (PD). 1
Second order necessary condition for a strong minimum is that
G(x) 0 (PSD), because in this case the higher order terms in
the expansion can play an important role, i.e.
xT G(x)x = 0
but the third term in the Taylor series expansion is positive.
1 Positive
Denite Matrix
Spr 2008
Solution Methods
16.323 14
k + pk ) F (x
k) +
Fk+1 F (x
gkT pk < 0
Spr 2008
Line Search
16.323 15
Line Search - given a search direction, must decide how far to step
Expression xk+1 = xk + k pk gives a new solution for all possible
values of - what is the right value to pick?
Note that pk denes a slice through solution space is a very spe
cic combination of how the elements of x will change together.
Would like to pick k to minimize F (xk + k pk )
Can do this line search in gory detail, but that would be very time
consuming
Often want this process to be fast, accurate, and easy
Especially if you are not that condent in the choice of pk
1
0
1
x0 =
p0 =
x1 = x0 + p0 =
1
2
1 + 2
which gives that F = 1 + (1 + 2) + (1 + 2)2 so that
F
= 2 + 2(1 + 2)(2) = 0
Spr 2008
16.323 16
Figure 1.2: F (x) = x21 + x1 x2 + x22 doing a line search in arbitrary direction
Spr 2008
16.323 17
Line Search II
First step: search along the line until you think you have bracketed a
local minimum
F(x)
a1 b1
a2
8
x
b2
a3
b3
a4
a5
b4
b5
Once you think you have a bracket of the local min what is the
smallest number of function evaluations that can be made to reduce
the size of the bracket?
Many ways to do this:
Golden Section Search
Bisection
Polynomial approximations
First 2 have linear convergence, last one has superlinear
Polynomial approximation approach
Approximate function as quadratic/cubic in the interval and use
the minimum of that polynomial as the estimate of the local min.
Use with care since it can go very wrong but it is a good termi
nation approach.
Spr 2008
16.323 18
g
+
v
w
b
x = a + (b a) 1
gb ga + 2v
where
v=
w2 gagb and w =
3
(Fa Fb) + ga + gb
ba
Content from: Scales, L. E. Introduction to Non-Linear Optimization. New York, NY: Springer, 1985, pp. 40.
Removed due to copyright restrictions.
Spr 2008
16.323 19
Observations:
Tends to work well near a function local minimum (good con
vergence behavior)
But can be very poor far away use a hybrid approach of
bisection followed by cubic.
Rule of thumb: do not bother making the linear search too accu
rate, especially at the beginning
A waste of time and eort
Check the min tolerance and reduce it as it you think you are
approaching the overall solution.
Spr 2008
16.323 110
T
x1
x.. 1
.
.
xk
=
=
.
,
gk =
x
F
xn
xn
k
x
2
F
2F
x1 xn
x. 21
.
...
..
..
Gk
=
2
F
2F
xn x1
x2
n
k
x
Spr 2008
16.323 111
FMINUNC Example
3
3
0
x1
Rosenbrock with GS
x2
x2
3
3
0
x1
x2
0
1
1000
500
2
0
3
3
0
x1
x2
3
3
Spr 2008
16.323 112
x2
3
3
0
x
Rosenbrock with GS
x2
3
3
0
x1
Spr 2008
16.323 113
x2
3
3
0
x
1000
500
2
0
3
2
x1
3
3
x2
Spr 2008
16.323 114
Observations:
1. Typically not a good idea to start the optimization with QN, and
I often nd that it is better to do GS for 100 iterations, and then
switch over to QN for the termination phase.
0 tends to be very important standard process is to try many
2. x
dierent cases to see if you can nd consistency in the answers.
6
F(x)
0
2
1.5
0.5
0
x
0.5
1.5
Figure 1.7: Shows how the point of convergence changes as a function of the initial
condition.
Spr 2008
16.323 115
1
2
function [F,G]=rosen(x)
%global xpath
3
4
%F=100*(x(1)^2-x(2))^2+(1-x(1))^2;
5
6
7
8
9
F=100*(x(:,2)-x(:,1).^2).^2+(1-x(:,1)).^2;
G=[100*(4*x(1)^3-4*x(1)*x(2))+2*x(1)-2; 100*(2*x(2)-2*x(1)^2)];
10
11
return
12
13
14
15
16
17
global xpath
18
19
20
21
22
23
24
25
clear FF
for ii=1:N,
for jj=1:N,
FF(ii,jj)=rosen([x1(ii) x2(jj)]);
end,
end
26
27
28
29
30
31
32
33
% quasi-newton
xpath=[];t0=clock;
opt=optimset(fminunc);
opt=optimset(opt,Hessupdate,bfgs,gradobj,on,Display,Iter,...
LargeScale,off,InitialHessType,identity,...
MaxFunEvals,150,OutputFcn, @outftn);
34
35
x0=[-1.9 2];
36
37
38
xout1=fminunc(rosen,x0,opt) % quasi-newton
xbfgs=xpath;
39
40
41
42
43
44
45
46
47
% gradient search
xpath=[];
opt=optimset(fminunc);
opt=optimset(opt,Hessupdate,steepdesc,gradobj,on,Display,Iter,...
LargeScale,off,InitialHessType,identity,MaxFunEvals,2000,MaxIter,1000,OutputFcn, @outftn);
xout=fminunc(rosen,x0,opt)
xgs=xpath;
48
49
50
51
52
53
54
55
56
57
58
59
60
xpath=[];
opt=optimset(fminunc);
opt=optimset(opt,Hessupdate,steepdesc,gradobj,on,Display,Iter,...
LargeScale,off,InitialHessType,identity,MaxFunEvals,5,OutputFcn, @outftn);
xout=fminunc(rosen,x0,opt)
opt=optimset(fminunc);
opt=optimset(opt,Hessupdate,bfgs,gradobj,on,Display,Iter,...
LargeScale,off,InitialHessType,identity,MaxFunEvals,150,OutputFcn, @outftn);
xout=fminunc(rosen,xout,opt)
61
62
xhyb=xpath;
63
64
65
66
67
figure(1);clf
contour(x1,x2,FF,[0:2:10 15:50:1000])
hold on
plot(x0(1),x0(2),ro,Markersize,12)
Spr 2008
68
69
70
71
72
73
74
plot(1,1,rs,Markersize,12)
plot(xbfgs(:,1),xbfgs(:,2),bd,Markersize,12)
hold off
xlabel(x_1)
ylabel(x_2)
75
76
77
78
79
80
81
82
83
84
85
86
figure(2);clf
contour(x1,x2,FF,[0:2:10 15:50:1000])
hold on
xlabel(x_1)
ylabel(x_2)
plot(x0(1),x0(2),ro,Markersize,12)
plot(1,1,rs,Markersize,12)
plot(xgs(:,1),xgs(:,2),m+,Markersize,12)
hold off
87
88
89
90
91
92
93
94
95
96
97
98
figure(3);clf
contour(x1,x2,FF,[0:2:10 15:50:1000])
hold on
xlabel(x_1)
ylabel(x_2)
plot(x0(1),x0(2),ro,Markersize,12)
plot(1,1,rs,Markersize,12)
plot(xhyb(:,1),xhyb(:,2),m+,Markersize,12)
hold off
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
figure(4);clf
mesh(x1,x2,FF)
hold on
plot3(x0(1),x0(2),rosen(x0)+5,ro,Markersize,12,MarkerFaceColor,r)
plot3(1,1,rosen([1 1]),ms,Markersize,12,MarkerFaceColor,m)
plot3(xbfgs(:,1),xbfgs(:,2),rosen(xbfgs)+5,gd,MarkerFaceColor,g)
%plot3(xgs(:,1),xgs(:,2),rosen(xgs)+5,m+)
hold off
axis([-3 3 -3 3 0 1000])
hh=get(gcf,children);
xlabel(x_1)
ylabel(x_2)
set(hh,View,[-177 89.861],CameraPosition,[-0.585976 11.1811 5116.63]);%
print -depsc rosen2.eps;jpdf(rosen2)
114
2
3
4
5
global xpath
stop=0;
xpath=[xpath;x];
6
7
return
16.323 116