Convex Optimization M2

Convex Optimization M2
Lecture 3
A. dAspremont. Convex Optimization M2. 1/49
Duality
DMs
DM par email: dm.daspremont@gmail.com
Outline
Lagrange dual problem
weak and strong duality
geometric interpretation
optimality conditions
perturbation and sensitivity analysis
examples
theorems of alternatives
generalized inequalities
Lagrangian
standard form problem (not necessarily convex)
minimize f
0
(x)
subject to f
i
(x) 0, i = 1, . . . , m
h
i
(x) = 0, i = 1, . . . , p
variable x R
n
, domain T, optimal value p
Lagrangian: L : R
n
R
m
R
p
R, with domL = T R
m
R
p
,
L(x, , ) = f
0
(x) +
m
i=1
i
f
i
(x) +
p
i=1
i
h
i
(x)
weighted sum of objective and constraint functions

i
is Lagrange multiplier associated with f
i
(x) 0

i
is Lagrange multiplier associated with h
i
(x) = 0
Lagrange dual function
Lagrange dual function: g : R
m
R
p
R,
g(, ) = inf
xD
L(x, , )
= inf
xD
_
f
0
(x) +
m
i=1
i
f
i
(x) +
p
i=1
i
h
i
(x)
_
g is concave, can be for some ,
lower bound property: if _ 0, then g(, ) p
proof: if x is feasible and _ 0, then

f
0
( x) L( x, , ) inf
xD
L(x, , ) = g(, )
minimizing over all feasible x gives p
g(, )
Least-norm solution of linear equations
minimize x
T
x
subject to Ax = b
dual function
Lagrangian is L(x, ) = x
T
x +
T
(Ax b)
to minimize L over x, set gradient equal to zero:
x
L(x, ) = 2x + A
T
= 0 = x = (1/2)A
T
plug in in L to obtain g:
g() = L((1/2)A
T
, ) =
1
4
T
AA
T
b
T
a concave function of
lower bound property: p
(1/4)
T
AA
T
b
T
for all
Standard form LP
minimize c
T
x
subject to Ax = b, x _ 0
dual function
Lagrangian is
L(x, , ) = c
T
x +
T
(Ax b)
T
x
= b
T
+ (c + A
T
)
T
x
L is linear in x, hence
g(, ) = inf
x
L(x, , ) =
_
b
T
A
T
+ c = 0
otherwise
g is linear on ane domain (, ) [ A
T
+ c = 0, hence concave
b
T
if A
T
+ c _ 0
Equality constrained norm minimization
minimize |x|
subject to Ax = b
dual function
g() = inf
x
(|x|
T
Ax + b
T
) =
_
b
T
|A
T
|
1
otherwise
where |v|
= sup
u1
u
T
v is dual norm of | |
proof: follows from inf
x
(|x| y
T
x) = 0 if |y|
1, otherwise
if |y|
1, then |x| y
T
x 0 for all x, with equality if x = 0
if |y|
> 1, choose x = tu where |u| 1, u

T
y = |y|
> 1:
|x| y
T
x = t(|u| |y|
) as t
b
T
if |A
T
|
1
Two-way partitioning
minimize x
T
Wx
subject to x
2
i
= 1, i = 1, . . . , n
a nonconvex problem; feasible set contains 2
n
discrete points
interpretation: partition 1, . . . , n in two sets; W
ij
is cost of assigning i, j to
the same set; W
ij
is cost of assigning to dierent sets
dual function
g() = inf
x
(x
T
Wx +
i
(x
2
i
1)) = inf
x
x
T
(W +diag())x 1
T
=
_
1
T
W +diag() _ 0
otherwise
1
T
if W +diag() _ 0
example: =
min
(W)1 gives bound p
n
min
(W)
The dual problem
Lagrange dual problem
maximize g(, )
subject to _ 0
nds best lower bound on p
, obtained from Lagrange dual function

a convex optimization problem; optimal value denoted d
, are dual feasible if _ 0, (, ) domg

often simplied by making implicit constraint (, ) domg explicit
example: standard form LP and its dual (page 8)
minimize c
T
x
subject to Ax = b
x _ 0
maximize b
T
subject to A
T
+ c _ 0
Weak and strong duality
weak duality: d
always holds (for convex and nonconvex problems)

can be used to nd nontrivial lower bounds for dicult problems
for example, solving the SDP
maximize 1
T
subject to W +diag() _ 0
gives a lower bound for the two-way partitioning problem on page 10
strong duality: d
= p
does not hold in general

(usually) holds for convex problems
conditions that guarantee strong duality in convex problems are called
constraint qualications
Slaters constraint qualication
strong duality holds for a convex problem
minimize f
0
(x)
subject to f
i
(x) 0, i = 1, . . . , m
Ax = b
if it is strictly feasible, i.e.,
x int T : f
i
(x) < 0, i = 1, . . . , m, Ax = b
also guarantees that the dual optimum is attained (if p
> )
can be sharpened: e.g., can replace int T with relint T (interior relative to
ane hull); linear inequalities do not need to hold with strict inequality, . . .
there exist many other types of constraint qualications
Feasibility problems
feasibility problem A in x R
n
.
f
i
(x) < 0, i = 1, . . . , m, h
i
(x) = 0, i = 1, . . . , p
feasibility problem B in R
m
, R
p
.
_ 0, ,= 0, g(, ) 0
where g(, ) = inf
x
(
m
i=1
i
f
i
(x) +
p
i=1
i
h
i
(x))
feasibility problem B is convex (g is concave), even if problem A is not
A and B are always weak alternatives: at most one is feasible
proof: assume x satises A, , satisfy B
0 g(, )
m
i=1
i
f
i
( x) +
p
i=1
i
h
i
( x) < 0
A and B are strong alternatives if exactly one of the two is feasible (can
prove infeasibility of A by producing solution of B and vice-versa).
Inequality form LP
primal problem
minimize c
T
x
subject to Ax _ b
dual function
g() = inf
x
_
(c + A
T
)
T
x b
T
_
=
_
b
T
A
T
+ c = 0
otherwise
dual problem
maximize b
T
subject to A
T
+ c = 0, _ 0
from Slaters condition: p
= d
if A x b for some x
in fact, p
= d
except when primal and dual are infeasible

Quadratic program
primal problem (assume P S
n
++
)
minimize x
T
Px
subject to Ax _ b
dual function
g() = inf
x
_
x
T
Px +
T
(Ax b)
_
=
1
4
T
AP
1
A
T
b
T
dual problem
maximize (1/4)
T
AP
1
A
T
b
T
subject to _ 0
from Slaters condition: p
= d
if A x b for some x
in fact, p
= d
always
A nonconvex problem with strong duality
minimize x
T
Ax + 2b
T
x
subject to x
T
x 1
nonconvex if A ,_ 0
dual function: g() = inf
x
(x
T
(A + I)x + 2b
T
x )
unbounded below if A + I ,_ 0 or if A + I _ 0 and b , 1(A + I)
minimized by x = (A + I)
b otherwise: g() = b
T
(A + I)
b
dual problem and equivalent SDP:
maximize b
T
(A + I)
b
subject to A + I _ 0
b 1(A + I)
maximize t
subject to
_
A + I b
b
T
t
_
_ 0
strong duality although primal problem is not convex (not easy to show)
Geometric interpretation
For simplicity, consider problem with one constraint f
1
(x) 0
interpretation of dual function:
g() = inf
(u,t)G
(t + u), where ( = (f
1
(x), f
0
(x)) [ x T
G
p
g()
u + t = g()
t
u
G
p
t
u
u + t = g() is (non-vertical) supporting hyperplane to (
hyperplane intersects t-axis at t = g()
epigraph variation: same interpretation if ( is replaced with
/ = (u, t) [ f
1
(x) u, f
0
(x) t for some x T
A
p
g()
u + t = g()
t
u
strong duality
holds if there is a non-vertical supporting hyperplane to / at (0, p
)
for convex problem, / is convex, hence has supp. hyperplane at (0, p
)
Slaters condition: if there exist ( u,
t) / with u < 0, then supporting

hyperplanes at (0, p
) must be non-vertical
Complementary slackness
Assume strong duality holds, x
is primal optimal, (
) is dual optimal
f
0
(x
) = g(
) = inf
x
_
f
0
(x) +
m
i=1
i
f
i
(x) +
p
i=1
i
h
i
(x)
_
f
0
(x
) +
m
i=1
i
f
i
(x
) +
p
i=1
i
h
i
(x
)
f
0
(x
)
hence, the two inequalities hold with equality
x
minimizes L(x,
i
f
i
(x
) = 0 for i = 1, . . . , m (known as complementary slackness):
i
> 0 =f
i
(x
) = 0, f
i
(x
) < 0 =
i
= 0
Karush-Kuhn-Tucker (KKT) conditions
the following four conditions are called KKT conditions (for a problem with
dierentiable f
i
, h
i
):
1. Primal feasibility: f
i
(x) 0, i = 1, . . . , m, h
i
(x) = 0, i = 1, . . . , p
2. Dual feasibility: _ 0
3. Complementary slackness:
i
f
i
(x) = 0, i = 1, . . . , m
4. Gradient of Lagrangian with respect to x vanishes (rst order condition):
f
0
(x) +
m
i=1
i
f
i
(x) +
p
i=1
i
h
i
(x) = 0
If strong duality holds and x, , are optimal, then they must satisfy the KKT
conditions
KKT conditions for convex problem
If x,

, satisfy KKT for a convex problem, then they are optimal:
from complementary slackness: f
0
( x) = L( x,
, )
from 4th condition (and convexity): g(
, ) = L( x,
, )
hence, f
0
( x) = g(
, )
If Slaters condition is satised, x is optimal if and only if there exist , that
satisfy KKT conditions
recall that Slater implies strong duality, and dual optimum is attained
generalizes optimality condition f
0
(x) = 0 for unconstrained problem
Summary:
When strong duality holds, the KKT conditions are necessary conditions for
optimality
If the problem is convex, they are also sucient
example: water-lling (assume
i
> 0)
minimize
n
i=1
log(x
i
+
i
)
subject to x _ 0, 1
T
x = 1
x is optimal i x _ 0, 1
T
x = 1, and there exist R
n
, R such that
_ 0,
i
x
i
= 0,
1
x
i
+
i
+
i
=
if < 1/
i
:
i
= 0 and x
i
= 1/
i
if 1/
i
:
i
= 1/
i
and x
i
= 0
determine from 1
T
x =
n
i=1
max0, 1/
i
= 1
interpretation
n patches; level of patch i is at height
i
ood area with unit amount of water
resulting level is 1/
i
1/
x
i
i
Perturbation and sensitivity analysis
(unperturbed) optimization problem and its dual
minimize f
0
(x)
subject to f
i
(x) 0, i = 1, . . . , m
h
i
(x) = 0, i = 1, . . . , p
maximize g(, )
subject to _ 0
perturbed problem and its dual
min. f
0
(x)
s.t. f
i
(x) u
i
, i = 1, . . . , m
h
i
(x) = v
i
, i = 1, . . . , p
max. g(, ) u
T
v
T
s.t. _ 0
x is primal variable; u, v are parameters
p
(u, v) is optimal value as a function of u, v

we are interested in information about p
(u, v) that we can obtain from the

solution of the unperturbed problem and its dual
Perturbation and sensitivity analysis
global sensitivity result Strong duality holds for unperturbed problem and
are dual optimal for unperturbed problem. Apply weak duality to perturbed
problem:
p
(u, v) g(
) u
T
v
T
= p
(0, 0) u
T
v
T
local sensitivity: if (in addition) p
(u, v) is dierentiable at (0, 0), then
i
=
p
(0, 0)
u
i
,
i
=
p
(0, 0)
v
i
Duality and problem reformulations
equivalent formulations of a problem can lead to very dierent duals
reformulating the primal problem can be useful when the dual is dicult to
derive, or uninteresting
common reformulations
introduce new variables and equality constraints
make explicit constraints implicit or vice-versa
transform objective or constraint functions
e.g., replace f
0
(x) by (f
0
(x)) with convex, increasing
Introducing new variables and equality constraints
minimize f
0
(Ax + b)
dual function is constant: g = inf
x
L(x) = inf
x
f
0
(Ax + b) = p
we have strong duality, but dual is quite useless

reformulated problem and its dual
minimize f
0
(y)
subject to Ax + b y = 0
maximize b
T
f
0
()
subject to A
T
= 0
dual function follows from
g() = inf
x,y
(f
0
(y)
T
y +
T
Ax + b
T
)
=
_
f
0
() + b
T
A
T
= 0
otherwise
norm approximation problem: minimize |Ax b|
minimize |y|
subject to y = Ax b
can look up conjugate of | |, or derive dual directly
g() = inf
x,y
(|y| +
T
y
T
Ax + b
T
)
=
_
b
T
+ inf
y
(|y| +
T
y) A
T
= 0
otherwise
=
_
b
T
A
T
= 0, ||
1
otherwise
(see page 7)
dual of norm approximation problem
maximize b
T
subject to A
T
= 0, ||
1
Implicit constraints
LP with box constraints: primal and dual problem
minimize c
T
x
subject to Ax = b
1 _ x _ 1
maximize b
T
1
T
1
1
T
2
subject to c + A
T
+
1
2
= 0
1
_ 0,
2
_ 0
reformulation with box constraints made implicit
minimize f
0
(x) =
_
c
T
x 1 _ x _ 1
otherwise
subject to Ax = b
dual function
g() = inf
1x1
(c
T
x +
T
(Ax b))
= b
T
|A
T
+ c|
1
dual problem: maximize b
T
|A
T
+ c|
1
Problems with generalized inequalities
minimize f
0
(x)
subject to f
i
(x) _
K
i
0, i = 1, . . . , m
h
i
(x) = 0, i = 1, . . . , p
_
K
i
is generalized inequality on R
k
i
denitions are parallel to scalar case:
Lagrange multiplier for f
i
(x) _
K
i
0 is vector
i
R
k
i
Lagrangian L : R
n
R
k
1
R
k
m
R
p
R, is dened as
L(x,
1
, ,
m
, ) = f
0
(x) +
m
i=1
T
i
f
i
(x) +
p
i=1
i
h
i
(x)
dual function g : R
k
1
R
k
m
R
p
R, is dened as
g(
1
, . . . ,
m
, ) = inf
xD
L(x,
1
, ,
m
, )
lower bound property: if
i
_
K
i
0, then g(
1
, . . . ,
m
, ) p
proof: if x is feasible and _

K
i
0, then
f
0
( x) f
0
( x) +
m
i=1
T
i
f
i
( x) +
p
i=1
i
h
i
( x)
inf
xD
L(x,
1
, . . . ,
m
, )
= g(
1
, . . . ,
m
, )
minimizing over all feasible x gives p
g(
1
, . . . ,
m
, )
dual problem
maximize g(
1
, . . . ,
m
, )
subject to
i
_
K
i
0, i = 1, . . . , m
weak duality: p
always
strong duality: p
= d
for convex problem with constraint qualication

(for example, Slaters: primal problem is strictly feasible)
Semidenite program
primal SDP (F
i
, G S
k
)
minimize c
T
x
subject to x
1
F
1
+ + x
n
F
n
_ G
Lagrange multiplier is matrix Z S
k
Lagrangian L(x, Z) = c
T
x +Tr (Z(x
1
F
1
+ + x
n
F
n
G))
dual function
g(Z) = inf
x
L(x, Z) =
_
Tr(GZ) Tr(F
i
Z) + c
i
= 0, i = 1, . . . , n
otherwise
dual SDP
maximize Tr(GZ)
subject to Z _ 0, Tr(F
i
Z) + c
i
= 0, i = 1, . . . , n
p
= d
if primal SDP is strictly feasible (x with x

1
F
1
+ + x
n
F
n
G)
Duality: SOCP example
Lets consider the following Second Order Cone Program (SOCP):
minimize f
T
x
subject to |A
i
x + b
i
|
2
c
T
i
x + d
i
, i = 1, . . . , m,
with variable x R
n
. Lets show that the dual can be expressed as
maximize
m
i=1
(b
T
i
u
i
+ d
i
v
i
)
subject to
m
i=1
(A
T
i
u
i
+ c
i
v
i
) + f = 0
|u
i
|
2
v
i
, i = 1, . . . , m,
with variables u
i
R
n
i
, v
i
R, i = 1, . . . , m and problem data given by f R
n
,
A
i
R
n
i
n
, b
i
R
n
i
, c
i
R and d
i
R.
Duality: SOCP
We can derive the dual in the following two ways:
1. Introduce new variables y
i
R
n
i
and t
i
R and equalities y
i
= A
i
x + b
i
,
t
i
= c
T
i
x + d
i
, and derive the Lagrange dual.
2. Start from the conic formulation of the SOCP and use the conic dual. Use the
fact that the second-order cone is self-dual:
t |x| tv + x
T
y 0, for all v, y such that v |y|
The condition x
T
y tv is a simple Cauchy-Schwarz inequality
Duality: SOCP
We introduce new variables, and write the problem as
minimize c
T
x
subject to |y
i
|
2
t
i
, i = 1, . . . , m
y
i
= A
i
x + b
i
, t
i
= c
T
i
x + d
i
, i = 1, . . . , m
The Lagrangian is
L(x, y, t, , , )
= c
T
x +
m
i=1
i
(|y
i
|
2
t
i
) +
m
i=1
T
i
(y
i
A
i
x b
i
) +
m
i=1
i
(t
i
c
T
i
x d
i
)
= (c
m
i=1
A
T
i

i
i=1
i
c
i
)
T
x +
m
i=1
(
i
|y
i
|
2
+
T
i
y
i
) +
m
i=1
(
i
+
i
)t
i
i=1
(b
T
i

i
+ d
i
i
).
Duality: SOCP
The minimum over x is bounded below if and only if
m
i=1
(A
T
i

i
+
i
c
i
) = c.
To minimize over y
i
, we note that
inf
y
i
(
i
|y
i
|
2
+
T
i
y
i
) =
_
0 |
i
|
2

i
otherwise.
The minimum over t
i
is bounded below if and only if
i
=
i
.
Duality: SOCP
The Lagrange dual function is
g(, , ) =
_
n
i=1
(b
T
i

i
+ d
i
i
) if
m
i=1
(A
T
i

i
+
i
c
i
) = c,
|
i
|
2

i
, =
otherwise
which leads to the dual problem
maximize
n
i=1
(b
T
i

i
+ d
i
i
)
subject to
m
i=1
(A
T
i

i
+
i
c
i
) = c
|
i
|
2

i
, i = 1, . . . , m.
which is again an SOCP
Duality: SOCP
We can also express the SOCP as a conic form problem
minimize c
T
x
subject to (c
T
i
x + d
i
, A
i
x + b
i
) _
K
i
0, i = 1, . . . , m.
The Lagrangian is given by:
L(x, u
i
, v
i
) = c
T
x
i
(A
i
x + b
i
)
T
u
i
i
(c
T
i
x + d
i
)v
i
=
_
c
i
(A
T
i
u
i
+ c
i
v
i
)
_
T
x
i
(b
T
i
u
i
+ d
i
v
i
)
for (v
i
, u
i
) _
K
i
0 (which is also v
i
|u
i
|)
Duality: SOCP
With
L(x, u
i
, v
i
) =
_
c
i
(A
T
i
u
i
+ c
i
v
i
)
_
T
x
i
(b
T
i
u
i
+ d
i
v
i
)
the dual function is given by:
g(, , ) =
_
_
_
n
i=1
(b
T
i

i
+ d
i
i
) if
m
i=1
(A
T
i

i
+
i
c
i
) = c,
otherwise
The conic dual is then:
maximize
n
i=1
(b
T
i
u
i
+ d
i
v
i
)
subject to
m
i=1
(A
T
i
u
i
+ v
i
c
i
) = c
(v
i
, u
i
) _
K
i
0, i = 1, . . . , m.
Proof
Convex problem & constraint qualication
Strong duality
Slaters constraint qualication
Convex problem
minimize f
0
(x)
subject to f
i
(x) 0, i = 1, . . . , m
Ax = b
The problem satises Slaters condition if it is strictly feasible, i.e.,
x int T : f
i
(x) < 0, i = 1, . . . , m, Ax = b
also guarantees that the dual optimum is attained (if p
> )
there exist many other types of constraint qualications
KKT conditions for convex problem
If x,

, satisfy KKT for a convex problem, then they are optimal:
from complementary slackness: f
0
( x) = L( x,
, )
from 4th condition (and convexity): g(
, ) = L( x,
, )
hence, f
0
( x) = g(
, ) with ( x,
, ) feasible.
If Slaters condition is satised, x is optimal if and only if there exist , that
satisfy KKT conditions
Slater implies strong duality (more on this now), and dual optimum is attained
generalizes optimality condition f
0
(x) = 0 for unconstrained problem
Summary
For a convex problem satisfying constraint qualication, the KKT conditions
are necessary & sucient conditions for optimality.
Proof
To simplify the analysis. We make two additional technical assumptions:
The domain T has nonempty interior (hence, relint T = int T)
We also assume that A has full rank, i.e. RankA = p.
Proof
We dene the set / as
/ = (u, v, t) [ x T, f
i
(x) u
i
, i = 1, . . . , m,
h
i
(x) = v
i
, i = 1, . . . , p, f
0
(x) t,
which is the set of values taken by the constraint and objective functions.
If the problem is convex, / is dened by a list of convex constraints hence is
convex.
We dene a second convex set B as
B = (0, 0, s) R
m
R
p
R [ s < p
.
The sets / and B do not intersect (otherwise p
could not be optimal value of

the problem).
First step: The hyperplane separating / and B denes a supporting hyperplane
to / at (0, p
).
Geometric proof
A
B
u
t
( u,

t)
Illustration of strong duality proof, for a convex problem that satises Slaters
constraint qualication. The two sets / and B are convex and do not intersect,
so they can be separated by a hyperplane. Slaters constraint qualication
guarantees that any separating hyperplane must be nonvertical.
Proof
By the separating hyperplane theorem there exists (
, , ) ,= 0 and such
that
(u, v, t) / =

T
u +
T
v + t , (1)
and
(u, v, t) B =

T
u +
T
v + t . (2)
From (1) we conclude that

_ 0 and 0. (Otherwise

T
u + t is
unbounded below over /, contradicting (1).)
The condition (2) simply means that t for all t < p
, and hence,
p
.
Together with (1) we conclude that for any x T,
p
f
0
(x) +
m
i=1
i
f
i
(x) +
T
(Ax b) (3)
Proof
Let us assume that > 0 (separating hyperplane is nonvertical)
We can divide the previous equation by to get
L(x,
/, /) p
for all x T
Minimizing this inequality over x produces p
g(, ), where
=

/, = /.
By weak duality we have g(, ) p
, so in fact g(, ) = p
.
This shows that strong duality holds, and that the dual optimum is attained,
whenever > 0. The normal vector has the form (
, 1) and produces the

Lagrange multipliers.
Proof
Second step: Slaters constraint qualication is used to establish that the
hyperplane must be nonvertical, i.e. > 0.
By contradiction, assume that = 0. From (3), we conclude that for all x T,
m
i=1
i
f
i
(x) +
T
(Ax b) 0. (4)
Applying this to the point x that satises the Slater condition, we have
m
i=1
i
f
i
( x) 0.
Since f
i
( x) < 0 and

i
0, we conclude that

= 0.
Proof
This is where we use the two technical assumptions.
Then (4) implies that for all x T,
T
(Ax b) 0.
But x satises
T
(A x b) = 0, and since x int T, there are points in T
with
T
(Ax b) < 0 unless A
T
= 0.
This contradicts our assumption that RankA = p.
This means that we cannot have = 0 and ends the proof.

Convex Optimization M2 - Slide Nstanford

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Convex Optimization M2 - Slide Nstanford

Загружено:

Авторское право:

Доступные форматы

proof: if x is feasible and _ 0, then

> 1, choose x = tu where |u| 1, u

, obtained from Lagrange dual function

, are dual feasible if _ 0, (, ) domg

always holds (for convex and nonconvex problems)

does not hold in general

except when primal and dual are infeasible

t) / with u < 0, then supporting

) = 0 for i = 1, . . . , m (known as complementary slackness):

(u, v) is optimal value as a function of u, v

(u, v) that we can obtain from the

local sensitivity: if (in addition) p

(u, v) is dierentiable at (0, 0), then

we have strong duality, but dual is quite useless

proof: if x is feasible and _

for convex problem with constraint qualication

if primal SDP is strictly feasible (x with x

could not be optimal value of

, 1) and produces the

Вам также может понравиться