MC Complete Version

ELEG 867 - Compressive Sensing and Sparse Signal
Representations
Introduction to Matrix Completion and Robust PCA
Gonzalo Garateguy
Depart. of Electrical and Computer Engineering
University of Delaware
Fall 2011
ELEG 867 (MC and RPCA problems) Fall, 2011 1 / 91
Matrix Completion Problems - Motivation
Recomender Systems
Items
User 1 x x ? ? x x
User 2 ? ? x x ? ?
. ? x ? x x ?
. x ? ? x ? x
. x ? x ? ? x
. ? x ? ? x ?
. ? ? x x x ?
User n x x ? ? ? x
Collaborative ltering (Amazon, last.fm)
Content based (Pandora,
www.nanocrowd.com)
Netix prize competition boosted interest in
the area
http://www.ima.umn.edu/videos/index.php?id=1598
http://sahd.pratt.duke.edu/Videos/keynote.html
Sensor location estimation in Wireless
Sensor Networks
node 1
node 2
node 3
node 4
node 5
node 6
node 7
d
12
d
13
d
24
d
34
d
45
d
57
d
56
d
67
d
23
=?
d
43
=?
d
74
=?
d
64
=?
.
.
.
Distance matrix
1 2 3 4 5 6 7
1 0 d
1,2
d
1,3
? ? ? ?
2 d
2,1
0 ? d
2,4
? ? ?
3 d
3,1
? 0 d
3,4
? ? ?
4 ? d
4,2
d
4,3
0 d
4,5
? ?
5 ? ? ? d
5,4
0 d
5,6
d
5,7
6 ? ? ? ? d
6,5
0 d
6,7
7 ? ? ? ? d
7,5
d
7,6
0
The problem is to nd the positions of the
sensors in R
2
given the partial information
about relative distances
A distance matrix like this has rank 2 in R
2
For certain types of graphs the problem can be
solved if we know the whole distance matrix
Image reconstruction from incomplete data
Reconstructed image Incomplete image 50% of the pixels
Robust PCA - Motivation
Foreground identication for surveillance applications
E.J. Candes, X. Li, Y. Ma, and Wright, J. Robust principal component analysis? http://arxiv.org/abs/0912.3599
Image alignment and texture recognition
Z. Zhang, X. Liang, A. Ganesh, and Y. Ma, TILT: transform invariant low-rank textures Computer VisionACCV 2010
Camera calibration with radial distortion
J. Wright, Z. Lin, and Y. Ma Low-Rank Matrix Recovery: From Theory to Imaging Applications Tutorial presented at International
Conference on Image and Graphics (ICIG), August 2011
Motivation
Many other applications
System Identication in control theory
Covariance matrix estimation
Machine Learning
Computer Vision
Videos to watch
Matrix Completion via Convex Optimization: Theory and Algorithms by Emmanuel Candes
http://videolectures.net/mlss09us_candes_mccota/
Low Dimensional Structures in Images or Data by Yi Ma, Workshop in Signal Processing with Adaptive
Sparse Structured Representations (June 2011)
http://ecos.maths.ed.ac.uk/SPARS11/YiMa.wmv
Problem Formulation
Matrix completion
minimize rank(A) (1)
subject to A
ij
= D
ij
(i, j)
Robust PCA
minimize rank(A) +||E||
0
(2)
subject to A
ij
+ E
ij
= D
ij
(i, j)
Problem Formulation
Matrix completion
subject to A
ij
= D
ij
(i, j)
Robust PCA
0
(2)
subject to A
ij
+ E
ij
= D
ij
(i, j)
Very hard to solve in general without any asumptions, some times NP
hard.
Problem Formulation
Matrix completion
subject to A
ij
= D
ij
(i, j)
Robust PCA
0
(2)
subject to A
ij
+ E
ij
= D
ij
(i, j)
hard.
Even if we can solve them, are the solutions always what we expect?
Problem Formulation
Matrix completion
subject to A
ij
= D
ij
(i, j)
Robust PCA
0
(2)
subject to A
ij
+ E
ij
= D
ij
(i, j)
hard.
Even if we can solve them, are the solutions always what we expect?
Under wich conditions we can have exact recovery of the real matrices?
Outline
Convex Optimization concepts
Matrix Completion
Exact Recovery from incomplete data by convex relaxation
ALM method for Nuclear Norm Minimization
Robust PCA
Exact Recovery from incomplete data and corrupted data by convex
relaxation
ALM method for Low rank and Sparse separation
Convex sets and Convex functions
Convex set
A set C is convex if the line segment between any two points in C lies in C.
For any x
1
, x
2
C and any with 0 1 we have
x
1
+ (1 )x
2
C.
Convex set
A set C is convex if the line segment between any two points in C lies in C.
For any x
1
, x
2
C and any with 0 1 we have
x
1
+ (1 )x
2
C.
convex non convex non convex
Convex combination
A convex combination of k points x
1
, .., x
k
is dened as
1
x
1
+ ... +
k
x
k
, where
i
0 and
1
+ ... +
k
= 1
Convex combination
A convex combination of k points x
1
, .., x
k
is dened as
1
x
1
+ ... +
k
x
k
, where
i
0 and
1
+ ... +
k
= 1
Convex hull
The convex hull of C is the set of all convex conbinations of points in C
conv C =
1
x
1
+ ... +
k
x
k
[x
i
C,
i
0, i = 1, ..., k,
1
+ ... +
k
= 1
Operations that preserve convexity
Intersection
If S1 and S2 are convex, then S
1
S
2
is convex.
In general if S
is convex for every /, then
A
S
is convex.
Subspaces, afne sets and convex cones are therefore closed under arbitrary
intersections.
Afne functions
Let f : R
n
R
m
be afne, f (x) = Ax + b, where A R
mn
and b R
m
. If
S R
n
is convex, then the image of S under f
f (S) = f (x)[x S
is convex
Convex functions
A function f : R
n
R is convex if domf is a convex set and if for all
x, y domf , and with 0 1, we have
f (x + (1 )y) f (x) + (1 f (y))
we say that f is strictly convex if the strict intequality holds whenever x ,= y
and 0 < < 1
Operations that preserve convexity
Composition with an afne mapping
Suppose f : R
n
R, A R
nm
and b R
n
. Dene g : R
m
R by
g(x) = f (Ax + b)
with domg = x[Ax + b domf . Then if f is convex, so is g.
Pointwise maximum
if f
1
and f
2
are convex functions then their pointwise maximum f dened by
f (x) = maxf
1
(x), f
2
(x)
with domf = domf
1
domf
2
is also convex. This also extend to the case
where f
1
, ..., f
m
are convex, then
f (x) = maxf
1
(x), ..., f
m
(x), is also convex
Pointwise maximum of convex functions
f
1
(x)
f
2
(x)
f(x)=max{f
1
(x),f
2
(x)}
f(x
)
f(x
1
)
f(x
2
)
^
f(x
)
Convex differentiable functions
If f is differentiable (i.e. its gradient f exist at each point in domf ). Then f
is convex if and only if domf is convex and
f (y) f (x) +f (x)
T
(y x)
holds for all x, y domf .
Second order conditions
If f is twice differentiable, i.e. its Hessian
2
f exist at each point in domf .
Then f is convex if and only if domf is convex and its Hessian is positive
semidenite for all x domf
2
f (x) _ 0
Convex non-differentiable functions
The concept of gradient can be extended to non-differentiable functions
introducing the subgradient
Subgradient of a function
A vector g R
n
is a subgradient of f : R
n
R at x domf if for all
z domf
f (z) f (x) + g
T
(z x)
Subgradients
Observations
If f is convex and differentiable, then its gradient at x , f (x) is its only
subgradient
Subdifferentiable functions
A function f is called subdifferentiable at x if there exist at least one
subgradient at x
Subdifferential at a point
The set of subgradients of f at the point x is called the subdifferential of f at x,
and is denoted f (x)
Subdifferentiability of a function
A function f is called subdifferentiable if it is subdifferentiable at all
x domf
Basic properties
Existence of the subgradient of a convex function
If f is convex and x int domf , then f (x) is nonempty and bounded.
The subdifferential f (x) is always a closed convex set, even if f is not
convex. This follows from the fact that it is the intersection of an innite set
of halfspaces
f (x) =
zdomf
g[f (z) f (x) + g
T
(z x).
Basic properties
Nonnegative scaling
For 0, (f )(x) = f (x)
Basic properties
Nonnegative scaling
For 0, (f )(x) = f (x)
Subgradient of the sum
Given f = f
1
+ ... + f
m
, where f
1
, ..., f
m
are convex functions, the subgradient
of f at x is given by f (x) = f
1
(x) + ... + f
m
(x)
Basic properties
Nonnegative scaling
For 0, (f )(x) = f (x)
Given f = f
1
+ ... + f
m
, where f
1
, ..., f
m
1
(x) + ... + f
m
(x)
Afne transformations of domain
Suppose f is convex, and let h(x) = f (Ax + b). Then h(x) = A
T
f (Ax + b).
Basic properties
Nonnegative scaling
For 0, (f )(x) = f (x)
Given f = f
1
+ ... + f
m
, where f
1
, ..., f
m
1
(x) + ... + f
m
(x)
Afne transformations of domain
Suppose f is convex, and let h(x) = f (Ax + b). Then h(x) = A
T
f (Ax + b).
Pointwise maximum
Suppose f is the pointwise maximum of convex functions f
1
, ..., f
m
,
f (x) = max
i=1,...,m
f
i
(x), then f (x) = Co f
i
(x)[f
i
(x) = f (x)
Subgradient of the pointwise maximum of two convex
functions
f
1
(x)
f
2
(x)
f(x)=max{f
1
(x),f
2
(x)}
x
functions
f
1
(x)
f
2
(x)
f(x)=max{f
1
(x),f
2
(x)}
x
functions
f
1
(x)
f
2
(x)
f(x)=max{f
1
(x),f
2
(x)}
x
Examples
Conside the function f (x) = [x[.
Examples
Conside the function f (x) = [x[. At x
0
=0 , the subdiferential is dened by the
inequality
Examples
0
inequality
f (z) f (x
0
) + g(z x
0
), z dom f
Examples
0
inequality
f (z) f (x
0
) + g(z x
0
), z dom f
[z[ gz, z R
Examples
0
inequality
f (z) f (x
0
) + g(z x
0
), z dom f
[z[ gz, z R
f (0) = g [ g [1 , 1]
Examples
0
inequality
f (z) f (x
0
) + g(z x
0
), z dom f
[z[ gz, z R
f (0) = g [ g [1 , 1]
then for all x
Examples
0
inequality
f (z) f (x
0
) + g(z x
0
), z dom f
[z[ gz, z R
f (0) = g [ g [1 , 1]
then for all x
f (x) =
_
_
_
1 for x < 0
1 for x > 0
g[g [1, 1] for x = 0
Examples
0
inequality
f (z) f (x
0
) + g(z x
0
), z dom f
[z[ gz, z R
f (0) = g [ g [1 , 1]
then for all x
f (x) =
_
_
_
1 for x < 0
1 for x > 0
g[g [1, 1] for x = 0
Example:
1
norm
Consider f (x) = |x|
1
= [x
1
[ + +[x
n
[, and note that f can be expressed as
the maximum of 2
n
linear functions
Example:
1
norm
1
= [x
1
[ + +[x
n
the maximum of 2
n
linear functions
|x|
1
= max f
1
(x) , .., f
2
n
(x)
Example:
1
norm
1
= [x
1
[ + +[x
n
the maximum of 2
n
linear functions
|x|
1
= max f
1
(x) , .., f
2
n
(x)
|x|
1
= max s
T
1
x , .., s
T
2
n
x [ s
i
1, 1
n

Example:
1
norm
1
= [x
1
[ + +[x
n
the maximum of 2
n
linear functions
|x|
1
= max f
1
(x) , .., f
2
n
(x)
|x|
1
= max s
T
1
x , .., s
T
2
n
x [ s
i
1, 1
n
The active functions f

i
(x) at x are the ones for wich s
T
i
x = |x|
1
. Then
denoting
s
i
= [s
i,1
, ..., s
i,n
]
T
, s
i,j
1, 1
Example:
1
norm
1
= [x
1
[ + +[x
n
the maximum of 2
n
linear functions
|x|
1
= max f
1
(x) , .., f
2
n
(x)
|x|
1
= max s
T
1
x , .., s
T
2
n
x [ s
i
1, 1
n
The active functions f

i
(x) at x are the ones for wich s
T
i
x = |x|
1
. Then
denoting
s
i
= [s
i,1
, ..., s
i,n
]
T
, s
i,j
1, 1
the set of indices of the active functions at x is
/
x
=
_
_
_
i
s
i,j
= 1 for x
j
< 0
s
i,j
= 1 for x
j
> 0
s
i,j
= 1 or 1 for x
j
= 0
, for j = 1, .., n
_
_
_
subgradient of the
1
norm
The subgradient of |x|
1
at a generic point x is dened by
|x|
1
= co f
i
(x) [ i /
x

subgradient of the
1
norm
1
|x|
1
= co f
i
(x) [ i /
x

|x|
1
= co f
i
(x) [ i /
x

subgradient of the
1
norm
1
|x|
1
= co f
i
(x) [ i /
x

|x|
1
= co f
i
(x) [ i /
x

|x|
1
= co s
i
[i /
x

subgradient of the
1
norm
1
|x|
1
= co f
i
(x) [ i /
x

|x|
1
= co f
i
(x) [ i /
x

|x|
1
= co s
i
[i /
x

|x|
1
= g[g =
iA
x

i
s
i
,
i
0 ,
i

i
= 1
or equivalently
|x|
1
=
_
_
_
g
g
j
= 1 for x
j
< 0
g
j
= 1 for x
j
> 0
g
j
= [1, 1] for x
j
= 0
_
_
_
1
norm on R
2
in R
2
the set of subgradients are
s
1
= [ 1, 1]
T
s
2
= [ 1, 1]
T
s
3
= [ 1, 1]
T
s
4
= [ 1, 1]
T
Convex optimization problems
An optimization problem is convex if its objective is a convex function, the
inequality constraints f
j
are convex and the equality constraints h
j
are afne
minimize
x
f
0
(x) (Convex function)
s.t. f
i
(x) 0 (Convex sets)
h
j
(x) = 0 (Afne)
or equivalently
minimize
x
f
0
(x) (Convex function)
s.t. x C C is a convex set
h
j
(x) = 0 (Afne)
Theorem
If x
is a local minimizer of a convex optimization problem, it is a global

minimizer.
Optimality conditions
A point x
is a minimizer of a convex function f if and only if f is

subdifferentiable at x
and
0 f (x
)
Convex optimization problems
Given the convex problem
minimize
x
f
0
(x)
s.t. f
i
(x) 0, i = 1, ..., k
h
j
(x) = 0, j = 1, ..., l
its Lagrangian function is dened as
/(x, , ) = f
0
(x) +
l
j=1
j
h
j
(x) +
k
i=1
i
f
i
(x)
where
i
0,
i
R
Augmented Lagrangian Method
Considering the problem
minimize
x
f (x)
s.t. x C
h(x) = 0
(3)
The augmented lagrangian is dened as
/(x, , c) = f (x) +
T
h(x) +

2
|h(x)|
2
2
where is a penalty parameter and is the multiplier vector
The augmented lagrangian method consist of solving a sequence of problems
of the form
minimize
x
/(x,
k
,
k
) = f (x) +
k
T
h(x) +

k
2
|h(x)|
2
2
s.t. x C
where
k
is a bounded sequence in R
l
and
k
is a penalty parameter
sequence satisfying
0 <
k
<
k+1
k ,
k

The exact solution to problem (3) can be found using the following iterative
algorithm
set > 1
while not converged do
solve x
k+1
= argmin
xC
/(x,
k
,
k
)
k+1
=
k
+
k
h(x
k+1
)
k
=
k
end while
Output x
k
Matrix completion
Optimization problem
subject to A
ij
= D
ij
(i, j)
Matrix completion
subject to A
ij
= D
ij
(i, j)
We look for the simplest explanation for the observed data
Matrix completion
subject to A
ij
= D
ij
(i, j)
We look for the simplest explanation for the observed data
Given enough number of samples, the likelihood of the solution to be
unique should be high
Matrix completion
minimize rank(A)
subject to A
ij
= D
ij
(i, j)
Matrix completion
minimize rank(A)
subject to A
ij
= D
ij
(i, j)
The minimization of the rank() function is a combinatorial problem,
with exponential complexity in the size of the matrix!
Need for a convex relaxation
Matrix completion
minimize rank(A)
subject to A
ij
= D
ij
(i, j)
rank(A)
Matrix completion
minimize rank(A)
subject to A
ij
= D
ij
(i, j)
rank(A) = [[diag()[[
0
A = UV
T
Matrix completion
minimize rank(A)
subject to A
ij
= D
ij
(i, j)
0
A = UV
T

Matrix completion
minimize rank(A)
subject to A
ij
= D
ij
(i, j)
0
A = UV
T
[[A[[
= [[diag()[[
1
Matrix completion
minimize rank(A)
subject to A
ij
= D
ij
(i, j)
0
A = UV
T
[[A[[
= [[diag()[[
1
Convex relaxation
minimize |A|
(5)
subject to A
ij
= D
ij
(i, j)
Matrix Completion
Nuclear Norm
The nuclear norm of a matrix A R
mn
is dened as [[A[[
r
i=1

i
(A),
where
i
(A)
r
i=1
are the elements of the diagonal matrix from the SVD
decomposition of A = UV
T
Matrix Completion
Nuclear Norm
mn
is dened as [[A[[
r
i=1

i
(A),
where
i
(A)
r
i=1
T
Observations
r = rank(A) can be r < m, n. If this is the case we say that the matrix is
low rank
Matrix Completion
Nuclear Norm
mn
is dened as [[A[[
r
i=1

i
(A),
where
i
(A)
r
i=1
T
Observations
low rank
the singular values
i
(A) =
_
i
(A
T
A) are obtained as the square root of
the eigenvalues of A
T
A and are always
i
0
Matrix Completion
Nuclear Norm
mn
is dened as [[A[[
r
i=1

i
(A),
where
i
(A)
r
i=1
T
Observations
low rank
the singular values
i
(A) =
_
i
(A
T
T
A and are always
i
0
the left singular vectors U are the eigenvectors of AA
T
Matrix Completion
Nuclear Norm
mn
is dened as [[A[[
r
i=1

i
(A),
where
i
(A)
r
i=1
T
Observations
low rank
the singular values
i
(A) =
_
i
(A
T
T
A and are always
i
0
the left singular vectors U are the eigenvectors of AA
T
the right singular vectors V are the eigenvectors of A
T
A
Matrix Completion
Spectral Norm
The spectral norm of a matrix A R
mn
is dened as |A|
2
=
max
(A), where
max
= max(
i
(A)
r
i=1
)
Matrix Completion
Spectral Norm
mn
is dened as |A|
2
=
max
(A), where
max
= max(
i
(A)
r
i=1
)
Dual Norm
Given an arbitrary norm [[ [[
in R
n
, its dual norm [[ [[
is dened as
|z|
= supz
T
x [ |x|
1
Matrix Completion
Spectral Norm
mn
is dened as |A|
2
=
max
(A), where
max
= max(
i
(A)
r
i=1
)
Dual Norm
in R
n
is dened as
|z|
= supz
T
x [ |x|
1
Observations
The nuclear norm is the dual norm of the spectral norm
Matrix Completion
Spectral Norm
mn
is dened as |A|
2
=
max
(A), where
max
= max(
i
(A)
r
i=1
)
Dual Norm
in R
n
is dened as
|z|
= supz
T
x [ |x|
1
Observations
The nuclear norm is the dual norm of the spectral norm
|A|
= suptr(A
T
X)[|X|
2
1
Matrix Completion
Convex relaxation of the rank
Matrix Completion
Convex envelope of a function
Let f : ( R where ( R
n
. The convex envelope of f (on () is dened as
the largest convex function g such that g(x) f (x) for all x (
Matrix Completion
n
Theorem
The convex envelope of the function (X) =rank(X) on
( = X R
mn
[|X|
2
1, is
env
(X) = |X|
.
Matrix Completion
n
Theorem
( = X R
mn
[|X|
2
1, is
env
(X) = |X|
.
Observations
The convex envelope of rank(X) on a the set {X|X
2
M} is given by
1
M
X

Matrix Completion
n
Theorem
( = X R
mn
[|X|
2
1, is
env
(X) = |X|
.
Observations
The convex envelope of rank(X) on a the set {X|X
2
M} is given by
1
M
X
By solving the heuristic problem we obtain a lower bound on the optimal value of the original
problem (provided we can identify a bound M on the feasible set).
M. Fazel, H. Hindi and S. Boyd A Rank Minimization Heuristic with Application to Minimum Order System Approximation American
Control Conference, 2001.
Matrix completion
Convex relaxation
minimize |A|
(6)
subject to A
ij
= D
ij
(i, j)
The original problem is now a problem with a non-smooth but convex
function as the objective
The remaining problem is the number of measurements and in which
positions have to be taken in order to guarantee that the solution is equal
to the matrix D?
Matrix completion
Which types of matrices can be completed exactly?
Matrix completion
Consider the matrix
M = e
1
.e
T
n
=
_
_
_
_
_
_
_
0 0 0 1
0 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0
0 0 0 0
_
_
_
_
_
_
_
Matrix completion
Consider the matrix
M = e
1
.e
T
n
=
_
_
_
_
_
_
_
0 0 0 1
0 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0
0 0 0 0
_
_
_
_
_
_
_
Can it be recovered from 90 % of its samples ?
Matrix completion
Consider the matrix
M = e
1
.e
T
n
=
_
_
_
_
_
_
_
0 0 0 1
0 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0
0 0 0 0
_
_
_
_
_
_
_
Is the sampling set important?
Matrix completion
Consider the matrix
M = e
1
.e
T
n
=
_
_
_
_
_
_
_
0 0 0 1
0 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0
0 0 0 0
_
_
_
_
_
_
_
Is the sampling set important?
Which sampling sets work and which ones doesnt?
Matrix completion
Sampling set
The sampling set is dened as = (i, j) [ D
ij
is observed
Matrix completion
Sampling set
ij
is observed
Consider
D = xy
T
x R
m
, y R
n
D
ij
= x
i
y
j
Matrix completion
Sampling set
ij
is observed
Consider
D = xy
T
x R
m
, y R
n
D
ij
= x
i
y
j
If the sampling set avoids row i, then x
i
can not be recovered by any
method whatsoever
Matrix completion
Sampling set
ij
is observed
Consider
D = xy
T
x R
m
, y R
n
D
ij
= x
i
y
j
i
method whatsoever
Observation
No columns or rows from D can be avoided in the sampling set
Matrix completion
Sampling set
ij
is observed
Consider
D = xy
T
x R
m
, y R
n
D
ij
= x
i
y
j
i
method whatsoever
Observation
No columns or rows from D can be avoided in the sampling set
There is a need for a characterization of the sampling operator with
respect to the set of matrices that we want to recover
Matrix completion
Intuition
the singular vectors need to be sufciently spread, i.e. uncorrelated with
the standar basis in order to minimize the number of observations needed
to recover a low rank matrix
Matrix completion
Intuition
Coherence of a subspace
Let U be a subspace of R
n
of dimension r and P
U
be the orthogonal projection
onto U. Then the coherence of U is dened to be
(U) =
n
r
max
1in
|P
U
e
i
|
2
Matrix completion
Intuition
n
U
(U) =
n
r
max
1in
|P
U
e
i
|
2
Observations
The minimum value that (U) can achieve is 1 for example if U is
spanned by vectors whos entries all have magnitude 1/
n
Matrix completion
Intuition
n
U
(U) =
n
r
max
1in
|P
U
e
i
|
2
Observations
The minimum value that (U) can achieve is 1 for example if U is
spanned by vectors whos entries all have magnitude 1/
n
The largest possible value for (U) is n/r corresponding to a subspace
that contains a standard basis element.
Matrix completion
0
coherence
A matrix D =
1kr

k
u
k
v
T
k
is
0
coherent if for some positive
0
max((U), (V))
0
Matrix completion
0
coherence
A matrix D =
1kr

k
u
k
v
T
k
is
0
0
max((U), (V))
0
1
coherence
A matrix D =
1kr

k
u
k
v
T
k
has
1
coherence if
|UV
T
|

1
_
r/mn
for some
1
> 0
Matrix completion
0
coherence
A matrix D =
1kr

k
u
k
v
T
k
is
0
0
max((U), (V))
0
1
coherence
A matrix D =
1kr

k
u
k
v
T
k
has
1
coherence if
|UV
T
|

1
_
r/mn
for some
1
> 0
Observation
If D is
0
coherent then it is
1
coherent for
1
=
0
r
Coherence of a rank 300 approximation of kowalski
0 100 200 300 400 500 600 700
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
||P
U
e
i
||
index i
100 200 300 400 500 600 700 800 900
1
1.5
2
2.5
3
||P
V
e
i
||
index i
(U) = 1.9588 (V) = 2.2290
0
= 2.2290
1
=
_
mn
r
|UV
T
|
= 13.412
Matrix completion
Theorem
Let D R
mn
of rank r be (
0
,
1
)-coherent and let N = max(m, n). If we
observe M entries of D with locations sampled uniformly at random. Then
there exist constants C and c such that if
M C max(
2
1
,
1/2
0

1
,
0
N
1/4
)Nr(logN)
for some > 2, then the minimizer of (6) is unique and equal to D with
probability at least 1 cn
. If in addition r
1
0
N
1/5
then the number of
observations can be improved to
M C
0
N
6/5
r(logN)
Cand` es, E.J. and Recht, B. Exact matrix completion via convex optimization, Foundations of Computational Mathematics 2009
Matrix completion
2
1
= 179.99
1/2
0

1
= 12.5139
0
N
1/4
= 4.7682
max(
2
1
,
1/2
0

1
,
0
N
1/4
)Nr(2.1logN) = 6.6076 10
8
What is the value of C? must be C > 0.
In the limit case M = mn, C =
675900
6.60710
8
= 9.194 10
4
.
For the bound to be useful 0 < C < 9.194 10
4
.
SNR = 23.74 dB , 10% of the samples
Completion Performance
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
10
20
30
40
50
60
70
percentage of samples
S
N
R
Matrix completion
Recovery performance for random matrices
Figure: The x axis corresponds to rank(A)/min{m, n} and the y axis to
s
= 1 M/mn (probability that
an entry is omited from the observations)
Emmanuel J. Candes, Xiaodong Li, Yi Ma, John Wright Robust Principal Component Analysis?
http://arxiv.org/abs/0912.3599
Matrix completion
Other bounds on number of meassurements and sampling operators
Emmanuel J. Candes, Xiaodong Li, Yi Ma, John Wright Rodbust Principal Component Analysis?
Venkat Chandrasekaran, Sujay Sanghavi, Pablo A. Parrilo, Alan S. Willsky Rank-Sparsity Incoherence for Matrix Decomposition
Zihan Zhou, Xiaodong Li, John Wright, Emmanuel Candes, Yi Ma Stable Principal Component Pursuit
Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh Matrix Completion from a Few Entries
Sahand Negahban, Martin J. Wainwright Restricted strong convexity and weighted matrix completion: Optimal bounds with noise
http://arxiv.org/abs/1009.2118v2
Yonina C. Eldar, Deanna Needell, Yaniv Plan Unicity conditions for low-rank matrix recovery
Solving the problem
Rewriting the problem
Solving the problem
minimize |A|
subject to A
ij
= D
ij
(i, j)
Solving the problem
minimize |A|
subject to A
ij
= D
ij
(i, j)

Solving the problem
minimize |A|
subject to A
ij
= D
ij
(i, j)
minimize |A|
subject to A + E = D
(E) = 0
Solving the problem
minimize |A|
subject to A
ij
= D
ij
(i, j)
minimize |A|
(E) = 0
where
Solving the problem
minimize |A|
subject to A
ij
= D
ij
(i, j)
minimize |A|
(E) = 0
where
[
(E)]
ij
=
_
E
ij
if (i, j)
0 if (i, j) /
D
ij
=
_
D
ij
if (i, j)
0 if (i, j) /
Solving the problem
minimize |A|
subject to A
ij
= D
ij
(i, j)
minimize |A|
(E) = 0
where
[
(E)]
ij
=
_
E
ij
if (i, j)
0 if (i, j) /
D
ij
=
_
D
ij
if (i, j)
0 if (i, j) /
The new problem can be solved by Augmented Lagrangian Method in an
efcient way
Z. Lin, M. Chen, L. Wu and Y. Ma The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices
Solving the problem
The augmented lagrangian for the problem
minimize |A|
(E) = 0
is
L(A, E, Y, ) = |A|
+Y, D
A E +

2
|D
A E|
2
F
(7)
The tradidional iterative method to minimize the augmented lagrangian can be
used here, but at each iteration the constraint
(E) = 0 has to be fullled.

Solving the problem
Algorithm
input: Observation samples D
ij
, (i, j )
Y
0
= 0; E
0
= 0;
0
> 0; > 1; k = 0
while not converged
A
k+1
= argmin
A
L(A, E
k
, Y
k
,
k
)
E
k+1
= argmin
E,
(E)=0
L(A
k+1
, E, Y
k
,
k
)
Y
k+1
= Y
k
+
k
(D
A
k+1
E
k+1
)
k+1
=
k
k k + 1
end while
Output: (A
k
, E
k
)
Solving the subproblems
Solving for A
k+1
A
k+1
= argmin
A
L(A, E
k
, Y
k
,
k
)
A
k+1
= argmin
A
|A|
+Y
k
, D
A E
k
+

2
|D
A E
k
|
2
F
A
k+1
= argmin
A
1
|A|
+
1
2
|D
A E
k
+
1
Y
k
|
2
F
which has the general form
argmin
A
|A|
+
1
2
|X A|
2
F
Singular value shrinkage operator
Given a matrix X = UV
T
,
the operator T
(.) : R
mn
R
mn
is dened as
T
(X) = Uo
()V
T
, o
() = sign() [[
Singular value shrinkage operator
Given a matrix X = UV
T
,
the operator T
(.) : R
mn
R
mn
is dened as
T
(X) = Uo
()V
T
, o
() = sign() [[
Theorem
For each 0 and Y R
mn
, the singular value shrinkage operator obeys
T
(Y) = argmin
X
|X|
+
1
2
|Y X|
2
F
Proof:
Consider the function h
0
(X) = |X|
+
1
2
|X Y|
2
F
Proof:
0
(X) = |X|
+
1
2
|X Y|
2
F
A sufcient condition for optimality of

X is that
0

X Y + |
X|
where |
X|
is the set of subgradients of the nuclear norm |
X|
at

X.
Proof:
0
(X) = |X|
+
1
2
|X Y|
2
F

X is that
0

X Y + |
X|
where |
X|
X|
at

X.
We know that for an arbitraty X = UV
T
R
mn
|X|
= UV
T
+ W : W R
mn
, U
T
W = 0, WV = 0, |W|
2
1
Proof:
0
(X) = |X|
+
1
2
|X Y|
2
F

X is that
0

X Y + |
X|
where |
X|
X|
at

X.
We know that for an arbitraty X = UV
T
R
mn
|X|
= UV
T
+ W : W R
mn
, U
T
W = 0, WV = 0, |W|
2
1
If we set

X = T
(Y) and prove that Y

X |
X|
, then the theorem is

concluded.
Decompose Y = U
0
0
V
T
0
+ U
1
1
V
T
1
Decompose Y = U
0
0
V
T
0
+ U
1
1
V
T
1
where U
0
, V
0
are the singular vectors
associated with singular values > and U
1
, V
1
are the ones associated with
values .
Decompose Y = U
0
0
V
T
0
+ U
1
1
V
T
1
where U
0
, V
0
are the singular vectors
associated with singular values > and U
1
, V
1
are the ones associated with
values . Since

X = T
(Y) we can write
X = U
0
(
0
I)V
T
0
.
Then
Y

X = U
1
1
V
T
1
+ U
0
V
T
0
= (U
0
V
T
0
+ W) , W =
1
U
1
1
V
T
1
By denition U
T
0
W = 0 , WV
0
= 0 and since the diagonal elements of
1
have magnitudes bounded by , we also have |W|
2
1. Hence
Y

X |
X|
which concludes the proof.

Solving for E
k+1
E
k+1
= argmin
E,
(E)=0
L(A
k+1
, E, Y
k
,
k
)
E
k+1
= argmin
E,
(E)=0
Y, D
A E +

2
|D
A E|
2
F
E
k+1
= argmin
E,
(E)=0
1
2
|D
A E +
1
Y|
2
F
E
k+1
=
(D
A
k+1
+
1
k
Y
k
)
Here is the complementary set of ,
= (i, j) [ (i, j) / .
Solving the problem
The algorithm is reduced to
Input: Observation samples D
ij
, (i, j )
Y
0
= 0; E
0
= 0;
0
> 0; > 1; k = 0
while not converged
A
k+1
= D
1
k
(D
E
k
+
1
k
Y
k
)
E
k+1
=
(D
A
k+1
+
1
k
Y
k
)
Y
k+1
= Y
k
+
k
(D
A
k+1
E
k+1
)
k+1
=
k
k k + 1
end while
Output: (A
k
, E
k
)
Robust PCA
minimize rank(A) + [[E[[
0
(8)
subject to A
ij
+ E
ij
= D
ij
(i, j)
Robust PCA
0
(8)
subject to A
ij
+ E
ij
= D
ij
(i, j)
We look for the best rank-k approximation of the matrix D which is
corrupted by sparse noise
Robust PCA
0
(8)
subject to A
ij
+ E
ij
= D
ij
(i, j)
We look for the best rank-k approximation of the matrix D which is
corrupted by sparse noise
Similar problems and conditions as in the case of matrix completion
Robust PCA
The original problem is very hard to solve so we look again for a convex
relaxation of the problem
Robust PCA
0
A = UV
T
Robust PCA
0
A = UV
T
, |E|
0
Robust PCA
0
A = UV
T
, |E|
0

Robust PCA
0
A = UV
T
, |E|
0

[[A[[
= [[diag()[[
1
|E|
1
Robust PCA
0
A = UV
T
, |E|
0

[[A[[
= [[diag()[[
1
|E|
1
Convex relaxation
minimize |A|
+ |E|
1
(9)
subject to A
ij
+ E
ij
= D
ij
(i, j)
Robust PCA
Conditions for exact recovery of the convex relaxation
In order to have exact recovery we need to impose that the low rank part
is not sparse and also that the sparse part is not low rank
Icoherence condition of the low rank part
The incoherence condition of a matrix A = USV
T
R
mn
with parameter
states that
max
i
|U
T
e
i
|
2
r
m
, max
i
|V
T
e
i
|
2
r
n
|UV
T
|

_
r
mn
Robust PCA
Theorem
If A
0
obeys the incoherent condition of parameter and the sampling set is
uniformly distributed among all sets of cardinality M obeying M = 0.1mn and
also each observed entry is corrupted with probability independently of the
others. Then for N = max(m, n) there exist a constant c such that with
probability at least 1 cN
10
problem (9) with = 1/
0.1N recovers the

exact solutions (A
0
, E
0
) provided that
rank(A
0
)
r
N
1
(logN)
2
,
s
where
r
and
s
are positive numerical constants
E.J. Candes, X. Li, Y. Ma, and Wright, J. Robust principal component analysis? http://arxiv.org/abs/0912.3599
Solving the problem
Solving the problem
minimize |A|
+ |E|
1
subject to A
ij
+ E
ij
= D
ij
(i, j)
Solving the problem
minimize |A|
+ |E|
1
subject to A
ij
+ E
ij
= D
ij
(i, j)

Solving the problem
minimize |A|
+ |E|
1
subject to A
ij
+ E
ij
= D
ij
(i, j)
minimize |A|
+ |E|
1
subject to A + E + Z = D
(Z) = 0
Solving the problem
minimize |A|
+ |E|
1
subject to A
ij
+ E
ij
= D
ij
(i, j)
minimize |A|
+ |E|
1
(Z) = 0
where
Solving the problem
minimize |A|
+ |E|
1
subject to A
ij
+ E
ij
= D
ij
(i, j)
minimize |A|
+ |E|
1
(Z) = 0
where
[
(Z)]
ij
=
_
Z
ij
if (i, j)
0 if (i, j) /
D
ij
=
_
D
ij
if (i, j)
0 if (i, j) /
Solving the problem
minimize |A|
+ |E|
1
subject to A
ij
+ E
ij
= D
ij
(i, j)
minimize |A|
+ |E|
1
(Z) = 0
where
[
(Z)]
ij
=
_
Z
ij
if (i, j)
0 if (i, j) /
D
ij
=
_
D
ij
if (i, j)
0 if (i, j) /
Z. Lin, M. Chen, L. Wu and Y. Ma The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices
Solving the problem
The augmented lagrangian for the problem
minimize |A|
+ |E|
1
(Z) = 0
is
L(A, E, Y, Z, ) = |A|
+|E|
1
+Y, D
AEZ+
2
|D
AEZ|
2
F
(10)
with the additional constraint
(Z) = 0.
Solving the problem
Algorithm
ij
, (i, j )
Y
0
= 0; E
0
= 0; A
0
= D
; Z
0
= 0,
0
> 0; > 1; k = 0
while not converged
A
k+1
= argmin
A
L(A, E
k
, Y
k
, Z
k
,
k
)
E
k+1
= argmin
E
L(A
k+1
, E, Y
k
, Z
k
,
k
)
Z
k+1
= argmin
Z,
(Z)=0
L(A
k+1
, E
k+1
, Y
k
, Z,
k
)
Y
k+1
= Y
k
+
k
(D
A
k+1
E
k+1
Z
k+1
)
k+1
=
k
k k + 1
end while
Output: (A
k
, E
k
, Z
k
)
Solving for A
k+1
A
k+1
= argmin
A
L(A, E
k
, Y
k
, Z
k
,
k
)
A
k+1
= argmin
A
|A|
+Y
k
, D
A E
k
Z
k
+

k
2
|D
A E
k
Z
k
|
2
F
A
k+1
= argmin
A
1
k
|A|
+
1
2
|D
A E
k
Z
k
+
1
Y
k
|
2
F
which has closed form solution
A
k+1
= T
1
k
(D
E
k
Z
k
+
1
k
Y
k
)
Solving for E
k+1
E
k+1
= argmin
E
L(A
k+1
, E, Y
k
, Z
k
,
k
)
E
k+1
= argmin
E
|E|
1
+Y, D
A
k+1
E Z
k
k
2
|D
A
k+1
E Z
k
|
2
F
E
k+1
= argmin
E
1
k
|E|
1
+
1
2
|D
A
k+1
E Z
k
+
1
k
Y|
2
F
which has the form
argmin
E
|E|
1
+
1
2
|X E|
2
F
Shrinkage operator
Given a matrix Y R
mn
, the operator o
(.) : R
mn
R
mn
is dened as
o
(Y) = sign(Y)([Y[ )
where
sign(Y)([Y[ ) is applied componentwise to Y
Theorem
For each 0 and Y R
mn
, the shrinkage operator obeys
o
(Y) = argmin
X
|X|
1
+
1
2
|Y X|
2
F
Proof:
Consider the function h(X) = |X|
1
+
1
2
|X Y|
2
F

X is that
0

X Y + |
X|
1
where |
X|
1
is the set of subgradients of the l
1
norm |
X|
1
at

X.
All the subgradients of |
X|
1
at

X are given by
|
X|
1
=
_
_
_
G R
mn
G
ij
=
1 for

X
ij
< 0
1 for

X
ij
> 0
[1, 1] for

X
ij
= 0
_
_
_
If we prove that Y

X |
X|
1
then

X is the unique minimizer of the
problem.
Consider the candidate

X = o
(Y), then
[Y o
(Y)]
ij
= Y
ij
sign(Y
ij
).max([Y
ij
[ , 0)
[Y o
(Y)]
ij
=
_
sign(Y
ij
) if [Y
ij
[ >
Y
ij
if [Y
ij
[
|o
(Y)|
1
=
_
_
_
G R
mn
G
ij
=
1 for Y
ij
<
1 for Y
ij
>
[1, 1] for [Y
ij
[
_
_
_
|o
(Y)|
1
=
_
G R
mn
G
ij
=
sign(Y
ij
) for [Y
ij
[ >
[, ] for [Y
ij
[
_
Y o
(Y) |o
(Y)| o
(Y) is the optimal solution

Solving for Z
k+1
Z
k+1
= argmin
Z,
(Z)=0
L(A
k+1
, E
k+1
, Y
k
, Z,
k
)
Z
k+1
= argmin
Z,
(Z)=0
Y
k
, D
A
k+1
E
k+1
Z
+
2
|D
A
k+1
E
k+1
Z|
2
F
Z
k+1
= argmin
Z,
(Z)=0
1
2
|D
A
k+1
E
k+1
Z +
1
k
Y
k
|
2
F
Z
k+1
=
(D
A
k+1
E
k+1
+
1
k
Y
k
)
Here is the complementary set of ,
= (i, j) [ (i, j) / .
Solving the problem
Algorithm
ij
, (i, j )
Y
0
= 0; E
0
= 0; Z
0
= 0;
0
> 0; > 1; k = 0
while not converged
A
k+1
= D
1
k
(D
E
k
Z
k
+
1
k
Y
k
)
E
k+1
= S
(D
A
k+1
Z
k
+
1
k
Y)
Z
k+1
=
(D
A
k+1
E
k+1
+
1
k
Y
k
)
Y
k+1
= Y
k
+
k
(D
A
k+1
E
k+1
)
k+1
=
k
k k + 1
end while
Output: (A
k
, E
k
, Z
k
)

MC Complete Version

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

MC Complete Version

Загружено:

Авторское право:

Доступные форматы

ELEG 867 - Compressive Sensing and Sparse Signal

is convex for every /, then

ELEG 867 (MC and RPCA problems) Fall, 2011 27 / 91

The active functions f

The active functions f

is a local minimizer of a convex optimization problem, it is a global

is a minimizer of a convex function f if and only if f is

ELEG 867 (MC and RPCA problems) Fall, 2011 38 / 91

ELEG 867 (MC and RPCA problems) Fall, 2011 41 / 91

ELEG 867 (MC and RPCA problems) Fall, 2011 68 / 91

(E) = 0 has to be fullled.

is the set of subgradients of the nuclear norm |

is the set of subgradients of the nuclear norm |

is the set of subgradients of the nuclear norm |

(Y) and prove that Y

, then the theorem is

(Y) we can write

which concludes the proof.

0.1N recovers the

ELEG 867 (MC and RPCA problems) Fall, 2011 82 / 91

(Y) is the optimal solution

Вам также может понравиться