Академический Документы
Профессиональный Документы
Культура Документы
Chris Ding
Lawrence Berkeley National Laboratory Supported by Office of Science, U.S. Dept. of Energy
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
K-means clustering
Spectral Clustering
Semi-supervised classification
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
Part 1.A. Principal Component Analysis (PCA) and Singular Value Decomposition (SVD)
Widely used in large number of different fields Most widely known as PCA (multivariate statistics) SVD is the theoretical basis for PCA
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 3
Brief history
PCA
Draw a plane closest to data points (Pearson, 1901) Retain most variance (Hotelling, 1933)
SVD
Low-rank approximation (Eckart-Young, 1936) Practical application/Efficient Computation (GolubKahan, 1965)
Many generalizations
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
X = ( x1 , x2 ,L, xn )
p k =1
T C = XX T = k uk uk
XTX =
k =1
T k v k v k
k =1
T k uk vk = UV T
5
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
Further Developments
SVD/PCA
Principal Curves Independent Component Analysis Sparse SVD/PCA (many approaches) Mixture of Probabilistic PCA Generalization to exponential familty, max-margin Connection to K-means clustering
Kernel (inner-product)
Kernel PCA
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
X = ( x1 , x2 ,L, xn )
uk = uk (1) X 1 + L + uk (d ) X d
Dimension reduction: Projection to low-dim subspace Sphereing the data Transform data to N(0,1)
X=
k =1
T k uk vk = UV T
~ X =UT X
U = (u1 ,L, uk )
~ X = C 1 / 2 X = U 1U T X
7
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
Applications of PCA/SVD
Most popular in multivariate statistics Image processing, signal processing Physics: principal axis, diagonalization of 2nd tensor (mass) Climate: Empirical Orthogonal Functions (EOF) s ( t +1) = As ( t ) + E , P ( t +1) = AP (t ) AT Kalman filter. Reduced order analysis
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 8
Applications of PCA/SVD
PCA/SVD is as widely as Fast Fourier Transforms
Both are spectral expansions FFT is more on Partial Differential Equations PCA/SVD is more on discrete (data) analysis PCA/SVD surpass FFT as computational sciences further advance
PCA/SVD
Select combination of variables Dimension reduction
An image has 104 pixels. True dimension is 20 !
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
C = XX T =
k =1
T k uk uk =UU T
Kernel matrix
XTX =
k =1 p
T k vk vk = VV T
k =1
T k uk vk = UV T
10
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
Wij = ( xi ), ( x j )
Wv = v
Generalized Eigenvector:
Wq = Dq
di =
D = diag (d1,L, dn )
w
j
ij
W=
1 2
vk k vT k
Scaled PCA:
~ 1 W = D W D2 = D
k =1
qk k qT D k
1 2
Scaled PCA on a Rectangle Matrix Correspondence Analysis ~ 1 1 ~ Re-scaling: P = Dr 2 PD 2 , pij = pij /( pi. pj.)1/ 2 c
~ Apply SVD on P
T k =1
P rc / p.. = Dr f k k g Dc
r = ( p1.,L, pn. )
fk = D u , gk = D v
1 2 r k
1 2 c k
c = ( p.1,L, p.n )
are scaled row and column principal component (standard coordinates in CA)
(Zha, et al, CIKM 2001, Ding et al, PKDD2002) PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding 13
X = ( x1 , x2 ,L, xn )
Decomposition (low-rank approximation) Nonnegative Matrices
xi
T
X FG
X ij 0, Fij 0, Gij 0
F = ( f1 , f 2 , L, f k )
G = ( g1 , g 2 ,L, g k )
14
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
Fix F, solve for G; Fix G, solve for F Lee & Seung ( 2000) propose
( X T F ) jk (GF T F ) jk
G jk G jk
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
15
Rectangle Matrix
(contigency table, bipartite graph)
PCA:
W = VV T
X = UV T
Scaled PCA: 1 ~ 1 T 2 2 W = D W D = D QQ D
NMF:
~ 1 X = Dr X Dc2 = Dr FG T Dc
1 2
W QQ
X FG
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
16
K-means:
W = XT X;
max Tr( H T WH + H T AH H T BH )
H
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
Tutorial Outline
PCA Recent developments on PCA/SVD Equivalence to K-means clustering Scaled PCA Laplacian matrix Spectral clustering Spectral ordering Nonnegative Matrix Factorization Equivalence to K-means clustering Holistic vs. Parts-based Indicator Matrix Quadratic Clustering Use Nonnegative Lagrangian Relaxtion Includes K-means and Spectral Clustering semi-supervised classification Semi-supervised clustering Outlier detection
19
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
Part 1.B.
X = ( x1 , x2 ,L, xn )
T C = XX T = k uk uk p k =1
XTX =
k =1
T k v k v k
X = u v
k =1
T k k k
21
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding
Kernel PCA
xi ( xi )
Kernel
K ij = ( xi ), ( x j )
v, ( x ) =
PCA Component
Feature extraction
vi ( xi ), ( x)
Mixture of PCA
Data has local structures.
Global PCA on all data is not useful
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding
xi = Wsi + + , ~ N (0, I )
Gaussian prior
P( s) ~
2
2 N ( s0 , s I ) T
x ~ N (Ws0 , I + sWW )
Linear Gaussian Model
si +1 = Asi + ,
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding
xi = Wsi + ,
24
Sparse PCA
Compute a factorization Why sparse?
Variable selection (sparse U) When n >> d Storage saving Other new reasons?
X UV T
L1 and L2 constraints
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding
25
Sparsified SVD
U = (u1 Luk )
V = (v1 Lvk )
Compute {uk,vk} one at a time, truncate those entries below a threshold. Recursively compute all pairs using deflation.
(Zhang, Zha, Simon, 2002)
Semi-discrete decomposition
(Kolda & Oleary, 1999)
X X uvT
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding
26
min || y X T ||2 ,
SCoTLASS (Joliffe & Uddin, 2003)
|| ||1 t
max u T ( XX T )u T ,
|| u ||1 t , u T uh = 0
Least Angle Regression (Efron, et al 2004) Sparse PCA (Zou, Hastie, Tibshirani,2004)
min
,
i =1
|| xi T xi ||2 +
j =1
|| j ||2 +
j =1
1, j || j ||1 , T = I
v j = j / || j ||
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding
27
Many questions
Orthogonality Unique solution, global solution
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 29
max(0,1 Y
ia X ia )
n1 + L + nk = n
Column Partitioned Data Matrix Partitions are generate by clustering U = (u1 Luk ) Centroid matrix
uk is centroid Fix U, compute V
min || X UV T ||2 F
V = X T U (U T U ) 1
k k
Two-dimensional SVD
Large number of data objects are 2-D: images, maps Standard method:
convert (re-order) each image as a 1D vector collect all 1D vectors into a single (big) matrix apply SVD on the big matrix
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding
32
Pixel vector
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding
33
XX T
T
and X T X
=UT X V
X = UV
2D-SVD
Eigenvectors of
F=
G=
{ A} = { A1 , A2 ,L, An }
( Ai A )( Ai A )T
( Ai A )T ( Ai A )
T
i
i
Ai = UM iV
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding
2D-SVD
{ A} = { A1 , A2 ,L, An }
row-row cov: col-col cov: Bilinear subspace
T i
T
assume
A =0
F = Ai Ai = u u
G = Ai Ai = k uk u T k
i k =1
T k k k
M i = U Ai V
T
Ai = UM iV , i = 1,L, n
Ai rc ,U rk ,V ck , M i kk
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 35
i2
Ai LM i RT , Ai R rc , L R rk , R R ck , M i R kk
min J1 =
i =1 n
|| Ai LM i ||2 =
min J 2 =
min J 3 =
i =1 n
j = k +1 r
j = k +1 r
|| Ai M i RT ||2 =
|| Ai LM i RT ||2
i =1 n
j = k +1 r
j
c
+
j j = k +1
min J 4 =
i =1
|| Ai LM i LT ||2 2
j = k +1
j
36
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding
37
Reconstructed image
SVD 2dSVD SVD (K=15), storage 160560 2DSVD (K=15), storage 93060
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding 38
2D-SVD Summary
2DSVD is extension of standard SVD Provides optimal solution for 4 representations for 2D images/maps Substantial improvements in storage, computation, quality of reconstruction Capture 2D characteristics
PCA & Matrix Factorization for Learning, ICML 2005, Chris Ding
39
40 PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
K-means clustering
Also called isodata, vector quantization Developed in 1960s (Lloyd, MacQueen, Hatigan, etc) Computationally Efficient (order-mN) Widely used in practice
Benchmark to evaluate other algorithms
X = ( x1 , x2 ,L, xn )
K-means objective
min J K =
k =1 iC k
|| xi ck ||2
41
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
42 PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
+ n2 / n1n q (i ) = n1 / n2 n
if i C1 if i C2
J K = n x J D ,
2
nn JD = 1 2 n
A simple illustration
44 PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
45 PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
1 1 0 0
0 0 0 0 = (h1 , h2 , h3 ) 1 0 0 1
46
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
1 2 xi i n k =1 k
i , jC k
xiT x j =
xi2
k =1
T hk X T Xhk
Regularized Relaxation
(q1 ,..., qk ) = (h1 ,L, hk )T
Redundancy: k =1
Qk = H kT
n1/ 2 hk = e k
q1 = e /n1/2
47
PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
T maxTr[Qk 1 ( X T X )Qk 1 ]
Qk 1 = (q2 ,..., qk )
k =1
K 1
48 PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
h1 = (1L1,0L0) , h2 = (0L0,1L1)
T
T
T
a=
b=
n2 n1n
n1 n2 n
q1 = (1L1) ,
T
q2 = (a,L, a,b,L,b)
T ck c k
Centroid is given by ck =
P=
h (i) x = Xh
k i
k
T k k k
c c
k
T k k
=X
h h
k
T k k
XT = X
v v
k
T k k
XT =
u u
k
PK means =
T k u k u k
T uk uk PPCA
PCA automatically project into cluster subspace PCA is unsupervised version of LDA
51 PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
52 PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
xi ( xi )
k =1 iCk
|| ( xi ) (ck ) ||2
| ( xi ) |2
1 n k =1 k
( xi )T ( x j )
i , jCk
53 PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding
54 PCA & Matrix Factorizations for Learning, ICML 2005 Tutorial, Chris Ding