Вы находитесь на странице: 1из 2

What is a Principal Component Analysis?

YAP Von Bing, Statistics and Applied Probability, NUS


September 24, 2010

Let an n p matrix X contain measurements of p variables on n subjects. Each row


of X (subject) will be viewed as a point in Rp . From each column (variable), subtract the
column mean. This amounts to shifting the points so that they are centred at the origin,
i.e., the mean of each variable is 0. The relative position of the points is unchanged, so
S = n1 X 0 X is the variance matrix of the variables. Let xi be the ith column of X, so that
Sii = n1 x0i xi measures the spread of the points along the ith axis. There can be another
direction along which the spread is larger, as can be seen from an example with p = 2
where the variables are linearly correlated. A direction which maximises this variance is
called a first principal direction of X.

To find a first principal direction, note that xi = Xei , where ei is the p 1 vector with
1 at the ith entry and 0 elsewhere. Each ei has norm 1: e0i ei = 1. More generally, if a Rp
has norm 1, then c = Xa are the coordinates of the n points along the direction a. Thus,
a first principal direction is given by an a with norm 1 that maximises n1 c0 c = a0 Sa,
which can be found by the method of Lagrange multipliers. Setting the partial derivatives
of a0 Sa (a0 a 1) to 0 yields Sa = a, so a stationary point a must be a normalised
eigenvector of S, and the variance of c is the associated eigenvalue . Since S is symmetric,
the spectral theorem implies that it has eigenvalues and eigenvectors. Furthermore, all its
eigenvalues are non-negative because S is positive semidefinite: for any a Rp , a0 Sa 0.
We have shown

Proposition 1: The normalised eigenvector a1 of S for the largest eigenvalue 1 is the


first principal direction of X, and the variance of the coordinates c1 = Xa1 is 1 .

The spectral theorem says there exist an orthogonal A and a diagonal , both p p,
such that
SA = A
The columns of A, the eigenvectors, are an orthonormal basis of Rp . The diagonal entries of
are the eigenvalues. Let the columns of A be arranged in such a way that the eigenvalues
are 1 p , with associated eigenvectors a1 , . . . , ap . We have
p
X
0
S = AA = i ai a0i (1)
i=1

Recall that the coordinates of the n points along the first principal direction a1 are c1 =
Xa1 . The n p c1 a01 is the first principal component of X. Define X by
X = c1 a01 + X

1
The rows of X are obtained by projecting those of X onto the subspace spanned by
a2 , . . . , ap , so the summands are orthogonal: X0 c1 a01 = 0. Let S = n1 X0 X be the
variance matrix of X . Then by (1),
p
X
1
S = S n a1 c01 c1 a01 =S 1 a1 a01 = i ai a0i
i=2

Thus, we get a spectral decomposition of S for free. It follows from Proposition 1 that the
first principal direction of X is a2 and the variance of the coordinates is 2 . Since

X a2 = (X c1 a01 )a2 = Xa2 := c2

we define the second principal component of X as c2 a02 . An induction argument completes


the construction of the remaining principal components, so that

X = c1 a01 + + cp a0p

Some of the coordinates could be the zero vector, corresponding to an eigenvalue of 0. The
number of non-zero components is then r, the number of non-zero eigenvalues, which is
the rank of X (and that of X 0 , X 0 X and XX 0 ). In general, r min(n, p), but column
centering implies r n 1. In summary:

Proposition 2 (Principal Component Analysis): Let X be an n p column-centred


data matrix with rank r min(n1, p). Let the variance matrix n1 X 0 X have orthonormal
eigenvectors {a1 , . . . , ar } and associated positive eigenvalues 1 r > 0. Then X
splits into r orthogonal principal components:

X = c1 a01 + + cr a0r (2)

where ci = Xai are the ith coordinates, with variance n1 c0i ci = i .

The coordinates are orthogonal: for i 6= j, c0i cj = 0. (2) implies XX 0 = ri=1 ci c0i , sug-
P
gesting that the coordinates are eigenvectors of n1 XX 0 . Indeed, n1 XX 0 ci = n1 XX 0 Xai =
1
Xi ai = i ci . Let bi = (ni ) 2 ci be the normalised coordinates. Then 1 , . . . , r are eigen-
values of both n1 X 0 X (orthonormal eigenvectors a1 , . . . , ar ) and n1 XX 0 (orthonormal
eigenvectors b1 , . . . , br ), and (2) can be written:
1 1
X = (n1 ) 2 b1 a01 + + (nr ) 2 br a0r

Thus, we have shown

Proposition 3 (Singular Value Decomposition): Let X be n p with rank r. There


are p r A and n r B with A0 A = B 0 B = I, and diagonal r r with strictly positive di-
agonal entries i such that X = BA0 . Let ai and bi be the ith column of A and B. Then ai
is an eigenvector of X 0 X, and bi is an eigenvector of XX 0 , associated with the eigenvalue i2 .

The i s are the singular values of X. Both PCA and SVD split X into the same
components; they only differ in interpretation.