Вы находитесь на странице: 1из 6

# NA_08: Principle Component Analysis

## Numerical algorithms, University of Konstanz, ST 2008

by Vladimir Bondarenko, 03-June. In the last lecture, we learned one of the most important matrix factorizations, the Singular Value Decomposition or shortly SVD. The SVD has many useful properties desirable in many applications. In this Lab, we meet one particular and popular application of the SVD: the Principle Components Analysis or shortly PCA.

Contents
1. Image SVD 2. PCA

1. Image SVD
PCA employs the best low-rank approximation property of the SVD (stated in the Theorems 5.8 and 5.9 of the NLA book). Having computed the SVD of an m-by-n matrix A, the matrix can be represented as a sum of rank-one matrices:

where u and v are the left and right singular vectors and sigmas are the singular values. The rank r approximation of the matrix is then the truncated sum:

Let's illustrate this idea in a simple exercise, where we are looking for an approximation to a matrix that is a grayscale image.

## b ) Compute its SVD:

use the MATLAB [U, S, V] = svd(A, 0) function. The zero stands for the "economysize" SVD. In the 2d figure, plot the singular values on a logarithmic scale. YOUR CODE HERE:

converted by Web2PDFConvert.com

## c ) Compute the low-rank approximation:

Compute also the relative approximation error w.r.t. Frobenius norm (use the MATLAB norm(A,'fro') function). Plot the original image on the right subplot of the 3d figure and it's rank- r approximation on the left one. Display the rank and the error values in the title. At what rank and error values the original image and its approximation are not visually discernible? YOUR CODE HERE:

2. PCA
We are going to learn PCA by a simple but typical example. Suppose we measure the height and weight of six subjects and write the data row-wise into the matrix A:

converted by Web2PDFConvert.com

Each of the subjects is characterized by two components (coordinates), namely, height, H, and weight, W. The question the PCA answers is:

is there one underlying component - let's call it "size", S - that linearly predict (approximate) both height and weight?
Let's attempt a qualitative answer to this question first. We would like to know whether there is a good approximation of height and weight in the form:

where a and b are some constants. Put it another way, we'd like to know whether the ratio of W and H is approximately constant:

## Let's look at the data to see whether this is the case.

a ) Explore data:
In a single figure but different subplots, plot the height and weight values, and the weight vs. height. YOUR CODE HERE:

converted by Web2PDFConvert.com

From the plots, it is clear that the "weight" and "height" data are strongly correlated, i.e., linearly dependent. Thus, there must be a single predictive component, like "Size". Now, the question becomes quantitative: how to compute this component. As usual there is more than one way to do this. One possibility is to let the SVD to compute it. Let's rewrite the Eq. (1) in matrix form:

converted by Web2PDFConvert.com

So, we would like to find a column-vector S and a row vector v=[a,b] which produce the best possible rank-one approximation to A. That is exactly what the SVD does! Let's compute it.

## b ) Compute the SVD:

Notice that the first singular value is much larger than the second one indicating that the rank-one approximation is fairly accurate.

## c ) The principle component:

By comparing the two equations above, it is clear that the "Size" component is the scaled 1st left singular vector, u :

and the corresponding coefficients a and b are the components of the 1st right singular vector, v.

## d ) The rank-one approximation:

Given the "Size" and coefficient vectors, we can recompute the approximated "weight" and "height" data. Do it and plot (in a single figure but different subplots) the

converted by Web2PDFConvert.com

approximated quantities together with the original data. Add a legend to your plots. In a separate subplot plot the "size" values also. YOUR CODE HERE:

## Published with MATLAB 7.6

converted by Web2PDFConvert.com