Вы находитесь на странице: 1из 15

Principal Component Analysis

Principal component analysis is a statistical technique that is used to analyze


the interrelationships among a large number of variables and to explain these
variables in terms of a smaller number of variables, called principal components,
with a minimum loss of information.
Definition 1: Let X = [x ] be any k 1 random vector. We now define
a k 1 vector Y = [y ], where for each i the ith principal component of X is
i

for some regression coefficients . Since each yi is a linear combination of the x , Y is


a random vector.
Now define the k k coefficient matrix = [ ] whose rows are the 1 k vectors
= [ ]. Thus,
yi =
Y=
For reasons that will be become apparent shortly, we choose to view the rows of as
column vectors , and so the rows themselves are the transpose
.
Observation: Let = [ ] be the k k population covariance matrix for X. Then the
covariance matrix for Y is given by
=
i.e. population variances and covariances of the yi are given by
ij

ij

ij

ij

Observation: Our objective is to choose values for the regression coefficients so


as to maximize var(yi) subject to the constraint that cov(y , yj) = 0 for all i j. We find
such coefficients using the Spectral Decomposition Theorem (Theorem 1 of Linear
Algebra Background). Since the covariance matrix is symmetric, by Theorem 1
of Symmetric Matrices, it follows that
=D
where is a k k matrix whose columns are unit eigenvectors , , corresponding
to the eigenvalues , , of and D is the k k diagonal matrix whose main
diagonal consists of , , . Alternatively, the spectral theorem can be expressed as
ij

ij

Property 1: If are the eigenvalues of with corresponding unit


eigenvectors , , , then
1

and furthermore, for all i and j i

var(yi) =
cov(yi, yj) = 0
Proof: The first statement results from Theorem 11 as explained above. Since the
column vectors are orthonormal, =
= 0 if j i and
= 1 if j = i. Thus
i

Property 2:

Proof: By definition of the covariance matrix, the main diagonal of contains the
values

, ,

, and so trace() =

Eigenvectors, trace() =

. But by Property 1 of Eigenvalues and

Observation: Thus the total variance

for X can be expressed as trace()

=
, but by Property 1, this is also the total variance for Y.
Thus the portion of the total variance (of X or Y) explained by the ith principal
component yi is /

. Assuming that the portion of the total variance

explained by the first m principal components is therefore


/
.
Our goal is to find a reduced number of principal components that can explain most
of the total variance, i.e. we seek a value of m that is as low as possible but such that
the ratio
/
is close to 1.
Observation: Since the population covariance is unknown, we will use the sample
covariance matrix S as an estimate and proceed as above using S in place of . Recall
thatS is given by the formula:

where we now consider X = [x ] to be a k n matrix such that for each i, {x : 1 j n}


is a random sample for random variable x . Since the sample covariance matrix is
symmetric, there is a similar spectral decomposition
ij

ij

where the B = [b ] are the unit eigenvectors of S corresponding to the


eigenvalues of S(actually this is a bit of an abuse of notation since these are not
the same as the eigenvalues of ).
We now use b as the regression coefficients and so have
j

ij

ij

and as above, for all i and j i


var(yi) =

cov(yi, yj) = 0

As before, assuming that , we want to find a value of m so that


explains as much of the total variance as possible. In this way we reduce the number
of principal components needed to explain most of the variance.
Example 1: The school system of a major city wanted to determine the
characteristics of a great teacher, and so they asked 120 students to rate the
importance of each of the following 9 criteria using a Likert scale of 1 to 10 with 10
representing that a particular characteristic is extremely important and 1
representing that the characteristic is not important.
1.
Setting high expectations for the students
2.
Entertaining
3.
Able to communicate effectvely
4.
Having expertise in their subject
5.
Able to motivate
6.
Caring
7.
Charismatic
8.
Having a passion for teaching
9.
Friendly and easy-going
Figure 1 shows the scores from the first 10 students in the sample and Figure 2 shows
some descriptive statistics about the entire 120 person sample.
1

Figure 1 Teacher evaluation scores

Figure 2 Descriptive statistics for teacher evaluations

The sample covariance matrix S is shown in Figure 3 and can be calculated directly as
=MMULT(TRANSPOSE(B4:J123-B126:J126),B4:J123-B126;J126)/
(COUNT(B4:B123)-1)
Here B4:J123 is the range containing all the evaluation scores and B126:J126 is the
range containing the means for each criterion. Alternatively we can simply use the
Real Statistics supplemental function COV(B4:J123) to produce the same result.

Figure 3 Covariance Matrix

In practice, we usually prefer to standardize the sample scores. This will make the
weights of the nine criteria equal. This is equivalent to using the correlation
matrix. Let R = [r ] where r is the correlation between x and x , i.e.
ij

ij

The sample correlation matrix R is shown in Figure 4 and can be calculated directly
as
=MMULT(TRANSPOSE((B4:J123-B126:J126)/B127:J127),(B4:J123B126;J126)/B127:J127)/(COUNT(B4:B123)-1)
Here B127:J127 is the range containing the standard deviations for each criterion.
Alternatively we can simply use the Real Statistics supplemental function
CORR(B4:J123) to produce the same result.

Figure 4 Correlation Matrix

Note that all the values on the main diagonal are 1, as we would expect since the
variances have been standardized. We next calculate the eigenvalues for the
correlation matrix using the eVECTORS(M4:U12) supplemental function, as
described in Linear Algebra Background. The result appears in range M18:U27 of
Figure 5.

Figure 5 Eigenvalues and eigenvectors of the correlation matrix

The first row in Figure 5 contains the eigenvalues for the correlation matrix in Figure
4. Below each eigenvalue is a corresponding unit eigenvector. E.g. the largest
eigenvalue is = 2.880437. Corresponding to this eigenvalue is the 9 1 column
eigenvector B whose elements are 0.108673, -0.41156, etc.
As we described above, coefficients of the eigenvectors serve as the regression
coefficients of the 9 principal components. For example the first principal
component can be expressed by
1

i.e.

Thus for any set of scores (for the x ) you can calculate each of the corresponding
principal components. Keep in mind that you need to standardize the values of
the x first since this is how the correlation matrix was obtained. For the first sample
j

(row 4 of Figure 1), we can calculate the nine principal components using the matrix
equation Y = BX as shown in Figure 6.

Figure 6 Calculation of PC1 for first sample

Here B (range AI61:AQ69) is the set of eigenvectors from Figure 5, X (range


AS61:AS69) is simply the transpose of row 4 from Figure 1, X (range AU61:AU69)
standardizes the scores in X (e.g. cell AU61 contains the formula
=STANDARDIZE(AS61, B126, B127), referring to Figure 2) and Y (range
AW61:AW69) is calculated by =MMULT(TRANSPOSE(AI61:AQ69),AU61:AU69).
Thus the principal components values corresponding to the first sample are
0.782502 (PC1), -1.9758 (PC2), etc.
As observed previously, the total variance for the nine random variables is 9 (since
the variance was standardized to 1 in the correlation matrix), which is, as expected,
equal to the sum of the nine eigenvalues listed in Figure 5. In fact, in Figure 7 we list
the eigenvalues in decreasing order and show the percentage of the total variance
accounted for by that eigenvalue.

Figure 7 Variance accounted for by each eigenvalue

The values in column M are simply the eigenvalues listed in the first row of Figure 5,
with cell M41 containing the formula =SUM(M32:M40) and producing the value 9 as
expected. Each cell in column N contains the percentage of the variance accounted
for by the corresponding eigenvalue. E.g. cell N32 contains the formula =M32/M41,
and so we see that 32% of the total variance is accounted for by the largest

eigenvalue. Column O simply contains the cummulative weights, and so we see that
the first four eigenvalues accounts for 72.3% of the variance.
Using Excels charting capability, we can plot the values in column N of Figure 7 to
obtain a graphical representation, called a scree plot.

Figure 8 Scree Plot

We decide to retain the first four eigenvalues, which explain 72.3% of the variance. In
section Basic Concepts of Factor Analysis we will explain in more detail how to
determine how many eigenvalues to retain. The portion of the Figure 5 that refers to
these eigenvalues is shown in Figure 9. Since all but the Expect value for PC1 is
negative, we first decide to negate all the values. This is not a problem since the
negative of a unit eigenvector is also a unit eigenvector.

Figure 9 Principal component coefficients (Reduced Model)

Those values that are sufficiently large, i.e. the values that show a high correlation
between the principal components and the (standardized) original variables, are
highlighted. We use a threshold of 0.4 for this purpose.
This is done by highlighting the range R32:U40 and selecting Home > Styles|
Conditional Formatting and then choosing Highlight Cell Rules > Greater
Than and inserting the value .4 and then selecting Home > Styles|Conditional
Formatting and then choosing Highlight Cell Rules > Less Than and inserting
the value -.4.
Note that Entertainment, Communications, Charisma and Passion are highly
correlated with PC1, Motivation and Caring are highly correlated with PC3 and
Expertise is highly correlated with PC4. Also Expectation is highly positively
correlated with PC2 while Friendly is negatively correlated with PC2.
Ideally we would like to see that each variable is highly correlated with only one
principal component. As we can see form Figure 9, this is the case in our example.
Usually this is not the case, however, and we will show what to do about this in
the Basic Concepts of Factor Analysis when we discuss rotation in Factor Analysis.
In our analysis we retain 4 of the 9 principal factors. As noted previously, each of the
principal components can be calculated by

i.e. Y= B X, where Y is a k 1 vector of principal components, B is a k


x k matrix (whose columns are the unit eigenvectors) and X is a k 1 vector of the
standardized scores for the original variables.
If we retain only m principal components, then Y = B X where Y is an m 1
vector, B is ak m matrix (consisting of the m unit eigenvectors corresponding to the
m largest eigenvalues) and X is the k 1 vector of standardized scores as before.
The interesting thing is that if Y is known we can calculate estimates for standardized
values for X using the fact that X = BB X = B(B X) = BY (since B is an orthogonal
matrix, and so, BB = I). From X it is then easy to calculate X.
T

Figure 10 Estimate of original scores using reduced model

In Figure 10 we show how this is done using the four principal components that we
calculated from the first sample in Figure 6. B (range AN74;AQ82) is the reduced set
of coefficients (Figure 9), Y (range AS74:AS77) are the principal components as
calculated in Figure 6, X are the estimated standardized values for the first sample
(range AU74:AU82) using the formula =MMULT(AN74:AQ82,AS74:AS77) and
finally X are the estimated scores in the first sample (range AW74:AW82) using the
formula =AU74:AU82*TRANSPOSE(B127:J127)+TRANSPOSE(B126:J126).
As you can see the values for X in Figure 10 are similar, but not exactly the same as
the values for X in Figure 6, demonstrating both the effectiveness as well as the
limitations of the reduced principal component model (at least for this sample data).

Covariance
From Wikipedia, the free encyclopedia

This article is about the measure of linear relation between random variables. For other uses,
see Covariance (disambiguation).
In probability theory and statistics, covariance is a measure of how much two random
variables change together. If the greater values of one variable mainly correspond with the
greater values of the other variable, and the same holds for the smaller values, i.e., the variables
tend to show similar behavior, the covariance is positive. [1] In the opposite case, when the greater
values of one variable mainly correspond to the smaller values of the other, i.e., the variables
tend to show opposite behavior, the covariance is negative. The sign of the covariance therefore
shows the tendency in the linear relationship between the variables. The magnitude of the
covariance is not easy to interpret. The normalized version of the covariance, the correlation
coefficient, however, shows by its magnitude the strength of the linear relation.
A distinction must be made between (1) the covariance of two random variables, which is
a population parameter that can be seen as a property of the joint probability distribution, and (2)
the sample covariance, which serves as an estimated value of the parameter.

Contents
[hide]

1 Definition

2 Properties
o

2.1 A more general identity for covariance matrices

2.2 Uncorrelatedness and independence

2.3 Relationship to inner products

3 Calculating the sample covariance

4 Comments

5 Applications
o

5.1 In genetics and molecular biology

5.2 In financial economics

5.3 In meteorological data assimilation

6 See also

7 References

8 External links

Definition[edit]
The covariance between two jointly distributed real-valued random variables X and Y with
finite second moments is defined as[2]

where E[X] is the expected value of X, also known as the mean of X. By using the linearity
property of expectations, this can be simplified to

However, when
, this last equation is prone to catastrophic
cancellation when computed with floating point arithmetic and thus should be avoided in
computer programs when the data have not been centered before.[3]

For random vectors


and
(both of dimension m) the mm cross covariance matrix
(also known as dispersion matrix or variancecovariance matrix,[4] or simply
calledcovariance matrix) is equal to

where mT is the transpose of the vector (or matrix) m.


The (i,j)-th element of this matrix is equal to the covariance Cov(Xi, Yj) between the ith scalar component of X and the j-th scalar component of Y. In particular, Cov(Y, X)
is thetranspose of Cov(X, Y).

For a vector
of m jointly distributed random
variables with finite second moments, its covariance matrix is defined as

Random variables whose covariance is zero are called uncorrelated.


The units of measurement of the covariance Cov(X, Y) are those of X times
those of Y. By contrast, correlation coefficients, which depend on the
covariance, are a dimensionlessmeasure of linear dependence. (In fact,
correlation coefficients can simply be understood as a normalized version of
covariance.)

Properties[edit]

Variance is a special case of the covariance when the two variables are
identical:

If X, Y, W, and V are real-valued random variables and a, b, c, d are


constant ("constant" in this context means non-random), then the
following facts are a consequence of the definition of covariance:

For a sequence X1, ..., Xn of random variables, and constants a1, ..., an,
we have

A more general identity for covariance matrices [edit]


Let

be a random vector with covariance matrix

let

be a matrix that can act on

vector

, and

. The covariance matrix of the

is:
.

This is a direct result of the linearity of expectation and is useful


when applying a linear transformation, such as a whitening
transformation, to a vector.

Uncorrelatedness and independence[edit]


If X and Y are independent, then their covariance is zero. This
follows because under independence,

The converse, however, is not generally true. For example,


let X be uniformly distributed in [-1, 1] and let Y = X2.
Clearly, X and Y are dependent, but

In this case, the relationship between Y and X is nonlinear, while correlation and covariance are measures
of linear dependence between two variables. This
example shows that if two variables are uncorrelated,
that does not in general imply that they are
independent. However, if two variables are jointly
normally distributed (but not if they are
merelyindividually normally distributed),
uncorrelatedness does imply independence.

Relationship to inner products[edit]


Many of the properties of covariance can be extracted
elegantly by observing that it satisfies similar properties
to those of an inner product:

1. bilinear: for constants a and b and random


variables X, Y, Z, (aX + bY, Z)
= a (X, Z) + b (Y, Z);
2. symmetric: (X, Y) = (Y, X);
3. positive semi-definite: 2(X) = (X, X) 0 for all
random variables X, and (X, X) = 0 implies
that X is a constant random variable (K).
In fact these properties imply that the covariance
defines an inner product over the quotient vector
space obtained by taking the subspace of random
variables with finite second moment and identifying any
two that differ by a constant. (This identification turns
the positive semi-definiteness above into positive
definiteness.) That quotient vector space is isomorphic
to the subspace of random variables with finite second
moment and mean zero; on that subspace, the
covariance is exactly the L2 inner product of real-valued
functions on the sample space.
As a result for random variables with finite variance, the
inequality

holds via the CauchySchwarz inequality.


Proof: If 2(Y) = 0, then it holds trivially. Otherwise,
let random variable

Then we have

Calculating the sample


covariance[edit]
Main article: Sample mean and sample
covariance

The sample covariance of N observations


of K variables is the K-byK matrix

with the entries

,
which is an estimate of the covariance
between variable j and variable k.
The sample mean and the sample
covariance matrix are unbiased
estimates of the mean and
the covariance matrix of the random
vector
, a row vector whose jth
element (j = 1, ..., K) is one of the
random variables. The reason the
sample covariance matrix has
in the denominator rather than
essentially that the population
mean

is

is not known and is

replaced by the sample mean

. If

the population mean


is known,
the analogous unbiased estimate is
given by

Comments[edit]
The covariance is sometimes
called a measure of "linear
dependence" between the two
random variables. That does not
mean the same thing as in the
context of linear algebra(see linear
dependence). When the
covariance is normalized, one
obtains the correlation coefficient.
From it, one can obtain
the Pearson coefficient, which
gives the goodness of the fit for
the best possible linear function
describing the relation between the
variables. In this sense covariance
is a linear gauge of dependence.

Applications[edit]
In genetics and molecular
biology[edit]
Covariance is an important
measure in biology. Certain
sequences of DNA are conserved
more than others among species,
and thus to study secondary and
tertiary structures of proteins, or of
RNA structures, sequences are
compared in closely related
species. If sequence changes are
found or no changes at all are
found in noncoding RNA (such as
microRNA), sequences are found
to be necessary for common
structural motifs, such as an RNA
loop.

In financial economics[edit]
Covariances play a key role
in financial economics, especially
in portfolio theory and in the capital
asset pricing model. Covariances
among various assets' returns are
used to determine, under certain
assumptions, the relative amounts
of different assets that investors
should (in a normative analysis) or
are predicted to (in a positive
analysis) choose to hold in a
context of diversification.

In meteorological data
assimilation[edit]
The covariance matrix is important
in estimating the initial conditions
required for running weather
forecast models.

Вам также может понравиться