Академический Документы
Профессиональный Документы
Культура Документы
docx
Factor Analysis
The methods we have employed so far attempt to repackage all of the variance in the p
variables into principal components. We may wish to restrict our analysis to variance that is common
among variables. That is, when repackaging the variables variance we may wish not to redistribute
variance that is unique to any one variable. This is Common Factor Analysis. A common factor is
an abstraction, a hypothetical dimension that affects at least two of the variables. We assumed that
there is also one unique factor for each variable, a factor that affects that variable but does not affect
any other variables. We assume that the p unique factors are uncorrelated with one another and with
the common factors. It is the variance due to these unique factors that we shall exclude from our FA.
The most common sort of FA is principal axis FA, also known as principal factor analysis.
This analysis proceeds very much like that for a PCA. We eliminate the variance due to unique
factors by replacing the 1s on the main diagonal of the correlation matrix with estimates of the
variables communalities. Recall that a variables communality, its SSL across components or factors,
is the amount of the variables variance that is accounted for by the components or factors. Since our
factors will be common factors, a variables communality will be the amount of its variance that is
common rather than unique. The R
2
between a variable and all other variables is most often used
initially to estimate a variables communality.
For the beer data, I used the following SAS statement:
PROC FACTOR PRIORS=SMC MIN=1 REORDER ROTATE=VARIMAX;
Specifying PRIORS=SMC replaced the 1s with R
2
s in the correlation matrix. Here are selected
parts of the output:
COST SIZE ALCOHOL REPUTAT COLOR AROMA TASTE
SMC .74 .91 .87 .50 .92 .86 .88
POST-ROTATION .74 .91 .86 .41 .91 .87 .89
COMMUNALITY
Note that the R
2
s are generally quite large, indicating considerable shared variance. That for
REPUTAT is less impressive, indicating that REPUTAT had considerable unique variance.
Here are the eigenvalues from the reduced correlation matrix (with R
2
s in the main diagonal).
Factor 1 2 3 4 5 6 7
Eigenvalue 3.13 2.47 .19 .03 .00 -.06 -.09
Cumulative .55 .99 1.02 1.03 1.03 1.02 1.00
The sum of these eigenvalues is 5.68. That means that 5.68/7 = 81% of the variance was
distributed among seven factors (recall that our PCA distributed all of the variance, that is, the sum of
its seven eigenvalues was seven).
Do not be concerned about the small negative eigenvalues. While some authors warn that
negative or near zero eigenvalues indicate multicollinearity (very high R
2
s), which may make the
results unstable [fluctuating wildly with only minor changes in the correlations], necessitating the
removal of redundant variables, I dont think this is often a problem in FA, where we are seeking to
collect somewhat redundant variables into factors. Of course, if you do something stupid like have
more variables than subjects or have a perfect linear combination among the variables, for example,
SAT total score, SAT verbal score, and SAT math score, you will have a singular matrix that cant be
inverted and the FA will crash. The SAS users guide says, with data that do not fit the common