Factor Handout

Factor Analysis
Introduction
Factor analysis is an interdependence technique whose primary
purpose is to define the underlying structure among the variables

in the analysis.
It examines the interrelationships among a large number of
variables and then attempts to explain them in terms of their

common underlying dimensions.
These common underlying dimensions are referred to as factors.
A summarization and data reduction technique that does not have
independent and dependent variables, but is an interdependence

technique in which all variables are considered simultaneously.
Correlation Matrix of Variables After Grouping Using Factor Analysis

Shaded areas represent variables likely to be grouped together by factor analysis.
Factor Analysis Objective

Data summarization: derives underlying dimensions
that, when interpreted and understood, describe the data

in a much smaller number of concepts than the original
individual variables.
Data
reduction: extends the process of data

summarization by deriving an empirical value (factor
score or summated scale) for each dimension (factor) and
then substituting this value for the original values.
Two most prominent type of Factoring

(Extraction)
There are two major methods of extracting the factors from a set of
variables:
1)
Principal components analysis (PCA): Used in exploratory

research, as when a researcher simply wants to reduce a large number
of items (ex., an 80-item survey) to a smaller number of underlying
latent dimensions (ex. 7 factors). This is the most common type of
"factor analysis.
2)
Principal factor analysis (PFA) or Principal Axis Factoring (PAF)

or Common Factor Analysis (CFA): Used in confirmatory research,
as when the researcher has a causal model. As such it is used in
conjunction with causal modeling techniques such as path analysis,
partial least squares modeling, and structural equation modeling.
Principal Component Analysis (PCA)
Principal Axis Factoring (PAF)
Analyzes a correlation matrix in which the

diagonals contain 1's.
Analyzes a correlation matrix in which the

diagonals contain the communalities.
PCA accounts for total variance of variables.

Factors or components reflect the common
variance plus unique variance.
PFA accounts for the co-variation among

variables. Factors reflect the common
variance of the variables excluding unique
variances.
PCA is a variance focused technique.
PFA is a correlation focused technique.
PCA is used for exploratory purposes when a

researcher does not have a causal model but
simply wants to reduce the large number of
items to a smaller number.
Adding variables to the model will change the
factor loadings.
Typically used in confirmatory research.
It is possible to add variables to the model

without affecting the factor loadings.
Assumptions
Multi-collinearity- Assessed using MSA

sampling adequacy).
(measure of
The MSA is measured by the Kaiser-Meyer-Olkin (KMO)
statistic. As a measure of sampling adequacy, the KMO

predicts if data are likely to factor well based on correlation
and partial correlation.
KMO can be used to identify which variables to drop from the
factor analysis because they lack multi-collinearity.

There is a KMO statistic for each individual variable, and
their sum is the KMO overall statistic.
KMO varies from 0 to 1.0. Overall KMO should be .50
or higher to proceed with factor analysis. If it is not,

remove the variable with the lowest individual KMO
statistic value one at a time until KMO overall rises
above .50, and each individual variable KMO is above .
50.
2) There must be a strong conceptual foundation to
support the assumption that a structure does exist

before the factor analysis is performed.
3) Multivariate Normality
Factor loadings
The factor loadings are the correlation coefficients between the items
(rows) and factors (columns).

Analogous to Pearson's r, the squared factor loading is the percent of
variance in that indicator variable explained by the factor.

To get the percent of variance in all the variables accounted for by each
factor, add the sum of the squared factor loadings for that factor (column)
and divide by the number of variables.
Factor loadings should be .5 or higher to confirm that independent
variables identified a priori are represented by a particular factor, on the

rationale that the .5 level corresponds to about half of the variance in the
indicator being explained by the factor.
Communality
Communality (h2 ) measures the percent of variance in a given variable explained by
all the factors jointly and may be interpreted as the reliability of the indicator.
Technique- row-wise for each variable ----cell1^2+cell2^2+cell3^2+..
In the example below, focused on subjects' music preferences (see example below), the
extracted factors explain over 95% of preferences for rap music but only 56% for country
western music. In general, communalities show for which measured variables the factor
analysis is working best and least well.
Low communality. When an indicator variable has a low communality, the
factor model is not working well for that indicator and possibly it should be
removed from the model.
Low communalities across the set of variables indicates the variables are little
related to each other.

However, communalities must be interpreted in relation to the interpretability of
the factors. A communality of .75 seems high but is meaningless unless the factor
on which the variable is loaded is interpretable, though it usually will be. A
communality of .25 seems low but may be meaningful if the item is contributing
to a well-defined factor.
What is critical is not the communality coefficient per se, but rather the extent to
which the item plays a role in the interpretation of the factor, though often this
role is greater when communality is high.
Spurious solutions. If the communality exceeds 1.0, there is
a spurious solution, which may reflect too small a sample or

the researcher has too many or too few factors.
In PCA the initial communality will be 1.0 for all
variables and all of the variance in the variables will be

explained by all of the factors, which will be as many as
there are variables.
The "extracted" communality is the percent of variance
in a given variable explained by the factors which are

extracted, which will usually be fewer than all the possible
factors, resulting in coefficients less than 1.0 .
Eigenvalues
The eigenvalue for a given factor measures the variance in
all the variables which is accounted for by that factor.

The ratio of eigen values is the ratio of explanatory
importance of the factors with respect to the variables. If

a factor has a low eigenvalue, then it is contributing little to
the explanation of variances in the variables and may be
ignored as redundant with more important factors.
Technique-
column-wise
cell1^2+cell2^2+cell3^2+
for
each
factor-

In the example below, again on analysis of music preferences, 18 components (factors) would be needed to
explain 100% of the variance in the data. However, using the conventional criterion of stopping when the
initial eigen value drops below 1.0, only 6 of the 18 factors were actually extracted in this analysis. These six
account for 72% of the variance in the data.
Criteria for determining the number of factors

Scree plot: The scree test, shown below, plots the components as the X axis and the
corresponding eigen values as the Y axis. As one moves to the right, toward later components,
the eigen values drop. When the drop ceases and the curve makes an elbow toward less steep
decline, scree test says to drop all further components after the one starting the elbow.
Kaiser criterion: The Kaiser rule is to drop all components with eigen values
under 1.0. It may overestimate or underestimate the true number of factors
Rotation methods
Rotation serves to make the output more understandable and is usually
necessary to facilitate the interpretation of factors. The sum of eigen values is

not affected by rotation, but rotation will alter the eigen values (and percent of
variance explained) of particular factors and will change the factor loadings.
No rotation-The original, unrotated principal components solution maximizes
the sum of squared factor loadings, efficiently creating a set of factors which
explain as much of the variance in the original variables as possible. The
amount explained is reflected in the sum of the eigen values of all factors.
However, unrotated solutions are hard to interpret because variables tend to
load on multiple factors.
Important consideration in selecting the rotation technique depends on whether
the researcher wants orthogonality. Under Orthogonality, the unique factors

should be uncorrelated with each other.
Varimax rotation is an orthogonal rotation method that minimizes the number
of variables that have high loadings on each factor. This method simplifies the
interpretation of the factors. This is the most common rotation option.
Assumption is that the factors are uncorrelated.
Direct oblimin rotation, sometimes called just "oblique rotation," is the standard
method when one wishes a non-orthogonal (oblique) solution -- that is, one in which
the factors are allowed to be correlated. This will result in higher eigenvalues but
diminished interpretability of the factors.
Example- Salary, incentives, Esops
Oblique rotations allow the factors to be correlated. In statistical output, a factor
correlation matrix is generated when oblique rotation is requested. Normally,

however, an orthogonal method such as varimax is selected and no factor correlation
matrix is produced as the correlation of any factor with another is zero in orthogonal
solutions. In PCA, a component transformation matrix in SPSS output shows the
correlation of the factors before and after rotation.
Factor scores
Factor score is a score for a given individual or observation on a given factor. Factor
scores can be correlated even when an orthogonal factor extraction was performed.
Regression Scores- The most common type of factor score is the regression scores,
based on ordinary least squares (OLS) estimates.

Bartlett scores - Bartlett scores may be preferred over regression scores on the
argument that they better conform to the original factor structure. Bartletts score may
be correlated.
Anderson-Rubin scores - Anderson-Rubin factor scores are a modification of Bartlett
scores to ensure orthogonality. Therefore Anderson-Rubin scores are uncorrelated.

Computing factor scores allows one to look for factor outliers. Also, factor scores
may be used as variables in subsequent modeling.
Dropping Variables from the Analysis :

A KMO statistic is generated for each predictor. Predictors whose KMO does not rise to some
criterion level (example .5 or higher) may be dropped from the analysis. Doing so is dropping
predictors based on low partial correlation.
The more prevalent criterion for dropping predictors is low communality, based on factor
analysis itself.
As the two dropping criteria (KMO and communality) may differ the latter is generally
preferred
though both may be considered. Low KMO indicates the variable in question may be too multcollinear with others in the model. Low communality indicates that the variable is not well
explained by the factor model. One strategy is to drop the indicator variables with the lowest
individual KMO one at a time until overall KMO rises above .50.
Predicting if data will factor well:

An overall KMO of .5 or higher is considered adequate and .8 or higher is considered good
factorability.
The KMO increases as 1) the sample size increases 2) the average correlation increases 3) the
number of variables increases or 4) the factor decreases. The researcher should always have an

Factor Handout

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Factor Handout

Загружено:

Авторское право:

Доступные форматы

Factor Analysis

purpose is to define the underlying structure among the variables

variables and then attempts to explain them in terms of their

independent and dependent variables, but is an interdependence

Correlation Matrix of Variables After Grouping Using Factor Analysis

Factor Analysis Objective

that, when interpreted and understood, describe the data

reduction: extends the process of data

Two most prominent type of Factoring

Principal components analysis (PCA): Used in exploratory

Principal factor analysis (PFA) or Principal Axis Factoring (PAF)

Principal Component Analysis (PCA)

Principal Axis Factoring (PAF)

Analyzes a correlation matrix in which the

Analyzes a correlation matrix in which the

PCA accounts for total variance of variables.

PFA accounts for the co-variation among

PCA is a variance focused technique.

PFA is a correlation focused technique.

PCA is used for exploratory purposes when a

Typically used in confirmatory research.

It is possible to add variables to the model

Multi-collinearity- Assessed using MSA

The MSA is measured by the Kaiser-Meyer-Olkin (KMO)

statistic. As a measure of sampling adequacy, the KMO

factor analysis because they lack multi-collinearity.

their sum is the KMO overall statistic.

KMO varies from 0 to 1.0. Overall KMO should be .50

or higher to proceed with factor analysis. If it is not,

support the assumption that a structure does exist

(rows) and factors (columns).

variance in that indicator variable explained by the factor.

variables identified a priori are represented by a particular factor, on the

Communality (h2 ) measures the percent of variance in a given variable explained by

Low communality. When an indicator variable has a low communality, the

related to each other.

Spurious solutions. If the communality exceeds 1.0, there is

a spurious solution, which may reflect too small a sample or

variables and all of the variance in the variables will be

in a given variable explained by the factors which are

all the variables which is accounted for by that factor.

importance of the factors with respect to the variables. If

Criteria for determining the number of factors

necessary to facilitate the interpretation of factors. The sum of eigen values is

the researcher wants orthogonality. Under Orthogonality, the unique factors

Varimax rotation is an orthogonal rotation method that minimizes the number

correlation matrix is generated when oblique rotation is requested. Normally,

based on ordinary least squares (OLS) estimates.

scores to ensure orthogonality. Therefore Anderson-Rubin scores are uncorrelated.

may be used as variables in subsequent modeling.

Dropping Variables from the Analysis :

Predicting if data will factor well:

Вам также может понравиться