Imb 2011 Seminar6 Pca Fa

Seminar 6: PCA and Factor Analysis
January 19 2011
PRINCIPAL COMPONENT ANALYSIS

The main idea of this method is to form, from a set of existing variables, a new variable (or new variables, but as few as possible) that contain as much variability of the original data as possible. This is a method of data reduction; we reduce the number of variables in order to handle data more easily. In most cases we wish to get only one dimension (variable) that contains most of the variability of the original data. This variable than represents some sort of index of a certain property that is measured by the original variables. For example: we are measuring the development of a region. We measure the differences with several variables (e.g. GDP/pc, infant mortality,...). With the help of principal component analysis we can construct an index of development. a controller in a factory has several indicators of quality components analysis we can construct a quality index with principal
PRINCIPAL COMPONENT ANALYSIS ANALYSIS
WITH
SPSS
PROCEDURE
FACTOR
SPSS can perform principal component analysis, but the procedure for doing so is hidden within the procedure for factor analysis. Procedure can perform the analysis with standardized and original (non-standardized) data. With this procedure we can compute descriptive statistics for all variables make the correlation matrix compute communalities compute the share of variance of original data, explained by each and all components plot the scree-plot
COMPUTATION OF THE PARAMETERS OF PRINCIPAL COMPONENTS ANALYSIS

1. Enter or load the data 2. Select Analyze | Dimension Reduction | Factor; we get the menu Factor Analysis (Figure 1)
Marko Pahor
January 19 2011
Figure 1: Dialog window Factor Analysis
3. In the left box we select the variables that we want to enter into the principal components analysis and transfer them into the right box. 4. Click Extraction...; we get the menu Factor Analysis: Extraction (Figure 2). The option for performing principal components analysis is Principal Components in the field Method. Other options in this field are for factor analysis. . 5. We click OK, the window Factor Analysis closes and the results of the analysis appear in the Viewer window.
Marko Pahor
January 19 2011
Figure 2: Dialog window Factor Analysis: Extraction
In the box Analyze we can set, whether the analysis will be performed on original (nonstandardized) (Covariance matrix) or standardized data (Correlation matrix). When choosing the analysis on original data, the importance of a variable is determined by the relative size of its variance higher variance means higher importance of that variable. If we dont want the variability of a variable to determine its importance, we decide to standardize data and so to use the correlation matrix. The decision, which one to use, depends on the nature of the problem. If we think the variables are more or less equally important, we decide for the standardization; if the variability of the variable is of any importance, we use covariance matrix in the analysis. When variables are of very different measurement sizes (e.g. infant mortality in % against GDP/pc in $) the standardization is usually the only sensible choice. Field Display offers the possibility of printing the unrotated solution (the only one in principal component analysis). The solution can contain only some components; the number of components is set by the rules in the field Extract. Field Display also sets the display of the scree-plot. Scree-plot is useful in determining the number of components needed.
Marko Pahor
January 19 2011
In field Extract we set how many components we want to be displayed. We can set the number of components we want or set the cut-off eigenvalue. Default value is 1 in the case of standardized data or the average eigenvalue in case of original data.
DESCRIPTIVE STATISTICS AND CORRELATION MATRICES
Click Descriptives, which opens the dialog window Factor Analysis: Descriptives (Figure 3). In this dialog we set: in field Statistics the display of descriptive statistics and the initial solution (all components)
Figure 3: Dialog window Factor Analysis: Descriptives
in field Correlation Matrix we set the display of correlation matrix, significances,... KMO or Keiser-Meyer-Olin-ova measure of sampling adequacy shows the strength of connection between variables; it can be between 0 and 1, values closer to 1 are more desirable. Bartlet test of sphericity tests for the assumption, that the correlation matrix is an identity matrix (variables are not correlated). In this case, principal component analysis can not be performed.
Marko Pahor
January 19 2011
EXAMPLE
FACTOR /VARIABLES total_liters value_sum transactions share_olive_oil /MISSING LISTWISE /ANALYSIS total_liters value_sum transactions share_olive_oil /PRINT UNIVARIATE INITIAL CORRELATION KMO EXTRACTION /PLOT EIGEN /CRITERIA MINEIGEN(1) ITERATE(25) /EXTRACTION PC /ROTATION NOROTATE /METHOD=CORRELATION. Factor Analysis Descriptive Statistics Mean total_liters value_sum transactions share_olive_oil 1.5709 10.1272 1.90 8.6048 Std. Deviation 1.49828 9.69014 1.597 11.73409 Analysis N 504 504 504 504
Correlation Matrix total_liters Correlation total_liters value_sum transactions share_olive_oil 1.000 .824 .842 .249 value_sum .824 1.000 .867 .299 transactions .842 .867 1.000 .210 share_olive_oil .249 .299 .210 1.000
KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy. Bartlett's Test of Sphericity Approx. Chi-Square df Sig. .767 1436.940 6 .000
Communalities
Marko Pahor
January 19 2011
Initial total_liters value_sum transactions share_olive_oil 1.000 1.000 1.000 1.000
Extraction .863 .894 .881 .157
Extraction Method: Principal Component Analysis.
Total Variance Explained Component Initial Eigenvalues % of Total 1 dimen 2 sion0 3 4 2.796 .898 .180 .125 Variance 69.898 22.461 4.511 3.130 Cumulative % 69.898 92.359 96.870 100.000 Total 2.796 Extraction Sums of Squared Loadings % of Variance 69.898 Cumulative % 69.898
Extraction Method: Principal Component Analysis.
Marko Pahor
January 19 2011
Component Matrixa Compone nt 1 total_liters .929 value_sum .946 transactions .939 share_olive_oil .396 Extraction Method: Principal Component Analysis. a. 1 components extracted.
Marko Pahor
January 19 2011
FACTOR ANALYSIS
With principal component analysis we tried to explain as much variance of the original data as possible by forming new, synthetic variables. In factor analysis we try to find some dimensions, traits, that can not be measured directly, but affect certain variables that can be measured. For example, measuring intelligence. We can not measure intelligence, but we can measure certain capabilities of an individual (mathematical, logical...) that are affected by intelligence.
FACTOR
ANALYSIS WITH
SPSS
DIFFERENCES FROM PRINCIPAL
COMPONENTS ANALYSIS
Although the logic of both is different, both principal components and factor analysis are supported in the same SPSS function. In factor analysis the following methods of extraction are used: 1. Principal factors this method differs from principal components only in logic and explanation. Initial solution is always based on this method Methods creates factors, that are uncorrelated (between themselves) linear combinations of initial variables. 2. Principal axes Method creates factors from the modified correlation matrix, which has diagonal values less than 0. This is an iteration method; in the first step the diagonal values are communalities of the initial (principal factors) solution. In the following steps, communities from previous steps are used until the solution converges. 3. alpha factoring method assumes, that we deal with a sample and tests for significances. this is actually the first step of principal axes method; modified correlation matrix with multiple determination coefficients on the diagonal is used. 5. ordinary least squares 4. image factoring
Marko Pahor
Seminar 6: PCA and Factor Analysis -
January 19 2011
minimizes the differences between the actual and estimated correlation matrix, not taking account of the diagonal values
6. generalized least squares minimizes the differences between the actual and estimated correlation matrix, not taking account of the diagonal values; variables are weighted by the inverse value of their uniqueness Most commonly used is the method of principal axes. Principal factors is less appropriate, because it doesnt take account of the existence of specific factors, that influence variables, existence of which if shown by communalities less than 1. It is only used when other methods dont converge. Rotation is used in order do improve the solution, to get a more clear picture. We know orthogonal and oblique (non-orthogonal) rotations. Rotations in SPSS: 1. Varimax orthogonal rotation, that minimizes the number of variables that have high loadins on each factor; it simplifies the interpretation of factors 2. Quartimax orthogonal rotation; that minimizes the number of factors needed to explain each variable; it simplifies the interpretation of the observed variables 3. Equamax orthogonal rotation, combination of varimax and quartimax. oblique rotation; non-orthogonal rotations are used, when orthogonal rotation dont give an interpretable solution. Delta determines the obliqueness, 0 meaning the most oblique rotation 5. Promax oblique rotation 4. Oblimin
Marko Pahor
Seminar 6: PCA and Factor Analysis Difference between pattern and structure loadings
January 19 2011
structure loadings are correlation coefficients between variable and factor pattern loadings are regression coefficients between variable and factor product of pattern loadings for two variables gives correlation between this two variables structure loadings are commonly explained
Marko Pahor
10
January 19 2011
EXAMPLE Factor Analysis

This example is done on the personality questions in the database. We do the factor analysis following the same steps as with principal factor analysis.
FACTOR /VARIABLES Q17.1 Q17.2 Q17.3 Q17.4 Q17.5 Q17.6 Q17.7 Q17.8 Q17.9 Q17.10 Q17.11 Q17.12 Q17.13 Q17.14 Q17.15 Q17.16 Q17.17 Q17.18 Q17.19 Q17.20 /MISSING LISTWISE /ANALYSIS Q17.1 Q17.2 Q17.3 Q17.4 Q17.5 Q17.6 Q17.7 Q17.8 Q17.9 Q17.10 Q17.11 Q17.12 Q17.13 Q17.14 Q17.15 Q17.16 Q17.17 Q17.18 Q17.19 Q17.20 /PRINT UNIVARIATE INITIAL CORRELATION KMO EXTRACTION ROTATION /PLOT EIGEN /CRITERIA MINEIGEN(1) ITERATE(25) /EXTRACTION PAF /CRITERIA ITERATE(25) /ROTATION VARIMAX /METHOD=CORRELATION .
Marko Pahor
11
January 19 2011
Correlation matrix
Marko Pahor
12
January 19 2011
Marko Pahor
13
January 19 2011
Marko Pahor
14
January 19 2011
Marko Pahor
15
January 19 2011
Marko Pahor
16
January 19 2011
ADEQUACY OF DATA
From the correlation matrix we could see that most correlations are not high, but some are and many more are statistically significant.
Bartlett test shows significant differences and KMO measure at 0.738 shows that the data is appropriate for this type of analysis.
STANDARDIZED OR ORIGINAL DATA?

As all questions are measured on the same scale, one could use covariance matrix (nonstandardized data) for the analysis. However, use of standardized data is still correct. Because of a simpler output and because its much more common in practice, correlation matrix is usually used in the example.
NUMBER OF FACTORS
Based on the scree plot one would use four factors, although the Kaiser rule suggests to use five factors.
INTERPRETATION OF FACTORS
Factors are interpreted based on structure loadings. We can interpret the non-rotated solution or use one of the rotations. In the example, we used varimax rotation. We have four factors that can be interpreted as follows:
optimism and self-esteem 17
Marko Pahor
Seminar 6: PCA and Factor Analysis sociability desperation and indecisiveness artism
January 19 2011
When orthogonal rotation doesnt give a sensible interpretation we use oblique rotation.
Marko Pahor
18
January 19 2011
Marko Pahor
19
January 19 2011
Marko Pahor
20
January 19 2011
In our case there arent many differences between orthogonal and oblique rotation. Factor correlation matrix shows the obliqueness higher the correlations, more oblique the rotation.
Marko Pahor
21

Imb 2011 Seminar6 Pca Fa

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Imb 2011 Seminar6 Pca Fa

Загружено:

Авторское право:

Доступные форматы

Seminar 6: PCA and Factor Analysis

PRINCIPAL COMPONENT ANALYSIS

PRINCIPAL COMPONENT ANALYSIS ANALYSIS

COMPUTATION OF THE PARAMETERS OF PRINCIPAL COMPONENTS ANALYSIS

Seminar 6: PCA and Factor Analysis

Figure 1: Dialog window Factor Analysis

Seminar 6: PCA and Factor Analysis

Figure 2: Dialog window Factor Analysis: Extraction

Seminar 6: PCA and Factor Analysis

DESCRIPTIVE STATISTICS AND CORRELATION MATRICES

Figure 3: Dialog window Factor Analysis: Descriptives

Seminar 6: PCA and Factor Analysis

Seminar 6: PCA and Factor Analysis

Initial total_liters value_sum transactions share_olive_oil 1.000 1.000 1.000 1.000

Extraction .863 .894 .881 .157

Extraction Method: Principal Component Analysis.

Extraction Method: Principal Component Analysis.

Seminar 6: PCA and Factor Analysis

Seminar 6: PCA and Factor Analysis

DIFFERENCES FROM PRINCIPAL

Seminar 6: PCA and Factor Analysis -

Seminar 6: PCA and Factor Analysis

EXAMPLE Factor Analysis

Seminar 6: PCA and Factor Analysis

Seminar 6: PCA and Factor Analysis

Seminar 6: PCA and Factor Analysis

Seminar 6: PCA and Factor Analysis

Seminar 6: PCA and Factor Analysis

Seminar 6: PCA and Factor Analysis

STANDARDIZED OR ORIGINAL DATA?

optimism and self-esteem 17

Seminar 6: PCA and Factor Analysis

Seminar 6: PCA and Factor Analysis

Seminar 6: PCA and Factor Analysis

Вам также может понравиться