Chapter 3 Factor Analysis New

Exploratory Factor Analysis
1. Glossary
Anti-image matrix correlation Matrix of partial correlations among variables after factor analysis, representing the degree to which the factors explain each other in the results. The diagonal contains the measures of sampling inadequacy for each variable, and off-diagonal values are partial correlations among variables. Bartlett test of sphericity Statistical test for overall significance of all correlations within a correlation matrix. Cluster Analysis Multivariate technique with the objective of grouping respondents or cases with similar profiles on a defined set of characteristics. Similar to Q factor analysis. Common Factor Analysis Factor model in which the factors are based on a reduced correlation matrix. That is, communalities are inserted in the diagonal of the correlation matrix, and the extracted factors are based only on the common variance, with specific and error variance excluded. Common Variance Variance shared with other variables in factor analysis. Communality Total amount of variance an original variable shares with all other variables included in the analysis. Component Analysis Factor models in which the factors are based on the total variance. With component analysis, unities are used in the diagonal of the correlation matrix; this procedure computationally implies that all the variance is common or shared. Composite Measure Conceptual Definition Specification of the theoretical basis for a concept that is represented by a factor. Content Validity Assessment of the degree of correspondence between the items selected to constitute a summated scale and its conceptual definition. Correlation Matrix Table showing the intercorrelations among all variables. Cronbachs alpha Measure of reliability that ranges from 0 to 1, with values of 0.60 to 0.70 deemed the lower limit of acceptability. Cross-loading A variable has two more factor loadings exceeding the threshold value deemed necessary for inclusion in the factor interpretation process. Dummy variable Binary metric variable used to represent a single category of a nonmetric variable. Eigenvalue Column sum of squared loadings for a factor; also referred to as the latent root. It represents the amount of variance accounted for by a factor. EQUIMAX One of the orthogonal factor rotation methods that is compromise between the VARIMAX and QUARTIMAX approaches, but is not widely used. Error Variance Variance of a variable due to errors in data collection or measurement. Face validity See content validity. Factor Linear combination (variate) of the original variables. Factors also represent the underlying dimensions (constructs) that summarize or account for the original set of observed variables. Factor indeterminacy Characteristic of common factor analysis such that several different factor scores can be calculated for a respondent, each fitting the estimated factor Page | 1
model. It means the factor scores are not unique for each individual. Correlation between the original variables and the factors, and the key to understanding the nature of a particular factor. Squared factor loadings indicate what percentage of the variance in an original variable is explained by a factor. Factor matrix Table displaying the factor loadings of all variables on each factor. Factor pattern matrix One of the two factor matrices found in an oblique rotation that is most Page | 2 comparable to the factor matric in a orthogonal rotation. Factor rotation Process of manipulation or adjusting the factor axes to achieve a simpler and pragmatically more meaningful factor solution. Factor score Composite measure created for each observation on each factor extracted in the factor analysis. The factor weights are used in conjunction with the original variable values to calculate each observations score. The factor score then can be used to represent the factor(s) in subsequent analyses. Factor scores are standardized to have a mean of 0 and a standard deviation of 1. Factor structure matrix A factor matrix found in an oblique rotation that represents the simple correlations between variables and factors, incorporating the unique variance and the correlations between factors. Most researchers prefer to use the factor pattern matrix when interpreting an oblique solution. Indicator Single variable used in conjunction with one or more other variables to form a composite measure. Latent root See Eigenvalues Measure of sampling Measure calculated both for the entire correlation matrix and each adequacy (MSA) individual variable evaluating the appropriateness of applying factor analyses. Values above 0.50 for either the entire matrix or an individual variable indicate appropriateness. Measurement error Inaccuracies in measuring the true variable values due to fallibility of the measurement instrument (i.e. inappropriateness response scales), data entry errors, or respondent errors. Multicollinearity Extent to which a variable can be explained by the other variables in the analysis. Oblique factor rotation Factor rotation computed so that the extracted factors are correlated. Rather than arbitrarily constraining the factor rotation to an orthogonal solution, the oblique rotation identifies the extent to which each of the factors is correlated. Orthogonal Mathematical independence (no correlation) of factor axes to each other (i.e. at right angles, or 90 degrees). Orthogonal factor rotation Factor rotation in which the factors are extracted so that their axes are maintained at 90 degrees. Each factor is independent of, or orthogonal to, all other factors. The correlation between the factors is determined to be 0. Q factor analysis Forms groups of respondents or cases based on their similarity on a set of characteristics. QUARTIMAX A type of orthogonal factor rotation method focusing on simplifying the columns of a factor matrix. Generally considered less effective than the VARIMAX rotation. R factor analysis Analyzes relationships among variables to identify groups of variables forming latent dimensions (factors). Reliability Extent to which a variable or set of variables is consistent in what it is intended to measure. If multiple measurements are taken, reliable measures will all be consistent in their values. It differs from validity in that it does not relate to what should be measured, but instead to how it is measured. Factor loadings
Reverse scoring
Specific variance Summated scales
Surrogate variable Trace Unique variance Validity
Variate VARIMAX
Process of reversing the scores of a variable, while retaining the distributional characteristics, to change the relationships (correlations) between two variables. Used in summated scale construction to avoid a canceling out between variables with positive and negative factor loading on the same factor. Variance of each variable unique to that variable and not explained or Page | 3 associated with other variables in the factor analysis. Method of combining several variables that measure the same concept into a single variable in an attempt to increase the reliability of the measurement. In most instances, the separate variables are summed and then their total or average score is used in the analysis. Selection of a single variable with the highest factor loading to represent a factor in the data reduction stage instead of using a summated scale or factor score. Represents the total amount of variance on which the factor solution is based. The trace is equal to the number of variables, based on the assumption that the variance in each variable is qual to 1. See specific variance Extent to which a measure or set of measures correctly represents the concept of study the degree to which it is free from any systematic or nonrandom error. Validity is concerned with how well the concept is defined by the measure(s), whereas reliability relates to the consistency of the measure(s). Linear combination of variables formed by deriving empirical weights applied to a set of variables specified by the researcher. The most popular orthogonal factor rotation methods focusing of simplifying the columns in a factor matrix. Generally considered superior to other orthogonal factor rotation methods in achieving a simplified factor structure.
2. What is Factor Analysis?

1. Factor analysis is an interdependence technique whose primary purpose is to define the underlying structure among the variables in the analysis. 2. Variables are the building blocks of relationships. 3. As multivariate techniques are employed, the number of variables increases to tens, hundreds or even thousands. 4. When the number of variables is low, all variables can be unique, distinct, and different. 5. But, as the number of variables rises, we see that the overlap (correlation) between variables increases significantly. 6. In some instances, the researcher may even want ways to group the variables together into new composite measures that represent each of these groups. 7. Factor analysis provides the tools for analyzing the structure of the interrelationships (correlations) among a large number of variables by defining sets of variables that are highly correlated, known as factors. 8. These groups are (by definition) highly correlated and are assumed to represent dimensions within the data. 9. Two options:
a. If we are trying to reduce the number of variables, then the dimensions can guide in creating new composite measures. b. However, if we have a conceptual basis for understanding the relationships between variables, then the dimensions may actually have meaning for what they collectively represent. These dimensions may correspond to concepts that cannot be adequately described by a single measure (a persons health is defined by various indicators that are Page | 4 measured separately but are all interrelated). 10. Factor analysis presents several ways of representing these groups of variables for use in other multivariate techniques. 11. Two approaches: a. Exploratory approach: useful in searching for structure among a set of variables or as a data reduction method. In this approach, factor analysis takes into consideration no a priori constraints on the estimation of components or the number of components to be extracted. b. Confirmatory approach: here the researcher has preconceived notions and thoughts on the actual structure of the data, based on theoretical support or prior research. For example, the researcher may wish to test hypotheses involving issues such as which variables should be grouped together on a factor or the precise number of factors. In this approach, the researcher requires that factor analysis take a confirmatory approach that is, assess the degree to which the data meet the expected structure.
3. A Hypothetical Example of Factor Analysis

1. A retail firm intends to identify how consumers patronage of its various stores and their services is determined. 2. The retailer wants to understand how consumers make decisions but feels that it cannot evaluate 80 separate characteristics or develop action plans for this many variables, because they are too specific. 3. Instead, it would like to know if consumers think in more general evaluative dimensions rather than in just the specific items. 4. For example, consumers may consider salespersons to be a more general evaluative dimension that is composed of many more specific characteristics, such as knowledge, courtesy, likeability, sensitivity, friendliness, helpfulness, and so on. 5. To identify these broader dimensions, the retailer could commission a survey asking for consumer evaluations on each of the 80 specific items. 6. Factor analysis would then be used to identify the broader underlying evaluative dimensions. 7. Specific items that correlate highly are assumed to be a member of that broader dimension. 8. These dimensions become composites of specific variables, which in turn allow the dimensions to be interpreted and described. 9. In our example, the factor analysis might identify such dimensions as product assortment, product quality, prices, store personnel, service, and store atmosphere as the broader evaluative dimensions used by the respondents. 10. Each of these dimensions contains specific items that are a facet of the broader evaluative dimension. 11. From these findings, the retailer may then use the dimensions (factors) to define broad areas for planning and action.
12. An illustrative example of a simple application of factor analysis if shown in figure 1, which represents the correlation matrix for nine store image elements.
Page | 5
Figure 1: Original Correlation Matrix
13. Included in this set are measures of the product offering, store personnel, price levels, and instore service and experiences. 14. Question: are all of these elements separate in their evaluative properties or do they group into some more general areas of evaluation? 15. For example, do all of the product elements group together? 16. Where does price level fit, or is it separate? 17. How do the in-store features like store personnel, service and atmosphere relate to each other? 18. Visual inspection of the original correlation matrix (figure 1) does not necessarily reveal any specific pattern. 19. Among scattered high correlations, variable groupings are not apparent. 20. The application of factor analysis results in the groupings of the variables as reflected in figure 2.
Figure 2: Correlation Matrix of Variables after Grouping According to Factor Analysis
21. Here some interesting patterns emerge: a. First, four variables all relating to the in-store experience of shoppers are grouped together
22. 23. 24. 25. 26.
b. Second, three variables describing the product assortment and availability are grouped together. c. Lastly, product quality and price levels are grouped. Each group represents a set of highly interrelated variables that may reflect a more general evaluative dimension. In this case, we might label the three variable groupings by the labels: in-store experience, Page | 6 product offering, and value. This simple example of factor analysis demonstrates its basic objective of grouping highly intercorrelated variables into distinct sets (factors). In many situations, these factors can provide a wealth of information about the interrelationships of the variables. In this example, factor analysis identified for store management a smaller set of concepts to consider in any strategic or tactical marketing plans, while still providing insight into what constitutes each general area (i.e., the individual variables defining each factor).
4. Factor Analysis Decision Process

1. There are seven stages in factor analysis: a. Objectives of factor analysis b. Designing a factor analysis c. Assumptions in factor analysis d. Deriving factors and assessing overall fit e. Interpreting the factors f. Validation of factor analysis g. Additional uses of factor analysis results 2. Each stage merits a detail discussion.
4.1. Stage 1: Objectives of Factor Analysis

1. Starting point in factor analysis is the research problem. 2. The general purpose of factor analysis is to find a way to condense (or summarize) the information contained in a number of original variables into a smaller set of new, composite dimensions or variates (factors) with a minimum loss of information. 3. There are four key issues here: a. Specifying the unit of analysis b. Achieving data summarization and/or data reduction c. Variable selection d. Using factor analysis results with other multivariate techniques. 4.1.1. Specifying the unit of analysis 1. Up to this time, we defined factor analysis solely in terms of identifying structure among a set of variables. 2. Factor analysis is actually a more general model in that it can identify the structure of relationships among either variables or respondents by examining either the correlation between the variables or the correlations between the respondents.
3. If the objective of the research were to summarize the characteristics, factor analysis would be applied to a correlation matrix of the variables (a.k.a. R factor analysis). This technique analyzes a set of variables to identify the dimensions that are latent (not easily observed). 4. Factor analysis also may be applied to a correlation matrix of the individual respondents based on their characteristics (a.k.a. Q factor analysis). This method combines large number of people into distinctly different groups within large population. This method is not used because of Page | 7 computational difficulties. 4.1.2. Achieving Data Summarization versus Data Reduction 1. Factor analysis provides the researcher with two distinct, but interrelated outcomes: a. Data summarization: FA derives underlying dimensions that describe the data in a much smaller number of concepts than the original individual variables. b. Data reduction: The concept of data summarization is extended by deriving an empirical value (factor score) for each dimension (factor) and then substituting this value for the original values. 2. Data Summarization: a. The fundamental concept involved in data summarization is the definition of structure. b. The structure will be used to view the set of variables at various levels of generalization, ranging from the most detailed level (individual variables themselves) to the more generalized level, where individual variables are grouped and then viewed not for what they represent individually, but for what they represent collectively in expressing a concept. c. For example, variables at the individual level might be: I shop for specials, I usually look for the lowest prices, I shop for bargains, National brands are worth more than store brands. d. Collectively, these variables can be used to classify people as either price conscious or bargain hunters. e. Remember, that FA is an interdependence technique where all variables are simultaneously considered with no distinction as to dependent or independent variables. f. FA still employs the concept of the variate, the linear composite of variables, but in FA, the variates (factors) are formed to maximize their explanation of the entire variable set, not to predict a dependent variable(s). g. The goal of data summarization is achieved by defining a small number of factors that adequately represents the original set of variables. 3. Data Reduction: a. FA can also be used to achieve data reductions by i. Identifying representative variables from a much larger set of variables for use in subsequent multivariate analyses ii. Creating an entirely new set of variables, much smaller in number, to partially or completely replace the original set of variables. b. In both instances the purposes is to reduce the number of variables without losing the nature and character of individual variables. c. The identification of the underlying dimensions or factors is an end result of data summarization. d. Estimates of factors and the contributions of each variable to the factors (termed loadings) are all that is required for the analysis.
e. Data reduction relies on the factor loadings as well, but uses them as the basis for either identifying variables for subsequent analysis with other techniques or making estimates of the factors themselves (factor scores or summated scales), which then replace the original variables in subsequent analyses. f. The method of calculating and interpreting factor loadings is discussed later. 4.1.3. Variable Selection 1. In both uses of factor analysis, the researcher implicitly specifies the potential dimensions that can be identifies through the character and nature of the variables submitted to factor analysis. For example, in assessing the dimensions of store image, if no question on store personnel were included, factor analysis would not be able to identify this dimension. 2. The researcher also must remember that FA will always produce factors. This factor analysis is always suspect to GIGO. If the researcher includes a large number of variables hoping that FA will figure it out, then the possibility of poor results is high. 3. The quality and meaning of the derived factors reflect the conceptual underpinnings of the variables included in the analysis. 4. Even when used solely for data reduction, FA is most efficient when conceptually defined dimensions can be represented by the derived factors. Page | 8
4.2. Stage 2: Designing a Factor Analysis

1. Three basic decisions: a. Calculation of the input data (a correlation matrix) to meet the specified objectives of grouping variables or respondents b. Design of the study in terms of number of variables, measurement properties of variables, and the types of allowable variables c. The sample size necessary, both in absolute terms and as a function of the number of variables in the analysis 4.2.1. Correlations among Variables or Respondents 1. Remember that there are two forms of FA: R-type and the Q-type. 2. Both types utilize a correlation matrix as the basic data input. 3. With R-type factor analysis, the researcher would use a traditional correlation matrix (correlations among variables) as input. 4. But the researcher could also elect to derive the correlation matrix from the correlation between individual respondents. 5. In this Q-type FA, the results would be a factor matrix that would identify similar individuals. 4.2.2. Variable Selection and Measurement Issues 1. Two specific questions must be answered at this point: a. What type of variables can be used in factor analysis? b. How many variables should be included? 2. In terms of types of variables included, the primary requirement is that a correlation value can be calculated among all variables. 3. Metric variables are easily measured by several types of correlations. 4. Nonmetric variables are more problematic because they cannot use the same type of correlation measures used by metric variables.
5. Although some specialized methods calculate correlations among nonmetric variables, the most prudent approach is to avoid nonmetric variables. 6. If a nonmetric variable must be included, one approach is to define dummy variables to represent categories of nonmetric variables. 7. If all variables are dummy variables, then specialized forms of FA, such as Boolean factor Page | 9 analysis are more appropriate. 8. The researcher should also attempt to minimize the number of variables included but still maintain a reasonable number of variables per factor. 9. If a study is being designed to assess a proposed structure, the researcher should be sure to include several variables (five or more) that may represent each proposed factor. 10. The strength of FA lies in finding patterns among group of variables, and it is of little use in identifying factors composed of only a single variable. 11. Finally, when designing a study to be factor analyzed, the researcher should, if possible, identify several key variables that closely reflect the hypothesized underlying factors. 12. This identification will aid in validating the derived factors and assessing whether the results have practical significance. 4.2.3. Sample Size 1. Regarding the sample size question, the researcher generally would not factor analyze a sample of fewer than 50 observations, and preferably the sample size should be 100 or larger. 2. As a general rule, the minimum is to have at least five times as many observations as the number of variables to be analyzes and the more acceptable sample size would have a 10:1 ratio. 3. Some researchers even propose a minimum of 20 cases for each variable. 4. One must remember, however, that 30 variables, for example, require computing 435 correlations in factor analysis. 5. At a 0.05 significance level, perhaps even 20 of those correlations would be deemed significant and appear in the factor analysis just by chance. 6. The researcher should always try to obtain the highest cases-per-variable ratio to minimize the chances of overfitting the data i.e. deriving factors that are sample-specific with little generalizability. 7. In order to do so, the researcher may employ the most parsimonious set of variables, guided by conceptual and practical considerations, and then obtain an adequate sample size for the number of variables examined. 8. When dealing with smaller sample sizes and/or a lower cases-to-variable ratio, the researcher should always interpret any findings cautiously. 9. The issue of sample size will also be addressed in a later section on interpreting factor loadings.
4.3. Stage 3: Assumptions in Factor Analysis

1. The critical assumptions underlying factor analysis are more conceptual than statistical. 2. The researcher is always concerned with meeting the statistical requirement for any multivariate technique, but in FA the overriding concerns center as much on the character and composition of variables included in the analysis as on their statistical qualities. 4.3.1. Conceptual Issues 1. The conceptual assumptions underlying factor analysis relate to the set of variables selected and the sample chosen.
2. A basic assumption of FA is that some underlying structure does exist in the set of selected variables. 3. The presence of correlated variables and the subsequent definition of factors do not guarantee relevance, even if they meet the statistical requirements. 4. It is the responsibility of the researcher to ensure that the observed patterns are conceptually valid and appropriate to study with factor analysis, because the technique has no means of Page | 10 determining appropriateness other than the correlations among variables. 5. For example, mixing dependent and independent variables in a single factor analysis and then using the derived factors to support dependence relationships is inappropriate. 6. The researcher must also ensure that the sample is homogenous with respect to the underlying factor structure. 7. It is inappropriate to apply FA to a sample of males and females for a set of items known to differ because of gender. 8. When the two subsamples (male and female) are combined, the resulting correlations and factor structure will be a poor representation of the unique structure of each group. 9. Thus, whenever differing groups are expected in the sample, separate factor analyses should be performed, and the result should be compared to identify differences not reflected in the results of the combined sample. 4.3.2. Statistical Issues 1. Some degree of Multicollinearity is desirable, because the objective is to identify interrelated sets of variables. 2. Assuming the researcher has met the conceptual requirements for the variables included in the analysis, the next step is to ensure that the variables are sufficiently intercorrelated to produce representative factors. 3. As we will see, we can assess this degree of interrelatedness from both overall and individual variable perspectives. 4. The following are several empirical measures to aid in diagnosing the factorability of the correlation matrix. 4.3.2.1. Overall Measures of Intercorrelation 1. The researcher must ensure that the data matrix has sufficient correlations to justify the application of factor analysis. 2. If it is found that all of the correlations are low, or that all of the correlations are equal (denoting that no structure exists to groups variables), then the researcher should question the application of factor analysis. 3. To this end, several techniques are available: a. Partial Correlation b. Bartletts test of sphericity c. Measure of sampling adequacy (MSA) 4.3.2.1.1. Partial Correlation 1. If visual inspection reveals no substantial number of correlations greater than 0.30, the FA is probably inappropriate. 2. The correlations among variables can also be analyzed by computing the partial correlations among variables.
3. A partial correlation is the correlation that is unexplained when the effects of other variables are taken into account. 4. If true factors exist in the data, the partial correlation should be small, because the variable can be explained by the variables loading on the factors. 5. If the partial correlations are high, indicating no underlying factors, then FA is not appropriate. 6. The researcher is looking for a pattern of high partial correlations, denoting a variable not Page | 11 correlated with a large number of other variables in the analysis. 7. The one exception occurs when two variables are highly correlated and have substantially higher loadings than other variables on that factor. Then, their partial correlation may be high because they are not explained to any great extent by the other variables, but do explain each other. This exception is also to be expected when a factor has only two variables loading highly. 8. A rule of the thumb would be to consider partial correlations above 0.7 to be high. 9. SPSS provides the anti-image correlation matrix, which is just the negative value of the partial correlation, whereas BMDP directly provides the partial correlations. 10. In each case, high partial or anti-image correlations are indicative of a data matrix perhaps not suited to FA. 4.3.2.1.2. Bartlett Test of Sphericity 1. Another method analyzes the entire correlation matrix. 2. The Bartlett test is one such measure. 3. It provides the statistical significance that the correlation matrix has significant correlations among at least some of the variables. 4. A statistically significant Bartletts test has a significance of less than 0.05 and indicates that sufficient correlations exist among the variables to proceed. 5. The researcher should not that increasing the sample size causes Bartlett test to become more sensitive in detecting correlations among the variables. 4.3.2.1.3. Measure of Sampling Adequacy (MSA) 1. This index ranges from 0 to 1. 2. It reaches 1 when each variable is perfectly predicted without error by the other variables. 3. The measure can be interpreted with the following guidelines: a. 0.8 or above: meritorious b. 0.7 or above: middling c. 0.6 or above: mediocre d. 0.5 or above: miserable e. Below 0.5: unacceptable 4. MSA increases as: a. Sample size increases b. Average correlations increase c. Number of variables increase d. Number of factors decreases 5. If the MSA value falls below 0.5, then the variable-specific MSA values can identify variables for deletion to achieve an overall value of 0.5 4.3.2.2. Variable-Specific Measures of Intercorrelation 1. We can extend the MSA to evaluate individual variables.
2. Variables with values less than 0.5 should be omitted from the factor analysis one at a time, with the smallest one being omitted each time 3. After deleting every culprit variable, recalculate the value of MSA. 4. Keep deleting until the value of 0.5 is not reached.
4.4. Stage 4: Deriving Factors and Assessing Overall Fit

1. Once the variables are specified and the correlation matrix is prepared, the researcher is ready to apply factor analysis to identify the underlying structure of relationships. 2. In doing so, decisions must be made concerning: a. The method of extracting the factors (common factors vs. component analysis) b. The number of factors selected to represent the underlying structure in the data. 4.4.1. Selecting the Factor Extraction Method 1. The researcher can choose from two similar, yet unique, methods for defining (extracting) the factors to represent the structure of the variables in the analysis. 2. This decision on the method to use must combine the objectives of factor analysis with knowledge about some basic characteristics of the relationships between variables. 3. Before discussing the two methods, a brief introduction to partitioning a variables variance is presented. 4.4.1.1. Partitioning the Variance of a Variable 1. The knowledge on how to partition or divide the variance of a variable is necessary to proceed to selection of the factor extraction technique. 2. Variance is a value (square of standard deviation) that represents the total amount of dispersion of values for a single variable about its mean. 3. When a variable is correlated with another variable, we many times say it shares variance with other variable, and the amount of sharing between the two is simply the squared correlation. 4. For example, if two variables have a correlation of 0.5, each variable shares 2.5 percent (0.5 2) of its variance with the other variable. 5. In factor analysis, we group variables by their correlations, such that variables in a group (factor) have high correlations with each other. 6. Thus, for the purposes of FA, it is important to understand how much of a variable is shared with other variables in that factor versus what cannot be shared (e.g. unexplained). 7. The total variance of any variable can be divided (partitioned) into three types of variances: a. Common: variance that is shared with all other variables in the analysis. This variance is accounted for (shared) based on a variables correlations with all other variables in the analysis. A variables communality is the estimate of its shared, or common, variance among the variables as represented by the derived factors. b. Specific: variance associated with only a specific variable. This variance cannot be explained by the correlations to the other variables but is still associated uniquely with a single variable. c. Error: variance that cannot be explained by correlation with other variables but it is due to unreliability in the data-gathering process, measurement error, or a random component in the measured phenomenon. 8. Thus total variance of any variable is composed of its common, unique, and error variances.
Page | 12
9. As a variable is more highly correlated with one or more variables, the common variance (communality) increases. 10. However, if unreliable measures or other sources of extraneous error variance are introduced, then the amount of possible common variance and the ability to relate the variable to any other variable are reduced. 4.1.1.2. Common Factor Analysis versus Component Analysis 1. With a basic understanding of how variance can be portioned, the researcher is ready to address the differences between the two methods. 2. The selection of one method over the other is based on two criteria: a. The objective of the factor analysis. b. The amount of prior knowledge about the variance in the variables. 3. Component analysis is used when the objective is to summarize most of the original information (variance) in a minimum number of factors for prediction purposes. 4. In contrast, common factor analysis is used primarily to identify underlying factors or dimensions that reflect what the variables share in common. 5. The most direct comparison between the two methods is by their use of explained versus unexplained variance. 6. Component Analysis a. Also known as principal component analysis b. It considers the total variance and derives factors that contain small proportions of unique variance and, in some instances, error variance. c. However, the first few factors do not contain enough unique or error variance to distort the overall factor structure. d. Specifically, with component analysis, unities are inserted in the diagonal of the correlation matrix, so that the full variance is brought into the factor matrix. 7. Common Factor Analysis a. It considers only common or shared variance, assuming that both the unique and error variance are not of interest in defining the structure of the variables. b. To employ only common variance in the estimation of the factors. Communalities (instead of unities) are inserted in the diagonal. c. Thus, factors resulting from common factor analysis are based only on the common variance. d. Common factor analysis excludes a portion of the variance included in a Component Analysis. 8. How is the researcher to choose between the two methods? a. First, the common factor and component analysis are both widely used. b. As a practical matter, components model is the typical default method of most statistical programs when performing factor analysis. c. Beyond the program defaults, distinct instances indicate which of the two methods is most appropriate. 9. Component factor analysis is most appropriate when: a. Data reduction is a primary concern, focusing on the minimum number of factors needed to account for the maximum portion of the total variance represented in the original set of variables. Page | 13
10.
11. 12.
13. 14. 15. 16.
b. Prior knowledge suggests that specific and error variance represent a relatively small proportion of the total variance. Common factor analysis is the most appropriate when: a. The primary objective is to identify the latent dimensions or constructs represented in the original variables. b. The researcher has little knowledge about the amount of specific and error variance and Page | 14 therefore wishes to eliminate this variance. Common factor analysis, with its more restrictive assumptions and use of only the latent dimensions (shared variance), is often viewed as more theoretically based. Although theoretically sound, however, common factor analysis has several problems. a. First, common factor analysis suffers from factor indeterminacy which means that for any individual respondent, several different factor scores can be calculated from a single factor model result, thus, no single solution is found. b. The second issue involves the calculation of the estimated communalities used to represent the shared variance. Sometimes the communalities are not estimable or may be invalid (e.g. values greater than 1 or less than 0) requiring the deletion of the variable from the analysis. The complications of common factor analysis have contributed to the widespread use of component analysis. Proponents for the common factor models argue otherwise. Although considerable debate remains over which factor model is the more appropriate, empirical research demonstrate similar results in many instances. In most applications, both methods arrive at essentially identical results it the number of variables exceed 30 or the communalities exceed 0.6 for most variables.
4.4.2. Criteria for the Number of Factors to Extract 1. How do we decide on the number of factors to extract? 2. Both methods are interested in the best linear combination of variables best in the sense that the particular combination of original variables accounts for more of the variance in the data as a whole than any other liner combination of variables. 3. Therefore, the first factor may be viewed as the single best summary of linear relationships exhibited in the data. 4. The second factor is defined as the second-best linear combination of the variables, subject to the constraint that it is orthogonal to the first factor. 5. To be orthogonal to the first factor, the second factor must be derived from the variance remaining after the first factor has been extracted. 6. Thus, second factor may be defined as the linear combination of variables that accounts for the most variance that is still unexplained after the effect of the first factor has been removed from the data. 7. The process continues extracting factors accounting for smaller and smaller amounts of variances until all the variance is explained. 8. For example, the components methods actually extracts n factors, where n is the number of variables in the analysis. 9. Thus, if 30 variables are in the analysis, 30 factors are extracted. 10. So, what is gained by factor analysis?
11. Although our example contains 30 factors, a few of the first can represent a substantial portion of the total variance across all the variables. 12. Hopefully, the researcher can retain or use only a small number of the variables and still adequately represent the entire set of variables. 13. That the key question is: how many factors to extract or retain? 14. In deciding when to stop factoring, the researcher must combine a conceptual foundation (how Page | 15 many factors should be in the structure?) with some empirical evidence (how many factors can be reasonably supported?). 15. The researcher generally begins with some predetermined criteria, such as the general number of factors plus some general thresholds of practical relevance (e.g. required percentage of variance explained). 16. These criteria are combined with empirical measures of the factor structure. 17. An exact quantitative basis for deciding the number of factors to extract has not been developed. 18. However, the following stopping criteria for the number of factors to extract are currently being utilized. 4.4.2.1. Latent Root Criterion 1. Most commonly used technique 2. Simple to apply to either components analysis or common factor analysis. 3. Rationale: any individual factor should account for the variance of at least a single variable if it is to be retained for interpretation. 4. With component analysis each variable contributes a value of 1 to the total eigenvalue. 5. Thus, only the factors having latent roots or eigenvalues greater than 1 are considered significant. 6. All factors with latent roots less than 1 are considered insignificant and are disregarded. 7. Using the eigenvalue for establishing a cutoff is most reliable when the number of variables is between 20 and 50. 8. If the number of variables is less than 20, the tendency is for this method to extract a conservative number of factors (too few). 9. If the number of variables is greater than 50, it is not uncommon for too many factors to be extracted. 4.4.2.2. A Priori Criterion 1. The a priori criterion is a simple yet reasonable criterion under certain circumstances. 2. When applying it, the researcher already knows how many factors to extract before undertaking the factor analysis. 3. The researcher simply instructs the computer to stop analysis when the desired number of factors has been extracted. 4. This approach is useful when testing a theory or hypothesis about the number of factors to be extracted. 5. It is also can be justified in attempting to replicate another researchers work and extract the same number of factors that was previously found. 4.4.2.3. Percentage of Variance Criterion 1. The percentage of variance criterion is an approach based on achieving a specified cumulative percentage of total variance extracted by successive factors.
2. The purpose is to ensure practical significance for the desired factors by ensuring that they explain at least a specified amount of variance. 3. No absolute threshold has been adopted for all applications. 4. However, in natural sciences the factoring procedure usually should not be stopped until the extracted factors account for at least 95 percent variance or until the last factor accounts for Page | 16 only a small portion (less than 5 percent). 5. In contrast, in the social sciences, where information is often less precise, it is not uncommon to consider a solution that accounts for 60 percent of the total variance (and in some instances even less) as satisfactory. 6. A variant of this criterion involves selecting enough factors to achieve a prespecified communality for each of the variables. 7. If theoretical or practical reasons require certain communality for each variable, then the researcher will include as many factors as necessary to adequately represent each of the original variables. 8. This approach differs from focusing on just the total amount of variance explained, which neglects the degree of explanation for the individual variables. 4.4.2.4. Scree Test Criterion 1. Recall that with component analysis factor model the later factors extracted contain both common and unique variance. 2. Although all factors contain at least some unique variance, the proportion of unique variance is subsequently higher in later factors. 3. The Scree test is used to identify the optimum number of factors that can be extracted before the amount of unique variance begins to dominate the common variance structure. 4. The Scree test is derived by plotting the latent roots against the number of factors in their order of extraction, and the shape of the resulting curve is used to evaluate the cutoff point.
Page | 17
Figure 3: The Scree Plot
5. The above figure plots the first 18 factors extracted in a study. 6. Starting with the first factor, the plot slopes steeply downward initially and then slowly becomes an approximately horizontal line. 7. The point at which the curve first begins to straighten is considered to indicate the maximum number of factors of extract. 8. In the present curve, the first 10 factors would qualify. 9. Beyond 10, too large a proportion of unique variance would be included; thus these factors would not be acceptable. 10. Note that in using the latent root criterion only 8 factors would have been considered. 4.4.2.5. Heterogeneity of the Respondents 1. Shared variance among variables is the basis for both common and component factor models. 2. An underlying assumption is that shared variance extends across the entire sample. 3. If sample is heterogeneous with regard to at least one subset of the variable then the first factor will represent those variables that are more homogenous across the entire sample. 4. Variables that are better discriminators between the subgroups of the sample will load on later factors, many times those not selected by the criteria discussed previously. 5. When the objective is to identify factors that discriminate among the subgroups of a sample, the researcher should extract additional factors beyond those indicated by the methods just discussed and examine the additional factors ability to discriminate among the groups. 6. If they prove less beneficial in discrimination, the solution can be run again and these later factors eliminated.
4.5. Stage 5: Interpreting the Factors

1. Although no unequivocal processes or guidelines determine the interpretation of factors, the researcher with a strong conceptual foundation for the anticipated structure and its rationale has the greatest chance of success. 2. We cannot state strongly enough the importance of a strong conceptual foundation, whether it Page | 18 comes from prior research, theoretical paradigms, or commonly accepted principles. 3. As we will see, the researcher must repeatedly make subjective judgments in such decisions as to the number of factors, what are sufficient relationships to warrant grouping variables and how can these groupings be identified. 4. As the experienced researcher can attest, almost anything can be uncovered if one tries long and hard enough (e.g. using different factor models, extracting different number of factors, using various forms of rotation). 5. It is therefore left up to the researcher to be the final arbitrator as to the form and appropriateness of a factor solution, and such decisions are best guided by conceptual rather than empirical bases. 6. To assist in the process of interpreting a factor structure and selecting a final factor solution, three fundamental processes are described. 7. Within each process, several substantive issues (factor rotation, factor-loading significance, and factor interpretation) are encountered. 8. Thus, after each process is briefly described, each of these processes will be discussed in more detail. 4.5.1. The Three Processes of Factor Implementation 1. Factor implementation is circular in nature. 2. The researcher first evaluates the initial results, then makes a number of judgments in viewing and refining these results, with the distinct possibility that the analysis is re-specified, requiring a return to the evaluative step. 3. Thus, the researcher should not be surprised to engage in several iterations until a final solution is achieved. 4. Estimate the factor matrix: a. First, the initial un-rotated factor matrix is calculated, containing the factor loadings for each variable on each factor. b. Factor loadings are the correlation of each variable and the factor. c. Loadings indicate the degree of correspondence between the variable and the factor, with higher loadings making the variable representative of the factor. d. Factor loadings are the means of interpreting the role each variable plays in defining each factor. 5. Factor rotation: a. Un-rotated factors achieve the objective of data reduction, but the researcher must ask whether the un-rotated factor solution (which fulfills desirable mathematical requirements) will provide information that the most adequate interpretation of the variables under examination. b. In most instances the answer to this question is no, because factor rotation should simplify the factor structure. c. Therefore, next employs a rotational method to achieve simpler and theoretically more meaningful factor solutions.
d. In most cases rotation of the factors improves the interpretation by reducing some of the ambiguities that often accompany initial un-rotated factor solutions. 6. Factor interpretation and re-specification: a. As a final process, the researcher evaluates the (rotated) factor loadings for each variable in order to determine that variables role and contribution in determining the factor Page | 19 structure. b. In the course of this evaluation process, the need may arise to re-specify the factor model owing to: i. The deletion of a variable(s) from the analysis ii. The desire to employ a different rotational method for interpretation iii. The need to extract a different number of factors iv. The desire to change from one extraction method to another c. Re-specification of a factor model involves returning to the extraction stage (stage 4), extracting factors, and then beginning the process of interpretation once again. 4.5.1.2. Rotation of Factors 1. Perhaps the most important tool in interpreting factors is factor rotation. 2. The term rotation means exactly what it implies. 3. Specifically, the reference axes of the factors are turned about the origin until some other position has been reached. 4. As indicated earlier, un-rotated factor solutions extract factors in the order of their variance extracted. 5. The first factor tends to be a general factor with almost every variable loading significantly, and it accounts for the largest amount of variance. 6. The second and subsequent factors are then based on the residual amount of variance. 7. Each accounts for successively smaller portions of variance. 8. The ultimate effect of rotating the factor matrix is to redistribute the variance from earlier factors to later ones to achieve a simpler, theoretically more meaningful factor pattern. 9. The simplest case of rotation is an orthogonal factor rotation, in which the axes are maintained at 90 degrees. 10. It is also possible to rotate the axes and not retain the 90-degree angle between the reference axes. 11. When not constrained to being orthogonal, the rotational procedure is called as oblique factor rotation. 12. Orthogonal and oblique factor rotations are demonstrated in figures 4 and 5 respectively. 13. Figure 4, in which five variables are depicted in a two-dimensional factor diagram, illustrates factor rotation. 14. The vertical axis represents un-rotated factor II, and the horizontal axis represents un-rotated factor I. 15. The axes are labeled with 0 at the origin and extend outwards to +1.0 or -1.0. 16. The numbers on the axes represent the factor loadings. 17. The five variables are labeled V1, V2, V3, V4, and V5. 18. The factor loading for variable 2 (V2) on un-rotated factor II is determined by drawing a dashed line horizontally from the data point to the vertical axis for factor II.
Page | 20
Figure 4: Orthogonal Factor Rotation
19. Similarly, a vertical line is drawn from variable 2 to the horizontal axis of un-rotated factor I to determine the loading of variable 2 on factor I. 20. A similar procedure followed for the remaining variables determines the factor loadings for the un-rotated and rotated solutions, as displayed in the table 1 for comparison purposes. 21. On the un-rotated first factor, all the variables load fairly high. 22. On the un-rotated second factor, variables 1 and 2 are very high in the positive direction. 23. Variable 5 is moderately high in the negative direction, and variables 3 and 4 considerably lower loadings in the negative direction. 24. From visual inspection of figure 4, two clusters of variables are obvious. 25. Variables 1 and 2 go together, as do variables 3, 4, and 5. 26. However, such patterning of variables is not so obvious from the un-rotated factor loadings. 27. By rotating the original axes clockwise, as indicated in figure 4, we obtain a completely different factor-loading pattern. 28. Note that in rotating the factors, the axes are maintained at 90 degrees. 29. This procedure signifies that the factors are mathematically independent and that the rotation has been orthogonal. 30. After rotating the factor axes, variables 3, 4, and 5 load high on factor I, and variables 1 and 2 load high on factor II.
31. Thus, the clustering or pattering of variables is not so obvious from the un-rotated factor loadings. 32. By rotating the original axes clockwise, as indicated in figure 4, we obtain a completely different factor-loading pattern. 33. Note that in rotating the factors, the axes are maintained at 90 degrees. 34. This procedure signifies that the factors are mathematically independent and the rotation has Page | 21 been orthogonal. 35. After rotating the factor axes, variables 3, 4, and 5 load high of factor I, and variables 3, 4, and 5 load high on factor II. 36. Thus, the clustering or patterning of these variables into two groups is more obvious after the rotation that before, even though the relative position or configuration of the variables remains unchanged. 37. The same general principles of orthogonal rotations pertain to oblique rotations. 38. The oblique rotational method is more flexible, however, because the factor axes need not be orthogonal. 39. It is also more realistic because the theoretically important underlying dimensions are not assumed to be uncorrelated with each other. 40. In figure 5 the two rotational methods are compared. 41. Note that the oblique factor rotation represents the clustering of variables more accurately. 42. This accuracy is a result of the fact that each rotated factor axis is now closer to the respective group of variables. 43. Also, the oblique solution provides information about the extent to which the factors are actually correlated with each other. 44. More researchers agree that most un-rotated solutions are not sufficient. 45. That is, in most cases rotation will improve the interpretation by reducing some of the ambiguities that often accompany the preliminary analysis. 46. The major option available is to choose an orthogonal or oblique rotation method. 47. The ultimate goal of any rotation is to obtain some theoretically meaningful factors and, if possible, the simplest factor structure. 48. Orthogonal rotational approaches are more widely used because all computer packages with factor analysis contain orthogonal rotation options, whereas the oblique methods are not as widespread. 49. Orthogonal rotations are also utilized more frequently because the analytical procedures for performing oblique rotations are not as well developed and are still subject to some controversy. 50. Several different approaches are available for performing either orthogonal or oblique rotations. 51. However, only a limited number of oblique rotational procedures are available in most statistical packages. 52. Thus, the researcher will probably have to accept the one that is provided. 53. Most researchers agree that most un-rotated solutions are not efficient. 54. That is, in most cases rotation will improve the interpretation by reducing some of the ambiguities that often accompany the preliminary analysis. 55. The major option available is to choose an orthogonal or oblique rotation method. 56. The ultimate goal of any rotation is to obtain some theoretically meaningful factors and, if possible, the simplest factor structure.
Page | 22
Figure 5: Oblique Rotation
Variables V1 V2 V3 V4 V5
Un-rotated Factor Loadings I II 0.50 0.80 0.60 0.70 0.90 -0.25 0.80 -0.30 0.60 -0.50
Rotated Factor Loadings I II 0.03 0.94 0.16 0.90 0.95 0.24 0.84 0.15 0.76 -0.13
Table 1: Comparison between Rotated and Un-rotated Factor Loadings
57. Orthogonal rotational approaches are more widely used because all computer packages with factor analysis contain orthogonal rotation options, whereas the oblique methods are not as widespread. 58. Orthogonal rotations are also utilized more frequently because the analytical procedures for performing oblique rotations are not as well developed and are still subject to some controversy. 59. Several different approaches are available for performing either orthogonal or oblique rotations. 60. However, only a limited number of oblique rotational procedures are available in most statistical packages. 61. Thus, the researcher will probably have to accept the one that is provided.
4.5.1.3. Orthogonal Rotation Methods 1. In practice, the objective of all methods of rotation is to simplify the rows and columns of the factor matrix to facilitate interpretation. 2. In a factor matrix, columns represent factors, with each rows corresponding to a variables loading across the factors. 3. By simplifying the rows, we mean making as many values in each row as close to zero as Page | 23 possible (i.e., maximizing a variables loading on a single factor). 4. By simplifying the columns, we mean making as many values in each column as close to zero as possible (i.e., making the number of high loadings as few as possible). 5. Three major orthogonal approaches have been developed: a. QUARTIMAX b. VARIMAX c. EQUIMAX 6. QUARTIMAX: a. The ultimate goal of a QUARTIMAX rotation is to simplify the rows of a factor matrix; that is, QUARTIMAX focuses on rotating the initial factor so that a variable loads high on one factor and as low as possible on all other factors. b. In these rotations, many variables can load high or near high on the same factor because the technique centers on simplifying the rows. c. The QUARTIMAX method has not proved especially successful in producing simpler structures. d. Its difficulty is that it tends to produce a general factor as the first factor on which most, if not all, of the variables have high loadings. e. Regardless of ones concept of a simpler structure, inevitably it involves dealing with clusters of variables; a method that tends to create a large general factor (QUARTIMAX) is not in line with the goals of rotation. 7. VARIMAX: a. In contrast to QUARTIMAX, the VARIMAX criterion centers on simplifying the columns of the factor matrix. b. With the VARIMAX rotational approach, the maximum possible simplification is reached if there are only 1s and 0s in a column. c. That is, the VARIMAX method maximizes the sum of variances of required loadings of the factor matrix. d. Recall that in QUARTIMAX approaches, many variables can load high or near high on the same factor because the technique centers on simplifying the rows. e. With the VARIMAX rotational approach, some high loadings (i.e., close to -1 or +1) are likely, as are some loadings near 0 in each column of the matrix. f. The logic is that interpretation is easiest when the variable-factor correlations are: i. Close to either +1 or -1, thus indicating a clear positive or negative association between the variable and the factor; or ii. Close to 0, indicating a clear lack of association. g. This structure is fundamentally simple. h. Although the QUARTIMAX solution is analytically simpler than the VARIMAX solution, VARIMAX seems to give a clearer separation of the factors.
In general, Kaisers experiment indicates that that factor pattern obtained by VARIMAX rotation tends to be more invariant than that obtained by the QUARTIMAX method when different subsets of variables are analyzed. j. The VARIMAX method has proved successful as an analytic approach to obtaining an orthogonal rotation of factors. Page | 24 8. EQUIMAX: a. The EQUIMAX approach is a compromise between QUARTIMAX and VARIMAX approaches. b. Rather than concentrating either on simplification of the rows or on simplification of the columns, it tries to accomplish some of each. c. EQUIMAX has not gained widespread acceptance and is used in frequently. 4.5.1.4. Oblique Rotation Methods 1. Oblique rotations are similar to orthogonal rotations, except that oblique rotations allow correlated factors instead of maintaining independence between the rotated factors. 2. Where several choices are available among orthogonal approaches, however, most statistical packages typically provide only limited choices for oblique rotations. 3. For example, SPSS provides OBLIMIN. 4. The objectives of simplification are comparable to the orthogonal methods, with the added feature of correlated factors. 5. With the possibility of correlated, the factor researcher must take additional care to validate obliquely rotated factors, because they have an additional way (non-orthogonality) of becoming specific to the sample and not generalizable, particularly with small samples or a low cases-tovariable ratio. 4.5.1.5. Selecting among Rotational Methods 1. No specific rules have been developed to guide the researcher in selecting a particular orthogonal rotational technique. 2. In most instances, the researcher simply utilizes the rotational technique provided by the computer program. 3. Most programs have the default rotation of VARIMAX, but all the major rotational methods are widely available. 4. However, no compelling analytical reason suggests favoring one rotational method over another. 5. The choice of an orthogonal or oblique rotation should be made on the basis of the particular needs of a given research problem. 6. To this end, several considerations should guide in selecting the rotational method. 4.5.2. Judging the Significance of Factor Loadings 1. In interpreting factors, a decision must be made regarding the factor loadings worth consideration and attention. 2. The following discussion details issues regarding practical and statistical significance, as well as the number of variables, that affect the interpretation of factor loadings.
i.
4.5.2.1. Ensuring Practical Significance 1. The first guideline is not based on any mathematical proposition but relates to practical significance by making a preliminary examination of the factor matrix in terms of the factor loadings. 2. Because a factor loading is the correlation of the variable and the factor, the squared loading is Page | 25 the amount of the variables total variance accounted for by the factor. 3. Thus, a 0.30 loading translates to approximately 10 percent explanation, and a 0.50 loading denotes that 25 percent of the variance is accounted for by the factor. 4. The loading must exceed 0.70 for the factor to account for 50 percent of the variance of a variable. 5. Thus, the larger the absolute size of the factor loading, the more important the loading in interpreting the factor matrix. 6. Using practical significance as the criteria, we can assess the loadings as follows: a. Factor loadings in the range of +- 0.30 to +-0.40 are considered to meet the minimal level for interpretation of structure. b. Loadings +-0.50 or greater are considered practically significant. c. Loadings exceeding 1.70 are considered indicative of well-defined structure and are the goal of any factor analysis. 7. The researcher should realize that extremely high loadings (0.80 and above) are not typical and that the practical significance of the loadings is an important criterion. 8. These guidelines are applicable when the sample size is 100 or larger and where the emphasis is on practical, not statistical significance. 4.5.2.2. Assessing Statistical Significance 1. As previously noted, a factor loading represents the correlation between an original variable and its factor. 2. In determining a significance level for the interpretation of loadings, an approach similar to determining the statistical significance of correlation coefficient could be used. 3. However, research has demonstrated that factor loadings have substantially larger standard errors than typical correlations. 4. Thus, factor loadings should be evaluated at considerably stricter levels. 5. The researcher cane employ the concept of statistical power to specify factor loadings considered significant for differing sample sizes. 6. With the stated objective of obtaining a power level of 80 percent, the use of a 0.05 significance level, and the proposed inflation of the standard errors of factor loadings, the table below contains the sample sizes necessary for each factor loading value to be considered significant. 7. For example, in a sample of 100 respondents, factor loadings of 0.55 and above are significant. 8. However, in a sample of 50, a factor loading of 0.75 is required for significance. 9. In comparison with the prior rule of thumb, which denoted all loadings of 0.30 as having practical significance, this approach would consider loadings of 0.30 significant only for sample sizes of 350 or greater. Factor Loading 0.30 0.35 0.40 Sample Size Needed for Significance 350 250 200
0.45 0.50 0.55 0.60 0.65 0.70 0.75
150 120 100 85 70 60 50
Page | 26
Table 2: Guidelines for Identifying Significant Factor Loadings Based on Sample Size
10. These guidelines are quite conservative when compared with the guidelines of the previous section or even the statistical levels associated with conventional correlation coefficients. 11. Thus, these guidelines should be used as a starting point in factor-loading interpretation, with lower loadings considered significant and added to the interpretation based on other considerations. 12. The next section details the interpretation process and the role that other considerations can play. 4.5.2.3. Adjustments based on the Number of Variables 1. A disadvantage of both of the prior approaches is that the number of variables being analyzed and the specific factor being examined are not considered. 2. It has been shown that as the researcher moves from the first factor to later factors, the acceptable level for a loading to be judged significant should increase. 3. The fact that unique variance and error variance begin to appear in later factors means that some upward adjustment in the level of significance should be included. 4. The number of variables being analyzed is also important in deciding which loadings are significant. 5. As the number of variables being analyzed increases, the acceptable level for considering a loading significantly decreases. 6. Adjustment for the number of variables in increasingly important as one moves from the first factor extracted to later factors. 7. Rules of Thumb 5 summarize the criteria for the practical or statistical significance of factor loadings. 4.5.3. Interpreting a Factor Matrix 1. The task of interpreting a factor-loading matrix to identify the structure among the variables can at first seem overwhelming. 2. The researcher must sort through all the factor loadings (remember, each variable has a loading on each factor) to identify those most indicative of the underlying structure. 3. Even a fairly simple analysis on four factors necessitates evaluating and interpreting 60 factor loadings. 4. Using the criteria for interpreting loadings described in the previous section, the researcher finds those distinctive variables for each factor and looks for a correspondence to the conceptual foundation or the managerial expectations for the research to assess practical significance. 5. Thus, interpreting the complex interrelationships represented in a factor matrix requires a combination of applying objective criteria with managerial judgment. 6. By following the five-step procedure outlined next, the process can be simplified considerably. 7. After the process is discussed, a brief example will be used to illustrate the process.
4.5.3.1. Step 1: Examine the Factor Matrix of Loadings 1. The factor-loading matrix contains the factor loading of each variable on each factor. 2. They may be either rotated or unrotated loadings, but as discussed earlier, rotated loadings are usually used in factor interpretation unless data reduction is the sole objective. 3. Typically, the factors are arranged as columns; thus, each column of numbers represents the Page | 27 loadings of a single factor. 4. If an oblique rotation has been used, two matrices of factor loadings are provided. 5. The first is the factor pattern matrix, which has loadings that represent the unique contribution of each variable to the factor. 6. The second is the factor structure matrix, which has simple correlations between variables and factors, but these loadings contain both the unique variance between variables and factors and the correlation among factors. 7. As the correlation among factors becomes greater, it becomes more difficult to distinguish which variables load uniquely on each factor in the factor structure matrix. 8. Thus, most researchers report the results of the factor pattern matrix. 4.5.3.2. Step 2: Identify the Significant Loading(s) for Each Variable 1. The interpretation should start with the first variable on the first factor and move horizontally from left to right, looking for the highest loading for that variable on any factor. 2. When the highest loading (largest absolute factor loading) is identified, it should be underlined if significant as determined by the criteria discussed earlier. 3. Attention then focuses on the second variable and, again moving from left to right horizontally, looking for the highest loading for that variable on any factor and underlining it. 4. This procedure should continue for each variable until all variables have been reviewed for their highest loading on a factor. 5. Most factor solutions, however, do not result in a simple structure solution (a single high loading for each variable on only one factor). 6. Thus, the researcher will, after underlining the highest loading for a variable, continue to evaluate the factor matrix by underlining all significant loadings for a variable on all the factors. 7. The process of interpretation would be greatly simplified if each variable had only one significant variable. 8. In practice, however, the researcher may find that one or more variables each has moderate-size loadings on several factors, all of which are significant, and the job of interpreting the factors is much more difficult. 9. When a variable is found to have more than one significant loading, it is termed a cross-loading. 10. The difficulty arises because a variable with several significant loadings (cross-loading) must be used in labeling all the factors on which it has a significant loading. 11. Yet how can the factors be distinct and potentially represent separate concepts when they share variables? 12. Ultimately, the objective is to minimize the number of significant loadings on each row of the factor matrix (i.e. make each variable associate with only one factor). 13. The researcher may find that different rotation methods eliminate any cross-loadings and thus define a simple structure. 14. If a variable persists in having cross-loadings, it becomes a candidate for deletion.
4.5.3.3. Step 3: Assess the Communalities of the Variables 1. Once all the significant loadings have been identified, the researcher should look for any variables that are not adequately accounted for by the factor solution. 2. One simple approach is to examine each variables communality, representing the amount of variance accounted for by the factor solution for each variable. 3. The researcher should view the communalities to assess whether the variables meet acceptable Page | 28 levels of explanation. 4. For example, a researcher may specify that at least one-half of the variance of each variable must be taken into account. 5. Using this guideline, the researcher would identify all variables with communalities less than 0.5 as not having sufficient explanation. 4.5.3.4. Step 4: Re-specify the Factor Model if Needed 1. Once all the significant loadings have been identified and the communalities examined, the researcher may find any one of the several problems: a. A variable has no significant loadings b. Even with a significant loading, a variables communality is deemed too low c. A variable has a cross loading 2. In this situation, the researcher can take any combination of the following remedies: a. Ignore problematic variables and interpret solution as is, which is appropriate if the objective is solely data reduction, but the researcher must still note that the variables in question are poorly represented in the factor solution. b. Evaluate each of those variables for possible deletion, depending on the variables overall contribution to the research as well as its communality index. If the variable is of minor importance to the studys objective or has an unacceptable communality value, it may be eliminated and then the factor model respecified by deriving a new factor solution with those variables eliminated. c. Employ an alternative rotation method, particularly an oblique method if only orthogonal methods had been used. d. Decrease/increase the number of factors retained to see whether a smaller/larger factor structure will represent those problematic variables. e. Modify the type of factor model used (component vs. common factor) to assess whether varying the type of variance considered affects the factor structure. 3. Many tricks can be used to improve upon the structure, but the ultimate responsibility rests with the researcher and the conceptual foundation underlying the analysis. 4.5.3.5. Step 5: Label the Factors 1. When an acceptable factor solution has been obtained in which all variables have a significant loading on a factor, the researcher attempts to assign some meaning to the pattern of factor loadings. 2. Variables with higher loadings are considered more important and have greater influence on the name or label selected to represent a factor. 3. The signs are interpreted just like any other correlation coefficients. 4. This label is not derived or assigned by the factor analysis computer program; rather, the label is intuitively developed by the researcher based on its appropriateness for representing the underlying dimensions of a particular factor.
5. The procedure is followed for each extracted factor and the final result will be a name or label that represents each of the derived factors as accurately as possible. 6. The final result will be a name or label that represents each of the derived factors as accurately as possible. 4.5.3.5.1 An Example of Factor Interpretation Page | 29 1. To serve as an illustration of factor interpretation, nine measures were obtained in a pilot test based on a sample of 202 respondents. 2. After estimation of the initial results, further analysis indicated a three-factor solution was appropriate. 3. Thus, the researcher now has the task of interpreting the factor loadings of the nine variables.
4. The above figure has an unrotated factor matrix.
5. The above figure has VARIMAX rotated factor-loadings.
Page | 30
6. The above figure has simplified rotated factor-loading matrix.
7. The above figure shows rotated factor-loading matrix with V1 deleted.
4.6. Validation of Factor Analysis

1. The sixth stage involves assessing the degree of generalizability of the results to the population and the potential influence of individual cases or respondents on the overall results. 2. The issue of generalizability is critical for each of the multivariate methods, but it is especially relevant for interdependence methods because they describe a data structure that should be representative of the population as well. 3. In the validation process, the researcher must address a number of issues in the area of research design and data characteristics are discussed next.
4.6.1. Use of a Confirmatory Perspective 1. The most direct method of validating the results is to more to a confirmatory perspective and assess the replicability of the results, either with a split sample in the original dataset or with a separate sample. 2. The comparison of two or more factor model results has always been problematic. Page | 31 3. However, several options exist for making an objective comparison. 4. The emergence of confirmatory factor analysis (CFA) through structural equation modeling has provided one option, but it is generally more complicated and required additional software packages like LISREL or EOS. 5. Apart from CFA, several other methods have been proposed, ranging from a simple matching index to programs designed specifically to assess the correspondence between factor matrices. 6. These methods have had sporadic use, owing in part to: a. Their perceived lack of sophistication b. The unavailability of software or analytical programs to automate the comparisons. 7. Thus, when CFA is not appropriate, these methods provide some objective basis for comparison. 4.6.1.1. Assessing Factor Structure Stability 1. Another aspect of generalizability is the stability of the factor model results. 2. Factor stability is primarily dependent on the sample size and on the number of cases per variable. 3. The researcher is always encouraged to obtain the largest sample possible and develop parsimonious models to increase the cases-to-variables ratio. 4. If sample size permits, the researcher may wish to randomly split the sample into two subsets and estimate factor models for each subset. 5. Comparison of the two resulting factor matrices will provide an assessment of the robustness of the solution across the sample. 4.6.1.2. Detecting Influential Observations 1. In addition to generalizability, another issue of importance to the validation of factor analysis is the detection of influential observations. 2. The researcher is encouraged to estimate the model with and without observations identified as outliers to assess their impact on the results. 3. If omission of the outliers is justified, the results should have greater generalizability. 4. Also, several measures of influence that reflect one observations position relative to all others (e.g. covariance ratio) are applicable to factor analysis as well. 5. Finally, the complexity of methods proposed for identifying influential observations specific to factor analysis limits the application of these methods. 4.7.

Chapter 3 Factor Analysis New

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Chapter 3 Factor Analysis New

Загружено:

Авторское право:

Доступные форматы

Exploratory Factor Analysis

Specific variance Summated scales

Surrogate variable Trace Unique variance Validity

2. What is Factor Analysis?

3. A Hypothetical Example of Factor Analysis

Figure 1: Original Correlation Matrix

Figure 2: Correlation Matrix of Variables after Grouping According to Factor Analysis

22. 23. 24. 25. 26.

4. Factor Analysis Decision Process

4.1. Stage 1: Objectives of Factor Analysis

4.2. Stage 2: Designing a Factor Analysis

4.3. Stage 3: Assumptions in Factor Analysis

4.4. Stage 4: Deriving Factors and Assessing Overall Fit

13. 14. 15. 16.

Figure 3: The Scree Plot

4.5. Stage 5: Interpreting the Factors

Figure 4: Orthogonal Factor Rotation

Figure 5: Oblique Rotation

Table 1: Comparison between Rotated and Un-rotated Factor Loadings

0.45 0.50 0.55 0.60 0.65 0.70 0.75

150 120 100 85 70 60 50

4. The above figure has an unrotated factor matrix.

5. The above figure has VARIMAX rotated factor-loadings.

6. The above figure has simplified rotated factor-loading matrix.

7. The above figure shows rotated factor-loading matrix with V1 deleted.

4.6. Validation of Factor Analysis

Вам также может понравиться