Вы находитесь на странице: 1из 57

Main Page

Welcome to Gaskination's StatWiki!


Supported by the Doctor of Management Program at Case Western Reserve University and
by Brigham Young University
This wiki has been created to provide you with all sorts of statistics tutorials to guide you through
the standard statistical analyses common to hypothesis testing in the social sciences. Examples
are geared toward organizational, business, and management fields. AMOS, SPSS, Excel,
SmartPLS and PLS-graph are used to perform all analyses provided on this wiki. This wiki is not
exhaustive, or even very comprehensive. I provide brief explanations of concepts, rather than full
length instruction. My main focus is on providing guidance on how to perform the statistics. This is
very much a mechanically oriented resource. For more comprehensive instruction on the
methods demonstrated in this wiki, please refer to Hair et al 2010 (Multivariate Data Analysis), as
well as to the powerpoint presentations offered for most of the topics. I hope you find the
resources here useful. I will likely update them from time to time.
This teaching material has been developed as part of a quantitative social science method
sequence aimed to prepare Doctor of Management students for their quantitative research
project. These students are working executives who carry out a rigorous quantitative project as
part of their research stream. Examples of these projects and examples of how to report results of
quantitative research projects in academic papers can be found at in the DM Research Library.
Acknowledgments
The materials and teaching approach adopted in these materials have been developed by a team
of teachers consisting of Jagdip Singh, Toni Somers, Kalle Lyytinen, Nick Berente, Shyam
Giridharadas and me over the last several years. Although I have developed and refined much of
the material and the resources in this Wiki, I am not the sole contributor. I greatly appreciate the
work done by Kalle Lyytinen (Case Western Reserve University) , Toni Somers (Wayne State
University), Nick Berente (University of Georgia), Shyam Giridharadas (University of Washington)
and Jagdip Singh (Case Western Reserve University) who selected and identified much of the
literature underlying the materials and also originally developed many of the Powerpoint slides.
They also helped me refine these materials by providing useful feedback on the slides and
videos. I also appreciate the contribution and help of Jagdip Singh (Case Western Reserve
University), who is the owner of the Sohana and Bencare datasets used in the examples and
which are made available below. I also acknowledge the continued support of the Doctor of
Management Program at the Weatherhead School of Management at Case Western Reserve
University, Cleveland, Ohio for their involvement, support, and sponsoring of this wiki, as well as
to Brigham Young University for encouraging me in all my SEM-related endeavors.
Please report any problems with the wiki to james.eric.gaskin@gmail.com [1]

 If you are having trouble and cannot figure out what to do, even after using the resources on
this wiki or on Gaskination, then you might benefit from the archive of support emails I have
received and responded to over the past years: Stats Help Archive.

 You may find this set of Excel tools useful/necessary for many of the analyses you
will learn about in this wiki: Stats Tools Package Please note that this one is the most
recently updated one, and does not include a variance column in the Validity Master sheet.
This is because it was a mistake to include variances when working with standardized
estimates.
 You may also find this basics tutorial for AMOS and SPSS useful as a starter.
 VIDEO TUTORIAL: Basic Analysis in AMOS and SPSS
Datasets
Here are some links to the datasets, and related resources, I use in many of the video tutorials.

 YouTube SEM Series (this data goes along with this YouTube playlist: SEM Series 2016)
 Sohana
 Bencare
 Sales Performance
How to cite Gaskination resources
IEEE TPC PLS article:

 Paul Benjamin Lowry & James Gaskin (2014). “Partial Least Squares (PLS) Structural
Equation Modeling (SEM) for Building and Testing Behavioral Causal Theory: When to
Choose It and How to Use It,” IEEE TPC (57:2), pp. 123-146.
Wiki:

 Gaskin, J., (2016), "Name of section", Gaskination's


StatWiki. http://statwiki.kolobkreations.com
YouTube videos:

 Gaskin, J., (Year video uploaded), "Name of video", Gaskination's


Statistics. http://youtube.com/Gaskination
Stats Tools Package:

 Gaskin, J., (2016), "Name of tab", Stats Tools Package. http://statwiki.kolobkreations.com


Plugin or Estimand:

 Gaskin, J., (2016), "Name of Plugin or Estimand", Gaskination's


Statistics. http://statwiki.kolobkreations.com
StatWiki Contents
1. Data screening

 Missing Data
 Outliers
 Normality
 Linearity
 Homoscedasticity
 Multicollinearity
2. Exploratory Factor Analysis (EFA)

 Rotation types
 Factoring methods
 Appropriateness of data
 Communalities
 Dimensionality
 Factor Structure
 Convergent validity
 Discriminant validity
 Face validity
 Reliability
 Formative vs. Reflective
3. Confirmatory Factor Analysis (CFA)

 Model Fit
 Validity and Reliability
 Common Method Bias (CMB)
 Invariance
 2nd Order Factors
4. Structural Equation Modeling (SEM)

 Hypotheses
 Controls
 Mediation
 Interaction
 Model fit again
 Multi-group
 From Measurement Model to Structural Model
 Creating Composites from Latent Factors
5. PLS (Partial Least Squares)

 Installing PLS-graph
 Troubleshooting
 Sample Size Rule
 Factor Analysis
 Testing Causal Models
 Testing Group Differences
 Handling Missing Data
 Convergent and Discriminant Validity
 Common Method Bias
 Interaction
 SmartPLS
6. General Guidelines

 Example Analysis
 Ten Steps to Building a Good Quant Model
 Order of Operations
 General Guidelines to Writing a Quant Paper
7. Cluster Analysis

 Just a bunch of videos here


Data screening
 LESSON: Data Screening
 VIDEO TUTORIAL: Data Screening
Data screening (sometimes referred to as "data screaming") is the process of ensuring your data
is clean and ready to go before you conduct further statistical analyses. Data must be screened in
order to ensure the data is useable, reliable, and valid for testing causal theory. In this section I
will focus on six specific issues that need to be addressed when cleaning (not cooking) your data.

Do you know of some citations that could be used to support the topics and procedures
discussed in this section? Please email them to me with the name of the section, procedure, or
subsection that they support. Thanks!

Contents
[hide]

 1 Missing Data
 2 Outliers
o 2.1 Univariate
o 2.2 Multivariate
 3 Normality
 4 Linearity
 5 Homoscedasticity
 6 Multicollinearity

Missing Data
If you are missing much of your data, this can cause several problems. The most apparent
problem is that there simply won't be enough data points to run your analyses. The EFA, CFA,
and path models require a certain number of data points in order to compute estimates. This
number increases with the complexity of your model. If you are missing several values in your
data, the analysis just won't run.
Additionally, missing data might represent bias issues. Some people may not have answered
particular questions in your survey because of some common issue. For example, if you asked
about gender, and females are less likely to report their gender than males, then you will have
male-biased data. Perhaps only 50% of the females reported their gender, but 95% of the males
reported gender. If you use gender in your causal models, then you will be heavily biased toward
males, because you will not end up using the unreported responses.
To find out how many missing values each variable has, in SPSS go to Analyze, then Descriptive
Statistics, then Frequencies. Enter the variables in the variables list. Then click OK. The table in
the output will show the number of missing values for each variable.
The threshold for missing data is flexible, but generally, if you are missing more than 10% of the
responses on a particular variable, or from a particular respondent, that variable or respondent
may be problematic. There are several ways to deal with problematic variables.
 Just don't use that variable.
 If it makes sense, impute the missing values. This should only be done for continuous or
interval data (like age or Likert-scale responses), not for categorical data (like gender).
 If your dataset is large enough, just don't use the responses that had missing values for that
variable. This may create a bias, however, if the number of missing responses is greater than
10%.
To impute values in SPSS, go to Transform, Replace Missing Values; then select the variables
that need imputing, and hit OK. See the screenshots below. In this screenshot, I use the Mean
replacement method. But there are other options, including Median replacement. Typically with
Likert-type data, you want to use median replacement, because means are less meaningful in
these scenarios. For more information on when to use which type of imputation, refer to: Lynch
(2003)

Handling problematic respondents is somewhat more difficult. If a respondent did not answer a
large portion of the questions, their other responses may be useless when it comes to testing
causal models. For example, if they answered questions about diet, but not about weight loss, for
this individual we cannot test a causal model that argues that diet has a positive effect on weight
loss. We simply do not have the data for that person. My recommendation is to first determine
which variables will actually be used in your model (often we collect data on more variables than
we actually end up using in our model), then determine if the respondent is problematic. If so,
then remove that respondent from the analysis.

Outliers
Outliers can influence your results, pulling the mean away from the median. Two types of outliers
exist: outliers for individual variables, and outliers for the model.
Univariate

 VIDEO TUTORIAL: Detecting Univariate Outliers


To detect outliers on each variable, just produce a boxplot in SPSS (as demonstrated in the
video). Outliers will appear at the extremes, and will be labeled, as in the figure below. If you have
a really high sample size, then you may want to remove the outliers. If you are working with a
smaller dataset, you may want to be less liberal about deleting records. However, this is a trade-
off, because outliers will influence small datasets more than large ones. Lastly, outliers do not
really exist in Likert-scales. Answering at the extreme (1 or 5) is not really representative outlier
behavior.
Another type of outlier is an unengaged respondent. Sometimes respondents will enter '3, 3, 3,
3,...' for every single survey item. This participant was clearly not engaged, and their responses
will throw off your results. Other patterns indicative of unengaged respondents are '1, 2, 3, 4, 5, 1,
2, ...' or '1, 1, 1, 1, 5, 5, 5, 5, 1, 1, ...'. There are multiple ways to identify and eliminate these
unengaged respondents:

 Include attention traps that request the respondent to "answer somewhat agree for this item if
you are paying attention". I usually include two of these in opposite directions (i.e., one says
somewhat agree and one says somewhat disagree) at about a third and two-thirds of the way
through my surveys. I am always astounded at how many I catch this way...
 See if the participant answered reverse-coded questions in the same direction as normal
questions. For example, if they responded strongly agree to both of these items, then they
were not paying attention: "I am very hungry", "I don't have much appetite right now".
Multivariate

 VIDEO TUTORIAL: Detecting Multivariate Influential Outliers


Multivariate outliers refer to records that do not fit the standard sets of correlations exhibited by
the other records in the dataset, with regards to your causal model. So, if all but one person in the
dataset reports that diet has a positive effect on weight loss, but this one guy reports that he
gains weight when he diets, then his record would be considered a multivariate outlier. To detect
these influential multivariate outliers, you need to calculate the Mahalanobis d-squared. This is a
simple matter in AMOS. See the video tutorial for the particulars. As a warning however, I
almost never address multivariate outliers, as it is very difficult to justify removing them just
because they don't match your theory. Additionally, you will nearly always find multivariate
outliers, even if you remove them, more will show up. It is a slippery slope.
A more conservative approach that I would recommend is to examine the influential cases
indicated by the Cook's distance. Here is a video explaining what this is and how to do it. This
video also discusses multicollinearity.

 VIDEO TUTORIAL: Multivariate Assumptions

Normality

 VIDEO TUTORIAL: Detecting Normality Issues


Normality refers to the distribution of the data for a particular variable. We usually assume that
the data is normally distributed, even though it usually is not! Normality is assessed in many
different ways: shape, skewness, and kurtosis (flat/peaked).

 Shape: To discover the shape of the distribution in SPSS, build a histogram (as shown in the
video tutorial) and plot the normal curve. If the histogram does not match the normal curve,
then you likely have normality issues. You can also look at the boxplot to determine
normality.
 Skewness: Skewness means that the responses did not fall into a normal distribution, but
were heavily weighted toward one end of the scale. Income is an example of a commonly
right skewed variable; most people make between 20 and 70 thousand dollars in the USA,
but there is smaller group that makes between 70 and 100, and an even smaller group that
makes between 100 and 150, and a much smaller group that makes between 150 and 250,
etc. all the way up to Bill Gates and Mark Zuckerberg. Addressing skewness may require
transformations of your data (if continuous), or removing influential outliers. There are two
rules on Skewness:
 (1)If your skewness value is greater than 1 then you are positive (right) skewed, if it is less
than -1 you are negative (left) skewed, if it is in between, then you are fine. Some published
thresholds are a bit more liberal and allow for up to +/-2.2, instead of +/-1.
 (2)If the absolute value of the skewness is less than three times the standard error, then you
are fine; otherwise you are skewed.
Using these rules, we can see from the table below, that all three variables are fine using the first
rule, but using the second rule, they are all negative (left) skewed.
Skewness looks like this:

 Kurtosis:
Kurtosis refers to the outliers of the distribution of data. Data that have outliers have large
kurtosis. Data without outliers have low kurtosis. The kurtosis (excess kurtosis) of the normal
distribution is 0. The rule for evaluating whether or not your kurtosis is problematic is the same as
rule two above:

 If the absolute value of the kurtosis is less than three times the standard error, then the
kurtosis is not significantly different from that of the normal distribution; otherwise you have
kurtosis issues. Although a looser rule is an overall kurtosis score of 2.200 or less (rather
than 1.00) (Sposito et al., 1983).
Kurtosis looks like this:

 Bimodal:
One other issue you may run into with the distribution of your data is a bimodal distribution. This
means that the data has multiple (two) peaks, rather than peaking at the mean. This may indicate
there are moderating variables effecting this data. A bimodal distribution looks like this:
 Transformations:
 VIDEO TUTORIAL: Transformations
When you have extremely non-normal data, it will influence your regressions in SPSS and
AMOS. In such cases, if you have non-Likert-scale variables (so, variables like age, income,
revenue, etc.), you can transform them prior to including them in your model. Gary Templeton has
published an excellent article on this and created a YouTube video showing how to conduct the
transformation. He also references his article in the video.

Linearity
Linearity refers to the consistent slope of change that represents the relationship between an IV
and a DV. If the relationship between the IV and the DV is radically inconsistent, then it will throw
off your SEM analyses. There are dozens of ways to test for linearity. Perhaps the most elegant
(easy and clear-cut, yet rigorous), is the deviation from linearity test available in the ANOVA test
in SPSS. In SPSS go to Analyze, Compare Means, Means. Put the composite IVs and DVs in the
lists, then click on options, and select "Test for Linearity". Then in the ANOVA table in the output
window, if the Sig value for Deviation from Linearity is less than 0.05, the relationship between IV
and DV is not linear, and thus is problematic (see the screenshots below). Issues of linearity can
sometimes be fixed by removing outliers (if the significance is borderline), or through transforming
the data. In the screenshot below, we can see that the first relationship is linear (Sig = .268), but
the second relationship is nonlinear (Sig = .003).

 If this test turns up odd results, then simply perform an OLS linear regression between each
IV->DV pair. If the sig value is less than 0.05, then the relationship can be considered
"sufficiently" linear. While this approach is somewhat less rigorous, it has the benefit of
working every time! You can also do a curve-linear regression ("curve estimation") to see if
the relationship is more linear than non-linear.
Homoscedasticity

 VIDEO TUTORIAL: Plotting Homoscedasticity

 Encyclopedia of Research Design, Volume 1 (2010), Sage Publications, pg. 581


Homoscedasticity is a nasty word that means that the variable's residual (error) exhibits
consistent variance across different levels of the variable. There are good reasons for desiring
this. For more information, see Hair et al. 2010 chapter 2. :) A simple way to determine if a
relationship is homoscedastic is to do a simple scatter plot with the variable on the y-axis and the
variable's residual on the x-axis. To see a step by step guide on how to do this, watch the video
tutorial. If the plot comes up with a consistent pattern - as in the figure below, then we are good -
we have homoscedasticity! If there is not a consistent pattern, then the relationship is considered
heteroskedastic. This can be fixed by transforming the data or by splitting the data by subgroups
(such as two groups for gender). You can read more about transformations in Hair et al. 2010 ch.
4.
Schools of thought on homoscedasticity are still out. Some suggest that evidence of
heteroskedasticity is not a problem (and is actually desirable and expected in moderated models),
and so we shouldn't worry about testing for homoscedasticity. I never conduct this test unless
specifically requested to by a reviewer.

Multicollinearity

 VIDEO TUTORIAL: Detecting Mulitcollinearity


Multicollinearity is not desirable. It means that the variance our independent variables explain in
our dependent variable are are overlapping with each other and thus not each explaining unique
variance in the dependent variable. The way to check this is to calculate a Variable Inflation
Factor (VIF) for each independent variable after running a multivariate regression. The rules of
thumb for the VIF are as follows:

 VIF < 3: not a problem


 VIF > 3; potential problem
 VIF > 5; very likely problem
 VIF > 10; definitely problem
The tolerance value in SPSS is directly related to the VIF, and values less than 0.10 are strong
indications of multicollinearity issues. For particulars on how to calculate the VIF in SPSS, watch
the step by step video tutorial. The easiest method for fixing multicollinearity issues is to drop one
of problematic variables. This won't hurt your R-square much because that variable doesn't add
much unique explanation of variance anyway.
For a more critical examination of multicollinearity, please refer to:

 O’brien, R. M. 2007. A Caution Regarding Rules of Thumb for Variance Inflation


Factors. Quality & Quantity, 41, 673-690.
Exploratory Factor Analysis
Exploratory Factor Analysis (EFA) is a statistical approach for determining the correlation among
the variables in a dataset. This type of analysis provides a factor structure (a grouping of
variables based on strong correlations). In general, an EFA prepares the variables to be used for
cleaner structural equation modeling. An EFA should always be conducted for new datasets. The
beauty of an EFA over a CFA (confirmatory) is that no a priori theory about which items belong to
which constructs is applied. This means the EFA will be able to spot problematic variables much
more easily than the CFA. A critical assumption of the EFA is that it is only appropriate for
sets of non-nominal items which theoretically belong to reflective
latent factors. Categorical/nominal variables (e.g., marital status, gender) should not be
included. Formative measures should not be included. Very rarely should objective (rather than
perceptual) variables be included, as objective variables rarely belong to reflective latent factors.
For those wondering why I default to Maximum Likelihood and Promax, here is a good
explanation: https://jonathantemplin.com/files/sem/sem13psyc948/sem13psyc948_lecture10.pdf

 VIDEO TUTORIAL: How to do an EFA

 LESSON: Exploratory Factor Analysis

Do you know of some citations that could be used to support the topics and procedures
discussed in this section? Please email them to me with the name of the section, procedure, or
subsection that they support. Thanks!

Contents
[hide]

 1 Rotation types
o 1.1 Orthogonal
o 1.2 Oblique
 2 Factoring methods
o 2.1 Principal Component Analysis (PCA)
o 2.2 Principal Axis Factoring (PAF)
o 2.3 Maximum Likelihood (ML)
 3 Appropriateness of data (adequacy)
o 3.1 KMO Statistics
o 3.2 Bartlett’s Test of Sphericity
 4 Communalities
 5 Factor Structure
 6 Convergent validity
 7 Discriminant validity
 8 Face validity
 9 Reliability
 10 Formative vs. Reflective
 11 Common EFA Problems
 12 Some Thoughts on Messy EFAs

Rotation types
Rotation causes factor loadings to be more clearly differentiated, which is often necessary to
facilitate interpretation. Several types of rotation are available for your use.
Orthogonal
Varimax (most common)

 minimizes number of variables with extreme loadings (high or low) on a factor


 makes it possible to identify a variable with a factor
Quartimax

 minimizes the number of factors needed to explain each variable


 tends to generate a general factor on which most variables load with medium to high values
 not very helpful for research
Equimax

 combination of Varimax and Quartimax


Oblique
The variables are assessed for the unique relationship between each factor and the variables
(removing relationships that are shared by multiple factors).
Direct oblimin (DO)

 factors are allowed to be correlated


 diminished interpretability
Promax (Use this one if you're not sure)

 computationally faster than DO


 used for large datasets

Factoring methods
There are three main methods for factor extraction.
Principal Component Analysis (PCA)
Use for a softer solution

 Considers all of the available variance (common + unique) (places 1’s on diagonal of
correlation matrix).
 Seeks a linear combination of variables such that maximum variance is extracted—repeats
this step.
 Use when there is concern with prediction, parsimony and you know the specific and error
variance are small.
 Results in orthogonal (uncorrelated factors).
Principal Axis Factoring (PAF)

 Considers only common variance (places communality estimates on diagonal of correlation


matrix).
 Seeks least number of factors that can account for the common variance (correlation) of a set
of variables.
 PAF is only analyzing common factor variability; removing the uniqueness or unexplained
variability from the model.
 PAF is preferred because it accounts for co-variation, whereas PCA accounts for total
variance.
Maximum Likelihood (ML)
Use this method if you are unsure

 Maximizes differences between factors. Provides Model Fit estimate.


 This is the approach used in AMOS, so if you are going to use AMOS for CFA and structural
modeling, you should use this one during the EFA.

Appropriateness of data (adequacy)


KMO Statistics

 Marvelous: .90s
 Meritorious: .80s
 Middling: .70s
 Mediocre: .60s
 Miserable: .50s
 Unacceptable: <.50
Bartlett’s Test of Sphericity
Tests hypothesis that correlation matrix is an identity matrix.

 Diagonals are ones


 Off-diagonals are zeros
A significant result (Sig. < 0.05) indicates matrix is not an identity matrix; i.e., the variables do
relate to one another enough to run a meaningful EFA.
Communalities
A communality is the extent to which an item correlates with all other items. Higher
communalities are better. If communalities for a particular variable are low (between 0.0-0.4),
then that variable may struggle to load significantly on any factor. In the table below, you should
identify low values in the "Extraction" column. Low values indicate candidates for removal after
you examine the pattern matrix.

Factor Structure
Factor structure refers to the intercorrelations among the variables being tested in the EFA. Using
the pattern matrix below as an illustration, we can see that variables group into factors - more
precisely, they "load" onto factors. The example below illustrates a very clean factor structure in
which convergent and discriminant validity are evident by the high loadings within factors, and no
major cross-loadings between factors (i.e., a primary loading should be at least 0.200 larger than
secondary loading).
Convergent validity
Convergent validity means that the variables within a single factor are highly correlated. This is
evident by the factor loadings. Sufficient/significant loadings depend on the sample size of your
dataset. The table below outlines the thresholds for sufficient/significant factor loadings.
Generally, the smaller the sample size, the higher the required loading. We can see that in the
pattern matrix above, we would need a sample size of 60-70 at a minimum to achieve significant
loadings for variables loyalty1 and loyalty7. Regardless of sample size, it is best to have loadings
greater than 0.500 and averaging out to greater than 0.700 for each factor.
Discriminant validity
Discriminant validity refers to the extent to which factors are distinct and uncorrelated. The rule is
that variables should relate more strongly to their own factor than to another factor. Two primary
methods exist for determining discriminant validity during an EFA. The first method is to examine
the pattern matrix. Variables should load significantly only on one factor. If "cross-loadings" do
exist (variable loads on multiple factors), then the cross-loadings should differ by more than 0.2.
The second method is to examine the factor correlation matrix, as shown below. Correlations
between factors should not exceed 0.7. A correlation greater than 0.7 indicates a majority of
shared variance (0.7 * 0.7 = 49% shared variance). As we can see from the factor correlation
matrix below, factor 2 is too highly correlated with factors 1, 3, and 4.

What if you have discriminant validity problems - for example, the items from two theoretically
different factors end up loading on the same extracted factor (instead of on separate factors). I
have found the best way to resolve this type of issue is to do a separate EFA with just the items
from the offending factors. Work out this smaller EFA (by removing items one at a time that have
the worst cross-loadings), then reinsert the remaining items into the full EFA. This will usually
resolve the issue. If it doesn't, then consider whether these two factors are actually just two
dimensions or manifestations of some higher order factor. If this is the case, then you might
consider doing the EFA for this higher order factor separate from all the items belonging to first
order factors. Then during the CFA, make sure to model the higher order factor properly by
making a 2nd order latent variable.

Face validity
Face validity is very simple. Do the factors make sense? For example, are variables that are
similar in nature loading together on the same factor? If there are exceptions, are they
explainable? Factors that demonstrate sufficient face validity should be easy to label. For
example, in the pattern matrix above, we could easily label factor 1 "Trust in the Agent"
(assuming the variable names are representative of the measure used to collect data for this
variable). If all the "Trust" variables in the pattern matrix above loaded onto a single factor, we
may have to abstract a bit and call this factor "Trust" rather than "Trust in Agent" and "Trust in
Company".

Reliability
Reliability refers to the consistency of the item-level errors within a single factor. Reliability means
just what it sounds like: a "reliable" set of variables will consistently load on the same factor. The
way to test reliability in an EFA is to compute Cronbach's alpha for each factor. Cronbach's alpha
should be above 0.7; although, ceteris paribus, the value will generally increase for factors with
more variables, and decrease for factors with fewer variables. Each factor should aim to have at
least 3 variables, although 2 variables is sometimes permissible.

Formative vs. Reflective

 LESSON: Variables and Factor Analysis, including Specification


Specifying formative versus reflective constructs is a critical preliminary step prior to further
statistical analysis. Formative constructs should not be expected to properly factor in SPSS, and
cannot be modeled appropriately in AMOS. If you need to work with formative factors, either use
a Partial Least Squares approach (see PLS section), or create a score (new variable) for each set
of formative indicators. This score could be an average or a sum, or some sort of weighted
scoring. Here is how you know whether you're working with formative or reflective constructs:
Formative

 Direction of causality is from measure to construct


 No reason to expect the measures are correlated
 Indicators are not interchangeable
Reflective

 Direction of causality is from construct to measure


 Measures expected to be correlated
 Indicators are interchangeable
An example of formative versus reflective constructs is given in the figure below.
Common EFA Problems
1. EFA that results in too many or too few factors (contrary to expected number of factors).

 This happens all the time when you extract based on eigenvalues. I encourage students to
use eigenvalues first, but then also to try constraining to the exact number of expected
factors. Concerns arise when the eigenvalues extract fewer than expected, so constraining
ends up extracting factors with very low eigenvalues (and therefore not very useful factors).
2. EFA with low communalities for some items.

 This is a sign of low correlation and is usually corroborated by a low pattern matrix loading. I
tell students not to remove an item just because of a low communality, but to watch it
carefully throughout the rest of the EFA.
3. EFA with a 2nd order construct involved, as well as several first order constructs.

 Often when there is a 2nd order factor in an EFA, the subdimensions of that factor will all load
together, instead of in separate factors. In such cases, I recommend doing a separate EFA
for the items of that 2nd order factor. Then, if that EFA results in removing some items to
achieve discriminant validity, you can try putting the EFA back together with the remaining
items (although it still might not work). Then, during the CFA, be sure to properly model the
2nd order factor with an additional latent variable connected to its sub-factors.
4. EFA with Heywood cases

 Sometimes loadings are greater than 1.00. I don’t address these until I’ve addressed all other
problems. Once I have a good EFA solution, then if the Heywood case is still there (usually it
resolves itself), then I try a different rotation method (Varimax will fix it every time).

Some Thoughts on Messy EFAs


Let us say that you are doing an EFA and your pattern matrix ends up a mess. Let’s say that the
items from one or two constructs do not load as expected no matter how you manipulate the EFA.
What can you do about it? There is no right answer (this is statistics after all), but you do have a
few options:
1. You can remove those constructs from the model and move forward without them.

 This option is not recommended as it is usually the last course of action to take. You should
always do everything in your power to retain constructs that are key to your theory.
2. You can run the EFA using a more exploratory approach without regard to expected loadings.
For example, if you expected item foo3 to load with items foo1 and foo2, but instead it loaded with
items moo1-3, then you should just let it. Then rename your factors according to what loaded on
them.

 This option is acceptable, but will lead you to produce a model that is probably somewhat
different from the one you had expected to end up with.
3. You can say to yourself, “Why am I doing an EFA? These are established scales and I already
know which items belong to which constructs (theoretically). I do not need to explore the
relationships between the items because I already know the relationships. So shouldn’t I be doing
a CFA instead – to confirm these expectations?” And then you would simply jump to the CFA first
to refine your measurement model (but then you return to your EFA after your CFA).

 Surveys are usually built with a priori constructs and theory in mind – or surveys are built
from existing scales that have been validated in previous literature. Thus, we are less inclined
to “explore” and more inclined to “confirm” when doing factor analysis. The point of a factor
analysis is to show that you have distinct constructs (discriminant validity) that each
measures a single thing (convergent validity), and that are reliable (reliability). This can all be
achieved in the CFA. However, you should then go back to the EFA and "confirm" the
CFA in the EFA by setting up the EFA as your CFA turned out.
Why do I bring this up? Mainly because your EFAs are nearly always going to run messy, and
because you can endlessly mess around with an EFA and if you believe everything your EFA is
telling you, you will end up throwing away items and constructs unnecessarily and thus you will
end up letting statistics drive your theory, instead of letting theory drive your theory. EFAs are
exploratory and they can be treated as such. We want to retain as much as possible and still be
producing valid results. I don’t know if this is emphasized enough in our quant courses. I also
bring this up because I ran an EFA recently and got something I could not salvage without
hacking a couple constructs. However, after running the CFA with the full model (ignoring the
EFA), I was able to retain all constructs by only removing a few items (and not the ones I
expected based on the EFA!). I now have excellent reliability, convergent validity, and only a
minor issue with discriminant validity that I’m willing to justify for the greater good of the model. I
can now go back and reconcile my CFA with an EFA.
For a very rocky but successful demonstration of handling a troublesome EFA, watch my SEM
Boot Camp 2014 Day 3 Afternoon Video towards the end. The link below will start you at the right
time position. In this video, I take one of the seminar participant's data, which I had never seen
before, and with which he had been unable to arrive at a clean EFA, and I struggle through it until
we arrive at something valid and usable.

 VIDEO TUTORIAL: [http://youtu.be/XYHrmDs68Bg?t=1h22m55s Tackling a Difficult


EFA'
Confirmatory Factor Analysis
Confirmatory Factor Analysis (CFA) is the next step after exploratory factor analysis to determine
the factor structure of your dataset. In the EFA we explore the factor structure (how the variables
relate and group based on inter-variable correlations); in the CFA we confirm the factor structure
we extracted in the EFA.

 LESSON: Confirmatory Factor Analysis


 VIDEO TUTORIAL: CFA part 1
 VIDEO TUTORIAL: CFA part 2

Do you know of some citations that could be used to support the topics and procedures
discussed in this section? Please email them to me with the name of the section, procedure, or
subsection that they support. Thanks!

Contents
[hide]

 1 Model Fit
o 1.1 Metrics
o 1.2 Modification indices
o 1.3 Standardized Residual Covariances
 2 Validity and Reliability
 3 Common Method Bias (CMB)
o 3.1 Harman’s single factor test
o 3.2 Common Latent Factor
o 3.3 Marker Variable
o 3.4 Zero and Equal Constraints
 4 Measurement Model Invariance
o 4.1 Configural
o 4.2 Metric
o 4.3 Contingency Plans
 5 2nd Order Factors
 6 Common CFA Problems

Model Fit

 VIDEO TUTORIAL: Handling Model Fit


Model fit refers to how well our proposed model (in this case, the model of the factor structure)
accounts for the correlations between variables in the dataset. If we are accounting for all the
major correlations inherent in the dataset (with regards to the variables in our model), then we will
have good fit; if not, then there is a significant "discrepancy" between the correlations proposed
and the correlations observed, and thus we have poor model fit. Our proposed model does not
"fit" the observed or "estimated" model (i.e., the correlations in the dataset). Refer to the CFA
video tutorial for specifics on how to go about performing a model fit analysis during the CFA.
Metrics
There are specific measures that can be calculated to determine goodness of fit. The metrics that
ought to be reported are listed below, along with their acceptable thresholds. Goodness of fit is
inversely related to sample size and the number of variables in the model. Thus, the thresholds
below are simply a guideline. For more contextualized thresholds, see Table 12-4 in Hair et al.
2010 on page 654. The thresholds listed in the table below are from Hu and Bentler (1999).

Modification indices
Modification indices offer suggested remedies to discrepancies between the proposed and
estimated model. In a CFA, there is not much we can do by way of adding regression lines to fix
model fit, as all regression lines between latent and observed variables are already in place.
Therefore, in a CFA, we look to the modification indices for the covariances. Generally, we should
not covary error terms with observed or latent variables, or with other error terms that are not part
of the same factor. Thus, the most appropriate modification available to us is to covary error
terms that are part of the same factor. The figure below illustrates this guideline - however, there
are exceptions. In general, you want to address the largest modification indices before
addressing more minor ones. For more information on when it is okay to covary error terms
(because there are other appropriate reasons), refer to David Kenny's thoughts on the
matter: David's website
Standardized Residual Covariances
Standardized Residual Covariances (SRCs) are much like modification indices; they point out
where the discrepancies are between the proposed and estimated models. However, they also
indicate whether or not those discrepancies are significant. A significant standardized residual
covariance is one with an absolute value greater than 2.58. Significant residual covariances
significantly decrease your model fit. Fixing model fit per the residuals matrix is similar to fixing
model fit per the modification indices. The same rules apply. For a more specific run-down of how
to calculate and locate residuals, refer to the CFA video tutorial. It should be noted however, that
in practice, I never address SRCs unless I cannot achieve adequate fit via modification indices,
because addressing the SRCs requires the removal of items.

Validity and Reliability

 VIDEO TUTORIAL: Testing Validity and Reliability in a CFA


It is absolutely necessary to establish convergent and discriminant validity, as well as reliability,
when doing a CFA. If your factors do not demonstrate adequate validity and reliability, moving on
to test a causal model will be useless - garbage in, garbage out! There are a few measures that
are useful for establishing validity and reliability: Composite Reliability (CR), Average Variance
Extracted (AVE), Maximum Shared Variance (MSV), and Average Shared Variance (ASV). The
video tutorial will show you how to calculate these values. The thresholds for these values are as
follows:
Reliability

 CR > 0.7
Convergent Validity

 AVE > 0.5


Discriminant Validity

 MSV < AVE


 Square root of AVE greater than inter-construct correlations
If you have convergent validity issues, then your variables do not correlate well with each other
within their parent factor; i.e, the latent factor is not well explained by its observed variables. If
you have discriminant validity issues, then your variables correlate more highly with variables
outside their parent factor than with the variables within their parent factor; i.e., the latent factor is
better explained by some other variables (from a different factor), than by its own observed
variables.
If you need to cite these suggested thresholds, please use the following:

 Hair, J., Black, W., Babin, B., and Anderson, R. (2010). Multivariate data analysis (7th ed.):
Prentice-Hall, Inc. Upper Saddle River, NJ, USA.
AVE is a strict measure of convergent validity. Malhotra and Dash (2011) note that "AVE is a
more conservative measure than CR. On the basis of CR alone, the researcher may conclude
that the convergent validity of the construct is adequate, even though more than 50% of the
variance is due to error.” (Malhotra and Dash, 2011, p.702).

 Malhotra N. K., Dash S. (2011). Marketing Research an Applied Orientation. London:


Pearson Publishing.
Here is an updated video that uses the most recent Stats Tools Package, which includes a more
accurate measure of AVE and CR.

 VIDEO TUTORIAL: SEM Series (2016) 5. Confirmatory Factor Analysis Part 2

Common Method Bias (CMB)

 VIDEO TUTORIAL: Zero-constraint approach to CMB


 REF: Podsakoff, P.M., MacKenzie, S.B., Lee, J.Y., and Podsakoff, N.P. "Common method
biases in behavioral research: a critical review of the literature and recommended
remedies," Journal of Applied Psychology (88:5) 2003, p 879.
Common method bias refers to a bias in your dataset due to something external to the measures.
Something external to the question may have influenced the response given. For example,
collecting data using a single (common) method, such as an online survey, may introduce
systematic response bias that will either inflate or deflate responses. A study that has significant
common method bias is one in which a majority of the variance can be explained by a single
factor. To test for a common method bias you can do a few different tests. Each will be described
below. For a step by step guide, refer to the video tutorials.
Harman’s single factor test
 It should be noted that the Harman's single factor test is no longer widely accepted and is
considered an outdated and inferior approach.
A Harman's single factor test tests to see if the majority of the variance can be explained by a
single factor. To do this, constrain the number of factors extracted in your EFA to be just one
(rather than extracting via eigenvalues). Then examine the unrotated solution. If CMB is an issue,
a single factor will account for the majority of the variance in the model (as in the figure below).

Common Latent Factor


This method uses a common latent factor (CLF) to capture the common variance among all
observed variables in the model. To do this, simply add a latent factor to your AMOS CFA model
(as in the figure below), and then connect it to all observed items in the model. Then compare the
standardised regression weights from this model to the standardized regression weights of a
model without the CLF. If there are large differences (like greater than 0.200) then you will want
to retain the CLF as you either impute composites from factor scores, or as you move in to the
structural model. The CLF video tutorial demonstrates how to do this.

Marker Variable
This method is simply an extended, and more accurate way to do the common latent factor
method. For this method, just add another latent factor to the model (as in the figure below), but
make sure it is something that you would not expect to correlate with the other latent factors in
the model (i.e., the observed variables for this new factor should have low, or no, correlation with
the observed variables from the other factors). Then add the common latent factor. This method
teases out truer common variance than the basic common latent factor method because it is
finding the common variance between unrelated latent factors. Thus, any common variance is
likely due to a common method bias, rather than natural correlations. This method is
demonstrated in the common method bias video tutorial.
Zero and Equal Constraints
The most current and best approach is outlined below.

1. Do an EFA, and make sure to include “marker” or specific bias (SB) constructs.
1. Specific bias constructs are just like any other multi-item constructs but measure
specific sources of bias that may account for shared variance not due to a causal
relationship between key variables in the study. A common one is Social
Desirability Bias.
2. Do the CFA with SB constructs covaried to other constructs (this looks like a normal
CFA).
1. Assess and adjust to achieve adequate goodness of fit
2. Assess and adjust to achieve adequate validity and reliability
3. Then conduct the CFA with the SB constructs shown to influence ALL indicators of other
constructs in the study. Do not correlate the SB constructs with the other constructs of
study. If there is more than one SB construct, they follow the same approach and can
correlate with each other.
1. Retest validity, but be willing to accept lower thresholds
2. If change in AVE is extreme (e.g., >.300) then there is too much shared variance
attributable to a response variable. This means that variable is compromised and
any subsequent analysis with it may be biased.
3. If the majority of factors have extreme changes to their AVE, you might consider
rethinking your data collection instrument and how to reduce specific response
biases.
4. If the validities are still sufficient, then conduct the zero-constrained test. This test
determines whether the response bias is any different from zero.
1. To do this, constrain all paths from the SB constructs to all indicators (but do not
constrain their own) to zero. Then conduct a chi-square difference test between
the constrained and unconstrained models.
2. If the null hypothesis cannot be rejected (i.e., the constrained and unconstrained
models are the same or "invariant"), you have demonstrated that you were
unable to detect any specific response bias affecting your model. You can move
on to causal modeling, but make sure to retain the SB construct(s) to include as
control in the causal model.
3. If you changed your model while testing for specific bias, you should retest
validities and model fit with this final (unconstrained) measurement model, as it
may have changed.
5. If the zero-constrained chi-square difference test resulted in a significant result (i.e., reject
null, i.e., response bias is not zero), then you should run an equal-constrained test. This
test determines whether the response bias is evenly distributed across factors.
1. To do this, constrain all paths from the SB construct to all indicators (not including
their own) to be equal. There are multiple ways to do this. One easy way is
simply to name them all the same thing (e.g., "aaa").
2. If the chi-square difference test between the constrained (to be equal) and
unconstrained models indicates invariance (i.e., fail to reject null - that they are
equal), then the bias is equally distributed. Make note of this in your report. e.g.,
"A test of equal specific bias demonstrated evenly distributed bias."
3. Move on to causal modeling with the SB constructs retained (keep them).
4. If the chi-square test is significant (i.e., unevenly distributed bias), which is more
common, you should still retain the SB construct for subsequent causal
analyses. Make note of this in your report. e.g., "A test of equal specific bias
demonstrated unevenly distributed bias."

Measurement Model Invariance

 VIDEO TUTORIAL: Measurement Model Invariance


Before creating composite variables for a path analysis, configural and metric invariance should
be tested during the CFA to validate that the factor structure and loadings are sufficiently
equivalent across groups, otherwise your composite variables will not be very useful (because
they are not actually measuring the same underlying latent construct for both groups).
Configural
Configural invariance tests whether the factor structure represented in your CFA achieves
adequate fit when both groups are tested together and freely (i.e., without any cross-group path
constraints). To do this, simply build your measurement model as usual, create two groups in
AMOS (e.g., male and female), and then split the data along gender. Next, attend to model fit as
usual (here’s a reminder: Model Fit). If the resultant model achieves good fit, then you have
configural invariance. If you don’t pass the configural invariance test, then you may need to look
at the modification indices to improve your model fit or to see how to restructure your CFA.
Metric
If we pass the test of configural invariance, then we need to test for metric invariance. To test for
metric invariance, simply perform a chi-square difference test on the two groups just as you would
for a structural model. The evaluation is the same as in the structural model invariance test: if you
have a significant p-value for the chi-square difference test, then you have evidence of
differences between groups, otherwise, they are invariant and you may proceed to make your
composites from this measurement model (but make sure you use the whole dataset when you
create composites, instead of using the split dataset).
An even simpler and less time-consuming approach to metric invariance is to conduct a
multigroup moderation test using critical ratios for differences in AMOS. Below is a video to
explain how to do this. The video is about a lot of things in the CFA, but the link below will start
you at the time point for testing metric invariance with critical ratios.

 VIDEO TUTORIAL: Metric Invariance


Contingency Plans
If you do not achieve invariant models, here are some appropriate approaches in the order I
would attempt them.

 1. Modification indices: Fit the model for each group using the unconstrained measurement
model. You can toggle between groups when looking at modification indices. So, for
example, for males, there might be a high MI for the covariance between e1 and e2, but for
females this might not be the case. Go ahead and add those covariances appropriately for
both groups. When adding them to the model, it does it for both groups, even if you only
needed to do it for one of them. If fitting the model this way does not solve your invariance
issues, then you will need to look at differences in regression weights.
 2. Regression weights: You need to figure out which item or items are causing the trouble
(i.e., which ones do not measure the same across groups). The cause of the lack of
invariance is most likely due to one of two things: the strength of the loading for one or more
items differs significantly across groups, or, an item or two load better on a factor other than
their own for one or more groups. To address the first issue, just look at the standardized
regression weights for each group to see if there are any major differences (just eyeball it). If
you find a regression weight that is exceptionally different (for example, item2 on Factor 3
has a loading of 0.34 for males and 0.88 for females), then you may need to remove that item
if possible. Retest and see if invariance issues are solved. If not, try addressing the second
issue (explained next).
 3. Standardized Residual Covariances: To address the second issue, you need to analyze
the standardized residual covariances (check the residual moments box in the output tab). I
talk about this a little bit in my video called “Model fit during a Confirmatory Factor Analysis
(CFA) in AMOS” around the 8:35 mark. This matrix can also be toggled between groups.
Here is a small example for CSRs and BCRs. We observe that for the BCR group rd3 and q5
have high standardized residual covariances with sw1. So, we could remove sw1 and see if
that fixes things, but SW only has three items right now, so another option is to remove rd3 or
q5 and see if that fixes things, and if not, then return to this matrix after rerunning things, and
see if there are any other issues. Remove items sparingly, and only one at a time, trying your
best to leave at least three items with each factor, although two items will also sometimes
work if necessary (two just becomes unstable). If you still have issues, then your groups are
exceptionally different… This may be due to small sample size for one of the groups. If such
is the case, then you may have to list that as a limitation and just move on.

2nd Order Factors

 VIDEO TUTORIAL: Handling 2nd Order Factors


Handling 2nd order factors in AMOS is not difficult, but it is tricky. And, if you don't get it right, it
won't run. The pictures below offer a simple example of how you would model a 2nd order factor
in a measurement model and in a structural model. The YouTube video tutorial above
demonstrates how to handle 2nd order factors, and explains how to report them.
Common CFA Problems
1. CFA that reaches iteration limit.

 Here is a video: * VIDEO TUTORIAL: Iteration limit reached in AMOS


2. CFA that shows CMB = 0 (sometimes happens when paths from CLF are constrained to be
equal)

 The best approach to CMB is just to not constrain them to be equal. Instead, it is best to do a
chi-square difference test between the unconstrained model (with CLF and marker if
available) and the same model but with all paths from the CLF constrained to zero. This tells
us whether the common variance is different from zero.
3. CFA with negative error variances

 This shouldn’t happen if all data screening and EFA worked out well, but it still happens… In
such cases, it is permitted to constrain the error variance to a small positive number (e.g.,
0.001)
4. CFA with negative error covariances (sometimes shows up as “not positive definite”)
 In such cases, there is usually a measurement issue deeper down (like skewness or kurtosis,
or too much missing data, or a variable that is nominal). If it cannot be fixed by addressing
these deeper down issues, then you might be able to correct it by moving the latent variable
path constraint (usually 1) to another path. Usually this issue accompanies the negative error
variance, so we can usually fix it by fixing the negative error variance first.
5. CFA with Heywood cases

 This often happens when we have only two items for a latent variable, and one of them is
very dominant. First try moving the latent variable path constraint to a different path. If this
doesn’t work then, move the path constraint up to the latent variable variance constraint AND
constrain the paths to be equal (by naming them both the same thing, like “aaa”).

 VIDEO TUTORIAL: AMOS Heywood Case


6. CFA with discriminant validity issues

 This shouldn’t happen if the EFA solution was satisfactory. However, it still happens
sometimes when two latent factors are strongly correlated. This strong correlation is a sign of
overlapping traits. For example, confidence and self-efficacy. These two traits are too similar.
Either one could be dropped, or you could create a 2nd order factor out of them:

 VIDEO TUTORIAL: Handling 2nd Order Factors


7. CFA with “missing constraint” error

 Sometimes the CFA will say you need to impose 1 additional constraint (sometimes it says
more than this). This is usually caused by drawing the model incorrectly. Check to see if all
latent variables have a single path constrained to 1 (or the latent variable variance
constrained to 1).
Structural Equation Modeling
“Structural equation modeling (SEM) grows out of and serves purposes similar to multiple
regression, but in a more powerful way which takes into account the modeling of interactions,
nonlinearities, correlated independents, measurement error, correlated error terms, multiple latent
independents each measured by multiple indicators, and one or more latent dependents also
each with multiple indicators. SEM may be used as a more powerful alternative to multiple
regression, path analysis, factor analysis, time series analysis, and analysis of covariance. That
is, these procedures may be seen as special cases of SEM, or, to put it another way, SEM is an
extension of the general linear model (GLM) of which multiple regression is a
part.“ http://www.pire.org/
SEM is an umbrella concept for analyses such as mediation and moderation. This wiki page
provides general instruction and guidance regarding how to write hypotheses for different types of
SEMs, what to do with control variables, mediation, interaction, multi-group analyses, and model
fit for structural models. Videos and slides presentations are provided in the subsections.

Do you know of some citations that could be used to support the topics and procedures
discussed in this section? Please email them to me with the name of the section, procedure, or
subsection that they support. Thanks!

Contents
[hide]

 1 Hypotheses
o 1.1 Direct effects
o 1.2 Mediated effects
o 1.3 Interaction effects
o 1.4 Multi-group effects
o 1.5 Mediated Moderation
o 1.6 Handling controls
o 1.7 Logical Support for Hypotheses
o 1.8 Statistical Support for Hypotheses through global and local tests
 2 Controls
 3 Mediation
o 3.1 Concept
 4 Interaction
o 4.1 Concept
o 4.2 Types
 5 Model fit again
 6 Multi-group
 7 From Measurement Model to Structural Model
 8 Creating Factor Scores from Latent Factors
 9 Need more degrees of freedom

Hypotheses
Hypotheses are a keystone to causal theory. However, wording hypotheses is clearly a struggle
for many researchers (just select at random any article from a good academic journal, and count
the wording issues!). In this section I offer examples of how you might word different types of
hypotheses. These examples are not exhaustive, but they are safe.
Direct effects
"Diet has a positive effect on weight loss"
"An increase in hours spent watching television will negatively effect weight loss"
Mediated effects
For mediated effects, be sure to indicate the direction of the mediation (positive or negative), the
degree of the mediation (partial, full, or simply indirect), and the direction of the mediated
relationship (positive or negative).
"Exercise positively and partially mediates the positive relationship between diet and weight loss"
"Television time positively and fully mediates the positive relationship between diet and weight
loss"
"Diet affects weight loss positively and indirectly through exercise"
Interaction effects
"Exercise positively moderates the positive relationship between diet and weight loss"
"Exercise amplifies the positive relationship between diet and weight loss"
"TV time negatively moderates (dampens) the positive relationship between diet and weight loss"
Multi-group effects
"Body Mass Index (BMI) moderates the relationship between exercise and weight loss, such that
for those with a low BMI, the effect is negative (i.e., you gain weight - muscle mass), and for
those with a high BMI, the effect is positive (i.e., exercising leads to weight loss)"
"Age moderates the relationship between exercise and weight loss, such that for age < 40, the
positive effect is stronger than for age > 40"
"Diet moderates the relationship between exercise and weight loss, such that for western diets
the effect is positive and weak, for eastern (asia) diets, the effect is positive and strong"
Mediated Moderation
An example of a mediated moderation hypothesis would be something like:
“Ethical concerns strengthen the negative indirect effect (through burnout) between customer
rejection and job satisfaction.”
In this case, the IV is customer rejection, the DV is job satisfaction, burnout is the mediator, and
the moderator is ethical concerns. The moderation is conducted through an interaction. However,
if you have a categorical moderator, it would be something more like this (using gender as the
moderator):
“The negative indirect effect between customer rejection and job satisfaction (through burnout) is
stronger for men than for women.”
Handling controls
When including controls in hypotheses (yes, you should include them), simply add at the end of
any hypothesis, "when controlling for...[list control variables here]" For example:
"Exercise positively moderates the positive relationship between diet and weight loss when
controlling for TV time and diet"
"Diet has a positive effect on weight loss when controlling for TV time and diet"
Another approach is to state somewhere above your hypotheses (while you're setting up your
theory) that all your hypotheses take into account the effects of the following controls: A, B, and
C. And then make sure to explain why.
Logical Support for Hypotheses
Getting the wording right is only part of the battle, and is mostly useless if you cannot support
your reasoning for WHY you think the relationships proposed in the hypotheses should exist.
Simply saying X has a positive effect on Y is not sufficient to make a causal statement. You must
then go an explain the various reasons behind your hypothesized relationship. Take Diet and
Weight loss for example. The hypothesis is, "Diet has a positive effect on weight loss". The
supporting logic would then be something like:

 Weight is gained as we consume calories. Diet reduces the number of calories consumed.
Therefore, the more we diet, the more weight we should lose (or the less weight we should
gain).
Statistical Support for Hypotheses through global and local tests
In order for a hypothesis to be supported, many criteria must be met. These criteria can be
classified as global or local tests. In order for a hypothesis to be supported, the local test must be
met, but in order for a local test to have meaning, all global tests must be met. Global tests of
model fit are the first necessity. If a hypothesized relationship has a significant p-value, but the
model has poor fit, we cannot have confidence in that p-value. Next is the global test of variance
explained or R-squared. We might observe significant p-values and good model fit, but if R-
square is only 0.025, then the relationships we are testing are not very meaningful because they
do not explain sufficient variance in the dependent variable. The figure below illustrates the
precedence of global and local tests. Lastly, and almost needless to explain, if a regression
weight is significant, but is in the wrong direction, our hypothesis is not supported. Instead, there
is counter-evidence. For example, if we theorized that exercise would increase weight loss, but
instead, exercise decreased weight loss, then we would have counter-evidence.

Controls
 LESSON: Controls
Controls are potentially confounding variables that we need to account for, but that don’t drive our
theory. For example, in Dietz and Gortmaker 1985, their theory was that TV time had a negative
effect on school performance. But there are many things that could affect school performance,
possibly even more than the amount of time spent in front of the TV. So, in order to account for
these other potentially confounding variables, the authors control for them. They are basically
saying, that regardless of IQ, time spent reading for pleasure, hours spent doing homework, or
the amount of time parents spend reading to their child, an increase in TV time still significantly
decreases school performance. These relationships are shown in the figure below.

As a cautionary note, you should nearly always include some controls; however, these control
variables still count against your sample size calculations. So, the more controls you have, the
higher your sample size needs to be. Also you get a higher R square but with increasingly smaller
gains for each added control. Sometimes you may even find that adding a control “drowns out” all
the effects of the IV’s, in such a case you may need to run your tests without that control variable
(but then you can only say that your IVs, though significant, only account for a small amount of
the variance in the DV). With that in mind, you can’t and shouldn't control for everything, and as
always, your decision to include or exclude controls should be based on theory.
Handling controls in AMOS is easy, but messy (see the figure below). You simply treat them like
the other exogenous variables (the ones that don’t have arrows going into them), and have them
regress on whichever endogenous variables they may logically affect. In this case, I have
valShort, a potentially confounding variable, as a control, with regards to valLong. And I have
LoyRepeat as a control on LoyLong. I’ve also covaried the Controls with each other and with the
other exogenous variables. When using controls in a moderated mediation analysis, go ahead
and put the controls in at the very beginning. Covarying control variables with the other
exogenous variables can be done based on theory, rather than as default. However, there are
different schools of thought on this. The downside of covarying with all exogenous variables is
that you gain no degrees of freedom. If you are in need of degrees of freedom, then try removing
the non-significant covariances with controls.
When reporting the model, you do need to include the controls in all your tests and output, but
you should consolidate them at the bottom where they can be out of the way. Also, just so you
don’t get any crazy ideas, you would not test for any mediation between a control and a
dependent variable. However, you may report how the control effects a dependent variable
differently based on a moderating variable. For example, valshort may have a stronger effect on
valLong for males than for females. This is something that should be reported, but not necessarily
focused on, as it is not likely a key part of your theory. Lastly, even if effects from controls are not
significant, you do not need trim them from your model (although, there are also other schools of
thought on this issue).

Mediation

 Lesson: Testing Mediation using Bootstrapping


 Video Lecture: A Simpler Guide to Mediation
 VIDEO TUTORIAL: Mediation in AMOS
 Hair et al.: pp. 751-755
Concept
Mediation models are used to describe chains of causation. Mediation is often used to provide a
more accurate explanation for the causal effect the antecedent has on the dependent variable.
The mediator is usually that variable that is the missing link in a chain of causation. For example,
Intelligence leads to increased performance - but not in all cases, as not all intelligent people are
high performers. Thus, some other variable is needed to explain the reason for the inconsistent
relationship between IV and DV. This other variable is called a mediator. In this example, work
effectiveness, may be a good mediator. We would say that work effectiveness mediates the
relationship between intelligence and performance. Thus, the direct relationship between
intelligence and performance is better explained through the mediator of work effectiveness. The
logic is, intelligent workers tend to perform better because they work more efficiently. Thus, when
intelligence leads to working smarter, then we observe greater performance.
We used to theorize three main types of mediation based on the Barron and Kenny approach;
namely: 1) partial, 2) full, and 3) indirect. However, recent literature suggests that mediation is
less nuanced than this -- that simply, if a significant indirect effect exists, then mediation is
present.
Here is another useful site for mediation: https://msu.edu/~falkcarl/mediation.html

Interaction

 VIDEO TUTORIAL: Testing Interaction Effects


 LESSON: Interaction Effects
Concept
In factorial designs, interaction effects are the joint effects of two predictor variables in addition to
the individual main effects. This is another form of moderation (along with multi-grouping) – i.e.,
the X to Y relationship changes form (gets stronger, weaker, changes signs) depending on the
value of another explanatory variable (the moderator). So, for example

 you lose 1 pound of weight for every hour you exercise


 you lose 1 pound of weight for every 500 calories you cut back from your regular diet
 but when you exercise while dieting, the you lose 2 pounds for every 500 calories you cut
back from your regular diet, in addition to the 1 pound you lose for exercising for one hour;
thus in total, you lose three pounds
So, the multiplicative effect of exercising while dieting is greater than the additive effects of doing
one or the other. Here is another simple example:

 Chocolate is yummy
 Cheese is yummy
 but combining chocolate and cheese is yucky!
The following figure is an example of a simple interaction model.
Types
Interactions enable more precise explanation of causal effects by providing a method for
explaining not only how X affects Y, but also under what circumstances the effect of X changes
depending on the moderating variable of Z. Interpreting interactions is somewhat tricky.
Interactions should be plotted (as demonstrated in the tutorial video). Once plotted, the
interpretation can be made using the following four examples (in the figures below) as a guide.
My most recent Stats Tools Package provides these interpretations automatically.

Model fit again


You already did model fit in your CFA, but you need to do it again in your structural model in
order to demonstrate sufficient exploration of alternative models. Every time the model changes
and a hypothesis is tested, model fit must be assessed. If multiple hypotheses are tested on the
same model, model fit will not change, so it only needs to be addressed once for that set of
hypotheses. The method for assessing model fit in a causal model is the same as for a
measurement model: look at modification indices, residuals, and standard fit measures like CFI,
RMSEA etc. The one thing that should be noted here in particular, however, is logic that should
determine how you apply the modification indices to error terms.

 If the correlated variables are not logically causally correlated, but merely statistically
correlated, then you may covary the error terms in order to account for the systematic
statistical correlations without implying a causal relationship.
 e.g., burnout from customers is highly correlated with burnout from management
 We expect these to have similar values (residuals) because they are logically similar and
have similar wording in our survey, but they do not necessarily have any causal ties.
 If the correlated variables are logically causally correlated, then simply add a regression line.
 e.g., burnout from customers is highly correlated with satisfaction with customers
 We expect burnC to predict satC, so not accounting for it is negligent.
Lastly, remember, you don't need to create the BEST fit, just good fit. If a BEST fit model (i.e.,
one in which all modification indices are addressed) isn't logical, or does not fit with your theory,
you may need to simply settle for a model that has worse (yet sufficient) fit, and then explain why
you did not choose the better fitting model. For more information on when it is okay to covary
error terms (because there are other appropriate reasons), refer to David Kenny's thoughts on the
matter: David's website

Multi-group

 VIDEO TUTORIAL: Testing Multi-group Moderation using Chi-square difference


test
 VIDEO TUTORIAL: Testing Multi-group differences using AMOS's multigroup
function
 LESSON: Mediation versus Moderation
Multi-group comparisons are a special form of moderation in which a dataset is split along values
of a grouping variable (such as gender), and then a given model is tested with each set of data.
Using the gender example, the model is tested for males and females separately. The use of
multi-group comparisons is to determine if relationships hypothesized in a model will differ based
on the value of the moderator (e.g., gender). Take the diet and weight loss hypothesis for
example. A multi-group analysis would answer the question: does dieting effect weight loss
differently for males than for females? In the videos above, you will learn how to set up a
multigroup analysis in AMOS, and test it using chi-square differences, and AMOS's built in
multigroup function. For those who have seen my video on the critical ratios approach, be warned
that currently, the chi-square approach is the most widely accepted because the critical ratios
approach doesn't take into account family-wise error which affects a model when testing multiple
hypotheses simultaneously. For now, I recommend using the chi-square approach. The AMOS
built in multigroup function uses the chi-square approach as well.

From Measurement Model to Structural Model

 VIDEO TUTORIAL: From CFA to SEM in AMOS


Many of the examples in the videos so far have taught concepts using a set of composite
variables (instead of latent factors with observed items). Many will want to utilize the full power of
SEM by building true structural models (with latent factors). This is not a difficult thing. Simply
remove the covariance arrows from your measurement model (after CFA), then draw single-
headed arrows from IVs to DVs. Make sure you put error terms on the DVs, then run it. It's that
easy. Refer to the video for a demonstration.

Creating Factor Scores from Latent Factors

 VIDEO TUTORIAL: Imputing Factor Scores in AMOS


If you would like to create factor scores (as used in many of the videos) from latent factors, it is an
easy thing to do. However, you must remember two very important caveats:

 You are not allowed to have any missing values in the data used. These will need to be
imputed beforehand in SPSS or Excel (I have two tools for this in my Stats Tools Package -
one for imputing, and one for simply removing the entire row that has missing data).
 Latent factor names must not have any spaces or hard returns in them. They must be single
continuous strings ("FactorOne" or "Factor_One" instead of "Factor One").
After those two caveats are addressed, then you can simply go to the Analyze menu, and
select Data Imputation. Select Regression Imputation, and then click on the Impute button. This
will create a new SPSS dataset with the same name as the current dataset except it will be
followed by an "_C". This can be found in the same folder as your current dataset.

Need more degrees of freedom


Did you run your model and observe that DF=0 or CFI=1.000. Sounds like you need more
degrees of freedom. There are a few ways to do this:

1. If there are opportunities to use latent variables instead of computed variables, use
latents.
2. If you have control variables, do not link them to every other variable.
3. Do not include all paths by default. Just include the ones that make good theoretical
sense.
4. If a path is not significant, omit it. If you do this, make sure to argue that the reason for
doing this was to increase degrees of freedom (and also because the path was not
significant).
Increasing the degrees of freedom allows AMOS to calculate model fit measures. If you have
zero degrees of freedom, model fit is irrelevant because you are "perfectly" accounting for all
possible relationships in the model.
Guidelines
On this wiki page I share my thoughts on various academic topics, including my 10 Steps to
building a good quantitative variance model that can be addressed using a well-
designed survey, as well as some general guidelines for structuring a quantitative model
building/testing paper. These are just off the top of my head and do not come from any sort of
published work. However, I have found them useful and hope you do as well.

Contents
[hide]

 1 Example Analysis - needs to be updated...


 2 How to start any (EVERY) Research Project
 3 Developing Your Quantitative Model
o 3.1 Ten Steps for Formulating a Decent Quantitative Model
 4 From Model Development to Model Testing
o 4.1 Critical tasks that happen between model development and model testing
 5 Guidelines on Survey Design
 6 Order of Operations for Testing your Model
o 6.1 Some general guidelines for the order to conduct each procedure
 7 Structuring a Quantitative Paper
o 7.1 Standard outline for quantitative model building/testing paper
 8 My Thoughts on Conference Presentations
o 8.1 What to include in a conference presentation
o 8.2 What to avoid in a conference presentation

Example Analysis - needs to be updated...


I've created an example of some quantitative analyses. The most useful part of this example is
probably the wording. It is often difficult to figure out how to word your findings, or to figure out
how much space to use on findings, or which measures to report and how to report them. This
offers just one example of how you might do it.

 Click here to access the example analysis.

How to start any (EVERY) Research Project


In a page or less, using only bullet points, answer these questions (or fill out this outline). Then
share it with a trusted advisor (not me unless I am actually your advisor) to get early feedback.
This way you don't waste your time on a bad or half-baked idea. You might also consider
reviewing the editorial by Arun Rai at MISQ called: "Avoiding Type III Errors: Formulating
Research Problems that Matter." This is written for the information systems field, but is
generalizable to all fields.
1. What is the problem you are seeking to address? (If there is no problem, then there is
usually no research required.)
2. Why is this an important (not just interesting) contemporary or upcoming problem? (i.e.,
old problems don't need to be readdressed if they are not still a problem)
3. Who else has addressed this problem? (Very rarely is the answer to this: "nobody". Be
creative. Someone has studied something related to this problem, even if it isn't the exact
same problem. This requires a lit review.)
4. In what way are the prior efforts of others incomplete? (i.e., if others have already
addressed the problem, what is left to study - what are the "gaps"?)
5. How will you go about filling these gaps in prior research? (i.e., study design)
1. Why is this an appropriate approach?
6. (If applicable) Who is your target population for studying this problem? (Where are you
going to get your data?)
1. How are you going to get the data you want? (quantity and quality)

Developing Your Quantitative Model


Ten Steps for Formulating a Decent Quantitative Model

1. Identify and define your dependent variables. These should be the outcome(s) of the
phenomenon you are interested in better understanding. They should be the effected
thing(s) in your research questions.
2. Figure out why explaining and predicting these DVs is important.
1. Why should we care?
2. For whom will it make a difference?
3. What can we possibly contribute to knowledge that is not already known?
4. If these are all answerable and suggest continuing the study, then go to #3,
otherwise, go to #1 and try different DVs.
3. Form one or two research questions around explaining and predicting these DVs.
1. Scoping your research questions may also require you to identify your population.
4. Is there some existing theory that would help explore these research questions?
1. If so, then how can we adopt it for specifically exploring these research
questions?
2. Does that theory also suggest other variables we are not considering?
5. What do you think (and what has research said) impacts the DVs we have chosen?
1. These become IVs.
6. What is it about these IVs that is causing the effect on the DVs?
1. These become Mediators.
7. Do these relationships depend on other factors, such as age, gender, race, religion,
industry, organization size and performance, etc.?
1. These become Moderators
8. What variables could potentially explain and predict the DVs, but are not directly related
to our interests?
1. These become control variables. These are often some of those moderators like
age and gender, or variables in extant literature.
9. Identify your population.
1. Do you have access to this population?
2. Why is this population appropriate to sample in order to answer the research
questions?
10. Based on all of the above, but particularly #4, develop an initial conceptual model
involving the IVs, DVs, Mediators, Moderators, and Controls.
1. If tested, how will this model contribute to research (make us think differently) and
practice (make us act differently)?

From Model Development to Model Testing


Video explanation of this section
Critical tasks that happen between model development and model testing

1. Develop a decent quantitative model


1. see previous section
2. Find existing scales and develop your own if necessary
1. You need to find ways to measure the constructs you want to include in your
model. Usually this is done through reflective latent measures on a Likert scale.
It is conventional and encouraged to leverage existing scales that have already
been either proposed or, better yet, validated in extant literature. If you can’t find
existing scales that match your construct, then you might need to develop your
own. For guidelines on how to design your survey, please see the next
section #Guidelines_on_Survey_Design
2. Find existing scales
1. I’ve made a VIDEO TUTORIAL about finding existing scales. The
easy way is to go to http://inn.theorizeit.org/ and search their database.
You can also search google scholar for scale development of your
construct. Make sure to note the source for the items, as you will need to
report this in your manuscript.
2. Once you’ve found the measures you need, you’ll most likely need to
adapt them to your context. For example, let’s say you’re studying the
construct of Enjoyment in the context of Virtual Reality. If the existing
scale was “I enjoy using the website”, you’ll want to change that to “I
enjoyed the Virtual Reality experience” (or something like that). The key
consideration is to retain the “spirit” or intent of the item and construct. If
you do adapt the measures, be sure to report your adaptations in the
appendix of any paper that uses these adapted measures.
3. Along this idea of adapting, you can also trim the scale as needed. Many
established scales are far too large, consisting of more than 10 items. A
reflective construct never requires more than 4 or 5 items. Simply pick
the 4-5 items that best capture the construct of interest. If the scale is
multidimensional, it is likely formative. In this case, you can either:
1. Keep the entire scale (this can greatly inflate your survey, but it
allows you to use a latent structure)
2. Keep only one dimension (just pick the one that best reflects the
construct you are interested in)
3. Keep one item from each dimension (this allows you to create
an aggregate score; i.e., sum, average, or weighted average)
3. Develop new scales
1. Developing new scales is a bit trickier, but is perhaps less daunting than
many make it out to be. The first thing you must do before developing
your own scales is to precisely define your construct. You cannot
develop new measures for a construct if you do not know precisely what
it is you are hoping to measure.
2. Once you have defined your construct, I strongly recommend developing
reflective scales where applicable. These are far easier to handle
statistically, and are more amenable to conventional SEM approaches.
Formative measures can also be used, but they involve several caveats
and considerations during the data analysis stage.
1. For reflective measures, simply create 5 interchangeable
statements that can be measured on a 5-point Likert scale of
agreement, frequency, or intensity. We develop 5 items so that
we have some flexibility in dropping 1 or 2 during the EFA if
needed. If the measures are truly reflective, using more than 5
items would be unnecessarily redundant. If we were to create a
scale for Enjoyment (defined in our study as the extent to which
a user receives joy from interacting with the VR), we might have
the following items that the user can answer from strongly
disagree to strongly agree:
1. I enjoyed using the VR
2. Interacting with the VR was fun
3. I was happy while using the VR
4. Using the VR was boring (reverse coded)
5. Using the VR was pleasurable
3. If developing your own scales, do pretesting (talk aloud, Q-sort)
1. To ensure the newly developed scales make sense to others and will hopefully
measure the construct you think they should measure, you need to do some
pretesting. Two very common pretesting exercises are ‘talk-aloud’ and ‘Q-sort’.
1. Talk-aloud exercises include sitting down with between five and eight
individuals who are within, or close to, your target population. For
example, if you plan on surveying nurses, then you should do talk-
alouds with nurses. If you are surveying a more difficult to access
population, such as CEOs, you can probably get away with doing talk-
alouds with upper level management instead. The purpose of the talk-
aloud is to see if the newly developed items make sense to others. Invite
the participant (just one participant at a time) to read out loud each item
and respond to it. If they struggle to read it, then it is worded poorly. If
they have to think very long about how to answer, then it needs to be
more direct. If they are unsure how to answer, then it needs to be
clarified. If they say “well, it depends” then it needs to be simplified or
made more contextually specific. You get the idea. After the first talk-
aloud, revise your items accordingly, and then do the second talk-aloud.
Repeat until you stop getting meaningful corrections.
2. Q-sort is an exercise where the participant (ideally from the target
population, but not strictly required) has a card (physical or digital) for
each item in your survey, even existing scales. They then sort these
cards into piles based on what construct they think the item is
measuring. To do this, you’ll need to let them know your constructs and
the construct definitions. This should be done for formative and reflective
constructs, but not for non-latent constructs (e.g., gender, industry,
education). Here is a video I’ve made for Q-sorting: Q-sorting in
Qualtrics. You should have at least 8 people participate in the Q-sort. If
you arrive at consensus (>70% agreement between participants) after
the first Q-sort, then move on. If not, identify the items that did not
achieve adequate consensus, and then try to reword them to be more
conceptually distinct from the construct they miss-loaded on while being
more conceptually similar to the construct they should have loaded on.
Repeat the Q-sort (with different participants) until you arrive at
adequate consensus.
4. Identify target sample and, if necessary, get approval to contact
1. Before you can submit your study for IRB approval, you must identify who you will
be collecting data from. Obtain approval and confirmation from whoever has
stewardship over that population. For example, if you plan to collect data from
employees at your current or former organization, you should obtain approval
from the proper manager over the group you plan to solicit. If you are going to
collect data from students, get approval from their professor(s).
5. Conduct a Pilot Study
1. It is exceptionally helpful to conduct a pilot study if time and target population
permit. A pilot study is a smaller data collection effort (between 30 and 100
participants) used to obtain reliability scores (like Cronbach’s alpha) for your
reflective latent factors, and to confirm the direction of relationships, as well as to
do preliminary manipulation checks (where applicable). Usually the sample size
of a pilot study will not allow you to test the full model (either measurement or
structural) altogether, but it can give you sufficient power to test pieces at a time.
For example, you could do an EFA with 20 items at a time, or you could run
simple linear regressions between an IV and a DV.
2. Often time and target population do not make a pilot study feasible. For example,
you would never want to cannibalize your target population if that population is
difficult to access and you are concerned about final sample size. Surgeons, for
example, are a hard population to access. Doing a pilot study of surgeons will
cannibalize your final sample size. Instead, you could do a pilot study of nurses,
or possibly resident surgeons. Deadlines are also real, and pilot studies take
time – although, they may save you time in the end. If the results of the pilot
study reveal poor Cronbach’s alphas, or poor loadings, or significant cross-
loadings, you should revise your items accordingly. Poor Cronbach’s alphas and
poor loadings indicate too much conceptual inconsistency between the items
within a construct. Significant cross-loadings indicate too much conceptual
overlap between items across separate constructs.
6. Get IRB approval
1. Once you’ve identified your population and obtained confirmation that you’ll be
able to collect data from them, you are now ready to submit your study for
approval to your local IRB. You cannot publish any work that includes data
collected prior to obtaining IRB approval. This means that if you did a pilot study
before obtaining approval, you cannot use that data in the final sample (although
you can still say that you did a pilot study). IRB approval can take between 3
days and 6 weeks (or more), depending on the nature of your study and the
population you intend to target. Typically studies of organizations regarding
performance and employee dispositions and intentions are simple and do not get
held up in IRB review. Studies that involve any form of deception or risk
(physical, psychological, or financial) to participants require extra consideration
and may require oral defense in front of the IRB.
7. Collect Data
1. You’ve made it! Time to collect your data. This could take anywhere between
three days and three months, depending on many factors. Be prepared to send
reminders. Incentives won’t hurt either. Also be prepared to only obtain a fraction
of the responses you expected. For example, if you are targeting an email list of
10,000 brand managers, expect half of the emails to return abandoned, three
quarters of the remainder to go unread, and then 90% of the remainder to go
ignored. That leaves us with only 125 responses, 20% of which may be
unusable, thus leaving us with only 100 usable responses from our original
10,000.
8. Test your model
1. see next section

Guidelines on Survey Design

1. Make sure you are using formative or reflective measures intentionally (i.e., know which
ones are which and be consistent). If you are planning on using AMOS, make sure all
measures are reflective, or be willing to create calculated scores out of your formative
measures.
2. If reflective measures are used, make sure they are truly reflective (i.e., that all items
must move together).
3. If any formative measures are used, make sure that they are not actually 2nd order
factors with multiple dimensions. If they are, then make sure there is sufficient and equal
representation from each dimension (i.e., same number of items per dimension).
4. Make sure you are using the proper scale for each measure. Many scholars will
mistakenly use a 5-point Likert scale of agreement (1=strongly disagree, 5=strongly
agree) for everything, even when it is not appropriate. For example, if the item is “I have
received feedback from my direct supervisor”, a scale of agreement makes no sense. It is
a yes/no question. You could perhaps change it to a scale of frequency: 1=never,
5=daily, but a scale of agreement is not correct.
5. Along these same lines, make sure your measures are not yes/no, true/false, etc. if they
are intended to belong to reflective constructs.
6. Make sure scales go from left to right, low to high, negative to positive, absence to
presence, and so on. This is so that when you start using statistics on the data, an
increase in the value of the response represents an increase in the trait measured.
7. Use exact numbers wherever possible, rather than buckets. This allows you much more
flexibility to later create buckets of even size if you want to. This also gives you richer
data.
8. However, make sure to restrict what types of responses can be given for numbers. For
example, instead of asking someone’s age with a text box entry, use a slider. This
prevents them from giving responses like: “twenty seven”, “twenty-seven”, “twentisven”,
“227”, and “none of your business”.
9. Avoid including “N/A” and “other” if possible. These get coded as either 0 or 6 or 8, etc.
but the number is completely invalid. However, when you’re doing statistics on it, your
statistics software doesn’t know that those numbers are invalid, so it uses them as actual
datapoints.
10. Despite literature stating the contrary, I’ve found reverse coded questions a perpetual
nightmare. They nearly always fail in the factor analysis because some cultures are
drawn to the positive end of the scale, while others are drawn to the negative end of the
scale. So they rarely actually capture the trait the way you intend. When I design new
surveys, I nearly always re-reverse reverse coded questions so that they are in the same
direction as the regular items.
11. Measure only one thing with each item. Don’t ask about two things at once. For example,
don’t include items like this: “I prefer face to face communication and don’t like talking via
web conferencing.” This asks about two separate things. What if they like both?
12. Don’t make assumptions with your measures. For example, this item assumes everyone
loses their temper: “When I lose my temper, it is difficult to think long term.”
13. Make sure your items are applicable to everyone within your sampled population. For
example, don’t include items like this: “My children are a handful.” What if this respondent
doesn’t have children? How should they respond?
14. Be careful including sensitive questions, or questions that have a socially desirable way to
respond. Obvious ones might be like: “I occasionally steal from the office” or “I don’t
report all my assets on my tax forms”. Regardless of the actual truth, respondents will
enter the more favorable response. More subtle such measures might include: “I consider
myself a critical thinker” or “sometimes I lose self-control”. These are less obvious, but
still will result in biased responses because everyone thinks they are critical thinkers and
no one wants to admit that they have anything less than full control over their emotions
and self.
15. Include an occasional attention trap so that you can catch those who are responding
without thinking. Such items should be mixed in with the regular items and should not
stand out. For example, if a set of regular items all start with “My project team often…”
then make sure to word your attention trap the same way. For example, “My project team
often, never mind, please respond with somewhat disagree”.

Order of Operations for Testing your Model


Some general guidelines for the order to conduct each procedure

 VIDEO TUTORIAL: SEM Speed Run (does everything below)

1. Develop a good theoretical model


1. See the Ten Steps above
2. Develop hypotheses to represent your model
2. Case Screening
1. Missing data in rows
2. Unengaged responses
3. Outliers (on continuous variables)
3. Variable Screening
1. Missing data in columns
2. Skewness & Kurtosis
4. Exploratory Factor Analysis
1. Iterate until you arrive at a clean pattern matrix
2. Adequacy
3. Convergent validity
4. Discriminant validity
5. Reliability
5. Confirmatory Factor Analysis
1. Obtain a roughly decent model quickly (cursory model fit, validity)
2. Do configural, metric, and scalar invariance tests (if using grouping variable in
causal model)
3. Validity and Reliability check
4. Response bias (aka common method bias, use specific bias variable(s) if
possible)
5. Final measurement model fit
6. Optionally, impute factor scores
6. Structural Models
1. Multivariate Assumptions
1. Outliers and Influentials
2. Multicollinearity
2. Include control variables in all of the following analyses
3. Mediation
1. Test indirect effects using bootstrapping
2. If you have multiple indirect paths from same IV to same DV, use AxB
estimand
4. Interactions
1. Optionally standardize constituent variables
2. Compute new product terms
3. Plot significant interactions
5. Multigroup Comparisons
1. Create multiple models
2. Assign them the proper group data
3. Test significance of moderation via chi-square difference test
7. Report findings in a concise table
1. Ensure global and local tests are met
2. Include post-hoc power analyses for unsupported direct effects hypotheses
8. Write paper
1. See guidelines below

Structuring a Quantitative Paper


Standard outline for quantitative model building/testing paper

 Title (something catchy and accurate)


 Abstract (concise – 150-250 words – to explain paper): roughly one sentence each:
 What is the problem?
 Why does it matter?
 How do you address the problem?
 What did you find?
 How does this change practice (what people in business do), and how does it change
research (existing or future)?
 Keywords (4-10 keywords that capture the contents of the study)
 Introduction (2-4 pages)
 What is the problem and why does it matter? And what have others done to try to
address this problem, and why have their efforts been insufficient (i.e., what is the gap in
the literature)? (1-2 paragraphs)
 What is your DV(s) and what is the context you are studying it in? Also briefly define the
DV(s). (1-2 paragraphs)
 One sentence about sample (e.g., "377 undergraduate university students using Excel").
 How does studying this DV(s) in this context adequately address the problem? (1-2
paragraphs)
 What existing theory/theories do you leverage, if any, to pursue this study, and why are
these appropriate? (1-2 paragraphs)
 Briefly discuss the primary contributions of this study in general terms without discussing
exact findings (i.e., no p-values here).
 How is the rest of the paper organized? (1 paragraph)
 Literature review (1-3 pages)
 Fully define your dependent variable(s) and summarize how it has been studied in
existing literature within your broader context (like Information systems, or,
Organizations, etc.).
 If you are basing your model on an existing theory/model, use this next space to explain
that theory (1 page) and then explain how you have adapted that theory to your study.
 If you are not basing your model on an existing theory/model, then use this next space to
explain how existing literature in your field has tried to predict your DV(s) or tried to
understand related research questions.
 (Optionally) Explain what other constructs you suspect will help predict your DV(s) and
why. Inclusion of a construct should have good logical/theoretical and/or literature
support. For example, “we are including construct xyz because the theory we are basing
our model on includes xyz.” Or, “we are including construct xyz because the following
logic (abc) constrains us to include this variable lest we be careless”. Try to do this
without repeating everything you are just going to say in the theory section anyway.
 (Optionally) Briefly discuss control variables and why they are being included.
 Theory & Hypotheses (take what space you need, but try to be parsimonious)
 Briefly summarize your conceptual model and show it with the Hypotheses labeled (if
possible).
 Begin supporting H1 then state H1 formally. Support should include strong causal logic
and literature.
 H2, H3, etc. If you have sub-hypotheses, list them as H1a, H1b, H2a, H2b, etc.
 Methods (keep it brief; many approaches; this is just a common template)
 Construct operationalization (where did you get your measures?)
 Instrument development (if you created your own measures)
 Explanation of study design (e.g., pretest, pilot, and online survey)
 Sampling (some descriptive statistics, like demographics (education, experience, etc.),
sample size; don`t forget to discuss response rate (number of responses as a percentage
of number of people invited to do the study)).
 Mention that IRB exempt status was granted and protocols were followed if applicable.
 Method for testing hypotheses (e.g., structural equation modeling in AMOS). If you
conducted multi-group comparisons, mediation, and/or interaction, explain how you kept
them all straight and how you went about analyzing them. For example, if you did
mediation, what approach did you take (hopefully bootstrapping)? Were there multiple
models tested, or did you keep all the variables in for all analyses? If you did interaction,
did you add that in afterward, or was it in from the beginning?
 Analysis (1-3 pages; sometimes combined with methods section)
 Data Screening
 EFA (report pattern matrix and Cronbach`s alphas in appendix) – mention if items were
dropped.
 CFA (just mention that you did it and bring up any issues you found) – mention any items
dropped during CFA. Report model fit for the final measurement model. Supporting
material can be placed in the Appendices if necessary.
 Mention CMB approach and results and actions taken if any (e.g., if you found CMB and
had to keep the CLF).
 Report the correlation matrix, CR and AVE (you can include MSV and ASV if you want),
and briefly discuss any issues with validity and reliability – if any.
 Report whether you used the full latent SEM, or if you imputed factor scores for a path
model.
 Report the final structural model(s) (include R-squares and betas) and the model fit for
the model(s).
 Findings (1-2 pages)
 Report the results for each hypothesis (supported or not, with evidence).
 Point out any unsupported or counter-evidence (significant in opposite direction)
hypotheses.
 Provide a table that concisely summarizes your findings.
 Discussion (2-5 pages)
 Summarize briefly the study and its intent and findings, focusing mainly on the research
question(s) (one paragraph).
 What insights did we gain from the study that we could not have gained without doing the
study?
 How do these insights change the way practitioners do their work?
 How do these insights shed light on existing literature and shape future research in this
area?
 What limitations is our study subject to (e.g., surveying students, just survey rather than
experiment, statistical limitations like CMB etc.)?
 What are some opportunities for future research based on the insights of this study?
 Conclusion (1-2 paragraphs)
 Summarize the insights gained from this study and how they address existing gaps or
problems.
 Explain the primary contribution of the study.
 Express your vision for moving forward or how you hope this work will affect the world.
 References (Please use a reference manager like EndNote)
 Appendices (Any additional information, like the instrument and measurement model stuff
that is necessary for validating or understanding or clarifying content in the main body text.)
 DO NOT pad the appendices with unnecessary statistics tables and illegible statistical
models. Everything in the appendix should add value to the manuscript. If it doesn't add
value, remove it.

My Thoughts on Conference Presentations


I've presented at and attended many many conferences. Over the years, I've seen the good, the
bad, and the ugly in terms of presentation structure, content, and delivery. Here are a few of my
thoughts on what to include and what to avoid.
What to include in a conference presentation

 What’s the problem and why is it important to study?


 Don’t short-change this part. If the audience doesn’t understand the problem, or why it is
important, they won’t follow anything else you say.
 Who else has researched this and what did they miss?
 Keep this short; just mention the key studies you’re building off of.
 How did we fill that gap?
 Theoretically and methodologically
 What did we find, what does it mean, and why does it matter?
 Spend most of your time here.
 The end. Short and sweet.
What to avoid in a conference presentation

 Long lit review


 Completely unnecessary. You don’t have time for this. Just mention the key pieces you’re
building off of
 Listing all hypotheses and explaining each one
 Just show a model (or some illustrative figure) and point out the most important parts
 Including big tables of statistics (for quant) or quotes (for qual)
 Just include a model with indications of significance if a quantitative study.
 Just include a couple key quotes (no more than one per slide) if a qualitative study.
 Back story on origination of the idea
 Don’t care unless it’s crazy fascinating and would make a great movie
 Travel log of methodology
 Again, don’t care. We figure you did the thing right.
 Statistics on model validation and measurement validation.
 Again, we figure you did the thing right. We’ll read the paper if we want to check your
measurement model.
 Repeating yourself too much
 The time is short. There is no need to be redundant.
 Using more words than images
 Presentations are short and so are attention spans. Use pictures with a few words only. I
can’t read your slide and listen to you at the same time. I bet you’d rather I listen to you
than read your slide.
 Reading the entire prepared presentation...
 Yes, that has happened, more than once... cringe...
 Failing to take notes of feedback
 Literally write down on paper the feedback you get, even if it is stupid. This is just
respectful.

Вам также может понравиться