Вы находитесь на странице: 1из 103

SW388R7

Hierarchical Multiple Regression


Data Analysis &
Computers II

Slide 1

Differences between hierarchical


and standard multiple regression

Sample problem

Steps in hierarchical multiple regression

Homework Problems
Differences between standard and hierarchical
SW388R7
Data Analysis &

multiple regression
Computers II

Slide 2

 Standard multiple regression is used to evaluate the relationship


between a set of independent variables and a dependent variable.

 Hierarchical regression is used to evaluate the relationship between a


set of independent variables and the dependent variable, controlling
for or taking into account the impact of a different set of independent
variables on the dependent variable.

 For example, a research hypothesis might state that there are


differences between the average salary for male employees and
female employees, even after we take into account differences
between education levels and prior work experience.

 In hierarchical regression, the independent variables are entered into


the analysis in a sequence of blocks, or groups that may contain one or
more variables. In the example above, education and work experience
would be entered in the first block and sex would be entered in the
second block.
SW388R7

Differences in statistical results


Data Analysis &
Computers II

Slide 3

 SPSS shows the statistical results (Model Summary, ANOVA,


Coefficients, etc.) as each block of variables is entered into the
analysis.

 In addition (if requested), SPSS prints and tests the key statistic
used in evaluating the hierarchical hypothesis: change in R² for
each additional block of variables.

 The null hypothesis for the addition of each block of variables


to the analysis is that the change in R² (contribution to the
explanation of the variance in the dependent variable) is zero.

 If the null hypothesis is rejected, then our interpretation


indicates that the variables in block 2 had a relationship to the
dependent variable, after controlling for the relationship of the
block 1 variables to the dependent variable.
SW388R7

Variations in hierarchical regression - 1


Data Analysis &
Computers II

Slide 4

 A hierarchical regression can have as many blocks as there are


independent variables, i.e. the analyst can specify a hypothesis
that specifies an exact order of entry for variables.

 A more common hierarchical regression specifies two blocks of


variables: a set of control variables entered in the first block
and a set of predictor variables entered in the second block.

 Control variables are often demographics which are thought to


make a difference in scores on the dependent variable.
Predictors are the variables in whose effect our research
question is really interested, but whose effect we want to
separate out from the control variables.
SW388R7

Variations in hierarchical regression - 2


Data Analysis &
Computers II

Slide 5

 Support for a hierarchical hypothesis would be expected to


require statistical significance for the addition of each block of
variables.

 However, many times, we want to exclude the effect of blocks


of variables previously entered into the analysis, whether or not
a previous block was statistically significant. The analysis is
interested in obtaining the best indicator of the effect of the
predictor variables. The statistical significance of previously
entered variables is not interpreted.

 The latter strategy is the one that we will employ in our


problems.
Differences in solving hierarchical regression
SW388R7
Data Analysis &

problems
Computers II

Slide 6

 R² change, i.e. the increase when the predictors variables are


added to the analysis is interpreted rather than the overall R²
for the model with all variables entered.

 In the interpretation of individual relationships, the relationship


between the predictors and the dependent variable is
presented.

 Similarly, in the validation analysis, we are only concerned with


verifying the significance of the predictor variables.
Differences in control variables are ignored.
SW388R7

A hierarchical regression problem


Data Analysis &
Computers II

Slide 7

The problem asks us to examine the feasibility


of doing multiple regression to evaluate the
relationships among these variables. The
inclusion of the “controlling for” phrase
indicates that this is a hierarchical multiple
regression problem.

Multiple regression is feasible if the dependent


variable is metric and the independent
variables (both predictors and controls) are
metric or dichotomous, and the available data
is sufficient to satisfy the sample size
requirements.
SW388R7

Level of measurement - answer


Data Analysis &
Computers II

Slide 8

Hierarchical multiple regression


requires that the dependent
variable be metric and the
independent variables be metric
or dichotomous.

"Spouse's highest academic degree" [spdeg] is ordinal, satisfying the


metric level of measurement requirement for the dependent variable, if
we follow the convention of treating ordinal level variables as metric.
Since some data analysts do not agree with this convention, a note of
caution should be included in our interpretation.

"Age" [age] is interval, satisfying the metric or dichotomous level of


measurement requirement for independent variables.

"Highest academic degree" [degree] is ordinal, satisfying the metric or


dichotomous level of measurement requirement for independent
variables, if we follow the convention of treating ordinal level variables
as metric. Since some data analysts do not agree with this convention, a
note of caution should be included in our interpretation.

"Sex" [sex] is dichotomous, satisfying the metric or dichotomous level of


measurement requirement for independent variables.

True with caution


is the correct
answer.
SW388R7

Sample size - question


Data Analysis &
Computers II

Slide 9

The second question asks about the


sample size requirements for multiple
regression.

To answer this question, we will run the


initial or baseline multiple regression to
obtain some basic data about the
problem and solution.
SW388R7

The baseline regression - 1


Data Analysis &
Computers II

Slide 10

After we check for violations of


assumptions and outliers, we will
make a decision whether we should
interpret the model that includes the
transformed variables and omits
outliers (the revised model), or
whether we will interpret the model
that uses the untransformed
variables and includes all cases
including the outliers (the baseline
model).

In order to make this decision, we To run the baseline


run the baseline regression before model, select Regression
we examine assumptions and | Linear… from the
outliers, and record the R² for the Analyze model.
baseline model. If using
transformations and outliers
substantially improves the analysis
(a 2% increase in R²), we interpret
the revised model. If the increase is
smaller, we interpret the baseline
model.
SW388R7

The baseline regression - 2


Data Analysis &
Computers II

Slide 11

First, move the


dependent variable spdeg
to the Dependent text box.

Fourth, click on the Next


button to tell SPSS to add
another block of variables
to the regression analysis.
Second, move the
independent variables to
control for age and sex
to the Independent(s)
list box.
Third, select the method for
entering the variables into the
analysis from the drop down
Method menu. In this example,
we accept the default of Enter for
direct entry of all variables in the
first block which will force the
controls into the regression.
SW388R7

The baseline regression - 3


Data Analysis &
Computers II

Slide 12

SPSS identifies that we


will now be adding
variables to a second block.

First, move the


predictor independent
variable degree to the
Independent(s) list box
for block 2.

Second, click on the


Statistics… button to
specify the statistics
options that we want.
SW388R7

The baseline regression - 4


Data Analysis &
Computers II

Slide 13

First, mark the Second, mark the checkboxes for Model


checkboxes for Fit, Descriptives, and R squared change.
Estimates on the
Regression The R squared change statistic will tell
Coefficients panel. us whether or not the variables added
after the controls have a relationship to
the dependent variable.

Fifth, click on
the Continue
button to close
the dialog box.

Fourth, mark the


Third, mark the
Collinearity diagnostics to
Durbin-Watson
get tolerance values for
statistic on the
testing multicollinearity.
Residuals panel.
SW388R7

The baseline regression - 5


Data Analysis &
Computers II

Slide 14

Click on the OK
button to
request the
regression
output.
SW388R7

R² for the baseline model


Data Analysis &
Computers II

Slide 15

The R² of 0.281 is the benchmark


that we will use to evaluate the
utility of transformations and the Prior to any transformations of variables
elimination of outliers. to satisfy the assumptions of multiple
regression or the removal of outliers,
the proportion of variance in the
dependent variable explained by the
independent variables (R²) was 28.1%.

The relationship is statistically


significant, though we would not stop if
it were not significant because the lack
of significance may be a consequence of
violation of assumptions or the inclusion
of outliers.
SW388R7

Sample size – evidence and answer


Data Analysis &
Computers II

Slide 16

Descriptive Statistics

Mean Std. Deviation N


SPOUSES HIGHEST
1.78 1.281 136
DEGREE
AGE OF RESPONDENT 45.80 14.534 136
RESPONDENTS SEX 1.60 .491 136
RS HIGHEST DEGREE 1.65 1.220 136

Hierarchical multiple regression requires that the


minimum ratio of valid cases to independent
variables be at least 5 to 1. The ratio of valid
cases (136) to number of independent variables
(3) was 45.3 to 1, which was equal to or greater
than the minimum ratio. The requirement for a
minimum ratio of cases to independent variables
was satisfied.

In addition, the ratio of 45.3 to 1 satisfied the


preferred ratio of 15 cases per independent
variable.

The answer to the question is true.


Assumption of normality for the dependent
SW388R7
Data Analysis &

variable - question
Computers II

Slide 17

Having satisfied the level of measurement


and sample size requirements, we turn our
attention to conformity with three of the
assumptions of multiple regression: normality,
linearity, and homoscedasticity.

First, we will evaluate the assumption of


normality for the dependent variable.
SW388R7

Run the script to test normality


Data Analysis &
Computers II

Slide 18

First, move the variables to the


list boxes based on the role that
the variable plays in the analysis
and its level of measurement.

Second, click on the Normality option


button to request that SPSS produce
the output needed to evaluate the
assumption of normality.

Fourth, click on
the OK button to
produce the output.

Third, mark the checkboxes


for the transformations that
we want to test in evaluating
the assumption.
Normality of the dependent variable:
SW388R7
Data Analysis &

spouse’s highest degree


Computers II

Slide 19

Descriptives

Statis tic Std. Error


SPOUSES Mean 1.78 .110
HIGHEST DEGREE 95% Confidence Lower Bound 1.56
Interval for Mean Upper Bound
2.00

5% Trimmed Mean 1.75


Median 1.00
Variance 1.640
Std. Deviation 1.281
Minimum 0
Maximum 4
Range 4
Interquartile Range 2.00
Skewness .573 .208
Kurtos is -1.051 .413

The dependent variable "spouse's highest


academic degree" [spdeg] did not satisfy the
criteria for a normal distribution. The
skewness of the distribution (0.573) was
between -1.0 and +1.0, but the kurtosis of
the distribution (-1.051) fell outside the
range from -1.0 to +1.0. The answer to the
question is false.
Normality of the transformed dependent variable:
SW388R7
Data Analysis &

spouse’s highest degree


Computers II

Slide 20

The "log of spouse's highest academic degree


[LGSPDEG=LG10(1+SPDEG)]" satisfied the criteria
for a normal distribution. The skewness of the
distribution (-0.091) was between -1.0 and +1.0 and
the kurtosis of the distribution (-0.678) was between
-1.0 and +1.0.

The "log of spouse's highest academic degree


[LGSPDEG=LG10(1+SPDEG)]" was substituted for
"spouse's highest academic degree" [spdeg] in the
analysis.
SW388R7

Normality of the control variable: age


Data Analysis &
Computers II

Slide 21

Next, we will evaluate the


assumption of normality for
the control variable, age.
SW388R7

Normality of the control variable: age


Data Analysis &
Computers II

Slide 22

Descriptives

Statis tic Std. Error


AGE OF RESPONDENT Mean 45.99 1.023
95% Confidence Lower Bound 43.98
Interval for Mean Upper Bound
48.00

5% Trimmed Mean 45.31


Median 43.50
Variance 282.465
Std. Deviation 16.807
Minimum 19
Maximum 89
Range 70
Interquartile Range 24.00
Skewness .595 .148
Kurtos is -.351 .295

The independent variable "age" [age]


satisfied the criteria for a normal distribution.
The skewness of the distribution (0.595) was
between -1.0 and +1.0 and the kurtosis of
the distribution (-0.351) was between -1.0
and +1.0.
Normality of the predictor variable:
SW388R7
Data Analysis &

highest academic degree


Computers II

Slide 23

Next, we will evaluate the


assumption of normality for
the predictor variable,
highest academic degree.
Normality of the predictor variable:
SW388R7
Data Analysis &

respondent’s highest academic degree


Computers II

Slide 24

Descriptives

Statis tic Std. Error


RS HIGHEST DEGREE Mean 1.41 .071
95% Confidence Lower Bound 1.27
Interval for Mean Upper Bound
1.55

5% Trimmed Mean 1.35


Median 1.00
Variance 1.341
Std. Deviation 1.158
Minimum 0
Maximum 4
Range 4
Interquartile Range 1.00
Skewness .948 .149
Kurtos is -.051 .297

The independent variable "highest academic


degree" [degree] satisfied the criteria for a
normal distribution. The skewness of the
distribution (0.948) was between -1.0 and
+1.0 and the kurtosis of the distribution
(-0.051) was between -1.0 and +1.0.
Assumption of linearity for spouse’s degree and
SW388R7
Data Analysis &

respondent’s degree - question


Computers II

Slide 25

The metric independent variables satisfied the criteria for


normality, but the dependent variable did not.

However, the logarithmic transformation of "spouse's highest


academic degree" produced a variable that was normally
distributed and will be tested as a substitute in the analysis.

The script for linearity will support our using the transformed
dependent variable without having to add it to the data set.
SW388R7

Run the script to test linearity


Data Analysis &
Computers II

Slide 26

When the linearity option is


selected, a default set of
transformations to test is marked.

Second , since we have decided to


use the log transformation of the
First, click on the Linearity dependent variable, we mark the
option button to request check box for the Logarithmic
that SPSS produce the transformation and clear the check
output needed to evaluate box for the Untransformed version
the assumption of linearity. of the dependent variable.

Third, click on the


OK button to
produce the output.
Linearity test: spouse’s highest degree and
SW388R7
Data Analysis &

respondent’s highest academic degree


Computers II

Slide 27

The correlation between "highest


academic degree" and logarithmic
transformation of "spouse's highest
academic degree" was statistically
significant (r=.519, p<0.001). A
linear relationship exists between
these variables.
Linearity test: spouse’s highest degree and
SW388R7
Data Analysis &

respondent’s age
Computers II

Slide 28

The assessment of the linear


relationship between logarithmic
transformation of "spouse's highest
academic degree"
[LGSPDEG=LG10(1+SPDEG)] and "age"
[age] indicated that the relationship
was weak, rather than nonlinear.
Neither the correlation between
logarithmic transformation of "spouse's
highest academic degree" and "age" nor
the correlations with the
transformations were statistically
significant.

The correlation between "age" and


logarithmic transformation of "spouse's
highest academic degree" was not
statistically significant (r=.009,
p=0.921). The correlations for the
transformations were: the logarithmic
transformation (r=.061, p=0.482); the
square root transformation (r=.034,
p=0.692); the inverse transformation
(r=.112, p=0.194); and the square
transformation (r=-.037, p=0.668)
SW388R7

Assumption of homogeneity of variance - question


Data Analysis &
Computers II

Slide 29

Sex is the only dichotomous


independent variable in the analysis.
We will test if for homogeneity of
variance using the logarithmic
transformation of the dependent
variable which we have already
decided to use.
Run the script to test
SW388R7
Data Analysis &

homogeneity of variance
Computers II

Slide 30

When the homogeneity of variance


option is selected, a default set of
transformations to test is marked.
First, click on the
Second , since we have decided to
Homogeneity of variance
use the log transformation of the
option button to request
dependent variable, we mark the
that SPSS produce the
check box for the Logarithmic
output needed to evaluate
transformation and clear the check
the assumption of linearity.
box for the Untransformed version
of the dependent variable.

Third, click on the


OK button to
produce the output.
Assumption of homogeneity of variance – evidence and
SW388R7
Data Analysis &

answer
Computers II

Slide 31

Based on the Levene Test, the


variance in "log of spouse's highest
academic degree
[LGSPDEG=LG10(1+SPDEG)]" was
homogeneous for the categories of
"sex" [sex]. The probability
associated with the Levene statistic
(0.687) was p=0.409, greater than
the level of significance for testing
assumptions (0.01). The null
hypothesis that the group variances
were equal was not rejected.

The homogeneity of variance


assumption was satisfied. The
answer to the question is true.
SW388R7

Including the transformed variable in the data set - 1


Data Analysis &
Computers II

Slide 32

In the evaluation for normality, we resolved a problem with


normality for spouse’s highest academic degree with a
logarithmic transformation. We need to add this transformed
variable to the data set, so that we can incorporate it in our
detection of outliers.

We can use the script to compute transformed variables and add


them to the data set.

We select an assumption to test (Normality is the easiest), mark


the check box for the transformation we want to retain, and
clear the check box "Delete variables created in this analysis."

NOTE: this will leave the transformed


variable in the data set. To remove it,
you can delete the column or close the
data set without saving.
SW388R7

Including the transformed variable in the data set - 2


Data Analysis &
Computers II

Slide 33

First, move the variable


SPDEG to the list box for
the dependent variable.

Second, click on the


Normality option button to Fourth, clear the check
request that SPSS do the test box for the option
for normality, including the "Delete variables
transformation we will mark. created in this analysis".

Third, mark the transformation


we want to retain (Logarithmic)
and clear the checkboxes for
the other transformations.
Fifth, click on
the OK button.
SW388R7

Including the transformed variable in the data set - 3


Data Analysis &
Computers II

Slide 34

If we scroll to the rightmost


column in the data editor, we
see than the log of SPDEG in
included in the data set.
Including the transformed variable in the list of
SW388R7
Data Analysis &

variables in the script - 1


Computers II

Slide 35

If we scroll to the bottom of


the list of variables, we see
that the log of SPDEG is not
included in the list of available
variables.

To tell the script to add the


log of SPDEG to the list of
variables in the script, click
on the Reset button. This
will start the script over
again, with a new list of
variables from the data set.
Including the transformed variable in the list of
SW388R7
Data Analysis &

variables in the script - 2


Computers II

Slide 36

If we scroll to the bottom of


the list of variables now, we
see that the log of SPDEG is
included in the list of available
variables.
SW388R7

Detection of outliers - question


Data Analysis &
Computers II

Slide 37

In multiple regression, an outlier in the solution


can be defined as a case that has a large residual
because the equation did a poor job of predicting
its value.

We will run the regression again incorporating any


transformations we have decided to test, and have
SPSS compute the standardized residual for each
case. Cases with a standardized residual larger
than +/- 3.0 will be treated as outliers.
SW388R7

The revised regression using transformations


Data Analysis &
Computers II

Slide 38

To run the regression to


detect outliers, select the
Linear Regression command
from the menu that drops
down when you click on the
Dialog Recall button.
The revised regression:
SW388R7
Data Analysis &

substituting transformed variables


Computers II

Slide 39

Remove the variable SPDEG


from the list of independent
variables. Include the log of
the variable, LGSPDEG.

Click on the Statistics…


button to select statistics
we will need for the analysis.
SW388R7

The revised regression: selecting statistics


Data Analysis &
Computers II

Slide 40

First, mark the Second, mark the checkboxes for Model


checkboxes for Fit, Descriptives, and R squared change.
Estimates on the
Regression The R squared change statistic will tell
Coefficients panel. us whether or not the variables added
after the controls have a relationship to
the dependent variable.

Third, mark the


Durbin-Watson Sixth, click on
statistic on the the Continue
Residuals panel. button to close
the dialog box.

Fifth, mark the


Fourth, mark the Collinearity diagnostics to
checkbox for the get tolerance values for
Casewise diagnostics, testing multicollinearity.
which will be used to
identify outliers.
SW388R7

The revised regression: saving standardized residuals


Data Analysis &
Computers II

Slide 41

Click on the
Continue
button to close
Mark the checkbox for the dialog box.
Standardized Residuals so
that SPSS saves a new
variable in the data editor.
We will use this variable to
omit outliers in the revised
regression model.
SW388R7

The revised regression: obtaining output


Data Analysis &
Computers II

Slide 42

Click on the OK
button to obtain
the output for the
revised model.
SW388R7

Outliers in the analysis


Data Analysis &
Computers II

Slide 43

If cases have a standardized residual larger than +/- 3.0,


SPSS creates a table titled Casewise Diagnostics, in which it
lists the cases and values that results in their being an outlier.

If there are no outliers, SPSS does not print the Casewise


Diagnostics table. There was no table for this problem. The
answer to the question is true.

We can verify that all standardized residuals


were less than +/- 3.0 by looking the
minimum and maximum standardized
residuals in the table of Residual Statistics.
Both the minimum and maximum fell in the
acceptable range.
Since there were no outliers,
we can use the regression just
completed to make our decision
about which model to interpret.
SW388R7

Selecting the model to interpret - question


Data Analysis &
Computers II

Slide 44

Since there were no outliers, we can


use the regression just completed to
make our decision about which
model to interpret.

If the R² for the revised model is


higher by 2% or more, we will base
out interpretation on the revised
model; otherwise, we will interpret
the baseline model.
Selecting the model to interpret – evidence and
SW388R7
Data Analysis &

answer
Computers II

Slide 45

Prior to any transformations of variables to


satisfy the assumptions of multiple regression
and the removal of outliers, the proportion of
variance in the dependent variable explained by
the independent variables (R²) was 28.1%.
After substituting transformed variables, the
proportion of variance in the dependent variable
explained by the independent variables (R²)
was 27.1%.

Since the revised regression model did not


explain at least two percent more variance than
explained by the baseline regression analysis,
the baseline regression model with all cases and
the original form of all variables should be used
for the interpretation.

The transformations used to satisfy the


assumptions will not be used, so cautions
should be added for the assumptions violated.

False is the correct answer to the question.


SW388R7

Re-running the baseline regression - 1


Data Analysis &
Computers II

Slide 46

Having decided to use the baseline


model for the interpretation of this
analysis, the SPSS regression
output was re-created.

To run the baseline regression


again, select the Linear
Regression command from
the menu that drops down
when you click on the Dialog
Recall button.
SW388R7

Re-running the baseline regression - 2


Data Analysis &
Computers II

Slide 47

Remove the transformed


variable lgspdeg from the
dependent variable textbox
and add the variable spdeg.

Click on the Save


button to remove
the request to
save standardized
residuals to the
data editor.
Revised regression using transformations
SW388R7
Data Analysis &

and omitting outliers - 3


Computers II

Slide 48

Click on the
Continue
button to close
Clear the checkbox for the dialog box.
Standardized Residuals
so that SPSS does not
save a new set of them
in the data editor when it
runs the new regression.
SW388R7

Re-running the baseline regression - 4


Data Analysis &
Computers II

Slide 49

Click on the OK
button to
request the
regression
output.
SW388R7

Assumption of independence of errors - question


Data Analysis &
Computers II

Slide 50

We can now check the


assumption of independence
of errors for the analysis we
will interpret.
Assumption of independence of errors:
SW388R7
Data Analysis &

evidence and answer


Computers II

Slide 51

Model Summaryc

Change Statis tics


Having selected Adjus
a regression
ted
model for
Std. Error of R Square Durbin-W
Model interpretation,
R R Square we can
R Square now
the Es timate the
examine Change F Change df1 df2 Sig. F Change atson
1 final
.014 assumptions
a .000 of-.015
independence 1.290 of .000 .013 2 133 .987
2 errors.
.531 b .281 .265 1.098 .281 51.670 1 132 .000 1.754
a. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT
The Durbin-Watson statistic is used to
b. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, RS HIGHEST DEGREE
test for the presence of serial correlation
c. Dependent Variable: SPOUSES HIGHEST DEGREE
among the residuals, i.e., the
assumption of independence of errors,
which requires that the residuals or The Durbin-Watson
errors in prediction do not follow a statistic for this problem is
pattern from case to case. 1.754 which falls within
the acceptable range.
The value of the Durbin-Watson statistic
ranges from 0 to 4. As a general rule of If the Durbin-Watson
thumb, the residuals are not correlated statistic was not in the
if the Durbin-Watson statistic is acceptable range, we
approximately 2, and an acceptable would add a caution to the
range is 1.50 - 2.50. findings for a violation of
regression assumptions.
The answer to the
question is true.
SW388R7

Multicollinearity - question
Data Analysis &
Computers II

Slide 52

The final condition that can have


an impact on our interpretation
is multicollinearity.
SW388R7

Multicollinearity – evidence and answer


Data Analysis &
Computers II

Slide 53

The tolerance values for all of the independent variables


are larger than 0.10: "highest academic degree" [degree]
(.990), "age" [age] (.954) and "sex" [sex] (.947).

Multicollinearity is not a problem in this regression analysis.

True is the correct answer to the question.


Overall relationship between dependent variable
SW388R7
Data Analysis &

and independent variables - question


Computers II

Slide 54

The first finding we want to


confirm concerns the
relationship between the
dependent variable and the set
of predictors after including the
control variables in the analysis.
Overall relationship between dependent variable
SW388R7
Data Analysis &

and independent variables – evidence and answer


Computers II

Slide 55

Hierarchical multiple regression was performed to test the


hypothesis that there was a relationship between the dependent
variable "spouse's highest academic degree" [spdeg] and the
predictor independent variables "highest academic degree" [degree]
after controlling for the effect of the control independent variables
"age" [age] and "sex" [sex]. In hierarchical regression, the
interpretation for overall relationship focuses on the change in R².
If change in R² is statistically significant, the overall relationship for
all independent variables will be significant as well.
Overall relationship between dependent variable
SW388R7
Data Analysis &

and independent variables – evidence and answer


Computers II

Slide 56

Based on model 2 in the Model Summary table where the predictors


were added , (F(1, 132) = 51.670, p<0.001), the predictor variable,
highest academic degree, did contribute to the overall relationship
with the dependent variable, spouse's highest academic degree.
Since the probability of the F statistic (p<0.001) was less than or
equal to the level of significance (0.05), the null hypothesis that
change in R² was equal to 0 was rejected. The research hypothesis
that highest academic degree reduced the error in predicting
spouse's highest academic degree was supported.
Overall relationship between dependent variable
SW388R7
Data Analysis &

and independent variables – evidence and answer


Computers II

Slide 57

The increase in R² by including the predictor variables


("highest academic degree") in the analysis was 0.281,
not 0.241.

Using a proportional reduction in error interpretation for


R², information provided by the predictor variables
reduced our error in predicting "spouse's highest
academic degree" [spdeg] by 28.1%, not 24.1%.

The answer to the


question is false because
the problem stated an
incorrect statistical value.
Relationship of the predictor variable and the
SW388R7
Data Analysis &

dependent variable - question


Computers II

Slide 58

In these hierarchical regression


problems, we will focus the
interpretation of individual relationships
on the predictor variables and ignore the
contribution of the control variables.
Relationship of the predictor variable and the
SW388R7
Data Analysis &

dependent variable – evidence and answer


Computers II

Slide 59

Coefficientsa

Uns tandardized Standardized


Coefficients Coefficients Collinearity Statis tics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Cons tant) 1.781 .577 3.085 .002
AGE OF RESPONDENT .001 .008 .009 .100 .920 .956 1.046
RESPONDENTS SEX -.023 .231 -.009 -.100 .920 .956 1.046
2 (Cons tant) .525 .521 1.007 .316
AGE OF RESPONDENT .003 .007 .037 .495 .622 .954 1.049
RESPONDENTS SEX .114 .198 .044 .575 .566 .947 1.056
RS HIGHEST DEGREE .559 .078 .533 7.188 .000 .990 1.010
a. Dependent Variable: SPOUSES HIGHEST DEGREE

Based on the statistical test of the b coefficient


(t = 7.188, p<0.001) for the independent
variable "highest academic degree" [degree],
the null hypothesis that the slope or b
coefficient was equal to 0 (zero) was rejected.
The research hypothesis that there was a
relationship between "highest academic
degree" and "spouse's highest academic
degree" was supported.
Relationship of the predictor variable and the
SW388R7
Data Analysis &

dependent variable – evidence and answer


Computers II

Slide 60

Coefficientsa

Uns tandardized Standardized


Coefficients Coefficients Collinearity Statis tics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Cons tant) 1.781 .577 3.085 .002
The b coefficient for the relationship
AGE OF RESPONDENT .001 .008 between
.009 the .100
dependent.920 variable "spouse's
.956 1.046
RESPONDENTS SEX -.023 .231 highest
-.009 academic
-.100 degree" .920[spdeg].956
and the 1.046
2 (Cons tant) .525 .521 independent1.007variable "highest
.316 academic
degree" [degree]. was .559, which implies
AGE OF RESPONDENT .003 .007 .037 relationship
a direct .495 .622 the sign
because .954 of 1.049
RESPONDENTS SEX .114 .198 the.044
coefficient is positive.
.575 .566Higher numeric
.947 1.056
RS HIGHEST DEGREE .559 .078 values
.533 for the independent
7.188 .000 variable
.990 1.010
"highest academic degree" [degree] are
a. Dependent Variable: SPOUSES HIGHEST DEGREE associated with higher numeric values for
the dependent variable "spouse's highest
academic degree" [spdeg].

The statement in the problem that "survey


respondents who had higher academic
degrees had spouses with higher academic
degrees" is correct. The answer to the
question is true with caution. Caution in
interpreting the relationship should be
exercised because of an ordinal variable
treated as metric; and violation of the
assumption of normality.
SW388R7

Validation analysis - question


Data Analysis &
Computers II

Slide 61

The problem states the


random number seed to use
in the validation analysis.
Validation analysis:
SW388R7
Data Analysis &

set the random number seed


Computers II

Slide 62

Validate the results of


your regression analysis
by conducting a 75/25%
cross-validation, using
998794 as the random
number seed.

To set the random number


seed, select the Random
Number Seed… command
from the Transform menu.
SW388R7

Set the random number seed


Data Analysis &
Computers II

Slide 63

First, click on the


Set seed to option
button to activate
the text box.

Second, type in the


random seed stated in
the problem.

Third, click on the OK


button to complete the
dialog box.

Note that SPSS does not


provide you with any
feedback about the change.
Validation analysis:
SW388R7
Data Analysis &

compute the split variable


Computers II

Slide 64

To enter the formula for the


variable that will split the
sample in two parts, click
on the Compute… command.
SW388R7

The formula for the split variable


Data Analysis &
Computers II

Slide 65

First, type the name for the


new variable, split, into the
Target Variable text box.

Second, the formula for the


value of split is shown in the
text box.

The uniform(1) function


generates a random decimal
number between 0 and 1.
The random number is
compared to the value 0.75.

If the random number is less


than or equal to 0.75, the
value of the formula will be 1,
the SPSS numeric equivalent
to true. If the random
number is larger than 0.75,
the formula will return a 0,
the SPSS numeric equivalent
Third, click on the
to false.
OK button to
complete the dialog
box.
SW388R7

The split variable in the data editor


Data Analysis &
Computers II

Slide 66

In the data editor, the


split variable shows a
random pattern of zero’s
and one’s.

To select the cases for the


training sample, we select
the cases where split = 1.
SW388R7

Repeat the regression for the validation


Data Analysis &
Computers II

Slide 67

To run the regression for the


validation training sample,
select the Linear Regression
command from the menu that
drops down when you click on
the Dialog Recall button.
SW388R7

Using "split" as the selection variable


Data Analysis &
Computers II

Slide 68

First, scroll
down the list of
variables and
highlight the
variable split.

Second, click on the


right arrow button to
move the split variable
to the Selection
Variable text box.
SW388R7

Setting the value of split to select cases


Data Analysis &
Computers II

Slide 69

When the variable named


split is moved to the
Selection Variable text box,
SPSS adds "=?" after the
name to prompt up to Click on the
enter a specific value for Rule… button
split. to enter a
value for split.
SW388R7

Completing the value selection


Data Analysis &
Computers II

Slide 70

First, type the value


for the training sample,
1, into the Value text
box.

Second, click on the


Continue button to
complete the value entry.
SW388R7

Requesting output for the validation analysis


Data Analysis &
Computers II

Slide 71

Click on the OK
button to
request the
output.

When the value entry


dialog box is closed, SPSS
adds the value we entered
after the equal sign. This
specification now tells
SPSS to include in the
analysis only those cases
that have a value of 1 for
the split variable.
SW388R7

Validation analysis - 1
Data Analysis &
Computers II

Slide 72

The validation analysis requires that the


regression model for the 75% training
sample replicate the pattern of statistical
significance found for the full data set.

In the analysis of the 75% training sample, the


relationship between the set of independent
variables and the dependent variable was
statistically significant, F(3, 103) = 11.569,
p<0.001, as was the overall relationship in the
analysis of the full data set, F(3, 132) = 17.235,
p<0.001
SW388R7

Validation analysis - 2
Data Analysis &
Computers II

Slide 73

The validation of a hierarchical regression


model also requires that the change in R²
demonstrate statistical significance in the
analysis of the 75% training sample.

The R² change of 0.249


satisfied this requirement
(F change(1, 103) =
34.319, p<0.001).
SW388R7

Validation analysis - 3
Data Analysis &
Computers II

Slide 74

The pattern of significance for the individual


relationships between the dependent variable and
the predictor variable was the same for the
analysis using the full data set and the 75%
training sample.

The relationship between highest academic degree and


spouse's highest academic degree was statistically significant
in both the analysis using the full data set (t=7.188,
p<0.001) and the analysis using the 75% training sample
(t=5.484, p<0.001). The pattern of statistical significance of
the independent variables for the analysis using the 75%
training sample matched the pattern identified in the
analysis of the full data set.
SW388R7

Validation analysis - 4
Data Analysis &
Computers II

Slide 75

The total proportion of variance explained in the


model using the training sample was 25.2%
(.502²), compared to 40.6% (.637²) for the
validation sample. The value of R² for the
validation sample was actually larger than the
value of R² for the training sample, implying a The validation analysis
better fit than obtained for the training sample. supported the
generalizability of the
This supports a conclusion that the regression findings of the analysis to
model would be effective in predicting scores for the population represented
cases other than those included in the sample. by the sample in the data
set.

The answer to the


question is true.
Steps in complete hierarchical
SW388R7
Data Analysis &
Computers II

Slide 76 regression analysis

The following flow charts depict the process for solving the complete
regression problem and determining the answer to each of the
questions encountered in the complete analysis.

Text in italics (e.g. True, False, True with caution, Incorrect


application of a statistic) represent the answers to each specific
question.

Many of the steps in hierarchical regression analysis are identical to


the steps in standard regression analysis. Steps that are different are
identified with a magenta background, with the specifics of the
difference underlined.
Complete Hierarchical multiple regression analysis:
SW388R7
Data Analysis &

level of measurement
Computers II

Slide 77

Question: do variables included in the analysis satisfy the level


of measurement requirements?

Is the dependent
variable metric and the No Incorrect
application of
independent variables
a statistic
metric or dichotomous?

Examine all independent


variables – controls as
well as predictors Yes

Ordinal variables included


Yes
in the relationship?
True with caution

No

True
Complete Hierarchical multiple regression analysis:
SW388R7
Data Analysis &

sample size
Computers II

Slide 78

Question: Number of variables and cases satisfy sample size


requirements?
Include both controls and
Compute the baseline predictors, in the count of
regression in SPSS independent variables

Ratio of cases to No Inappropriate


independent variables at application of
least 5 to 1?
a statistic

Yes

Ratio of cases to
No
independent variables at True with caution
preferred sample size of at
least 15 to 1?

Yes
True
Complete Hierarchical multiple regression analysis:
SW388R7
Data Analysis &

assumption of normality
Computers II

Slide 79

Question: each metric variable satisfies the assumption of


normality?
Test the dependent
variable and both
controls and predictor
independent variables
The variable
satisfies criteria for
No False
a normal distribution?

Yes Use untransformed


Log, square root, or
variable in analysis,
inverse No add caution to
True transformation interpretation for
satisfies normality? violation of normality
If more than one
transformation
satisfies normality,
use one with Yes
smallest skew

Use transformation
in revised model,
no caution needed
Complete Hierarchical multiple regression analysis:
SW388R7
Data Analysis &

assumption of linearity
Computers II

Slide 80

Question: relationship between dependent variable and metric


independent variable satisfies assumption of linearity?
If independent
variable was
If dependent variable was transformed to satisfy
transformed for normality, use normality, skip check
transformed dependent for linearity. If more than one
variable in the test for linearity. transformation
satisfies linearity,
use one with
largest r

Probability of Pearson Probability of correlation (r)


No for relationship with any
correlation (r) <=
level of significance? transformation of IV <=
level of significance?

No

Test both Yes Yes


control and Weak
predictor
independen relationship.
t variables No caution
Use transformation needed
in revised model

True
Complete Hierarchical multiple regression analysis:
SW388R7
Data Analysis &

assumption of homogeneity of variance


Computers II

Slide 81

Question: variance in dependent variable is uniform across the


categories of a dichotomous independent variable?

If dependent variable was


transformed for normality,
substitute transformed
dependent variable in the test
for the assumption of
homogeneity of variance
Test both
control and
predictor
independen
t variables

Probability of Levene Yes


statistic <= level of False
significance?

No
Do not test transformations of
dependent variable, add caution to
interpretation for violation of
True homoscedasticity
SW388R7
Data Analysis & Complete Hierarchical multiple regression
analysis: detecting outliers
Computers II

Slide 82

Question: After incorporating any transformations, no outliers


were detected in the regression analysis.

If any variables were transformed


for normality or linearity, substitute
transformed variables in the
regression for the detection of
outliers.

Is the standardized residual


Yes
for any case greater than False
+/-3.00?

No
Remove outliers and run
revised regression again.
True
Complete Hierarchical multiple regression analysis:
SW388R7
Data Analysis &

picking regression model for interpretation


Computers II

Slide 83

Question: interpretation based on model that includes


transformation of variables and removes outliers?

R² for revised regression


greater than R² for
baseline regression by 2%
Yes or more? No

Pick revised regression with Pick baseline regression with


transformations and omitting untransformed variables and all
outliers for interpretation cases for interpretation

True False
Complete Hierarchical multiple regression analysis:
SW388R7
Data Analysis &

assumption of independence of errors


Computers II

Slide 84

Question: serial correlation of errors is not a problem in this regression


analysis?

Residuals are
independent, No
Durbin-Watson between False
1.5 and 2.5?

NOTE: caution
for violation of
Yes assumption of
independence of
errors

True
Complete Hierarchical multiple regression analysis:
SW388R7
Data Analysis &

multicollinearity
Computers II

Slide 85

Question: Multicollinearity is not a problem in this regression analysis?

Tolerance for all IV’s


greater than 0.10, No
indicating no False
multicollinearity?

NOTE: halt
Yes the analysis
until problem
is diagnosed
True
Complete Hierarchical multiple regression analysis:
SW388R7
Data Analysis &

overall relationship
Computers II

Slide 86

Question: Finding about overall relationship between


dependent variable and independent variables.

Probability of F test of R² No
change less than/equal to False
level of significance?

Yes

Strength of R² change for No


predictor variables False
interpreted correctly?

Yes

Small sample, ordinal Yes


variables, or violation of True with caution
assumption in the relationship?

No

True
Complete Hierarchical multiple regression analysis:
SW388R7
Data Analysis &

individual relationships
Computers II

Slide 87

Question: Finding about individual relationship between


independent variable and dependent variable.

Probability of t test No
between predictors and DV False
<= level of significance?

Yes

Direction of relationship No
between predictors and DV False
interpreted correctly?

Yes

Small sample, ordinal Yes


variables, or violation of True with caution
assumption in the relationship?

No
True
Complete Hierarchical multiple regression analysis:
SW388R7
Data Analysis &

individual relationships
Computers II

Slide 88

Question: Finding about independent variable with largest


impact on dependent variable.

Does the stated variable


No
have the largest beta
coefficient (ignoring sign) False
among predictors?

Yes

Small sample, ordinal Yes


variables, or violation of True with caution
assumption in the relationship?

No

True
Complete Hierarchical multiple regression analysis:
SW388R7
Data Analysis &

validation analysis - 1
Computers II

Slide 89

Question: The validation analysis supports the generalizability of the


findings?
Set the random seed and randomly
split the sample into 75% training
sample and 25% validation sample.

Probability of ANOVA test No


for training sample <= False
level of significance?

Yes

Probability of F for R² No
change for training sample False
<= level of significance?

Yes
Complete Hierarchical multiple regression analysis:
SW388R7
Data Analysis &

validation analysis - 2
Computers II

Slide 90

Pattern of significance for


predictor variables in
No
training sample matches
False
pattern for full data set?

Yes

Shrinkage in R² (R² for No


training sample - R² for False
validation sample) < 2%?

Yes

True
Homework Problems
SW388R7
Data Analysis &

Multiple Regression – Hierarchical Problems - 1


Computers II

Slide 91

The hierarchical regression homework problems parallel


the complete standard regression problems. The only
assumption made is the problems is that there is no
problem with missing data.

The complete hierarchical multiple regression will include:


•Testing assumptions of normality and linearity
•Testing for outliers
•Determining whether to use transformations or exclude
outliers,
•Testing for independence of errors,
•Checking for multicollinearty, and
•Validating the generalizability of the analysis.
Homework Problems
SW388R7
Data Analysis &

Multiple Regression – Hierarchical Problems - 2


Computers II

Slide 92

The statement of the hierarchical


regression problem identifies the
dependent variable, the predictor
independent variables, and the
control independent variables.
Homework Problems
SW388R7
Data Analysis &

Multiple Regression – Hierarchical Problems - 3


Computers II

Slide 93

The findings, which must all be correct for a problem to


be true, include:
•a finding about the R2 change when the predictor
independent variables are included in the regression,
and
•an interpretive statement about each of the predictor
independent variables.

The findings do not specify any finding about the control


independent variables.
Homework Problems
SW388R7
Data Analysis &

Multiple Regression – Hierarchical Problems - 4


Computers II

Slide 94

The first prerequisite for a problem is the


satisfaction of the level of measurement
and minimum sample size requirements.

Failing to satisfy either of these


requirement results in an inappropriate
application of a statistic.
Homework Problems
SW388R7
Data Analysis &

Multiple Regression – Hierarchical Problems - 5


Computers II

Slide 95

The assumption of normality requires that


each metric variable be tested. If the
variable is not normal, transformations
should be examined to see if we can
improve the distribution of the variable. If
transformations are unsuccessful, a
caution is added to any true findings.
Homework Problems
SW388R7
Data Analysis &

Multiple Regression – Hierarchical Problems - 6


Computers II

Slide 96

The assumption of linearity is examined


for any metric independent variables that
were not transformed for the assumption
of normality.
Homework Problems
SW388R7
Data Analysis &

Multiple Regression – Hierarchical Problems - 7


Computers II

Slide 97

After incorporating any transformations,


we look for outliers using standard
residuals as the criterion.
Homework Problems
SW388R7
Data Analysis &

Multiple Regression – Hierarchical Problems - 8


Computers II

Slide 98

We compare the results of the regression


without transformations and exclusion of
outliers to the model with transformations
and excluding outliers to determine
whether we will base our interpretation on
the baseline or the revised analysis.
Homework Problems
SW388R7
Data Analysis &

Multiple Regression – Hierarchical Problems - 9


Computers II

Slide 99

We test for the assumption of independence of


errors and the presence of multicollinearity.

If we violate the assumption of independence,


we attach a caution to our findings.

If there is a mutlicollinearity problem, we halt


the analysis, since we may be reporting
erroneous findings.
Homework Problems
SW388R7
Data Analysis &

Multiple Regression – Hierarchical Problems - 9


Computers II

Slide 100

In hierarchical regression, we interpret


the change in R² in the overall
relationship that is associated with the
inclusion of the predictor independent
variables. The change in R² must be
statistically significant and the magnitude
of the R² change must be correctly
stated.
Homework Problems
SW388R7
Data Analysis &

Multiple Regression – Hierarchical Problems - 10


Computers II

Slide 101

The relationships between predictor independent


variables and the dependent variable stated in the
problem must be statistically significant, and worded
correctly for the direction of the relationship.
Homework Problems
SW388R7
Data Analysis &

Multiple Regression – Hierarchical Problems - 11


Computers II

Slide 102

We use a 75-25% validation strategy to support the


generalizability of our findings. The validation must
support:
•the significance of the overall relationship,
•the statistical significance of the change in R²,
•the pattern of significance for the individual
predictors, and
•the shrinkage in R² for the validation sample must
not be more than 2% less than the training sample.
Homework Problems
SW388R7
Data Analysis &

Multiple Regression – Hierarchical Problems - 12


Computers II

Slide 103

Cautions are added as limitations


to the analysis, if needed.

Вам также может понравиться