MultipleRegression CompleteHierarchicalProblems Spring2006

SW388R7
Hierarchical Multiple Regression

Data Analysis &
Computers II
Slide 1
Differences between hierarchical

and standard multiple regression
Sample problem
Steps in hierarchical multiple regression
Homework Problems
Differences between standard and hierarchical
SW388R7
Data Analysis &
multiple regression
Computers II
Slide 2
 Standard multiple regression is used to evaluate the relationship

between a set of independent variables and a dependent variable.
 Hierarchical regression is used to evaluate the relationship between a

set of independent variables and the dependent variable, controlling
for or taking into account the impact of a different set of independent
variables on the dependent variable.
 For example, a research hypothesis might state that there are

differences between the average salary for male employees and
female employees, even after we take into account differences
between education levels and prior work experience.
 In hierarchical regression, the independent variables are entered into

the analysis in a sequence of blocks, or groups that may contain one or
more variables. In the example above, education and work experience
would be entered in the first block and sex would be entered in the
second block.
SW388R7
Differences in statistical results

Data Analysis &
Computers II
Slide 3
 SPSS shows the statistical results (Model Summary, ANOVA,

Coefficients, etc.) as each block of variables is entered into the
analysis.
 In addition (if requested), SPSS prints and tests the key statistic
used in evaluating the hierarchical hypothesis: change in R² for
each additional block of variables.
 The null hypothesis for the addition of each block of variables

to the analysis is that the change in R² (contribution to the
explanation of the variance in the dependent variable) is zero.
 If the null hypothesis is rejected, then our interpretation

indicates that the variables in block 2 had a relationship to the
dependent variable, after controlling for the relationship of the
block 1 variables to the dependent variable.
SW388R7
Variations in hierarchical regression - 1

Data Analysis &
Computers II
Slide 4
 A hierarchical regression can have as many blocks as there are

independent variables, i.e. the analyst can specify a hypothesis
that specifies an exact order of entry for variables.
 A more common hierarchical regression specifies two blocks of

variables: a set of control variables entered in the first block
and a set of predictor variables entered in the second block.
 Control variables are often demographics which are thought to

make a difference in scores on the dependent variable.
Predictors are the variables in whose effect our research
question is really interested, but whose effect we want to
separate out from the control variables.
SW388R7
Variations in hierarchical regression - 2

Data Analysis &
Computers II
Slide 5
 Support for a hierarchical hypothesis would be expected to

require statistical significance for the addition of each block of
variables.
 However, many times, we want to exclude the effect of blocks

of variables previously entered into the analysis, whether or not
a previous block was statistically significant. The analysis is
interested in obtaining the best indicator of the effect of the
predictor variables. The statistical significance of previously
entered variables is not interpreted.
 The latter strategy is the one that we will employ in our

problems.
Differences in solving hierarchical regression
SW388R7
Data Analysis &
problems
Computers II
Slide 6
 R² change, i.e. the increase when the predictors variables are

added to the analysis is interpreted rather than the overall R²
for the model with all variables entered.
 In the interpretation of individual relationships, the relationship

between the predictors and the dependent variable is
presented.
 Similarly, in the validation analysis, we are only concerned with

verifying the significance of the predictor variables.
Differences in control variables are ignored.
SW388R7
A hierarchical regression problem

Data Analysis &
Computers II
Slide 7
The problem asks us to examine the feasibility

of doing multiple regression to evaluate the
relationships among these variables. The
inclusion of the “controlling for” phrase
indicates that this is a hierarchical multiple
regression problem.
Multiple regression is feasible if the dependent

variable is metric and the independent
variables (both predictors and controls) are
metric or dichotomous, and the available data
is sufficient to satisfy the sample size
requirements.
SW388R7
Level of measurement - answer

Data Analysis &
Computers II
Slide 8
Hierarchical multiple regression

requires that the dependent
variable be metric and the
independent variables be metric
or dichotomous.
"Spouse's highest academic degree" [spdeg] is ordinal, satisfying the

metric level of measurement requirement for the dependent variable, if
we follow the convention of treating ordinal level variables as metric.
Since some data analysts do not agree with this convention, a note of
caution should be included in our interpretation.
"Age" [age] is interval, satisfying the metric or dichotomous level of

measurement requirement for independent variables.
"Highest academic degree" [degree] is ordinal, satisfying the metric or

dichotomous level of measurement requirement for independent
variables, if we follow the convention of treating ordinal level variables
as metric. Since some data analysts do not agree with this convention, a
note of caution should be included in our interpretation.
"Sex" [sex] is dichotomous, satisfying the metric or dichotomous level of

measurement requirement for independent variables.
True with caution

is the correct
answer.
SW388R7
Sample size - question

Data Analysis &
Computers II
Slide 9
The second question asks about the

sample size requirements for multiple
regression.
To answer this question, we will run the

initial or baseline multiple regression to
obtain some basic data about the
problem and solution.
SW388R7
The baseline regression - 1

Data Analysis &
Computers II
Slide 10
After we check for violations of

assumptions and outliers, we will
make a decision whether we should
interpret the model that includes the
transformed variables and omits
outliers (the revised model), or
whether we will interpret the model
that uses the untransformed
variables and includes all cases
including the outliers (the baseline
model).
In order to make this decision, we To run the baseline

run the baseline regression before model, select Regression
we examine assumptions and | Linear… from the
outliers, and record the R² for the Analyze model.
baseline model. If using
transformations and outliers
substantially improves the analysis
(a 2% increase in R²), we interpret
the revised model. If the increase is
smaller, we interpret the baseline
model.
SW388R7

Data Analysis &
Computers II
Slide 11
First, move the

dependent variable spdeg
to the Dependent text box.
Fourth, click on the Next

button to tell SPSS to add
another block of variables
to the regression analysis.
Second, move the
independent variables to
control for age and sex
to the Independent(s)
list box.
Third, select the method for
entering the variables into the
analysis from the drop down
Method menu. In this example,
we accept the default of Enter for
direct entry of all variables in the
first block which will force the
controls into the regression.
SW388R7

Data Analysis &
Computers II
Slide 12
SPSS identifies that we

will now be adding
variables to a second block.
First, move the

predictor independent
variable degree to the
Independent(s) list box
for block 2.
Second, click on the

Statistics… button to
specify the statistics
options that we want.
SW388R7

Data Analysis &
Computers II
Slide 13
First, mark the Second, mark the checkboxes for Model

checkboxes for Fit, Descriptives, and R squared change.
Estimates on the
Regression The R squared change statistic will tell
Coefficients panel. us whether or not the variables added
after the controls have a relationship to
the dependent variable.
Fifth, click on
the Continue
button to close
the dialog box.
Fourth, mark the

Third, mark the
Collinearity diagnostics to
Durbin-Watson
get tolerance values for
statistic on the
testing multicollinearity.
Residuals panel.
SW388R7

Data Analysis &
Computers II
Slide 14
Click on the OK
button to
request the
regression
output.
SW388R7
R² for the baseline model

Data Analysis &
Computers II
Slide 15
The R² of 0.281 is the benchmark

that we will use to evaluate the
utility of transformations and the Prior to any transformations of variables
elimination of outliers. to satisfy the assumptions of multiple
regression or the removal of outliers,
the proportion of variance in the
dependent variable explained by the
independent variables (R²) was 28.1%.
The relationship is statistically

significant, though we would not stop if
it were not significant because the lack
of significance may be a consequence of
violation of assumptions or the inclusion
of outliers.
SW388R7
Sample size – evidence and answer

Data Analysis &
Computers II
Slide 16
Descriptive Statistics
Mean Std. Deviation N

SPOUSES HIGHEST
1.78 1.281 136
DEGREE
AGE OF RESPONDENT 45.80 14.534 136
RESPONDENTS SEX 1.60 .491 136
RS HIGHEST DEGREE 1.65 1.220 136
Hierarchical multiple regression requires that the

minimum ratio of valid cases to independent
variables be at least 5 to 1. The ratio of valid
cases (136) to number of independent variables
(3) was 45.3 to 1, which was equal to or greater
than the minimum ratio. The requirement for a
minimum ratio of cases to independent variables
was satisfied.
In addition, the ratio of 45.3 to 1 satisfied the

preferred ratio of 15 cases per independent
variable.
The answer to the question is true.

Assumption of normality for the dependent
SW388R7
Data Analysis &
variable - question
Computers II
Slide 17
Having satisfied the level of measurement

and sample size requirements, we turn our
attention to conformity with three of the
assumptions of multiple regression: normality,
linearity, and homoscedasticity.
First, we will evaluate the assumption of

normality for the dependent variable.
SW388R7
Run the script to test normality

Data Analysis &
Computers II
Slide 18
First, move the variables to the

list boxes based on the role that
the variable plays in the analysis
and its level of measurement.
Second, click on the Normality option

button to request that SPSS produce
the output needed to evaluate the
assumption of normality.
Fourth, click on
the OK button to
produce the output.
Third, mark the checkboxes

for the transformations that
we want to test in evaluating
the assumption.
Normality of the dependent variable:
SW388R7
Data Analysis &
spouse’s highest degree

Computers II
Slide 19
Descriptives
Statis tic Std. Error

SPOUSES Mean 1.78 .110
HIGHEST DEGREE 95% Confidence Lower Bound 1.56
Interval for Mean Upper Bound
2.00
5% Trimmed Mean 1.75

Median 1.00
Variance 1.640
Std. Deviation 1.281
Minimum 0
Maximum 4
Range 4
Interquartile Range 2.00
Skewness .573 .208
Kurtos is -1.051 .413
The dependent variable "spouse's highest

academic degree" [spdeg] did not satisfy the
criteria for a normal distribution. The
skewness of the distribution (0.573) was
between -1.0 and +1.0, but the kurtosis of
the distribution (-1.051) fell outside the
range from -1.0 to +1.0. The answer to the
question is false.
Normality of the transformed dependent variable:
SW388R7
Data Analysis &
spouse’s highest degree

Computers II
Slide 20
The "log of spouse's highest academic degree

[LGSPDEG=LG10(1+SPDEG)]" satisfied the criteria
for a normal distribution. The skewness of the
distribution (-0.091) was between -1.0 and +1.0 and
the kurtosis of the distribution (-0.678) was between
-1.0 and +1.0.
The "log of spouse's highest academic degree

[LGSPDEG=LG10(1+SPDEG)]" was substituted for
"spouse's highest academic degree" [spdeg] in the
analysis.
SW388R7
Normality of the control variable: age

Data Analysis &
Computers II
Slide 21
Next, we will evaluate the

assumption of normality for
the control variable, age.
SW388R7
Normality of the control variable: age

Data Analysis &
Computers II
Slide 22
Descriptives

AGE OF RESPONDENT Mean 45.99 1.023
95% Confidence Lower Bound 43.98
48.00

Median 43.50
Variance 282.465
Minimum 19
Maximum 89
Range 70
Skewness .595 .148
Kurtos is -.351 .295
The independent variable "age" [age]

satisfied the criteria for a normal distribution.
The skewness of the distribution (0.595) was
between -1.0 and +1.0 and the kurtosis of
the distribution (-0.351) was between -1.0
and +1.0.
Normality of the predictor variable:
SW388R7
Data Analysis &
highest academic degree

Computers II
Slide 23
Next, we will evaluate the

assumption of normality for
the predictor variable,
highest academic degree.
Normality of the predictor variable:
SW388R7
Data Analysis &
respondent’s highest academic degree

Computers II
Slide 24
Descriptives

RS HIGHEST DEGREE Mean 1.41 .071
95% Confidence Lower Bound 1.27
1.55

Median 1.00
Variance 1.341
Minimum 0
Maximum 4
Range 4
Skewness .948 .149
Kurtos is -.051 .297
The independent variable "highest academic

degree" [degree] satisfied the criteria for a
normal distribution. The skewness of the
distribution (0.948) was between -1.0 and
+1.0 and the kurtosis of the distribution
(-0.051) was between -1.0 and +1.0.
Assumption of linearity for spouse’s degree and
SW388R7
Data Analysis &
respondent’s degree - question

Computers II
Slide 25
The metric independent variables satisfied the criteria for

normality, but the dependent variable did not.
However, the logarithmic transformation of "spouse's highest

academic degree" produced a variable that was normally
distributed and will be tested as a substitute in the analysis.
The script for linearity will support our using the transformed
dependent variable without having to add it to the data set.
SW388R7
Run the script to test linearity

Data Analysis &
Computers II
Slide 26
When the linearity option is

selected, a default set of
transformations to test is marked.
Second , since we have decided to

use the log transformation of the
First, click on the Linearity dependent variable, we mark the
option button to request check box for the Logarithmic
that SPSS produce the transformation and clear the check
output needed to evaluate box for the Untransformed version
the assumption of linearity. of the dependent variable.
Third, click on the

OK button to
produce the output.
Linearity test: spouse’s highest degree and
SW388R7
Data Analysis &
respondent’s highest academic degree

Computers II
Slide 27
The correlation between "highest

academic degree" and logarithmic
transformation of "spouse's highest
academic degree" was statistically
significant (r=.519, p<0.001). A
linear relationship exists between
these variables.
Linearity test: spouse’s highest degree and
SW388R7
Data Analysis &
respondent’s age
Computers II
Slide 28
The assessment of the linear

relationship between logarithmic
transformation of "spouse's highest
academic degree"
[LGSPDEG=LG10(1+SPDEG)] and "age"
[age] indicated that the relationship
was weak, rather than nonlinear.
Neither the correlation between
logarithmic transformation of "spouse's
highest academic degree" and "age" nor
the correlations with the
transformations were statistically
significant.
The correlation between "age" and

logarithmic transformation of "spouse's
highest academic degree" was not
statistically significant (r=.009,
p=0.921). The correlations for the
transformations were: the logarithmic
transformation (r=.061, p=0.482); the
square root transformation (r=.034,
p=0.692); the inverse transformation
(r=.112, p=0.194); and the square
transformation (r=-.037, p=0.668)
SW388R7
Assumption of homogeneity of variance - question

Data Analysis &
Computers II
Slide 29
Sex is the only dichotomous

independent variable in the analysis.
We will test if for homogeneity of
variance using the logarithmic
transformation of the dependent
variable which we have already
decided to use.
Run the script to test
SW388R7
Data Analysis &
homogeneity of variance
Computers II
Slide 30
When the homogeneity of variance

option is selected, a default set of
transformations to test is marked.
First, click on the
Second , since we have decided to
Homogeneity of variance
use the log transformation of the
option button to request
dependent variable, we mark the
that SPSS produce the
check box for the Logarithmic
output needed to evaluate
transformation and clear the check
the assumption of linearity.
box for the Untransformed version
of the dependent variable.
Third, click on the

OK button to
produce the output.
Assumption of homogeneity of variance – evidence and
SW388R7
Data Analysis &
answer
Computers II
Slide 31
Based on the Levene Test, the

variance in "log of spouse's highest
academic degree
[LGSPDEG=LG10(1+SPDEG)]" was
homogeneous for the categories of
"sex" [sex]. The probability
associated with the Levene statistic
(0.687) was p=0.409, greater than
the level of significance for testing
assumptions (0.01). The null
hypothesis that the group variances
were equal was not rejected.
The homogeneity of variance

assumption was satisfied. The
answer to the question is true.
SW388R7
Including the transformed variable in the data set - 1

Data Analysis &
Computers II
Slide 32
In the evaluation for normality, we resolved a problem with

normality for spouse’s highest academic degree with a
logarithmic transformation. We need to add this transformed
variable to the data set, so that we can incorporate it in our
detection of outliers.
We can use the script to compute transformed variables and add

them to the data set.
We select an assumption to test (Normality is the easiest), mark

the check box for the transformation we want to retain, and
clear the check box "Delete variables created in this analysis."
NOTE: this will leave the transformed

variable in the data set. To remove it,
you can delete the column or close the
data set without saving.
SW388R7

Data Analysis &
Computers II
Slide 33
First, move the variable

SPDEG to the list box for

Normality option button to Fourth, clear the check
request that SPSS do the test box for the option
for normality, including the "Delete variables
transformation we will mark. created in this analysis".
Third, mark the transformation

we want to retain (Logarithmic)
and clear the checkboxes for
the other transformations.
Fifth, click on
the OK button.
SW388R7

Data Analysis &
Computers II
Slide 34
If we scroll to the rightmost

column in the data editor, we
see than the log of SPDEG in
included in the data set.
Including the transformed variable in the list of
SW388R7
Data Analysis &
variables in the script - 1

Computers II
Slide 35
If we scroll to the bottom of

the list of variables, we see
that the log of SPDEG is not
included in the list of available
variables.
To tell the script to add the

log of SPDEG to the list of
variables in the script, click
on the Reset button. This
will start the script over
again, with a new list of
variables from the data set.
Including the transformed variable in the list of
SW388R7
Data Analysis &
variables in the script - 2

Computers II
Slide 36
If we scroll to the bottom of

the list of variables now, we
see that the log of SPDEG is
included in the list of available
variables.
SW388R7
Detection of outliers - question

Data Analysis &
Computers II
Slide 37
In multiple regression, an outlier in the solution

can be defined as a case that has a large residual
because the equation did a poor job of predicting
its value.
We will run the regression again incorporating any

transformations we have decided to test, and have
SPSS compute the standardized residual for each
case. Cases with a standardized residual larger
than +/- 3.0 will be treated as outliers.
SW388R7
The revised regression using transformations

Data Analysis &
Computers II
Slide 38
To run the regression to

detect outliers, select the
Linear Regression command
from the menu that drops
down when you click on the
Dialog Recall button.
The revised regression:
SW388R7
Data Analysis &
substituting transformed variables

Computers II
Slide 39
Remove the variable SPDEG

from the list of independent
variables. Include the log of
the variable, LGSPDEG.
Click on the Statistics…

button to select statistics
we will need for the analysis.
SW388R7
The revised regression: selecting statistics

Data Analysis &
Computers II
Slide 40
First, mark the Second, mark the checkboxes for Model

checkboxes for Fit, Descriptives, and R squared change.
Estimates on the
Regression The R squared change statistic will tell
Coefficients panel. us whether or not the variables added
after the controls have a relationship to
Third, mark the

Durbin-Watson Sixth, click on
statistic on the the Continue
Residuals panel. button to close
the dialog box.
Fifth, mark the

Fourth, mark the Collinearity diagnostics to
checkbox for the get tolerance values for
Casewise diagnostics, testing multicollinearity.
which will be used to
identify outliers.
SW388R7
The revised regression: saving standardized residuals

Data Analysis &
Computers II
Slide 41
Click on the
Continue
button to close
Mark the checkbox for the dialog box.
Standardized Residuals so
that SPSS saves a new
variable in the data editor.
We will use this variable to
omit outliers in the revised
regression model.
SW388R7
The revised regression: obtaining output

Data Analysis &
Computers II
Slide 42
Click on the OK
button to obtain
the output for the
revised model.
SW388R7
Outliers in the analysis

Data Analysis &
Computers II
Slide 43
If cases have a standardized residual larger than +/- 3.0,

SPSS creates a table titled Casewise Diagnostics, in which it
lists the cases and values that results in their being an outlier.
If there are no outliers, SPSS does not print the Casewise

Diagnostics table. There was no table for this problem. The
answer to the question is true.
We can verify that all standardized residuals

were less than +/- 3.0 by looking the
minimum and maximum standardized
residuals in the table of Residual Statistics.
Both the minimum and maximum fell in the
acceptable range.
Since there were no outliers,
we can use the regression just
completed to make our decision
about which model to interpret.
SW388R7
Selecting the model to interpret - question

Data Analysis &
Computers II
Slide 44
Since there were no outliers, we can

use the regression just completed to
make our decision about which
model to interpret.
If the R² for the revised model is

higher by 2% or more, we will base
out interpretation on the revised
model; otherwise, we will interpret
the baseline model.
Selecting the model to interpret – evidence and
SW388R7
Data Analysis &
answer
Computers II
Slide 45
Prior to any transformations of variables to

satisfy the assumptions of multiple regression
and the removal of outliers, the proportion of
variance in the dependent variable explained by
the independent variables (R²) was 28.1%.
After substituting transformed variables, the
proportion of variance in the dependent variable
explained by the independent variables (R²)
was 27.1%.
Since the revised regression model did not

explain at least two percent more variance than
explained by the baseline regression analysis,
the baseline regression model with all cases and
the original form of all variables should be used
for the interpretation.
The transformations used to satisfy the

assumptions will not be used, so cautions
should be added for the assumptions violated.
False is the correct answer to the question.

SW388R7
Re-running the baseline regression - 1

Data Analysis &
Computers II
Slide 46
Having decided to use the baseline

model for the interpretation of this
analysis, the SPSS regression
output was re-created.
To run the baseline regression

again, select the Linear
Regression command from
the menu that drops down
when you click on the Dialog
Recall button.
SW388R7

Data Analysis &
Computers II
Slide 47
Remove the transformed

variable lgspdeg from the
dependent variable textbox
and add the variable spdeg.
Click on the Save

button to remove
the request to
save standardized
residuals to the
data editor.
Revised regression using transformations
SW388R7
Data Analysis &
and omitting outliers - 3

Computers II
Slide 48
Click on the
Continue
button to close
Clear the checkbox for the dialog box.
Standardized Residuals
so that SPSS does not
save a new set of them
in the data editor when it
runs the new regression.
SW388R7

Data Analysis &
Computers II
Slide 49
Click on the OK
button to
request the
regression
output.
SW388R7
Assumption of independence of errors - question

Data Analysis &
Computers II
Slide 50
We can now check the

assumption of independence
of errors for the analysis we
will interpret.
Assumption of independence of errors:
SW388R7
Data Analysis &
evidence and answer

Computers II
Slide 51
Model Summaryc
Change Statis tics

Having selected Adjus
a regression
ted
model for
Std. Error of R Square Durbin-W
Model interpretation,
R R Square we can
R Square now
the Es timate the
examine Change F Change df1 df2 Sig. F Change atson
1 final
.014 assumptions
a .000 of-.015
independence 1.290 of .000 .013 2 133 .987
2 errors.
.531 b .281 .265 1.098 .281 51.670 1 132 .000 1.754
a. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT
The Durbin-Watson statistic is used to
b. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, RS HIGHEST DEGREE
test for the presence of serial correlation
c. Dependent Variable: SPOUSES HIGHEST DEGREE
among the residuals, i.e., the
assumption of independence of errors,
which requires that the residuals or The Durbin-Watson
errors in prediction do not follow a statistic for this problem is
pattern from case to case. 1.754 which falls within
the acceptable range.
The value of the Durbin-Watson statistic
ranges from 0 to 4. As a general rule of If the Durbin-Watson
thumb, the residuals are not correlated statistic was not in the
if the Durbin-Watson statistic is acceptable range, we
approximately 2, and an acceptable would add a caution to the
range is 1.50 - 2.50. findings for a violation of
regression assumptions.
The answer to the
question is true.
SW388R7
Multicollinearity - question
Data Analysis &
Computers II
Slide 52
The final condition that can have

an impact on our interpretation
is multicollinearity.
SW388R7
Multicollinearity – evidence and answer

Data Analysis &
Computers II
Slide 53
The tolerance values for all of the independent variables

are larger than 0.10: "highest academic degree" [degree]
(.990), "age" [age] (.954) and "sex" [sex] (.947).
Multicollinearity is not a problem in this regression analysis.
True is the correct answer to the question.

Overall relationship between dependent variable
SW388R7
Data Analysis &
and independent variables - question

Computers II
Slide 54
The first finding we want to

confirm concerns the
relationship between the
dependent variable and the set
of predictors after including the
control variables in the analysis.
SW388R7
Data Analysis &
and independent variables – evidence and answer

Computers II
Slide 55
Hierarchical multiple regression was performed to test the

hypothesis that there was a relationship between the dependent
variable "spouse's highest academic degree" [spdeg] and the
predictor independent variables "highest academic degree" [degree]
after controlling for the effect of the control independent variables
"age" [age] and "sex" [sex]. In hierarchical regression, the
interpretation for overall relationship focuses on the change in R².
If change in R² is statistically significant, the overall relationship for
all independent variables will be significant as well.
SW388R7
Data Analysis &

Computers II
Slide 56
Based on model 2 in the Model Summary table where the predictors

were added , (F(1, 132) = 51.670, p<0.001), the predictor variable,
highest academic degree, did contribute to the overall relationship
with the dependent variable, spouse's highest academic degree.
Since the probability of the F statistic (p<0.001) was less than or
equal to the level of significance (0.05), the null hypothesis that
change in R² was equal to 0 was rejected. The research hypothesis
that highest academic degree reduced the error in predicting
spouse's highest academic degree was supported.
SW388R7
Data Analysis &

Computers II
Slide 57
The increase in R² by including the predictor variables

("highest academic degree") in the analysis was 0.281,
not 0.241.
Using a proportional reduction in error interpretation for

R², information provided by the predictor variables
reduced our error in predicting "spouse's highest
academic degree" [spdeg] by 28.1%, not 24.1%.
The answer to the

question is false because
the problem stated an
incorrect statistical value.
Relationship of the predictor variable and the
SW388R7
Data Analysis &
dependent variable - question

Computers II
Slide 58
In these hierarchical regression

problems, we will focus the
interpretation of individual relationships
on the predictor variables and ignore the
contribution of the control variables.
SW388R7
Data Analysis &
dependent variable – evidence and answer

Computers II
Slide 59
Coefficientsa
Uns tandardized Standardized

Coefficients Coefficients Collinearity Statis tics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Cons tant) 1.781 .577 3.085 .002
AGE OF RESPONDENT .001 .008 .009 .100 .920 .956 1.046
RESPONDENTS SEX -.023 .231 -.009 -.100 .920 .956 1.046
2 (Cons tant) .525 .521 1.007 .316
AGE OF RESPONDENT .003 .007 .037 .495 .622 .954 1.049
RESPONDENTS SEX .114 .198 .044 .575 .566 .947 1.056
RS HIGHEST DEGREE .559 .078 .533 7.188 .000 .990 1.010
a. Dependent Variable: SPOUSES HIGHEST DEGREE
Based on the statistical test of the b coefficient

(t = 7.188, p<0.001) for the independent
variable "highest academic degree" [degree],
the null hypothesis that the slope or b
coefficient was equal to 0 (zero) was rejected.
The research hypothesis that there was a
relationship between "highest academic
degree" and "spouse's highest academic
degree" was supported.
SW388R7
Data Analysis &
dependent variable – evidence and answer

Computers II
Slide 60
Coefficientsa
Uns tandardized Standardized

Coefficients Coefficients Collinearity Statis tics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Cons tant) 1.781 .577 3.085 .002
The b coefficient for the relationship
AGE OF RESPONDENT .001 .008 between
.009 the .100
dependent.920 variable "spouse's
.956 1.046
RESPONDENTS SEX -.023 .231 highest
-.009 academic
-.100 degree" .920[spdeg].956
and the 1.046
2 (Cons tant) .525 .521 independent1.007variable "highest
.316 academic
degree" [degree]. was .559, which implies
AGE OF RESPONDENT .003 .007 .037 relationship
a direct .495 .622 the sign
because .954 of 1.049
RESPONDENTS SEX .114 .198 the.044
coefficient is positive.
.575 .566Higher numeric
.947 1.056
RS HIGHEST DEGREE .559 .078 values
.533 for the independent
7.188 .000 variable
.990 1.010
"highest academic degree" [degree] are
a. Dependent Variable: SPOUSES HIGHEST DEGREE associated with higher numeric values for
the dependent variable "spouse's highest
academic degree" [spdeg].
The statement in the problem that "survey

respondents who had higher academic
degrees had spouses with higher academic
degrees" is correct. The answer to the
question is true with caution. Caution in
interpreting the relationship should be
exercised because of an ordinal variable
treated as metric; and violation of the
assumption of normality.
SW388R7
Validation analysis - question

Data Analysis &
Computers II
Slide 61
The problem states the

random number seed to use
in the validation analysis.
Validation analysis:
SW388R7
Data Analysis &
set the random number seed

Computers II
Slide 62
Validate the results of

your regression analysis
by conducting a 75/25%
cross-validation, using
998794 as the random
number seed.
To set the random number

seed, select the Random
Number Seed… command
from the Transform menu.
SW388R7
Set the random number seed

Data Analysis &
Computers II
Slide 63
First, click on the

Set seed to option
button to activate
the text box.
Second, type in the

random seed stated in
the problem.
Third, click on the OK

button to complete the
dialog box.
Note that SPSS does not

provide you with any
feedback about the change.
Validation analysis:
SW388R7
Data Analysis &
compute the split variable

Computers II
Slide 64
To enter the formula for the

variable that will split the
sample in two parts, click
on the Compute… command.
SW388R7
The formula for the split variable

Data Analysis &
Computers II
Slide 65
First, type the name for the

new variable, split, into the
Target Variable text box.
Second, the formula for the

value of split is shown in the
text box.
The uniform(1) function

generates a random decimal
number between 0 and 1.
The random number is
compared to the value 0.75.
If the random number is less

than or equal to 0.75, the
value of the formula will be 1,
the SPSS numeric equivalent
to true. If the random
number is larger than 0.75,
the formula will return a 0,
the SPSS numeric equivalent
Third, click on the
to false.
OK button to
complete the dialog
box.
SW388R7
The split variable in the data editor

Data Analysis &
Computers II
Slide 66
In the data editor, the

split variable shows a
random pattern of zero’s
and one’s.
To select the cases for the

training sample, we select
the cases where split = 1.
SW388R7
Repeat the regression for the validation

Data Analysis &
Computers II
Slide 67
To run the regression for the

validation training sample,
select the Linear Regression
command from the menu that
drops down when you click on
the Dialog Recall button.
SW388R7
Using "split" as the selection variable

Data Analysis &
Computers II
Slide 68
First, scroll
down the list of
variables and
highlight the
variable split.

right arrow button to
move the split variable
to the Selection
Variable text box.
SW388R7
Setting the value of split to select cases

Data Analysis &
Computers II
Slide 69
When the variable named

split is moved to the
Selection Variable text box,
SPSS adds "=?" after the
name to prompt up to Click on the
enter a specific value for Rule… button
split. to enter a
value for split.
SW388R7
Completing the value selection

Data Analysis &
Computers II
Slide 70
First, type the value

for the training sample,
1, into the Value text
box.

Continue button to
complete the value entry.
SW388R7
Requesting output for the validation analysis

Data Analysis &
Computers II
Slide 71
Click on the OK
button to
request the
output.
When the value entry

dialog box is closed, SPSS
adds the value we entered
after the equal sign. This
specification now tells
SPSS to include in the
analysis only those cases
that have a value of 1 for
the split variable.
SW388R7
Validation analysis - 1
Data Analysis &
Computers II
Slide 72
The validation analysis requires that the

regression model for the 75% training
sample replicate the pattern of statistical
significance found for the full data set.
In the analysis of the 75% training sample, the

relationship between the set of independent
variables and the dependent variable was
statistically significant, F(3, 103) = 11.569,
p<0.001, as was the overall relationship in the
analysis of the full data set, F(3, 132) = 17.235,
p<0.001
SW388R7
Data Analysis &
Computers II
Slide 73
The validation of a hierarchical regression

model also requires that the change in R²
demonstrate statistical significance in the
analysis of the 75% training sample.
The R² change of 0.249

satisfied this requirement
(F change(1, 103) =
34.319, p<0.001).
SW388R7
Data Analysis &
Computers II
Slide 74
The pattern of significance for the individual

relationships between the dependent variable and
the predictor variable was the same for the
analysis using the full data set and the 75%
training sample.
The relationship between highest academic degree and

spouse's highest academic degree was statistically significant
in both the analysis using the full data set (t=7.188,
p<0.001) and the analysis using the 75% training sample
(t=5.484, p<0.001). The pattern of statistical significance of
the independent variables for the analysis using the 75%
training sample matched the pattern identified in the
analysis of the full data set.
SW388R7
Data Analysis &
Computers II
Slide 75
The total proportion of variance explained in the

model using the training sample was 25.2%
(.502²), compared to 40.6% (.637²) for the
validation sample. The value of R² for the
validation sample was actually larger than the
value of R² for the training sample, implying a The validation analysis
better fit than obtained for the training sample. supported the
generalizability of the
This supports a conclusion that the regression findings of the analysis to
model would be effective in predicting scores for the population represented
cases other than those included in the sample. by the sample in the data
set.
The answer to the

question is true.
Steps in complete hierarchical
SW388R7
Data Analysis &
Computers II
Slide 76 regression analysis
The following flow charts depict the process for solving the complete
regression problem and determining the answer to each of the
questions encountered in the complete analysis.
Text in italics (e.g. True, False, True with caution, Incorrect

application of a statistic) represent the answers to each specific
question.
Many of the steps in hierarchical regression analysis are identical to

the steps in standard regression analysis. Steps that are different are
identified with a magenta background, with the specifics of the
difference underlined.
Complete Hierarchical multiple regression analysis:
SW388R7
Data Analysis &
level of measurement
Computers II
Slide 77
Question: do variables included in the analysis satisfy the level

of measurement requirements?
Is the dependent
variable metric and the No Incorrect
application of
independent variables
a statistic
metric or dichotomous?
Examine all independent

variables – controls as
well as predictors Yes
Ordinal variables included

Yes
in the relationship?
True with caution
No
True
SW388R7
Data Analysis &
sample size
Computers II
Slide 78
Question: Number of variables and cases satisfy sample size

requirements?
Include both controls and
Compute the baseline predictors, in the count of
regression in SPSS independent variables
Ratio of cases to No Inappropriate

independent variables at application of
least 5 to 1?
a statistic
Yes
Ratio of cases to
No
independent variables at True with caution
preferred sample size of at
least 15 to 1?
Yes
True
SW388R7
Data Analysis &
assumption of normality
Computers II
Slide 79
Question: each metric variable satisfies the assumption of

normality?
Test the dependent
variable and both
controls and predictor
independent variables
The variable
satisfies criteria for
No False
a normal distribution?
Yes Use untransformed

Log, square root, or
variable in analysis,
inverse No add caution to
True transformation interpretation for
satisfies normality? violation of normality
If more than one
transformation
satisfies normality,
use one with Yes
smallest skew
Use transformation
in revised model,
no caution needed
SW388R7
Data Analysis &
assumption of linearity
Computers II
Slide 80
Question: relationship between dependent variable and metric

independent variable satisfies assumption of linearity?
If independent
variable was
If dependent variable was transformed to satisfy
transformed for normality, use normality, skip check
transformed dependent for linearity. If more than one
variable in the test for linearity. transformation
satisfies linearity,
use one with
largest r
Probability of Pearson Probability of correlation (r)

No for relationship with any
correlation (r) <=
level of significance? transformation of IV <=
level of significance?
No
Test both Yes Yes

control and Weak
predictor
independen relationship.
t variables No caution
Use transformation needed
in revised model
True
SW388R7
Data Analysis &
assumption of homogeneity of variance

Computers II
Slide 81
Question: variance in dependent variable is uniform across the

categories of a dichotomous independent variable?
If dependent variable was

transformed for normality,
substitute transformed
dependent variable in the test
for the assumption of
homogeneity of variance
Test both
control and
predictor
independen
t variables
Probability of Levene Yes

statistic <= level of False
significance?
No
Do not test transformations of
dependent variable, add caution to
interpretation for violation of
True homoscedasticity
SW388R7
Data Analysis & Complete Hierarchical multiple regression
analysis: detecting outliers
Computers II
Slide 82
Question: After incorporating any transformations, no outliers

were detected in the regression analysis.
If any variables were transformed

for normality or linearity, substitute
transformed variables in the
regression for the detection of
outliers.
Is the standardized residual

Yes
for any case greater than False
+/-3.00?
No
Remove outliers and run
revised regression again.
True
SW388R7
Data Analysis &
picking regression model for interpretation

Computers II
Slide 83
Question: interpretation based on model that includes

transformation of variables and removes outliers?
R² for revised regression

greater than R² for
baseline regression by 2%
Yes or more? No
Pick revised regression with Pick baseline regression with

transformations and omitting untransformed variables and all
outliers for interpretation cases for interpretation
True False
SW388R7
Data Analysis &
assumption of independence of errors

Computers II
Slide 84
Question: serial correlation of errors is not a problem in this regression

analysis?
Residuals are
independent, No
Durbin-Watson between False
1.5 and 2.5?
NOTE: caution
for violation of
Yes assumption of
independence of
errors
True
SW388R7
Data Analysis &
multicollinearity
Computers II
Slide 85
Question: Multicollinearity is not a problem in this regression analysis?
Tolerance for all IV’s

greater than 0.10, No
indicating no False
multicollinearity?
NOTE: halt
Yes the analysis
until problem
is diagnosed
True
SW388R7
Data Analysis &
overall relationship
Computers II
Slide 86
Question: Finding about overall relationship between

dependent variable and independent variables.
Probability of F test of R² No
change less than/equal to False
Yes
Strength of R² change for No

predictor variables False
interpreted correctly?
Yes
Small sample, ordinal Yes

variables, or violation of True with caution
assumption in the relationship?
No
True
SW388R7
Data Analysis &
individual relationships
Computers II
Slide 87
Question: Finding about individual relationship between

independent variable and dependent variable.
Probability of t test No
between predictors and DV False
<= level of significance?
Yes
Direction of relationship No
between predictors and DV False
interpreted correctly?
Yes

No
True
SW388R7
Data Analysis &
individual relationships
Computers II
Slide 88
Question: Finding about independent variable with largest

impact on dependent variable.
Does the stated variable

No
have the largest beta
coefficient (ignoring sign) False
among predictors?
Yes

No
True
SW388R7
Data Analysis &
validation analysis - 1
Computers II
Slide 89
Question: The validation analysis supports the generalizability of the

findings?
Set the random seed and randomly
split the sample into 75% training
sample and 25% validation sample.
Probability of ANOVA test No

for training sample <= False
Yes
Probability of F for R² No
change for training sample False
<= level of significance?
Yes
SW388R7
Data Analysis &
validation analysis - 2
Computers II
Slide 90
Pattern of significance for

predictor variables in
No
training sample matches
False
pattern for full data set?
Yes
Shrinkage in R² (R² for No

training sample - R² for False
validation sample) < 2%?
Yes
True
Homework Problems
SW388R7
Data Analysis &
Multiple Regression – Hierarchical Problems - 1

Computers II
Slide 91
The hierarchical regression homework problems parallel

the complete standard regression problems. The only
assumption made is the problems is that there is no
problem with missing data.
The complete hierarchical multiple regression will include:

•Testing assumptions of normality and linearity
•Testing for outliers
•Determining whether to use transformations or exclude
outliers,
•Testing for independence of errors,
•Checking for multicollinearty, and
•Validating the generalizability of the analysis.
Homework Problems
SW388R7
Data Analysis &

Computers II
Slide 92
The statement of the hierarchical

regression problem identifies the
dependent variable, the predictor
independent variables, and the
control independent variables.
Homework Problems
SW388R7
Data Analysis &

Computers II
Slide 93
The findings, which must all be correct for a problem to

be true, include:
•a finding about the R2 change when the predictor
independent variables are included in the regression,
and
•an interpretive statement about each of the predictor
independent variables.
The findings do not specify any finding about the control

independent variables.
Homework Problems
SW388R7
Data Analysis &

Computers II
Slide 94
The first prerequisite for a problem is the

satisfaction of the level of measurement
and minimum sample size requirements.
Failing to satisfy either of these

requirement results in an inappropriate
application of a statistic.
Homework Problems
SW388R7
Data Analysis &

Computers II
Slide 95
The assumption of normality requires that

each metric variable be tested. If the
variable is not normal, transformations
should be examined to see if we can
improve the distribution of the variable. If
transformations are unsuccessful, a
caution is added to any true findings.
Homework Problems
SW388R7
Data Analysis &

Computers II
Slide 96
The assumption of linearity is examined

for any metric independent variables that
were not transformed for the assumption
of normality.
Homework Problems
SW388R7
Data Analysis &

Computers II
Slide 97
After incorporating any transformations,

we look for outliers using standard
residuals as the criterion.
Homework Problems
SW388R7
Data Analysis &

Computers II
Slide 98
We compare the results of the regression

without transformations and exclusion of
outliers to the model with transformations
and excluding outliers to determine
whether we will base our interpretation on
the baseline or the revised analysis.
Homework Problems
SW388R7
Data Analysis &

Computers II
Slide 99
We test for the assumption of independence of

errors and the presence of multicollinearity.
If we violate the assumption of independence,

we attach a caution to our findings.
If there is a mutlicollinearity problem, we halt

the analysis, since we may be reporting
erroneous findings.
Homework Problems
SW388R7
Data Analysis &

Computers II
Slide 100
In hierarchical regression, we interpret

the change in R² in the overall
relationship that is associated with the
inclusion of the predictor independent
variables. The change in R² must be
statistically significant and the magnitude
of the R² change must be correctly
stated.
Homework Problems
SW388R7
Data Analysis &

Computers II
Slide 101
The relationships between predictor independent

variables and the dependent variable stated in the
problem must be statistically significant, and worded
correctly for the direction of the relationship.
Homework Problems
SW388R7
Data Analysis &

Computers II
Slide 102
We use a 75-25% validation strategy to support the

generalizability of our findings. The validation must
support:
•the significance of the overall relationship,
•the statistical significance of the change in R²,
•the pattern of significance for the individual
predictors, and
•the shrinkage in R² for the validation sample must
not be more than 2% less than the training sample.
Homework Problems
SW388R7
Data Analysis &

Computers II
Slide 103
Cautions are added as limitations

to the analysis, if needed.

MultipleRegression CompleteHierarchicalProblems Spring2006

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

MultipleRegression CompleteHierarchicalProblems Spring2006

Загружено:

Авторское право:

Доступные форматы

SW388R7

Hierarchical Multiple Regression

Differences between hierarchical

Steps in hierarchical multiple regression

 Standard multiple regression is used to evaluate the relationship

 Hierarchical regression is used to evaluate the relationship between a

 For example, a research hypothesis might state that there are

 In hierarchical regression, the independent variables are entered into

Differences in statistical results

 SPSS shows the statistical results (Model Summary, ANOVA,

 The null hypothesis for the addition of each block of variables

 If the null hypothesis is rejected, then our interpretation

Variations in hierarchical regression - 1

 A hierarchical regression can have as many blocks as there are

 A more common hierarchical regression specifies two blocks of

 Control variables are often demographics which are thought to

Variations in hierarchical regression - 2

 Support for a hierarchical hypothesis would be expected to

 However, many times, we want to exclude the effect of blocks

 The latter strategy is the one that we will employ in our

 R² change, i.e. the increase when the predictors variables are

 In the interpretation of individual relationships, the relationship

 Similarly, in the validation analysis, we are only concerned with

A hierarchical regression problem

The problem asks us to examine the feasibility

Multiple regression is feasible if the dependent

Level of measurement - answer

Hierarchical multiple regression

"Spouse's highest academic degree" [spdeg] is ordinal, satisfying the

"Age" [age] is interval, satisfying the metric or dichotomous level of

"Highest academic degree" [degree] is ordinal, satisfying the metric or

"Sex" [sex] is dichotomous, satisfying the metric or dichotomous level of

True with caution

Sample size - question

The second question asks about the

To answer this question, we will run the

The baseline regression - 1

After we check for violations of

In order to make this decision, we To run the baseline

The baseline regression - 2

First, move the

Fourth, click on the Next

The baseline regression - 3

SPSS identifies that we

First, move the

Second, click on the

The baseline regression - 4

First, mark the Second, mark the checkboxes for Model

Fourth, mark the

The baseline regression - 5

R² for the baseline model

The R² of 0.281 is the benchmark

The relationship is statistically

Sample size – evidence and answer

Mean Std. Deviation N

Hierarchical multiple regression requires that the

In addition, the ratio of 45.3 to 1 satisfied the

The answer to the question is true.

Having satisfied the level of measurement

First, we will evaluate the assumption of

Run the script to test normality

First, move the variables to the