Regression SPSS

REGRESSION
This procedure performs multiple linear regression with five methods for entry and
removal of variables. It also provides extensive analysis of residual and influential
cases. Caseweight (CASEWEIGHT) and regression weight (REGWGT) can be
specified in the model fitting.
Notation
The following notation is used throughout this chapter unless otherwise stated:
yi
Dependent variable for case i with variance 2 gi
ci
Caseweight for case i; ci = 1 if CASEWEIGHT is not specified
gi
Regression weight for case i; gi = 1 if REGWGT is not specified
Number of distinct cases
wi
ci gi
l
i =1
Number of independent variables

l
Sum of caseweights:
i =1
x ki
The kth independent variable for case i
Xk
Sample mean for the kth independent variable: X k =

w x

Sample mean for the dependent variable: Y = w y W

l
i ki
i =1
i i
i =1
hi
Leverage for case i

2 REGRESSION
~
hi
gi
+ hi
W
S kj
Sample covariance for X k and X j
S yy
Sample variance for Y
S ky
Sample covariance for X k and Y
Number of coefficients in the model. p = p if the intercept is not included;

otherwise p = p + 1
The sample correlation matrix for X 1 ,K , X p and Y
Descriptive Statistics
r
11
r21
R=
ry1
K
K
K
K
r1 p r1 y
r2 p r2 y
ryp
ryy
"#
##
##
$
where
rkj =
S kj
S kk S jj
and
S ky
ryk = rky =
S kk S yy
The sample mean X i and covariance Sij are computed by a provisional means
algorithm. Define
k
Wk =
w =
i
i =1
cumulative weight up to case k
REGRESSION 3
then
Xi
1k 6 = X i1 k 16 + 4 xik X i1 k 16 9 Wkk
w
and, if the intercept is included,
Cij
16
k
= Cij
1 6 4
k 1
+ xik X i
1 6 94 x
k 1
jk
Xj
w
9
1 6
k 1
wk2
Wk

Otherwise,
Cij
1 k 6 = Cij1k 16 + wk xik x jk
where
16
X i 1 = xi 1
and
16
Cij 1 = 0
The sample covariance Sij is computed as the final Cij divided by C 1 .
Sweep Operations (Dempster, 1969)

For a regression model of the form
Yi = 0 + 1 X 1i + 2 X 2i + L + p X pi + ei
sweep operations are used to compute the least squares estimates b of E and the
associated regression statistics. The sweeping starts with the correlation matrix R.
4 REGRESSION
~
Let R be the new matrix produced by sweeping on the kth row and column of R.
~
The elements of R are
1
r~kk =
rkk
r
r~ik = ik ,
rkk
rkj
r~kj =
,
rkk
ik
jk
and
rij rkk rik rkj

r~ij =
,
rkk
i k, j k
If the above sweep operations are repeatedly applied to each row of R 11 in
R=
R
R
11
R 12
21
R 22

where R 11 contains independent variables in the equation at the current step, the
result is

1
R 11
~
R=
1
R 21R 11
1
R 11
R 12
1
R 22 R 21R 11
R 12

The last row of

1
R 21R 11
contains the standardized coefficients (also called BETA), and

1
R 22 R 21R 11
R 12
REGRESSION 5
can be used to obtain the partial correlations for the variables not in the equation,
controlling for the variables already in the equation. Note that this routine is its own
inverse; that is, exactly the same operations are performed to remove a variable as
to enter a variable.
Variable Selection Criteria

Let rij be the element in the current swept matrix associated with X i and X j .
Variables are entered or removed one at a time. X k is eligible for entry if it is an
independent variable not currently in the model with
rkk t (tolerance with a default of 0.0001)

and also, for each variable X j that is currently in the model,
r

jj
r jk rkj
rkk
t 1

The above condition is imposed so that entry of the variable does not reduce the
tolerance of variables already in the model to unacceptable levels.
The F-to-enter value for X k is computed as
4C p 19V
F to enterk =
ryy Vk
with 1 and C p 1 degrees of freedom, where p is the number of coefficients

currently in the model and
Vk =
ryk rky
rkk
The F-to-remove value for X k is computed as
4C p 9 V
=
F to removek
ryy
with 1 and C p degrees of freedom.
6 REGRESSION
Methods for Variable Entry and Removal

Five methods for entry and removal of variables are available. The selection
process is repeated until the maximum number of steps (MAXSTEP) is reached or
no more independent variables qualify for entry or removal. The algorithms for
these five methods are described below.
Stepwise
If there are independent variables currently entered in the model, choose X k such
that F to removek is minimum. X k is removed if F to removek < Fout
(default = 2.71) or, if probability criteria are used, P F to removek > Pout
(default = 0.1). If the inequality does not hold, no variable is removed from the
model.
If there are no independent variables currently entered in the model or if no
entered variable is to be removed, choose X k such that F to enterk is
maximum.
Xk
is entered if
F to enterk > Fin
(default = 3.84) or,
P F to enterk < Pin (default = 0.05). If the inequality does not hold, no
variable is entered.
At each step, all eligible variables are considered for removal and entry.
Forward
This procedure is the entry phase of the stepwise procedure.
Backward
This procedure is the removal phase of the stepwise procedure and can be used only
after at least one independent variable has been entered in the model.
Enter (Forced Entry)

Choose X k such that rkk is maximum and enter X k . Repeat for all variables to
be entered.
REGRESSION 7
Remove (Forced Removal)

Choose X k such that rkk is minimum and remove X k . Repeat for all variables to
be removed.
Statistics
Summary
For the summary statistics, assume p independent variables are currently entered in
the equation, of which a block of q variables have been entered or removed in the
current step.
Multiple R
R = 1 ryy
R Square
R 2 = 1 ryy
Adjusted R Square
41 R 9 p
2
Radj
=R
C p
R Square Change (when a block of q independent variables was added or removed)

2
2
'R 2 = Rcurrent
R previous
8 REGRESSION
F Change and Significance of F Change
%K R2 4C p 9
2
KK q41 Rcurrent
9
F = & 2
KK R 4C p q9
2
19
K' q4 Rprevious
for the addition of q independent variables
for the removal of q independent variables
the degrees of freedom for the addition are q and C p , while the degrees of
freedom for the removal are q and C p q .
Residual Sum of Squares
1 6
SSe = ryy C 1 S yy
with degrees of freedom C p .
Sum of Squares Due to Regression
1 6
SS R = R 2 C 1 S yy
with degrees of freedom p.
REGRESSION 9
ANOVA Table
Analysis of Variance
df
Sum of Squares
Regression
SS R
C p
SSe
Mean Square
1SS 6 p
1SS 6 4C p 9
R
Variance-Covariance Matrix for Unstandardized Regression Coefficient Estimates

A square matrix of size p with diagonal elements equal to the variance, the below
diagonal elements equal to the covariance, and the above diagonal elements equal
to the correlations:
1 6
var bk =
rkk ryy Syy
Skk C p
cov bk , b j =
cor bk , b j =
rkj ryy Syy
Skk S jj C p
rkj
rkk rjj
Selection Criteria
Akaike Information Criterion (AIC)
AIC = C ln
SS + 2 p
C
e
10 REGRESSION
Amemiyas Prediction Criterion (PC)
41 R 94C + p 9
PC =
C p
Mallows Cp (CP)
CP =
SSe
+ 2 p* C
2
$
where $ 2 is the mean square error from fitting the model that includes all the
variables in the variable list.
Schwarz Bayesian Criterion (SBC)
SBC = C ln
SS + p ln1C6
C
e
Collinearity
Variance Inflation Factors
VIFi =
1
rii
Tolerance
Tolerancei = rii
REGRESSION 11
Eigenvalues, ON
The eigenvalues of scaled and uncentered cross-product matrix for the
independent variables in the equation are computed by the QL method
(Wilkinson and Reinsch, 1971).
Condition Indices
max j
k =
Variance-Decomposition Proportions
Let
v i = vi1 ,K , vip
be the eigenvector associated with eigenvalue i . Also, let

p
) ij =
vij2
i and ) j =
ij
i =1
The variance-decomposition proportion for the jth regression coefficient associated

with the ith component is defined as
ij = ) ij ) j
Statistics for Variables in the Equation

Regression Coefficient bk
bk =
ryk Syy
Skk
for k = 1,K , p
12 REGRESSION
The standard error of bk is computed as
rkk ryy S yy
$ bk =
S kk C p
A 95% confidence interval for Ek is constructed from
bk $ bk t 0.025, C p
If the model includes the intercept, the intercept is estimated as

p
b0 = y
b X
k
k =1
The variance of b0 is estimated by
1C 16r S + X $
C 4C p 9
p
$ b2 =
0
yy yy
k =1
2 2
k bk
+2
p 1
k = j +1 j =1
Beta Coefficients
Beta k = ryk
The standard error of Beta k is estimated by
$ Betak =
ryy rkk
C p
X j est . cov bk , b j
REGRESSION 13
F-test for Beta k
Beta
F =
$
Beta k
with 1 and C p degrees of freedom.

Part Correlation of Xk with Y
1 6
Part Corr X k =
ryk
rkk
Partial Correlation of Xk with Y
1 6
Partial Corr X k =
ryk
rkk ryy ryk rky
Statistics for Variables Not in the Equation

Standardized regression coefficient Beta k if Xk enters the equation at the next step
Beta k =
ryk
rkk
The F-test for Beta k
4C p 19r
F=
2
yk
2
rkk ryy ryk
with 1 and C p degrees of freedom
14 REGRESSION
Partial Correlation of Xk with Y
1 6
Partial X k =
ryk
ryy rkk
Tolerance of Xk
Tolerancek = rkk
Minimum tolerance among variables already in the equation if Xk enters at the next step is
min
1 j p

r r 1r r
3 8
jj
kj jk
, rkk
kk

Residuals and Associated Statistics

There are 19 temporary variables that can be added to the active system file. These
variables can be requested with the RESIDUAL subcommand.
Centered Leverage Values
For all cases, compute
%K
KK 0Cg 15 3 X X S83 XS
K
h =&
KK
KK 0Cg 15 X SX S r
'
p
ji
ki
Xk rjk
if intercept is included
jj kk
j =1 k =1
ji ki jk
j =1 k =1
jj kk
otherwise
REGRESSION 15
For selected cases, leverage is hi ; for unselected case i with positive caseweight,
leverage is
%Kg 1 + h 1 + 1 + h 1 "
K W W W + 1#$
h = & !
KKh 11 + h g 6
'
i
otherwise
Unstandardized Predicted Values
%K
b X
K
$
Y =&
KKb + b X
K'
p
k =1
if no intercept
ki
ki
otherwise
k =1
Unstandardized Residuals
ei = Yi Y$i
Standardized Residuals
%K e
ZRESID = & s
K'SYSMIS
i
if no regression weight is specified
otherwise
where s is the square root of the residual mean square.
16 REGRESSION
Standardized Predicted Values
%K Y$ Y
ZPRED = & sd
KKSYSMIS
'
i
if no regression weight is specified
otherwise
where sd is computed as
sd =
ci Y$i Y
i =1
C 1
Studentized Residuals
%K e s
~
1 h 9 g
K
4
=&
KK e ~s
K' 41 + h 9 g
i
SRESi
for selected cases with ci > 0
otherwise
Deleted Residuals
DRESIDi =
%Ke 41 h~ 9
&Ke
'
i
i

otherwise
REGRESSION 17
Studentized Deleted Residuals
%K DRESID
s1 6
K
SDRESID = &
KK e~
K' s 41 + h 9 g
i
otherwise
i
16
where s i is computed as
16
si =
4C p 9s
~
1 hi
C p 1
DRESIDi2
Adjusted Predicted Values
ADJPREDi = Yi DRESIDi
DfBeta
16
g e X WX
DFBETAi = b b i = i i
~
1 hi
Xit
where
Xit =
%K31, X ,K, X 8
&K3 X ,K, X 8
'
1i
pi
1i
ottherwise
pi
and W = diag w1 ,K , wl .
18 REGRESSION
Standardized DfBeta
16
bj bj i
SDBETAij =
t
1 6 4X WX9 jj
si
16
16
where b j b j i is the jth component of b b i .

DfFit
16
DFFITi = X i b b i =
~
hi ei
~
1 hi
Standardized DfFit
SDFITi =
DFFITi
~
s i hi
16
Covratio
s1 6
COVRATIO =
s
i
2 p
1
~
1 hi
Mahalanobis Distance
For selected cases with ci > 0 ,
MAHALi =
%&1C 16h
'C h
otherwise
REGRESSION 19
For unselected cases with ci > 0
MAHALi =
%&C h
'1C + 16h
otherwise
Cooks Distance (Cook, 1977)

For selected cases with ci > 0
%K4 DRESIDi2h~i gi 9 s2 1 p + 16
COOKi = &
K'4 DRESIDi2hi gi 9 4s2 p9
otherwise
For unselected cases with ci > 0
%K DRESID h + 1
W
K
=&
KK4 DRESID h 9 4~s p9
'
2
i
COOKi
2
i i
1 6
~
s 2 p +1
otherwise
where hi is the leverage for unselected case i, and ~

s 2 is computed as
~2
s
%K 1 SS + e 1 h 1 "

1 + W #$
KC p !
=&
KK 1 SS + e 11 h6
'C p + 1
2
i
2
i
otherwise
20 REGRESSION
Standard Errors of the Mean Predicted Values

For all the cases with positive caseweight,
%Ks
&Ks
'
SEPREDi =
~
hi gi
hi gi
otherwise
95% Confidence Interval for Mean Predicted Response
LMCIN i = Y$i t 0.025, C p SEPREDi

UMCIN i = Y$i + t 0.025, C p SEPREDi
95% Confidence Interval for a Single Observation
LICINi
UICINi
%KY$ t
=&
K'Y$ t
0.025, C p
0.025, C p s
%KY$ + t
=&
K'Y$ + t
0.025, C p
0.025, C p s
1e~ e~ 6
i
i 1
i =2
c e~
i i
i =1
where e~i = ei gi .
4h~ + 19 g
1h + 16 g
i
Durbin-Watson Statistic
DW =
otherwise
4h~ + 19 g
1h + 16 g
i
otherwise
REGRESSION 21
Partial Residual Plots

The scatterplots of the residuals of the dependent variable and an independent
variable when both of these variables are regressed on the rest of the independent
variables can be requested in the RESIDUAL branch. The algorithm for these
residuals is described in Velleman and Welsch (1981).
Missing Values
By default, a case that has a missing value for any variable is deleted from the
computation of the correlation matrix on which all consequent computations are
based. Users are allowed to change the treatment of cases with missing values.
References
Cook, R. D. 1977. Detection of influential observations in linear regression,
Technometrics, 19: 1518.
Dempster, A. P. 1969. Elements of Continuous Multivariate Analysis. Reading,
Mass.: Addison-Wesley.
Velleman, P. F., and Welsch, R. E. 1981. Efficient computing of regression
diagnostics. The American Statistician, 35: 234242.
Wilkinson, J. H., and Reinsch, C. 1971. Linear algebra. In: Handbook for
Automatic Computation, Volume II, J. H. Wilkinson and C. Reinsch, eds. New
York: Springer-Verlag.

Regression SPSS

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Regression SPSS

Загружено:

Авторское право:

Доступные форматы

REGRESSION

Dependent variable for case i with variance 2 gi

Caseweight for case i; ci = 1 if CASEWEIGHT is not specified

Regression weight for case i; gi = 1 if REGWGT is not specified

Number of distinct cases

Number of independent variables

The kth independent variable for case i

Sample mean for the kth independent variable: X k =

Leverage for case i

Sample covariance for X k and X j

Sample variance for Y

Sample covariance for X k and Y

Number of coefficients in the model. p = p if the intercept is not included;

cumulative weight up to case k

and, if the intercept is included,

Sweep Operations (Dempster, 1969)

rij rkk rik rkj

If the above sweep operations are repeatedly applied to each row of R 11 in

The last row of

contains the standardized coefficients (also called BETA), and

Variable Selection Criteria

rkk t (tolerance with a default of 0.0001)

with 1 and C p 1 degrees of freedom, where p is the number of coefficients

The F-to-remove value for X k is computed as

with 1 and C p degrees of freedom.

Methods for Variable Entry and Removal

F to enterk > Fin

(default = 3.84) or,

Enter (Forced Entry)

Remove (Forced Removal)

R Square Change (when a block of q independent variables was added or removed)

F Change and Significance of F Change

for the addition of q independent variables

for the removal of q independent variables

Variance-Covariance Matrix for Unstandardized Regression Coefficient Estimates

rkk ryy Syy

rkj ryy Syy

Amemiyas Prediction Criterion (PC)

be the eigenvector associated with eigenvalue i . Also, let

The variance-decomposition proportion for the jth regression coefficient associated

Statistics for Variables in the Equation

The standard error of bk is computed as

A 95% confidence interval for Ek is constructed from

If the model includes the intercept, the intercept is estimated as

The variance of b0 is estimated by

F-test for Beta k

with 1 and C p degrees of freedom.

Partial Correlation of Xk with Y

Statistics for Variables Not in the Equation

The F-test for Beta k

with 1 and C p degrees of freedom

Partial Correlation of Xk with Y

Residuals and Associated Statistics

Unstandardized Predicted Values

if no regression weight is specified

where s is the square root of the residual mean square.

Standardized Predicted Values

if no regression weight is specified

for selected cases with ci > 0

for selected cases with ci > 0

Studentized Deleted Residuals

for selected cases with ci > 0

Adjusted Predicted Values

where b j b j i is the jth component of b b i .

%K 1 SS + e 1 h 1 "