LINEAR REGRESSION K.F.Turkman
Contents
1 Introduction 
3 

2 Straight line 
4 

2.1 Examining the regression equation 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
5 

2.2 Some distributional theory 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
6 

2.3 Conﬁdence intervals and tests of hypotheses regarding (β _{0} , β _{1} ). 
9 

2.4 Predicted future value of y . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
10 

2.5 Straight line regression in matrix terms . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
11 

3 Generalization to multivariate (multiple) regression 
12 

3.1 Precision of the regression equation 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
14 

3.2 Is our model correct? 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
18 

3.3 R ^{2} when there are repeated observations 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
23 

3.4 Correlation Coeﬃcients 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
24 

3.5 Partial correlation coeﬃcient 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
25 

3.6 Use of qualitative variables in the regression equation 
26 

4 Selecting the best Regression equation 
31 

4.1 Extra sums of squares and partial Ftests 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
31 

4.2 Methods of selecting the best regression 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
34 

5 Examination of residuals 
38 

5.1 Testing the independence of residuals 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
39 

5.2 Checking for normality 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
41 

5.3 Plots of residuals 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
41 

5.4 Outliers and test for inﬂuential observations 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
44 

6 Further Topics 
47 

6.1 Transformations 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
47 

6.2 Unequal variances 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
50 

6.3 Illconditioned regression, collinearity and Ridge regression 
50 

6.4 Generalized Linear models (GLIM) 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
52 

6.5 Nonlinear models 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
56 

1 
Approximate duration of the course: 12 hours References:
1. A.Sen and M. Srivastava(1990) Regression Analysis, Springer Verlag
2. N. Draper and H. Smith(1998) Applied Regression Analysis,
3. V.K. Rohatgi(1976) An Introduction to Probability Theory and Math ematical Statistics. J.Wiley and sons.
Recommended software: STATISTICA
2
1 
Introduction 
A 
common question in experimental science is to examine how some sets of 
variables eﬀect others. Some relations are deterministic and easy to interpret,
others are too complicated to grasp or describe in simple terms, possibly having a random component. In these cases, we approximate these actual relationships by simple functions or random processes using relatively simple empirical methods. Among all the methods available for approximating such complex relationships, linear regression possibly is the most used one. A common feature of this methodology is to assume a functional, parametric relationship between the variables in question, typically linear in unknown parameters which are to be estimated from the available data. Two sets of variables can be distinguished at this stage: Predictor variables and response variables. Predictor variables are those that can either be set to a desired value (controlled) or else take values that can be observed without any error. Our objective is to ﬁnd out how changes in the predictor variables eﬀect the values of the response variables. Other names frequently attached to these variables in diﬀerent books by diﬀerent authors are the following:
Predictor variables 
= 
Input variables 
= 
Xvariables 

= 
Regressors 

= 
Independent variables 

Response variable 
= 
Output variable 
= 
Yvariable 

= 
Dependent variable. 
We shall be concerned with relationships of the form
Response variable 
= 
Linear model function in terms of input variables 
+ 
random error. 
In
function of the form
the simplest case when we have data (y _{1} , x _{1} ),(y _{2} , x _{2} ),
y _{i} = β _{0} + β _{1} x _{i} + _{i}
,
i = 1, 2,
, n
3
,(y
_{n} , x _{n} ), the linear
(1)
can be used to relate y to x. We will also write down this model in generic terms as
y = β _{0} + β _{1} x + ,
Here, is a random quantity measuring the error of any individual y may fall oﬀ the regression line, or is a random quantity measuring the variation in y not explained by x. We also assume that the input variable x is assumed to be either controlled or measured without error, thus is not a random variable (As long as the error in measurement that may exist in x is smaller than the measurement error in y, then this assumption is a fairly robust one) If the relation between y and x is more complex than the linear relation ship given in (1), then models of the form
y = β _{0} + β _{1} x + β _{2} x ^{2} +
+ β _{p} x ^{p} +
(2)
can be used. Note that we say the model is linear in the sense that the model is linear in parameters. For example,
y = β _{0} x _{1} + β _{1} x _{2} ^{2} + β _{3} x _{1} x _{2} +
is linear, whereas
is not.
y
= α _{0} + α _{1} x ^{β} x ^{γ} +
1
2
2 Straight line
Suppose that we observe the data (y _{1} , x _{1} ),(y _{2} , x _{2} ), that the model (1)
,(y
_{n} , x _{n} ) and we think
(3)
is the right model.Here β _{0} , β _{1} are ﬁxed but unknown model parameters to be estimated from data. One way of obtaining estimators b _{0} , b _{1} of β _{0} , β _{1} is by minimizing
(4)
y _{i} = β _{0} + β _{1} x _{i} + _{i}
i = 1, 2,
,
, n
n
n
S =
2
i =
(y _{i} − β _{0} − β _{1} x _{i} ) ^{2}
i=1
i=1
in terms of β _{0} , β _{1} , where (b _{0} , b _{1} ) is the value of (β _{0} , β _{1} ) corresponding to the minimal value of S. Here S is called the sum of squares of errors and this
4
method is called the least square method. Under certain general conditions, the estimators (b _{0} , b _{1} ) obtained this way also turn out to be the minimum unbiased estimators as well as the maximum likelihood estimators. We can determine (b _{0} , b _{1} ) by diﬀerentiating S with respect to (β _{0} , β _{1} ) , setting them to 0 and solving for (β _{0} , β _{1} ), giving
∂S
∂β _{0}
∂S
∂β _{1}
= −2
= −2
n
(y _{i} − β _{0} − β _{1} x _{i} ) = 0,
i=1
n
i=1
(y _{i} − β _{0} − β _{1} )x _{i} = 0,
(5)
resulting in
b _{1} = n
_{i}_{=}_{1} (x _{i} − x)(y _{i} − y)
^{} n
_{i}_{=}_{1} (x _{i} − x) ^{2}
,
(6)
(7)
Here x = 1/n ^{} _{i}_{=}_{1} ^{n} x _{i} is the sample average and (5) are called the normal equations. We will call yˆ _{i} = b _{0} + b _{1} x _{i} the ﬁtted value of y at x = x _{i} and ˆ _{i} = y _{i} − yˆ _{i} the residual for the ith observation. Note that
yˆ _{i} = y + b _{1} (x _{i} − x),
b _{0} = y − b _{1} x.
so that
_{n}
i=1
_{i} =
n
i=1
(y _{i} − yˆ _{i} ) = 0.
2.1 Examining the regression equation
So far, we made no assumption involving the probability structure of , and hence of y . We now give the basic assumptions regarding the model (1). In model (1), we assume
, n are identically distributed uncorrelated random variables
1. _{i} , i = 1,
with mean E( _{i} ) = 0 and variance V ( _{i} ) = σ ^{2} , so that (assuming x variables are measured without error or controlled) y _{i} are random
variables with E(y _{i} ) = β _{0} + β _{1} x _{i} and V (y _{i} ) = σ ^{2} . (In
notation for E(y _{i} ) is E(Y X = x _{i} ).) Note that if both independent
fact, correct
5
variable X as well as the response variable Y are random variables than the random variable f (X) that minimizes
E[(Y − f (X)) ^{2} X]
is given by
E(Y X).
If further (X, Y ) have joint normal distribution, then E(Y X = x) is a linear function of the form
E(Y X = x) = β _{0} + β _{1} x.
Hence, yˆ _{i} = b _{0} + b _{1} x _{i} can and should be seen as estimator of this conditional mean.
ˆ
E(Y X = x _{i} ), the
2. _{i} and _{j} , for all i = j are uncorrelated hence y _{i} , y _{j} are also uncorrelated
.
3. _{i} ∼ N (0, σ ^{2} ), thus they are independent. Hence
Y X = x _{i} ∼ N(β _{0} + β _{1} x _{i} , σ ^{2} )
and y _{i} are independent but not identically distributed random variables (With some abuse in notation, let y _{i} = (Y X = x _{i} ) ∼ N(β _{0} +β _{1} x _{i} , σ ^{2} )) Note that in regression model, we specify the conditional distribution of Y given X = x _{i} ; full inference on (Y, X) would require the speciﬁcation of the joint distribution for (Y, X).
2.2 Some distributional theory
While examining the regression equation, we will need to test various hy potheses which, in general will depend on the distributional properties of sums of squares of independent normal variables and their ratios. In this section, we give a brief summary of distributional results of these quadratic forms.
• Normal density: Random variable X has a normal distribution with mean µ and variance σ ^{2} if it has the density
f(x) =
_{σ}_{(}_{2}_{π}_{)} _{1}_{/}_{2} exp[−( ^{(}^{x} ^{−} ^{µ}^{)} ^{2} )],
1
2σ ^{2}
6
for ∞ ≤ x ≤ ∞. Z = ^{(}^{X}^{−}^{µ}^{)} transforms X to a standard normal variate with mean 0 and variance 1.
• (central) tdistribution: X has a tdistribution with v degrees of free dom ( denote it by t(v)) if it has the density
σ
_{f} v _{(}_{t}_{)} _{=} Γ((v + 1)/2)
(vπ) ^{1}^{/}^{2} Γ(v/2) ^{(}^{1} ^{+} t
v
2
_{)} −(v+1)/2 _{)}_{,}
for −∞ ≤ t ≤ ∞. Here, Γ(q) = ^{}
∞ e ^{−}^{x} x ^{q}^{−}^{1} dx is the gamma function.
0
In General tdistribution looks like a normal distribution with heavier
tails. As v → ∞, tdistribution tends to the normal distribution and in fact t(∞) = N (0, 1). For all practical purposes, when v > 30, they are equal.
• (central) Fdistribution: X has a Fdistribution with m and n degrees of freedom (F _{m}_{,}_{n} ) if it has the density
_{f} m,n _{(}_{x}_{)} _{=} Γ((m + n)/2)(m/n) ^{m}^{/}^{2}
Γ(m/2)Γ(n/2)
_{x} m/2−1
(1 + mx/n) ^{(}^{m}^{+}^{n}^{)}^{/}^{2} ^{,}
for x ≥ 0. If X has a F _{1}_{,}_{n} distribution, than it is equivalent to the square of a random variable with t(n) distribution that is F _{1}_{,}_{n} = t Another characteristic is that F _{m}_{,}_{n} = 1/F _{n}_{,}_{m} .
n .
2
• X is said to have a χ ^{2} distribution with n degrees of freedom (χ ^{2} (n)) if it has the density function
for 0 < x ≤ ∞.
f(x) =
1
Γ(n/2)2 n/2 _{e} −x/2 _{x} n/2−1 _{,}
How do these distributions appear in the regression analysis? As we will see, most of the tests of hypotheses as well as estimators of model param eters will depend on sums of squares of independent, normally distributed random variables and their ratios and these sums usually have χ ^{2} distribu tion, whereas the ratio of independent random variables with χ ^{2} distributions have F distribution. Here is the summary of distributional results we need. For detail, see Sen and Srivastava(1990) or any good statistics book such as
Rohatgi(1976)
7
1. If X _{1} ,
, X _{n} are independent, normally distributed random variables
, µ _{n} ) and common variance σ ^{2} , then σ ^{−}^{2} Z ^{T} AZ
, X _{n} − µ _{n} ) and A is any symmetric matrix
with r = tr(A), has a (central) χ ^{2} distribution with r degrees of free
dom.(Sum of the diagonal elements of an symmetric matrix A is called the trace of the matrix and denoted by tr(A))
where Z ^{T} = (X _{1} − µ _{1} ,
with means (µ _{1} , µ _{2} ,
2. In particular,
_{n}
i=1
(X _{i} − µ _{i} ) ^{2}
σ
^{2}
has χ ^{2} (n) distribution, whereas, if µ _{i} = µ constant and is estimated by
X, then
_{n}
i=1
(X _{i} − X) ^{2}
σ
^{2}
has χ ^{2} (n − 1) distribution.
freedom in estimating the mean µ by X .)
(this is due to the loss of one degree of
3. Ratio of two independent χ ^{2} random variables each divided by their respective degrees of freedom has an F distribution. That is, if X ∼ χ ^{2} (m) and Y ∼ χ ^{2} (n) and X, Y are independent then F _{m}_{,}_{n} = ^{(}^{X}^{/}^{m}^{)}
has a F distribution with m, n degrees of freedom.
(Y /n)
4. If X ∼ N (µ, σ ^{2} ) and Y ∼ χ ^{2} (n) and if further X and Y are indepen dent, then t = ^{(}^{X}^{−}^{µ}^{)}^{/}^{σ} has a t distribution with n degrees of freedom.
^{√} Y /n
Thus we immediately see that t ^{2} = ^{(}^{X}^{−}^{µ}^{)} ^{2} ^{/}^{σ} ^{2} has F distribution with
Y /n
1, n degrees of freedom. This distribution appears when we want to look at the distribution of the form (X − µ)/σ, when X has normal distribution, but σ is not known and substituted by the empirical stan dard deviation.
This is all the distributional theory we need to deal with inference on re gression equation. Most of the eﬀort in proving results on the distributional properties of estimators and tests of hypotheses will fall on showing the in dependence of various forms of quadratic forms.
8
2.3 Conﬁdence intervals and tests of hypotheses re garding (β _{0} , β _{1} ).
A
simple calculation shows that n
b _{1} =
_{i}_{=}_{1} (x _{i} − x)(y _{i} − y)
^{} n
_{i}_{=}_{1} (x _{i} − x) ^{2}
hence
n
_{i}_{=}_{1} (x _{i} − x)y _{i}
=
^{}
(b _{1} ) =
V
_{i}_{=}_{1} (x _{i} − x) ^{2} ^{,}
n
σ 2
^{} n
_{i}_{=}_{1} (x _{i} − x) ^{2} ^{.}
(8)
^{(}^{9}^{)}
In general, σ ^{2} is not known (usually is the case) then a suitable estimator for σ ^{2} can replace σ ^{2} . If the assumed model (1) is correct, then it is known that under the normality assumption on residuals,
is
s ^{2} =
^{1}
n − 2
n
i=1
(y _{i} − yˆ _{i} ) ^{2}
the minimum variance unbiased estimator of σ ^{2} . Note that
s ^{2} = ^{1}
n
n
i=1
(y _{i} − yˆ _{i} ) ^{2}
is
are normal, we can construct the usual 100(1 − α)%) conﬁdence interval for ^{β} ^{1} ^{:}
the MLE estimator, but it is biased. Hence under the assumption that _{i}
_{b} 1 _{±} t(n − 2, 1 − 1/2α)s
^{n} [ ^{} _{i}_{=}_{1}
(x _{i} − x) ^{2} )] ^{1}^{/}^{2} ^{.}
Here, t(n − 2, 1 − 1/2α) is the 1 − 1/2α percentage point of a tdistribution with n − 2 degrees of freedom. (It is left to the reader to verify that
b _{1} − β _{1}
s.e(b _{1} )
has a tdistribution with n − 2 degrees of freedom. Here s.e stands for the standard error) The test of hypotheses
H _{0} : β _{1} = β
∗
1
,
v.s.
9
H _{1} : β _{1} = β
∗
1
can be performed by calculating the test statistic
_{t} _{=} (b _{1} − β
∗
1
)
s
n
(
i=1
(x _{i} − x) ^{2} ) ^{1}^{/}^{2} ,
and comparing t with table value t(n − 2, 1 − 1/2α). Standard error of b _{0} can similarly be calculated:
s.e(b _{0} ) =
^{} n
− x) ^{2} 1/2
2
i=1 ^{x} i
n ^{} ^{n}
i=1 ^{(}^{x} i
σ,
hence (1 − α)100% conﬁdence interval for β _{0} is given by
and the test
b _{0} ± t(n − 2, 1 − 1/2α)
^{} n
− x) ^{2} 1/2
2
i=1 ^{x} i
n ^{} ^{n}
i=1 ^{(}^{x} i
∗
0
H _{0} : β _{0} = β
,
v.s.
H _{1} : β _{0} = β
∗
0
s,
can be performed by comparing the absolute value of
_{t} _{=} (b _{0} − β
∗
0
)
s
with t(n − 2, 1 − 1/2α).
^{} n − x) 2 ^{} −1/2 2 i=1 ^{x} i 

n ^{} ^{n} i=1 ^{(}^{x} i 
2.4 Predicted future value of y
Suppose that we want to predict the future value y _{k} of the response variable y
ˆ
at the observed value of x _{k} . The expected predicted value is yˆ _{k} = E(Y X = x _{k} ) = b _{0} + b _{1} x _{k} , that is, the estimator of the mean of Y conditional at X = x _{k} . Putting in the expressions (6) and (7) for b _{1} and b _{0} and simplifying the expression, we get yˆ _{k} = y + b _{1} (x _{k} − x). One can easily check that b _{1} and y are uncorrelated so that Cov(b _{1} , y) = 0 and hence
V
(ˆy _{k} )
=
=
V (E(Y _{k} X = x _{k} )) = V ar(y) + (x _{k} − x) ^{2} V (b _{1} )
σ ^{2} /n +
(x _{k} − x) ^{2}
^{} n i=1 ^{(}^{x} i ^{−} ^{x}^{)} ^{2} ^{σ} ^{2} ^{.}
(10)
10
Now, a future observation y _{k} varies around its mean E(Y X = x _{k} ) with a variance σ ^{2} , hence
V (y _{k} ) = σ ^{2} (1 + 1/n +
(x _{k} − x) ^{2}
^{} _{i}_{=}_{1} n ^{(}^{x} _{i} ^{−} _{x}_{)} ^{2} ^{)}
^{(}^{1}^{1}^{)}
Hence, 100(1 − α)% conﬁdence interval for future observation y _{k} is given
by
yˆ _{k} ± t(n − 2, 1 − 1/2α)s[1 + 1/n +
(x _{k} − x) ^{2}
_{x}_{)} _{2} ] ^{1}^{/}^{2} .
^{} n
i=1 ^{(}^{x} i ^{−}
(12)
2.5 Straight line regression in matrix terms
Suppose we have the observations
Let
(y _{1} , x _{1} ), (y _{2} , x _{2} ), , 
(y _{n} , x _{n} ). 



y 1 y 2 . 

Y = 
. . y n 
, 




1 
x _{1} 


1 . 
x . _{2} 

X = 

, 

. . 

. . 


1 _{n} x 

β = ^{β} ^{0} , β 1 

b = ^{b} ^{0} b 1

, 

= 
1 2 . 


. 


. . 


n 
11
Then, the model (1) can be written as
Note that
Y = Xβ + .
^{T} = (Y − Xβ) ^{T} (Y − Xβ) =
n
i=1
(y _{i} − β _{0} − β _{1} x _{i} ) ^{2} ,
(13)
hence the least square estimates are obtained by minimizing ^{T} and the normal equations in (5) are given in matrix form by
(14)
and hence
(15)
provided that the inverse of the matrix X ^{T} X exists. As we will often see, the matrix X ^{T} X and its inverse (X ^{T} X) ^{−}^{1} are the back bone of multiple regression analysis. Note that
b = (X ^{T} X) ^{−}^{1} X ^{T} Y,
X ^{T} Xb = X ^{T} Y,
hence
Cov(b _{0} , b _{1} ) = Cov(y − b _{1} x, b _{1} ) = −
^{x}
^{} _{i}_{=}_{1} n (x _{i} − x) ^{2} ^{,}
V (b) = (X ^{T} X) ^{−}^{1} σ ^{2} .
ˆ
(16)
Letting a _{k} = (1, x _{k} ), we can write E(Y X = x _{k} ) = b _{0} + b _{1} x _{k} = a _{k} b, and
ˆ
V ( E(Y X = x _{k} ))
=
=
=
V (b _{0} ) + 2x _{k} Cov(b _{0} , b _{1} ) + x _{k}
a _{k} V (b)a ^{T}
^{2} V (b _{1} )
k
a _{k} (X ^{T} X) ^{−}^{1} a ^{T} σ ^{2} .
k
Here, V (b) is the covariance matrix of b.
(17)
3 Generalization to multivariate (multiple) re gression
Suppose that we have p independent variables (x _{1} , x _{2} ,
to know the eﬀect of these variables on the response variable y through the linear relationship
(18)
, x _{p} ) and we want
y _{i} = β _{0} + β _{1} x _{1}_{i} + β _{2} x _{2}_{i} +
+
β _{p} x _{p}_{i} + _{i} ,
12
for i = 1, 2, 
, 
where 
n. We can write this model in matrix term
Y = Xβ + ,
X =
1
.
.
.
.
1
Y
x
x
=
_{1}_{1}
.
.
.
.
_{1}_{n}
β =
b =
=
.
.
.
.
.
.
y
y
.
.
.
y
1
2
n
.
.
.
.
.
.
β
0
1
β
.
.
β
b
b
.
.
b
.
.
.
p
0
1
p
1
2
n
,
,
.
.
.
.
.
.
,
.
.
.
.
.
.
.
x _{p}_{1}
.
.
.
.
x _{p}_{n}
,
For this model, we again assume
1. E( _{j} ) = 0, V ( _{j} ) = σ ^{2} , for every
j
13
(19)
2. _{i} , _{j} are uncorrelated
3. _{i} have normal distribution and hence, ∼ N (0, Σ), where Σ = Iσ ^{2} , I being the identity matrix.
The generalization to the straight line gives similar results:
and
b = (X ^{T} X) ^{−}^{1} X ^{T} Y,
V (b) = (X ^{T} X) ^{−}^{1} σ ^{2} ,
V
ˆ
( E(Y X = x _{k} )) = a _{k} (X ^{T} X) ^{−}^{1} a ^{T} σ ^{2} .
k
Hence, tests of hypotheses as well as conﬁdence intervals on individual pa
rameters β _{0} ,
based on the individual standard errors of b _{i} . (Students are strongly urged to construct these conﬁdence intervals and tests of hypotheses, which are standard exercises in basic statistics)
, β _{p} and on the future observation Y _{k} can easily be constructed
3.1 Precision of the regression equation
So far, we looked at the problem of inference on the individual parameters of the model (18)
(20)
y _{i} = β _{0} + β _{1} x _{1}_{i} + β _{2} x _{2}_{i} +
+
β _{p} x _{p}_{i} + _{i} ,
however, there are many more important questions to ask;
1. Is the model we use correct?,
2. if it is the correct model, how signiﬁcant it is, in the sense that how
, x _{p} ) are contributing in explain
much the independent variables (x _{1} ,
ing the variation in the response variable y?,
3. how can we reach to a more parsimonious model by excluding those in dependent variables which do not contribute signiﬁcantly in explaining the variation in y?
We start by answering the second question. Lets assume that the model in (18) is the correct model. Then the second question can be formulated by testing the hypotheses
= β _{p} = 0 , H _{1} : not all are 0. (21)
H _{0} : β _{1} = β _{2} =
14
If we do not reject the null hypothesis, then the model is not statistically diﬀerent from the model
y = β _{0} + ,
which means that whatever the variation in the independent variables, E(Y ) remains constant, indicating that the independent variables do not contribute anything in explaining y. However, before going any further to see how such
a test can be performed, we note that the test
H _{0} : β _{1} = β _{2} =
= β _{p} = 0
,
H _{1} : not all are 0
is not equal to testing p consecutive hypotheses
(i)
H
0
: β _{i} = 0
H
(i)
1
: β _{i} = 0.
Let α _{i} = α is the type one error in testing the hypotheses H
0
(i) , that is,
α = P (rejectingH
(i) when it is true).
0
Suppose that we perform these p tests independently. Then the probability of rejecting at least one of the p hypotheses when all are true is 1−(1−α) ^{p} , which
is the type 1 error for the composite hypotheses(carried independently). Note
that this type of error increases when p increases. hence, individual hypothe
ses can not substitute a composite hypotheses without increasing the type one error. Now let us see how we can perform the composite hypotheses (21). We can write
n
i=1
(y _{i} − y) ^{2}
=
=
we can show that
Hence
_{n}
i=1
n
i=1
(y _{i} − yˆ _{i} + yˆ _{i} − y) ^{2}
n n
i=1
(y _{i} − yˆ _{i} ) ^{2} +
i=1
(ˆy _{i} − y) ^{2} + 2
n
i=1
(y _{i} − y)(y _{i} − yˆ _{i} )(22)
n
2
i=1
(y _{i} − y)(y _{i} − yˆ _{i} ) =
0.
(y _{i} − y) ^{2} =
n
i=1
(ˆy _{i} − y) ^{2} +
15
n
i=1
(y _{i} − yˆ _{i} ) ^{2} .
(23)
Гораздо больше, чем просто документы.
Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.
Отменить можно в любой момент.