Академический Документы
Профессиональный Документы
Культура Документы
Model
Estimating the Coefficients
Error Variable: Required Conditions
Assessing the Model
Using the Regression Equation
Regression Diagnostics (Part 1)
Introduction
16.1 Model
y= 0 + 1 x +
y=
dependant variable
x=
independent variable
0=
y-intercept
1=
error variable
i=1
).
The sum of squares for error (SSE) is the minimised sum of squared
deviations.
Residuals are the deviations between the actual data points and the line:
o
( y i ^y i )2
e i= y i ^y i
s xy
b1 =
b0 = y b 1 x
s2x
( x ix )( y i y )
s xy = i=1
n1
( x i x )2
s 2x = i=1
n1
xi
x = i=1
n
n
y = i=1
n
Shortcuts:
o
yi
s xy =
1
n1
1
s =
n1
2
x
xi yi
i=1
x i y i i=1
x
i=1
( )
xi
n
2
i
i=1
i=1
Excel:
o Have two columns of data: one for the dependent variable; the
other for the independent variable.
o Click Data, Data Analysis, and Regression.
o
Range.
1 ) are linear.
and
a. Therefore,
5. The variance of
x :
is 0, regardless of
and
E ( i|x i )=0 .
are uncorrelated.
is a constant:
Var ( i ) = 2 .
i
a. But in reality, not necessarily true (e.g. higher income may increase
variance in expenditure because they have a greater range of
choices)
6. The error variables are uncorrelated:
Cov ( i , j )=0 .
i N ( 0, 2 ) .
SSE= ( y i ^y i ) =( n1 ) s
i=1
2
y
s2xy
s2x
If
relationship).
b1
s
s
=
b
(
( n1 ) s 2x ) decreases as
1
1=t=
increases.
b1 1
[ where =n2 ]
sb
1
Coefficient of Determination
Coefficient of Determination:
R 2=
s2xy
2
=1
2
sx s y
( y i y ) =( y i y ) + ^y i^y i
( y i y )2= ( yi ^y i )2+ ( ^y i y )2
Coefficient of Correlation
t=r
r=
s xy
sx sy
n2
1r 2 [where
is a point estimator.
1 ( x gx )
^y t 2,n2 s 1+ +
n ( n1 ) s 2x
y : ^y t 2,n2 s
2
1 ( x gx )
+
n ( n1 ) s 2x
is from
( x gx )
( n1 ) s 2x
2
1 ( x ix )
Where hi= +
n ( n1 ) s 2x
Normality
Homoscedasticity
Outliers
Outliers may be:
1. Recording errors
2. Points that should not have been included in the sample
3. Valid and should belong to the sample
Influential observations
Some points are influence in determining a least squares line. Without it,
there would be no least squares line.
Procedure
1. Develop a model that has a theoretical basis; find an independent variable
that you believe is linearly related to the dependent variable.
2. Gather data for the two variables from (preferably) a controlled
experiment, or observational data.
3. Draw a scatter diagram. Determine whether a linear model is appropriate.
Identify outliers and influential observations.
4. Determine the regression equation.
5. Calculate the residuals and check the required conditions:
a. Is the error variable normal?
b. Is the variance constant?
c. Are the errors independent?
1 or
relationship.
Compute the coefficient of determination.
model fits the data, use the regression equation to:
Predict a particular value of the dependant variable
Estimate its mean