Mar 5621

Regression
MOHAMMED SAAD ABDALLAH

Simple linear regression
Types of Regression Models
Linear Regression: Y i   0   1 X i   i
 Y i - Outcome of Dependent Variable (response) for ith experimental/sampling unit

 X i - Level of the Independent (predictor) variable for ith experimental/sampling unit
  0   1 X i - Linear (systematic) relation between Yi and Xi (aka conditional mean)
  0 - Mean of Y when X=0 (Y-intercept)
  1 - Change in mean of Y when X increases by 1 (slope)
  i - Random error term
Note that  0 and  1 are unknown parameters. We estimate them by the least squares
method.
Polynomial (Nonlinear) Regression: Y i   0   1 X i   2 X i

2
 i
This model allows for a curvilinear (as opposed to straight line) relation. Both linear and
polynomial regression are susceptible to problems when predictions of Y are made
outside the range of the X values used to fit the model. This is referred to as
extrapolation.
Least Squares Estimation
1. Obtain a sample of n pairs (X1,Y1)…(Xn,Yn).

2. Plot the Y values on the vertical (up/down) axis versus their corresponding X values
on the horizontal (left/right) axis.
^
3. Choose the line Y i  b0  b1 X i
that minimizes the sum of squared vertical distances
n ^

^
from observed values (Yi) to their fitted values ( Y i ) Note: S S E  (Yi  Yi ) 2
i1
4. b0 is the Y-intercept for the estimated regression equation
5. b1 is the slope of the estimated regression equation
Measures of Variation
Sums of Squares
 Total sum of squares = Regression sum of squares + Error sum of squares

 Total variation = Explained variation + Unexplained variation
n
 Total sum of squares (Total Variation): S S T  i1
(Yi  Y ) 2 dfT  n  1
n ^
 Regression sum of squares (Explained Variation): S S R  
i1
(Yi  Y ) 2 df R  1
n ^
 Error sum of squares (Unexplained Variation): S S E  i1
(Yi  Yi ) 2 df E  n  2
Coefficients of Determination and Correlation

Coefficient of Determination
 Proportion of variation in Y “explained” by the regression on X

e x p la in e d v a r ia tio n SSR
 r2   0  r2  1
to ta l v a r ia tio n SST
Coefficient of Correlation
 Measure of the direction and strength of the linear association between Y and X
 r  s ig n (b 1 ) r 2
 1  r  1
Standard Error of the Estimate (Residual Standard Deviation)
 Estimated standard deviation of data (V ( Y i )  V (  i )   2 )

n 2
 Y  Y ^ 
 SSE

i1
 i i 
S  
YX
n  2 n  2
Model Assumptions
 Normally distributed errors

 Heteroscedasticity (constant error variance for Y at all levels of X)
 Independent errors (usually checked when data collected over time or space)
Residual Analysis
^
Residuals: e i  Y i  Y i  Y i  ( b 0  b 1 X i )
Plots (see prototype plots in book and in class):
^
 Plot of ei vs Y i Can be used to check for linear relation, constant variance
 If relation is nonlinear, U-shaped pattern appears
 If error variance is non constant, funnel shaped pattern appears
 If assumptions are met, random cloud of points appears
 Plot of ei vs Xi Can be used to check for linear relation, constant variance

 Plot of ei vs i Can be used to check for independence when collected over time
(see next section)
 Histogram of ei
 If distribution is normal, histogram of residuals will be mound-shaped, around 0
Measuring Autocorrelation – Durbin-Watson Test
Plot residuals versus time order
 If errors are independent there will be no pattern (random cloud centered at 0)

 If not independent (dependent), expect errors to be close together over time (distinct
curved form, centered at 0).
Durbin-Watson Test
 H0: Errors are independent (no autocorrelation among residuals)

 HA: Errors are dependent (Positive autocorrelation among residuals)
n

i2
( e i  e i1 ) 2
 Test Statistic: D  n
i1
e i2
 Decision Rule (Values of d L ( k , n ) and d U ( k , n ) are given in Table E.10):

 If D  d u ( k , n ) then conclude H0 (independent errors) where k is the number
of independent variables (k=1 for simple regression)
 If D  d L ( k , n ) then conclude HA (dependent errors) where k is the number of
independent variables (k=1 for simple regression)
 If d L ( k , n )  D  d U ( k , n ) then we withhold judgment (possibly need a longer
series)
NOTE: This is a test that you “want” to include in favor of the null hypothesis. If you
reject the null in this test, a more complex model needs to be fit (will be discussed in
Chapter 13).
Inferences Concerning the Slope
t-test
Test used to determine whether the population based slope parameter (  1 ) is equal to a
pre-determined value (often, but not necessarily 0). Tests can be one-sided (pre-
determined direction) or two-sided (either direction).
2-sided t-test:
H 0 :  1   10 H A :  1   10
b 1   10 S YX
T S : t obs  S b1 
S b1 n

i1
(X i  X )2
R R :|t obs | t 
,n  2
2
1-sided t-test (Upper-tail, reverse signs for lower tail):
H 0 :  1   10 H A :  1   10
b 1   10 S YX
T S : t obs  S b1 
S b1 n

i1
(X i  X )2
R R : tobs  t 
,n  2
2
F-test (based on k independent variables)
A test based directly on sum of squares that tests the specific hypotheses of whether the
slope parameter is 0 (2-sided). The book describes the general case of k predictor
variables, for simple linear regression, k=1.
H 0 :1  0 H A : 1  0
M SR SSR / k
TS:F  
obs
M SE S S E / (n  k  1)
R R : F obs  F  ,k ,n  k  1
Analysis of Variance (based on k Predictor Variables)
Source df Sum of Squares Mean Square F

Regression k SSR MSR=SSR/k Fobs=MSR/MSE
Error n-k-1 SSE MSE=SSE/(n-k-1) ---
Total n-1 SST --- ---
(1-)100% Confidence Interval for the slope parameter,  
b1  t S b1
,n  2
2
 If entire interval is positive, conclude >0 (Positive association)

 If interval contains 0, conclude (do not reject) 0 (No association)
 If entire interval is negative, conclude <0 (Negative association)
Estimation of Mean and Prediction of Individual Outcomes
Estimating the Population Mean Response when X=Xi
Parameter: E [ Y | X  X i ]   0   1 X i
^
Point Estimate: Y i  b0  b1 X i
(1-)100% Confidence Interval for E [ Y | X  X i ]   0   1 X i :
^ 1 (X i  X )2
Y i t S YX hi hi   n
n

,n  2
2 (X i  X )2
i1
Predicting a Future Outcome when X=Xi
Future Outcome: Y i   0   1 X i   i
^
Point Prediction: Y i  b0  b1 X i
(1-)100% Prediction Interval for Y i   0   1 X i   i

^ 1 (X i  X )2
Y i t S YX 1  h i hi   n
n
 (X i  X )2
,n  2
2
i1
Problem Areas in Regression Analysis
 Awareness of model assumptions

 Knowing methods of checking model assumptions
 Alternative methods when model assumptions are not met
 Not understanding the theoretical background of the problem
The most important means of identifying problems with model assumptions is based on
plots of data and residuals. Often transformations of the dependent and/or independent
variables can solve the problems. Other, more complex methods may need to be used.
See Section 11-5 for plots used to check assumptions and Exhibit 11.3 on pages 555-556
of text.
Computational Formulas for Simple Linear Regression
Normal Equations (Based on Minimizing SSE by Calculus)
n n
 Yi  nb0  b1 X i
i1 i1
n n n
i1
X iY i  b 0 i1
X i  b1 X
i1
i
2
Computational Formula for the Slope, b1
SSXY
b1 
SSXX
 n
 n

n n 

 X i

 Yi

SSXY  
i1
( X i  X )(Yi  Y )  
i1
X iY i 
i1
n
i1
2
 n

n n 

 X i

SSXX  
i1
(X i  X )  2

i1
X i
2

i1
n
Computational Formula for the Y-intercept, b0
b0  Y  b1 X
Computational Formula for the Total Sum of Squares, SST

2
 n

n n 

i  1 Y i 
SST  
i1
(Yi  Y ) 2  
i1
Y i2 
n
Computational Formula for the Regression Sum of Squares, SSR

2
 n

n 2
(SSX Y )2 n n 

i  1 Y i 
 Y ^  Y 
SSR  
i1
 i  
SSXX
 b0 
i1
Y i  b 1  X iY i 
i1 n
Computational Formula for Error Sum of Squares, SSE

n 2 n n n
 Y  Y^ 
SSE  
i1
 i i
  SST  SSR   i1
Y i2  b 0  i1
Y i  b 1  X iY i
i1
Computational Formula for the Standard Error of the Slope, S b1
S YX S YX
S b1  
SSXX n
 i1
(X i  X )2
Chapter 12 – mULTIple Linear regression
Multiple Regression Model with k Independent Variables
General Case: We have k variables that we control, or know in advance of outcome, that
are used to predict Y, the response (dependent variable). The k independent variables are
labeled X1, X2,...,Xk. The levels of these variables for the ith case are labeled X1i ,..., Xki .
Note that simple linear regression is a special case where k=1, thus the methods used are
just basic extensions of what we have previously done.
Yi = 0 + X1i + ... + kXki + i
where j is the change in mean for Y when variable Xj increases by 1 unit, while holding
the k-1 remaining independent variables constant (partial regression coefficient). This is
also referred to as the slope of Y with variable Xj holding the other predictors constant.
Least Squares Fitted (Prediction) Equation (obtained by minimizing SSE):
^
Y i  b0  b1 X 1i   bk X ki
Coefficient of Multiple Determination
Proportion of variation in Y “explained” by the regression on the k independent variables.
n 2
 Y^  Y 

i1
 i  SSR SSE
R 2
 r Y 2 1 , , k  n   1 
SST SST
 Y  2
i  Y
i1
Adjusted-R2
Used to compare models with different sets of independent variables in terms of

predictive capabilities. Penalizes models with unnecessary or redundant predictors.
 n  1  SSE    n  1 
Adj  R 2
 r a 2d j  1   
 n  k  1 SST  

  1   1  r Y 2  1 , ... , k  
 n  k  1   
Residual Analysis for Multiple Regression (Sec. 12-2)
Very similar to case for Simple Regression. Only difference is to plot residuals versus
each independent variable. Similar interpretations of plots. A review of plots and
interpretations:
^
Residuals: e i  Y i  Y i  Y i  ( b 0  b 1 X 1i   bk X ki )
Plots (see prototype plots in book and in class):
^
 Plot of ei vs Y i Can be used to check for linear relation, constant variance
 Plot of ei vs Xji for each j Can be used to check for linear relation with respect to Xj
 Plot of ei vs i Can be used to check for independence when collected over time
 If errors are dependent, smooth pattern will appear
 If errors are independent, random cloud of points appears
 Histogram of ei
 If distribution is normal, histogram of residuals will be mound-shaped, around 0
F-test for the Overall Model
Used to test whether any of the independent variables are linearly associated with Y
Analysis of Variance
n
Total Sum of Squares and df: S S T  
i1
(Yi  Y ) 2 dfT  n  1
n ^
Regression Sum of Squares: S S R  i1
(Y i  Y ) 2 df R  k
n ^
Error Sum of Squares: S S E  
i1
(Yi  Yi ) 2 df E  n  k  1
Source df SS MS F
Regression k SSR MSR=SSR/k Fobs=MSR/MSE
Error n-k-1 SSE MSE=SSE/(n-k-1) ---
Total n-1 SST --- ---
F-test for Overall Model
H0: k = 0 (Y is not linearly associated with any of the independent variables)
HA: Not all j = 0 (At least one of the independent variables is associated with Y )
M SR
TS: F o b s 
M SE
RR: F o b s  F  , k , n  k  1
P-value: Area in the F-distribution to the right of Fobs

Inferences Concerning Individual Regression Coefficients (Sec. 12-4)
Used to test or estimate the slope of Y with respect to Xj, after controlling for all other
predictor variables.
t-test for  j
H0: j = j0 (often, but not necessarily 0)
HA: j  j0
bj   0
j
TS: t o b s 
S b j
|tobs |  t 
RR: ,n  k  1
2
P-value: Twice the area in the t-distribution to the right of | tobs |
(1-% Confidence Interval for  j
b j  t S b j
,n  k  1
2
 If entire interval is positive, conclude j>0 (Positive association)

 If interval contains 0, conclude (do not reject) j0 (No association)
 If entire interval is negative, conclude j<0 (Negative association)
Testing Portions of the Model (Sec. 12-5, But Different Notation)
Consider 2 blocks of independent varibles:
 Block A containing k-q independent variables

 Block B containing q independent variables
We wish to test whether the set of independent variables in block B can be dropped from
a model containing the set of independent variables. That is, the variables in block B add
no predictive value to the variables in block A.
Notation:
 SSR(A&B) is the regression sum of squares for model containing both blocks
 SSR(A) is the regression sum of squares for model containing block A
 MSE(A&B) is the error mean square for model containing both blocks
 k is the number of predictors in the model containing both blocks
 q is the number of predictors in block B
H0: The ’s in block B are all 0, given that the variables in block A have been included in
model
HA: Not all ’s in block B are 0, given that the variables in block A have been included in
model
SSR (A& B )  SSR (A) SSR (B|A)

 q   q 
TS:     M SR (B|A)
F obs   
M SE (A& B ) M SE (A& B ) M SE (A& B )
RR: F o b s  F  , q , n  k  1
P-value: Area in Fdistribution to the right of Fobs

Coefficient of Partial Determination (Notation differs from text)
Measures the fraction of variation in Y explained by Xj, that is not explained by other
independent variables.
SSR ( X 1, X 2)  SSR ( X 1) SSR (X 2|X 1)

r Y22 1  
SST  SSR (X 1) SST  SSR ( X 1)
SSR (X 1, X 2, X 3)  SSR (X 1, X 2) SSR ( X 3|X 1, X 2)

r Y23  1 2  
SST  SSR (X 1, X 2) SST  SSR (X 1, X 2)
S S R ( X 1 , X k )  S S R ( X 1 , , X k 1 ) S S R ( X k | X 1 , , X k 1 )
r Y 2k  1 2 ... k  1  
S S T  S S R ( X 1 , , X k 1 ) S S T  S S R ( X 1 , , X k 1 )
Note that since labels are arbitrary, these can be constructed for any of the independent
variables.
Quadratic Regression Models (Sec. 12-6)
When the relation between Y and X is not linear, it can often be approximated by a
quadratic model.
Population Model: Y i   0   1 X 1i  2 X 2
1i  i
^
Fitted Model: Y i  b0  b1 X 1i  b2 X 2
1i
Testing for any association: H0: 1=2=0 HA: 1 and/or 2  0 (F-test)
Testing for a quadratic effect: H0: 2=0 HA:   0 (t-test)

Models with Dummy Variables (Sec. 12-7)
Dummy variables are used to include categorical variables in the model. If the variable
has m levels, we include m-1 dummy variables, The simplest case is binary variables with
2 levels, such as gender. The dummy variable takes on the value 1 if the characteristic of
interest is present, 0 if it is absent.
Model with no interaction
Consider a model with one numeric independent variable (X1) and one dummy variable
(X2). Then the model for the two groups (characteristic present and absent) have the
following relationships with Y:
Yi  0  1 X 1i  2 X 2i  i E (i )  0
Characteristic Present (X2i = 1): E ( Y i )   0   1 X 1i   2 (1)  (  0   2 )  1 X 1i
Characteristic Absent (X2i=0): E ( Y i )   0   1 X 1i  2 (0 )  0  1 X 1i
Note that the two groups have different Y-intercepts, but the slopes with respect to X1 are
equal.
Model with interaction
This model allows the slope of Y with respect to X1 to be different for the two groups with
respect to the categorical variable:
Yi  0  1 X 1i  2 X 2i  3 X 1i X 2i  i E (i )  0
X2i=1: E ( Y i )   0   1 X 1i   2 (1 )   3 X 1i (1)  (  0   2 )  ( 1   3 ) X 1i
X2i=0: E ( Y i )   0   1 X 1i  2 (0)  3 X 1i (0 )  0  1 X 1i
Note that the two groups have different Y-intercepts and slopes. More complex models
can be fit with multiple numeric predictors and dummy variables. See examples.
Transformations to Meet Model Assumptions of Regression (Sec. 12-8)
There are a wide range of possibilities of transformations to improve model when

assumptions on errors are violated. We will consider specifically two types of logarithmic
transformations that work well in a wide range of problems. Other transformations can be
useful in very specific situations.
Logarithmic Transformations
These transformations can often fix problems with nonlinearity, unequal variances, and
non-normality of errors. We will consider base 10 logarithms, and natural logarithms.
X '  lo g 10 ( X ) X  10 X '
X '  ln ( X ) X  e X '
Case 1: Multiplicative Model (with multiplicative nonnegative errors)
1 2
Yi  0 X 1i X 2 i i
Taking base 10 logarithms of both sides gives model:
lo g 10 ( Y i )  l o g 1 0 (  0 X 1i 1 X 2 i 2  i )  l o g 1 0 (  0 )  l o g 1 0 ( X 1
1i )  lo g 10 (X 2
2i )  lo g 10 (i) 
lo g 10 (  0 )   1 lo g 10 ( X 1i )   2 lo g 10 ( X 2 i )  lo g 10 (  i )
Procedure: First, take base 10 logarithms of Y and X1 and X2, and fit the linear regression
model on the transformed variables. The relationship will be linear for the transformed
variables.
Case 2: Exponential Model (with multiplicative nonnegative errors)
0  1 X  2 X
Yi  e i1 2i
i
Taking natural logarithms of the Y values gives the model:
ln (Y i )   0  1 X 1i  2 X 2i  ln ( i )
Multicollinearity (Sec. 12-9)
Problem: In observational studies and models with polynomial terms, the independent
variables may be highly correlated among themselves. Two classes of problems arise.
Intuitively, the variables explain the same part of the random variation in Y. This causes
the partial regression coefficients to not appear significant for individual terms (since
they are explaining the same variation). Mathematically, the standard errors of estimates
are inflated, causing wider confidence intervals for individual parameters, and smaller t-
statistics for testing whether individual partial regression coefficients are 0.
Variance Inflation Factor (VIF)
1
V IF 
j
1  r j2
2
where r j is is the coefficient of determination when variable Xj is regressed on the j-1
remaining independent variables. There is a VIF for each independent variable. A variable
is considered to be problematic if its VIF is 10.0 or larger. When multicollinearity is
present, theory and common sense should be used to choose the best variables to keep in
the model. Complex methods such as principal components regression and ridge
regression have been developed, we will not pursue them here, as they don’t help in terms
of the explanation of which independent variables are associated and cause changes in Y.
Model Building (Sec. 12-10)

A major goal of regression is to fit as simple of model as possible, in terms of the number
of predictor (independent) variables, that still explains variation in Y. That is, we wish to
choose a parsimonious model (think of Ockham’s razor). We consider two methods.
Stepwise Regression
This is a computational method that uses the following algorithm:
1. Choose a significance level (P-value) to enter the model (SLE) and a significance
level to stay in the model (SLS). Some computer packages require that SLE < SLS.
2. Fit all k simple regression models, choose the independent variable with the largest t-
statistic (smallest P-value). Check to make sure the P-value is less than SLE. If yes,
keep this predictor in model, otherwise stop (no model works).
3. Fit all k-1 variable models with two variables (one of which is the winning variable
from step 2). Choose the model with the new variable having the lowest P-value.
Check to determine that this new variable has P-value less than SLE. If no, then stop
and keep the model with only the winning variable from step 2. If yes, then check that
the variable from step 2 is still significant (it’s P-value is less than SLS), if not drop
the variable from step 2.
4. Fit all k-2 variable models with the winners from steps 2 and 3. Follow the process
described in step 3, where both variables in model from steps 2 and 3 are checked
against SLS.
5. Continue until no new terms can enter model by not meeting SLE criteria.
Best Subsets (Mallows’ Cp Statistic)
Consider a situation with T-1 potential independent variables (candidates). There are 2T-1
possible models containing subsets of the T-1 candidates. Note that the full model
containing all candidates actually has T parameters (including the intercept term).
1. Fit the full model and compute r T 2 , the coefficient of determination
2. For each model with k predictors, compute r k 2
( 1  r k2 ) ( n  T )
3. Compute C   (n  2 (k  1))
P
1  r T2
4. Choose model with smallest k so that CP is is close to or below k+1

Chapter 13 – Time series & Forecasting
Classical Multiplicative Model (Sec. 13-2)
Time series can generally be decomposed into components representing overall level,
linear trend, cyclical patterns, seasonal shifts, and random irregularities. Series that are
measured at the annual level, cannot demonstrate the seasonal patterns. In the
multiplicative model, effects of each component enter in a multiplicative manner.
Multiplicative Model Observed at Annual Level
Yi  TiC i I i where the components are trend, cycle, and irregularity effects
Multiplicative Model Observed at Seasonal Level
Y i  T i C i S i I i where the components are trend, cycle, seasonal, and irregularity effects
Smoothing a Time Series (Sec. 13-3)
Time series data are usually smoothed to reduce the impact of the random irregularities in
individual data points.
Moving Averages
Centered moving averages are averages of the value at the current point and neighboring
values (above and below the point). Due to the centering, they are typically odd
numbered. The 3- and 5-period moving averages at time point i would be:
Y i1  Y i  Y i1 Y i2  Y i1  Y i  Y i1  Y i2

M A (3)i  M A (5) i 
3 5
Clearly, their can’t be moving averages at the lower and upper end of the series.
Exponential Smoothing
This method uses all previous data in a series to smooth it, with exponentially decreasing
weights on observations further in the past:
E 1  Y1 E i  W Y i  (1  W ) E i1 i  2 ,3 , 0  W  1
The exponentially smoothed value for a current time point can be used as forecast for the
next time point.
^ ^
Y i1  E i  W Y i  (1  W ) Y i
Least Squares Trend Fitting and Forecasting (Sec. 13-4)
We consider three trend models: linear, quadratic, and exponential. In each model the
independent variable Xi is the time period of observation i. If the first observation is
period 1, then Xi = i.
Linear Trend Model
Population Model: Y i   0   1 X i   i
^
Fitted Forecasting Equation: Y i  b0  b1 X i
Quadratic Trend Model
Population Model: Y i   0   1 X i   2 X i
2
 i
^
Fitted Forecasting Equation: Y i  b0  b1 X i  b2 X i
2
Exponential Trend Model
Population Model: Y i   0  1 X i  i lo g 10 (Y i )  lo g 10 0  X i lo g 10 ( 1 )  lo g 10 (i)
^ ^
Fitted Forecasting Equation: l o g 10 (Yi )  b 0  b1 X i Y i  (1 0 b0
) ( 1 0 b1 ) X i
This exponential model’s forecasting equation is obtained by first fitting a simple linear
regression of the logarithm of Y on X, then putting the least squares estimates of the Y-
intercept and the slope into the second equation.
Model Selection Based on Differences
First Differences: Y i  Y i  1 i  2 , , n
Second Differences: ( Y i  Y i  1 )  ( Y i  1  Y i  2 )  Y i  2 Y i  1  Y i  2 i  3 , , n
Y i  Y i1
Percent Differences:  100% i  2 , , n
Y i1
If first differences are approximately constant, then the linear trend model is appropriate.
If the second differences are approximately constant, then the quadratic trend model is
appropriate. If percent differences are approximately constant, then the exponential trend
model is appropriate.
Autoregressive Model for Trend Fit and Forecasting (Sec. 13-5)
Autoregressive models allow for the possibility that the mean response is a linear
function of previous response(s).
1st Order Model
Population Model: Y i  A 0  A 1 Y i  1   i
^
Ftted Equation: Y i  a 0  a 1Y i1
Pth Order Model
Population Model: Y i  A 0  A 1 Y i  1    A P Y i  P   i
^
Ftted Equation: Y i  a 0  a 1Y i1    a P Y i P
Significance Test for Highest Order Term
H 0: A P  0 H A:A P  0
a
T S : t obs  P R R :|t obs | t 
S AP 2
,n  2 P  1
This test can be used repeatedly to determine the order of the autoregressive model. You
may start with a large value of P (say 4 or 5, depending on length of series), and
then keep fitting simpler models until the test for the last term is significant.
Fitted Pth Order Forecasting Equation
Start by fitting the Pth order autoregressive model on a dataset cotaining n observations.
Then, we can get forecasts into the future as follows:
^
Y n 1  a 0  a 1Y n  a 2 Y n 1   a P Y n  P 1
^ ^
Y n2  a0  a1Y n 1  a 2Y n   a P Y n P 2
^ ^ ^ ^
Y n j  a0  a1Y n  j1  a2 Y n  j2 aP Y n  j P
In the third equation (the general case), the Y values are the observed values when j  P
^
, and are forecasted values when j  P .
Choosing a Forecasting Model (Sec. 13-6)
To determine which of a set of potential forecasting models to use, first obtain the
prediction errors (residuals). Two widely used measures are the mean absolute deviation
(MAD) and mean square error (MSE).
n ^ n n ^ n
^
 |Y i  Y i |  |e i |  (Yi  Yi ) 2
 e i2
i1 i1 i1 i1
ei  Yi  Y i M AD   M SE  
n n n n
Plots of residuals versus time should be a random cloud, centered at 0. Linear, cyclical,
and seasonal patterns may also appear, leading to fitting models accounting for them. See
prototype plots on Page 695 of text book.
Time Series Forecasting of Quarterly or Monthly Series (Sec. 13-7)
Consider the exponential trend model, with quarterly and monthly dummy variables,
depending on the data in a series.
Exponential Model with Quarterly Data
Create three dummy variables: Q1, Q2, and Q3 where Qj=1 if the observation occured in
quarter j, 0 otherwise. Let Xi be the coded quarterly value (if the first quarter of the data is
to be labeled as 1, then Xi=i). The population model and its tranformed linear model are:
Y i   0  1 X i  2 Q 1  3 Q 2  4Q 3  i
lo g 10 (Y i )  lo g 10 (  0 )  X i lo g 10 ( 1 )  Q 1 lo g 10 (  2 )  Q 2 lo g 10 (  3 )  Q 3 lo g 10 (  4 )  lo g 10 ( i )
Fit the multiple regression model with the base 10 logarithm of Yi as the response and Xi
(which is usually i), Q1, Q2, and Q3 as the independent variables. The fitted model is:
^
lo g 10 (Yi )  b 0  b1 X i  b 2 Q 1  b 3Q 2  b4Q 3
^
b0  b1 X i b2Q 1 b3Q 2  b4Q
Y i  10 3
Exponential Model with Monthly Data
A similar approach is taken when data are monthly. We create 11 dummy variables:
M1...,M11 that take on 1 if the month of the observation is the month corresponding to the
dummy variable. That is, if the observation is for January, M1=1 and all other Mj terms
are 0. The population model and its tranformed linear model are:
Y i   0  1 X i  2 M 1  3 M 2   1 M2 1 1  i
lo g 10 (Y i )  lo g 10 (  0 )  X i lo g 10 (1 )  M 1 lo g 10 (2 )  M 2 lo g 10 (3 )  M 11 lo g 10 ( 12 )  lo g 10 (i)

The fitted model is:
^
lo g 10 (Yi )  b 0  b1 X i  b 2 M 1  b3M 2    b 12 M 11
^
b0  b1 X i  b2 M 1 b3 M   b12 M
Y i  10 2 11

Mar 5621

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Mar 5621

Загружено:

Авторское право:

Доступные форматы

Regression

MOHAMMED SAAD ABDALLAH

 Y i - Outcome of Dependent Variable (response) for ith experimental/sampling unit

Polynomial (Nonlinear) Regression: Y i   0   1 X i   2 X i

Least Squares Estimation

1. Obtain a sample of n pairs (X1,Y1)…(Xn,Yn).

 Total sum of squares = Regression sum of squares + Error sum of squares

Coefficients of Determination and Correlation

 Proportion of variation in Y “explained” by the regression on X

Standard Error of the Estimate (Residual Standard Deviation)

 Estimated standard deviation of data (V ( Y i )  V (  i )   2 )

 Normally distributed errors

Plots (see prototype plots in book and in class):

 Plot of ei vs Xi Can be used to check for linear relation, constant variance

Plot residuals versus time order

 If errors are independent there will be no pattern (random cloud centered at 0)

 H0: Errors are independent (no autocorrelation among residuals)

 Decision Rule (Values of d L ( k , n ) and d U ( k , n ) are given in Table E.10):

1-sided t-test (Upper-tail, reverse signs for lower tail):

F-test (based on k independent variables)

Source df Sum of Squares Mean Square F

(1-)100% Confidence Interval for the slope parameter,  

 If entire interval is positive, conclude >0 (Positive association)

Estimation of Mean and Prediction of Individual Outcomes

Estimating the Population Mean Response when X=Xi

(1-)100% Confidence Interval for E [ Y | X  X i ]   0   1 X i :

Predicting a Future Outcome when X=Xi

(1-)100% Prediction Interval for Y i   0   1 X i   i

 Awareness of model assumptions

Computational Formulas for Simple Linear Regression

Normal Equations (Based on Minimizing SSE by Calculus)

Computational Formula for the Slope, b1

Computational Formula for the Total Sum of Squares, SST

Computational Formula for the Regression Sum of Squares, SSR

Computational Formula for Error Sum of Squares, SSE

Computational Formula for the Standard Error of the Slope, S b1

Multiple Regression Model with k Independent Variables

Yi = 0 + X1i + ... + kXki + i

Least Squares Fitted (Prediction) Equation (obtained by minimizing SSE):

Coefficient of Multiple Determination

Proportion of variation in Y “explained” by the regression on the k independent variables.

Used to compare models with different sets of independent variables in terms of

Plots (see prototype plots in book and in class):

F-test for Overall Model

P-value: Area in the F-distribution to the right of Fobs

H0: j = j0 (often, but not necessarily 0)

P-value: Twice the area in the t-distribution to the right of | tobs |

(1-% Confidence Interval for  j

 If entire interval is positive, conclude j>0 (Positive association)

Consider 2 blocks of independent varibles:

 Block A containing k-q independent variables

SSR (A& B )  SSR (A) SSR (B|A)

P-value: Area in Fdistribution to the right of Fobs

SSR ( X 1, X 2)  SSR ( X 1) SSR (X 2|X 1)

SSR (X 1, X 2, X 3)  SSR (X 1, X 2) SSR ( X 3|X 1, X 2)

Quadratic Regression Models (Sec. 12-6)

Testing for any association: H0: 1=2=0 HA: 1 and/or 2  0 (F-test)

Testing for a quadratic effect: H0: 2=0 HA:   0 (t-test)

Model with no interaction

Characteristic Present (X2i = 1): E ( Y i )   0   1 X 1i   2 (1)  (  0   2 )  1 X 1i

Characteristic Absent (X2i=0): E ( Y i )   0   1 X 1i  2 (0 )  0  1 X 1i

Model with interaction

There are a wide range of possibilities of transformations to improve model when