Вы находитесь на странице: 1из 10

Chapter 12: Regression.

A relationship is sought between two numerical variables. Explanatory Variable. (Independent or Predictor) Response Variable (Dependent) Example. The included data is gives information on Used Toyota Corollas for sale in Michigan. Which variables would be considered explanatory and which response?

Remark. The values of the explanatory variable are considered fixed. The response variable is considered to be random.

Scatterplots.
The explanatory variable is always plotted on the x axis. The response variable is plotted on the y axis. Below is a scatter plot of price vs mileage. What kind of relationship is apparent in the scatterplot?

15000

10000

5000

50000

100000

150000

MILEAGE

The Simple Linear Regression Model (Section 12.1)

YEAR PRICE MILEAGE 2000 13995.00 20906 2000 13995.00 14504 2000 13995.00 18858 2000 13995.00 13734 2000 14300.00 7132 2000 14300.00 28843 2000 14300.00 8709 2001 14903.00 24 2000 14995.00 12675 2001 15274.00 19 2001 16138.00 14 2000 11995.00 19269 2000 11995.00 19828 2000 11995.00 15457 1998 11995.00 41678 2000 11995.00 16768 2000 11995.00 19601 1998 11995.00 35079 2000 11995.00 13392 1997 11995.00 34550 2000 11995.00 19468 1998 11995.00 48367 2000 11995.00 19082 2000 11995.00 19346 1998 12900.00 46349 1998 12900.00 46139 1998 12995.00 31344 1998 12995.00 35302 2000 12995.00 13320 1998 12995.00 7627 1997 12995.00 33694 2000 12995.00 30104 2000 12995.00 21363 1998 13495.00 29123 2000 13695.00 19629 2000 13995.00 11049 1993 3995.00 130935 1992 4995.00 136378 1995 4995.00 85779 1991 4995.00 118868 1993 6795.00 58799 1997 7950.00 71236 1997 8495.00 93448 1997 8880.00 59620 1998 8995.00 49832 1997 8995.00 63660 1995 8995.00 89006 1997 9950.00 44442 1998 10900.00 35969 1997 10900.00 57883 1998 10995.00 29564 1998 10995.00 40416 1999 10995.00 46428 1999 11410.00 27308 2000 11995.00 19050 2000 11995.00 19834 2000 11995.00 19082

PRICE

If the linear relationship were deterministic, then we would have y = 0+ 1x. In this case, we think of Y as a random variable. We will assume that the expected value of Y is a linear function of x, but that the variable Y differs from its expected value by a random amount. Thus, the linear model is Y = 0+ 1x + . Furthermore, we will assume that the random variables variance 2. What are the parameters of the model? 1. 2. 3. What does the line y = 0+ 1x represent? 1. are normal with mean 0 and

2.

Visualization of Model.

Estimating Model Parameters from Data (Section 12.2)


We desire to estimate the parameters 0, 1, and 2 from our data. The estimators are statistics and are denoted with a hat. Estimating 0 and 1.

15000

PRICE

10000

5000

20000

40000

60000

80000

100000

120000

140000

MILEAGE

Principle of least squares.

Finding 0 and 1that minimize f( 0, 1).

Normal equations.

Least Squares Estimates for 0 and 1.

Notation.
S xx := ( x i x ) = xi2
2

1 ( xi ) 2 n

Similarly,
S xY := ( x i x ) (Yi Y ) = xi Yi 1 ( xi )( Yi ) n

The first equality on each line is by definition. Verification of the second is a fun algebra problem. Give the least squares estimates of 0 and 1using above notation.

Find the least squares regression line for the corollas example. Summary statistics.
( prices )
61 61

i =1

y i = 712270 ,

y
i =1 61

2 i

= 8719043204 ,
2 i

(mileage )

i =1

61

xi = 2161191 ,

x
i =1

= 1305568896 47 ,

x y
i i =1

61

= 2108143647 0

Interpret what 0 and 1represent.

Fitted or predicted values.

What is the fitted value for the first data point? Residuals.

What is the residual for the first data point? Estimating 2. 1. Error Sum of Squares. 2. Degrees of freedom 3. Estimator of 2. 4. Estimate 2 for this example.

Examining Model
How well does the simple linear regression model fit the data? Residual plots address this question.

Residual Model Diagnostics


Normal Plot of Residuals
2000 1000 4000 3000 2000 1000 0 -1000 -2000 -3000 -4000 0

I Chart of Residuals
UCL=3147 22 2 Mean=5.37E-13

Residual

0 -1000 -2000 -3000 -2 -1 0 1 2

Residual

LCL=-3147

10

20

30

40

50

60

Normal Score

Observation Number

Histogram of Residuals
20 2000 1000

Residuals vs. Fits

Frequency

Residual
-3000 -2000 -1000 -2500 -1500 -500 0 500 1500 1000 2000

10

0 -1000 -2000

-3000 5000 10000 15000

Residual

Fit

SE corner. A plot of the residuals vs the explanatory variable helps one check a linear fit and the assumption of constant variance. If there is any pattern left in this plot, then the model is not adequate. Some common patterns to watch out for in a plot of residuals.

West Side. If the model appears to fit well, the normal probability plot and histogram help one check if the residuals do indeed have a normal distribution. Comment on normality for this example.

Inference for 1. (Sections 12.3)


Write as a linear combination of the Yis. 1

What is the distribution of ? 1

Comment on variance of . 1

Give formula for 95% confidence interval for 1.

Give test statistic for testing the null hypothesis that 1=0 against the alternative 1>0 , 1<0 , or 10 .

When is each alternative appropriate?

Calculating P-value.

Minitab Output: Stat>Regression>Regression.


Regression Analysis: PRICE versus MILEAGE
The regression equation is PRICE = 14403 - 0.0769 MILEAGE Predictor Constant MILEAGE S = 1183 Coef 14402.5 -0.076941 SE Coef 235.6 0.005092 T 61.14 -15.11 P 0.000 0.000

R-Sq = 79.5%

R-Sq(adj) = 79.1%

Analysis of Variance Source Regression Residual Error Total DF 1 59 60 SS 319600113 82581569 402181681 MS 319600113 1399688 F 228.34 P 0.000

Inference for Ys when x=x*. (Sections 12.4)


Write Y as a linear combination of the Yis.

What is the distribution of Y ?

Comment on variance of Y .

Give formula for 95% confidence interval for

Y-x*

Give formula for 95% prediction interval for the next value of Y when x=x*.

When is each interval appropriate?

Regression Plot
PRICE = 14402.5 - 0.0769410 MILEAGE S = 1183.08 R-Sq = 79.5 % R-Sq(adj) = 79.1 %

Minitab Output:
In Stat>Regression>Regression, under options give an x value for prediction. In Stat>Regression>Fitted Line Plot, under options, have it display confidence and prediction bands.
Predicted Values for New Observations New Obs 1 Fit 10555 SE Fit 169 ( 95.0% CI 10218, 10893) (
PRICE

15000

10000

5000

Regression 95% CI
0 0 20000 40000 60000 80000 100000 120000 140000

95% PI

MILEAGE

95.0% PI 8164, 12947)

Values of Predictors for New Observations New Obs 1 MILEAGE 50000

Joint Intervals.

Вам также может понравиться