Академический Документы
Профессиональный Документы
Культура Документы
A relationship is sought between two numerical variables. Explanatory Variable. (Independent or Predictor) Response Variable (Dependent) Example. The included data is gives information on Used Toyota Corollas for sale in Michigan. Which variables would be considered explanatory and which response?
Remark. The values of the explanatory variable are considered fixed. The response variable is considered to be random.
Scatterplots.
The explanatory variable is always plotted on the x axis. The response variable is plotted on the y axis. Below is a scatter plot of price vs mileage. What kind of relationship is apparent in the scatterplot?
15000
10000
5000
50000
100000
150000
MILEAGE
YEAR PRICE MILEAGE 2000 13995.00 20906 2000 13995.00 14504 2000 13995.00 18858 2000 13995.00 13734 2000 14300.00 7132 2000 14300.00 28843 2000 14300.00 8709 2001 14903.00 24 2000 14995.00 12675 2001 15274.00 19 2001 16138.00 14 2000 11995.00 19269 2000 11995.00 19828 2000 11995.00 15457 1998 11995.00 41678 2000 11995.00 16768 2000 11995.00 19601 1998 11995.00 35079 2000 11995.00 13392 1997 11995.00 34550 2000 11995.00 19468 1998 11995.00 48367 2000 11995.00 19082 2000 11995.00 19346 1998 12900.00 46349 1998 12900.00 46139 1998 12995.00 31344 1998 12995.00 35302 2000 12995.00 13320 1998 12995.00 7627 1997 12995.00 33694 2000 12995.00 30104 2000 12995.00 21363 1998 13495.00 29123 2000 13695.00 19629 2000 13995.00 11049 1993 3995.00 130935 1992 4995.00 136378 1995 4995.00 85779 1991 4995.00 118868 1993 6795.00 58799 1997 7950.00 71236 1997 8495.00 93448 1997 8880.00 59620 1998 8995.00 49832 1997 8995.00 63660 1995 8995.00 89006 1997 9950.00 44442 1998 10900.00 35969 1997 10900.00 57883 1998 10995.00 29564 1998 10995.00 40416 1999 10995.00 46428 1999 11410.00 27308 2000 11995.00 19050 2000 11995.00 19834 2000 11995.00 19082
PRICE
If the linear relationship were deterministic, then we would have y = 0+ 1x. In this case, we think of Y as a random variable. We will assume that the expected value of Y is a linear function of x, but that the variable Y differs from its expected value by a random amount. Thus, the linear model is Y = 0+ 1x + . Furthermore, we will assume that the random variables variance 2. What are the parameters of the model? 1. 2. 3. What does the line y = 0+ 1x represent? 1. are normal with mean 0 and
2.
Visualization of Model.
15000
PRICE
10000
5000
20000
40000
60000
80000
100000
120000
140000
MILEAGE
Normal equations.
Notation.
S xx := ( x i x ) = xi2
2
1 ( xi ) 2 n
Similarly,
S xY := ( x i x ) (Yi Y ) = xi Yi 1 ( xi )( Yi ) n
The first equality on each line is by definition. Verification of the second is a fun algebra problem. Give the least squares estimates of 0 and 1using above notation.
Find the least squares regression line for the corollas example. Summary statistics.
( prices )
61 61
i =1
y i = 712270 ,
y
i =1 61
2 i
= 8719043204 ,
2 i
(mileage )
i =1
61
xi = 2161191 ,
x
i =1
= 1305568896 47 ,
x y
i i =1
61
= 2108143647 0
What is the fitted value for the first data point? Residuals.
What is the residual for the first data point? Estimating 2. 1. Error Sum of Squares. 2. Degrees of freedom 3. Estimator of 2. 4. Estimate 2 for this example.
Examining Model
How well does the simple linear regression model fit the data? Residual plots address this question.
I Chart of Residuals
UCL=3147 22 2 Mean=5.37E-13
Residual
Residual
LCL=-3147
10
20
30
40
50
60
Normal Score
Observation Number
Histogram of Residuals
20 2000 1000
Frequency
Residual
-3000 -2000 -1000 -2500 -1500 -500 0 500 1500 1000 2000
10
0 -1000 -2000
Residual
Fit
SE corner. A plot of the residuals vs the explanatory variable helps one check a linear fit and the assumption of constant variance. If there is any pattern left in this plot, then the model is not adequate. Some common patterns to watch out for in a plot of residuals.
West Side. If the model appears to fit well, the normal probability plot and histogram help one check if the residuals do indeed have a normal distribution. Comment on normality for this example.
Comment on variance of . 1
Give test statistic for testing the null hypothesis that 1=0 against the alternative 1>0 , 1<0 , or 10 .
Calculating P-value.
R-Sq = 79.5%
R-Sq(adj) = 79.1%
Analysis of Variance Source Regression Residual Error Total DF 1 59 60 SS 319600113 82581569 402181681 MS 319600113 1399688 F 228.34 P 0.000
Comment on variance of Y .
Y-x*
Give formula for 95% prediction interval for the next value of Y when x=x*.
Regression Plot
PRICE = 14402.5 - 0.0769410 MILEAGE S = 1183.08 R-Sq = 79.5 % R-Sq(adj) = 79.1 %
Minitab Output:
In Stat>Regression>Regression, under options give an x value for prediction. In Stat>Regression>Fitted Line Plot, under options, have it display confidence and prediction bands.
Predicted Values for New Observations New Obs 1 Fit 10555 SE Fit 169 ( 95.0% CI 10218, 10893) (
PRICE
15000
10000
5000
Regression 95% CI
0 0 20000 40000 60000 80000 100000 120000 140000
95% PI
MILEAGE
Joint Intervals.