QT2 Topic 09 Simple Regression

Topic 9
Simple Linear Regression
Week 10 & 11 -1
Learning Objectives
• Determine the least squares regression

equation, and make point and interval estimates
for the dependent variable.
• Determine and interpret the value of the:
 Coefficient of correlation.
 Coefficient of determination.
• Construct confidence intervals and carry out
hypothesis tests involving the slope of the
regression line.
Learning Outcome
1. Compute statistical variables involves the use

of sample information to draw conclusion
about the population of an event
2. Select appropriate hypothesis testing methods
for different types of data
3. Use the hypothesis testing methods to test the
significance levels of an event
4. Interpret the statistical results of the
regression models
Week 10 & 11-3

Correlation vs. Regression
 A scatter plot (or scatter diagram) can be used
to show the relationship between two variables
 Correlation analysis is used to measure
strength of the association (linear relationship)
between two variables
 Correlation is only concerned with strength of the
relationship
 No causal effect is implied with correlation
 Correlation was first presented in Chapter 3
Week 10 & 11-4

Scatter Plots
 A scatter plot is a graph of the ordered

pairs (x, y) of numbers consisting of the
independent variable, x, and the dependent
variable, y.
Week 10 & 11-5

Scatter Plots - Example
 Construct a scatter plot for the data obtained in a study

of age and systolic blood pressure of six randomly
selected subjects.
 The data is given on the next slide.
Week 10 & 11-6

Subject Age, x Pressure, y

A 43 128
B 48 120
C 56 135
D 61 143
E 67 141
F 70 152
Week 10 & 11-7

Positive Relationship
rree
150
150
essu
Pressu
Pr
140
140
130
130
120
120
40
40 50
50 60
60 70
70
Age
Age
Week 10 & 11-8

Scatter Plots - Other Examples
Negative Relationship
90
Final gr ade
80
70
60
50
40
5 10 15
Number of absences
Week 10 & 11-9

Scatter Plots - Other Examples
No Relationship
10
10
55
Y
y
00
00 10
10 20
20 30
30 4040 50
50 60
60 70
70
xX
Week 10 & 11-10

Scatter Plot Examples
(continued)
Strong relationships Weak relationships
y y
x x
y y
x x
Week 10 & 11-11
Scatter Plot Examples
(continued)
No relationship
x
Week 10 & 11-12
Week 10 & 11-13
Correlation Coefficient
 The correlation coefficient computed from the

sample data measures the strength and
direction of a relationship between two
variables.
 Sample correlation coefficient, r.
 Population correlation coefficient, 
Week 10 & 11-14

Range of Values for the Correlation
Coefficient
Strong negative No linear Strong positive

relationship relationship relationship
  
Week 10 & 11-15

Features of Correlation Coefficient, r
Correlation Coefficient, r Value

Perfect positive correlation +1.00
Strong positive correlation 0.50 to 0.99
Medium positive correlation 0.30 to 0.49
Weak positive correlation 0.01 to 0.29
No correlation 0
Weak negative correlation -0.01 to -0.29
Medium negative correlation -0.30 to -0.49
Stronger negative correlation -0.50 to -0.99
Perfect negative correlation -1.00
Week 10 & 11-16
Coefficient of Correlation
 Measures the relative strength of the linear
relationship between two variables
 Sample coefficient of correlation:
( X  X)(Y  Y)
i i
cov ( X, Y)
 r 
i1
n n S X SY
( X  X) (Y  Y)
i1
i
2
i1
i
2
Week 10 & 11-17

Formula for the Correlation Coefficient
r
n( xy)  ( x)( y)

r
n( x )  ( x)n( y )  ( y) 
2 2 2 2
Where n is the number of data pairs
Week 10 & 11-18

Correlation Coefficient - Example
Subject Age, x Pressure, y

A 43 128
B 48 120
C 56 135
D 61 143
E 67 141
F 70 152
Week 10 & 11-19

7-3 Correlation Coefficient -
Example (Verify)
7-13
 Compute the correlation coefficient for the
age and blood pressure data.
 x  345,  y = 819,  xy = 47,634

 x  20,399,  y  112,443.
2 2
Substituting in the formula for r gives

r  0.897.
Strong positive relationship between age (x) and
blood pressure (y).
Week 10 & 11-20
Week 10 & 11-21
Week 10 & 11-22
Introduction to
Regression Analysis
 Regression analysis is used to:
 Predict the value of a dependent variable based on
the value of at least one independent variable
 Explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to
explain
Independent variable: the variable used to explain
the dependent variable
Week 10 & 11-23

Model
 Only ONE Independent Variable, X

 Relationship between X and Y is
described by a linear function
 Changes in Y are assumed to be caused
by changes in X
Week 10 & 11-24

Model
Week 10 & 11-25

Types of Relationships
Linear relationships Curvilinear relationships
Y Y
+ve
X X
Y Y
-ve
X X
Week 10 & 11-26
(continued)
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Week 10 & 11-27
(continued)
No relationship
X
Week 10 & 11-28
Model
The population regression model:
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi  β0  β1Xi  εi
Linear component Random Error
component
Week 10 & 11-29

Model
(continued)
Y Yi  β0  β1Xi  εi
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi X
Week 10 & 11-30
Equation
The simple linear regression equation provides an
estimate of the population regression line
Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i intercept
Value of X for
Ŷi  b0  b1Xi
observation i
The individual random error terms ei have a mean of zero
Week 10 & 11-31

Week 10 & 11-32
Least Squares Method
 b0 and b1 are obtained by finding the

values of b0 and b1 that minimize the sum
of the squared differences between Y and Ŷ
:
min (Yi Ŷi )  min (Yi  (b0  b1Xi ))
2 2
Week 10 & 11-33

Finding the Least Squares
Equation
 The coefficients b0 and b1 , and other

regression results in this chapter, will be
found using Excel
Formulas are shown in the text at the end of

the chapter for those who are interested
Week 10 & 11-34

Interpretation of the
Slope and the Intercept
 b0 is the estimated average value of Y

when the value of X is zero
 b1 is the estimated change in the

average value of Y as a result of a
one-unit change in X
Week 10 & 11-35

Example
 A real estate agent wishes to examine the
relationship between the selling price of a home
and its size (measured in square feet)
 A random sample of 10 houses is selected

 Dependent variable (Y) = house price in $1000s
 Independent variable (X) = square feet
Week 10 & 11-36

Sample Data for House Price
Model
House Price in $1000s Square Feet
(Y) (X)
245 1,400
312 1,600
279 1,700
308 1,875
199 1,100
219 1,550
405 2,350
324 2,450
319 1,425
255 1,700
Week 10 & 11-37

Graphical Presentation
 House price model: scatter plot

450
400
House Price ($1000s)
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
Week 10 & 11-38

 Y  2865
 X  17150
 XY  5085975
 X  30983750
2
Find the simple linear equation.
Week 10 & 11-39

Excel Output
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842 houseprice  98.24833 0.10977(squarefeet)
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Week 10 & 11-40

Graphical Presentation
 House price model: scatter plot and

regression
450
line
400
350 Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
houseprice  98.24833 0.10977(squarefeet)
Week 10 & 11-41

Intercept, b0
 b0 is the estimated average value of Y when the

value of X is zero (if X = 0 is in the range of
observed X values)
 Here, no houses had 0 square feet, so b0 = 98.24833
just indicates that, for houses within the range of
sizes observed, $98,248.33 is the portion of the
house price not explained by square feet
Week 10 & 11-42

Slope Coefficient, b1
 b1 measures the estimated change in the

average value of Y as a result of a one-
unit change in X
 Here, b1 = .10977 tells us that the average value of a
house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of size
Week 10 & 11-43

Predictions using
Regression Analysis
Predict the price for a house
with 2000 square feet:
houseprice  98.25  0.1098(sq.ft.)
 98.25  0.1098(2000)
 317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Week 10 & 11-44
Interpolation vs. Extrapolation
 When using a regression model for prediction,
only predict within the relevant range of data
Relevant range for
interpolation
450
400
350
300
250
200
150 Do not try to
100
extrapolate
50
0
beyond the range
0 500 1000 1500 2000 2500 3000 of observed X’s
Square Feet
Week 10 & 11-45
Measures of Variation
 Total variation is made up of two parts:

SST  SSR  SSE
Total Sum of Regression Sum Error Sum of
Squares of Squares Squares
SST  ( Yi  Y)2 SSR  ( Ŷi  Y)2 SSE  ( Yi  Ŷi )2

where:
Y = Average value of the dependent variable
Yi = Observed values of the dependent variable
Ŷi = Predicted value of Y for the given Xi value
Week 10 & 11-46
(continued)
 SST = total sum of squares

 Measures the variation of the Yi values around their
mean Y
 SSR = regression sum of squares
 Explained variation attributable to the relationship
between X and Y
 SSE = error sum of squares
 Variation attributable to factors other than the
relationship between X and Y
Week 10 & 11-47

Week 10 & 11-48

Week 10 & 11-49

Coefficient of Determination, r2
 The coefficient of determination is the portion
of the total variation in the dependent variable
that is explained by variation in the
independent variable
 The coefficient of determination is also called
r-squared and is denoted as r2
SSR regressionsum of squares
r 
2

SST total sum of squares
note: 0  r 1
2
Week 10 & 11-50

Examples of Approximate
r2 Values
Y
r2 = 1
Perfect linear relationship

between X and Y:
X
r2 = 1
Y 100% of the variation in Y is
explained by variation in X
X
r2 =1
Week 10 & 11-51
r2 Values
Y
0 < r2 < 1
Weaker linear relationships

between X and Y:
X
Some but not all of the
Y
variation in Y is explained
by variation in X
X
Week 10 & 11-52
r2 Values
r2 = 0
Y
No linear relationship
between X and Y:
The value of Y does not

X depend on X. (None of the
r2 = 0
variation in Y is explained
by variation in X)
Week 10 & 11-53

Excel Output
SSR 18934.9348
r 
2
  0.58082
Multiple R 0.76211
SST 32600.5000
R Square 0.58082
Adjusted R Square 0.52842 58.08% of the variation in
Standard Error 41.33032 house prices is explained by
Observations 10
variation in square feet
ANOVA
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Week 10 & 11-54

Standard Error of Estimate
 The standard deviation of the variation of
observations around the regression line is
estimated by
n
SSE  i i
( Y  Ŷ )2
SYX   i1
n2 n2
Where
SSE = error sum of squares
n = sample size
Week 10 & 11-55

Excel Output
Multiple R
R Square
0.76211
0.58082
S YX  41.33032
Adjusted R Square 0.52842
Observations 10
ANOVA
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957

Total 9 32600.5000

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Week 10 & 11-56

Comparing Standard Errors
SYX is a measure of the variation of observed
Y values from the regression line
Y Y
small s YX X large s YX X
The magnitude of SYX should always be judged relative to the

size of the Y values in the sample data
i.e., SYX = $41.33K is moderately small relative to house prices in
the $200 - $300K range
Week 10 & 11-57
Assumptions of Regression
 Normality of Error
 Error values (ε) are normally distributed for any given
value of X
 Homoscedasticity
 The probability distribution of the errors has constant
variance
 Independence of Errors
 Error values are statistically independent
Week 10 & 11-58

Normality of Error
Week 10 & 11-59

Residual Analysis
ei  Yi  Ŷi
 The residual for observation i, ei, is the difference
between its observed and predicted value
 Check the assumptions of regression by examining the
residuals
 Examine for linearity assumption
 Examine for constant variance for all levels of X
(homoscedasticity)
 Evaluate normal distribution assumption
 Evaluate independence assumption
 Graphical Analysis of Residuals

 Can plot residuals vs. X
Week 10 & 11-60
How to Compute Residuals
Observation House Price Square Feet Predicted Y Residuals
Yi Xi Yˆ ei
1 245 1400 251.9232 -6.9232
2 312 1600 273.8767 38.1233
3 279 1700 284.8535 -5.8535
4 308 1875 304.0628 3.9372
5 199 1100 218.9928 -19.9928
6 219 1550 268.3883 -49.3883
7 405 2350 356.2025 48.7975
8 324 2450 367.1793 -43.1793
9 319 1425 254.6674 64.3326
10 255 1700 284.8535 -29.8535
Week 10 & 11-61
Residual Analysis for Linearity
Y Y
x x
residuals
x residuals x
Not Linear
 Linear
Week 10 & 11-62
Residual Analysis for
Homoscedasticity
Y Y
x x
residuals
x residuals x
Non-constant variance  Constant variance
Week 10 & 11-63

Residual Analysis for
Independence
Not Independent
 Independent
residuals
residuals
X
residuals
Week 10 & 11-64

Excel Residual Output
RESIDUAL OUTPUT House Price Model Residual Plot

Predicted
House Price Residuals 80
1 251.92316 -6.923162 60
2 273.87671 38.12329
40
3 284.85348 -5.853484 Residuals
20
4 304.06284 3.937162
0
5 218.99284 -19.99284
0 1000 2000 3000
6 268.38832 -49.38832 -20
7 356.20251 48.79749 -40

8 367.17929 -43.17929 -60
9 254.6674 64.33264 Square Feet
10 284.85348 -29.85348
Does not appear to violate
any regression assumptions
Week 10 & 11-65
Inferences About the Slope
 The standard error of the regression slope

coefficient (b1) is estimated by
SYX SYX
Sb1  
SSX (X  X)i
2
where:
Sb1 = Estimate of the standard error of the least squares slope
SSE = Standard error of the estimate

S YX 
n2
Week 10 & 11-66
Excel Output
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error
Observations
41.33032
10
Sb1  0.03297
ANOVA
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Week 10 & 11-67

Comparing Standard Errors of
the Slope
Sb1 is a measure of the variation in the slope of regression
lines from different possible samples
Y Y
small Sb1 X largeSb1 X
Week 10 & 11-68

Inference about the Slope:
t Test
 t test for a population slope
 Is there a linear relationship between X and Y?
 Null and alternative hypotheses
H0: β1 = 0 (no linear relationship)
H1: β1  0 (linear relationship does exist)
 Test statistic
b1  β1 where:
t b1 = regression slope
coefficient
Sb1 β1 = hypothesized slope
Sb1 = standard
d.f.  n  2 error of the slope
Week 10 & 11-69

Inference about the Slope:
t Test
(continued)
House Price Estimated Regression Equation:

Square Feet
in $1000s
(x)
(y) houseprice  98.25  0.1098(sq.ft.)
245 1400
312 1600
279 1700
308 1875 The slope of this model is 0.1098
199 1100
219 1550
Does square footage of the house
405 2350 affect its sales price?
324 2450
319 1425
255 1700
Week 10 & 11-70

Inferences about the Slope:
t Test Example
b1 Sb1
H0: β1 = 0 From Excel output:
H1: β1  0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
b1  β1 0.10977 0
t   3.32938
Sb1 0.03297
Week 10 & 11-71

t Test Example
(continued)
Test Statistic: t = 3.329
b1 Sb1 t
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
d.f. = 10-2 = 8
Decision:
a/2=.025 a/2=.025 Reject H0
Conclusion:
Reject H0 Do not reject H0 Reject H
0
There is sufficient evidence
-tα/2 tα/2
0 that square footage affects
-2.3060 2.3060 3.329
house price
Week 10 & 11-72
t Test Example
(continued)
P-value = 0.01039
P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
This is a two-tail test, so Decision: P-value < α so

the p-value is Reject H0
P(t > 3.329)+P(t < -3.329) Conclusion:
= 0.01039 There is sufficient evidence
(for 8 d.f.) that square footage affects
house price
Week 10 & 11-73
F-Test for Significance
 F Test statistic: F  MSR

MSE
where SSR
MSR 
k
SSE
MSE 
n  k 1
where F follows an F distribution with k numerator and (n – k - 1)
denominator degrees of freedom
(k = the number of independent variables in the regression model)
Week 10 & 11-74

Excel Output
Multiple R 0.76211
MSR 18934.9348
R Square 0.58082 F   11.0848
Adjusted R Square 0.52842 MSE 1708.1957
Observations 10 With 1 and 8 degrees P-value for
of freedom the F-Test
ANOVA
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Week 10 & 11-75

F-Test for Significance
(continued)
H0: β1 = 0 Test Statistic:

H1: β1 ≠ 0 MSR
F  11.08
 = .05 MSE
df1= 1 df2 = 8
Decision:
Critical Reject H0 at = 0.05
Value:
F = 5.32
= .05 Conclusion:
There is sufficient evidence that
0 F house size affects selling price
Do not Reject H0
reject H0
F.05 = 5.32
Week 10 & 11-76
Confidence Interval Estimate
for the Slope
Confidence Interval Estimate of the Slope:
b1  tn2Sb1 d.f. = n - 2
Excel Printout for House Prices:

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
At 95% level of confidence, the confidence interval for

the slope is (0.0337, 0.1858)
0.0337 < 1 < 0.1858

Week 10 & 11-77
Confidence Interval Estimate
for the Slope
(continued)

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Since the units of the house price variable is

$1000s, we are 95% confident that the average
impact on sales price is between $33.70 and
$185.80 per square foot of house size
This 95% confidence interval does not include 0.

Conclusion: There is a significant relationship between
house price and square feet at the .05 level of significance
Week 10 & 11-78

t Test for a Correlation Coefficient
 Hypotheses
H0: ρ = 0 (no correlation between X and Y)
H1: ρ ≠ 0 (correlation exists)
 Test statistic
r -ρ
 t (with n – 2 degrees of freedom)
1 r 2
where
n2 r   r 2 if b1  0
r   r 2 if b1  0
Week 10 & 11-79
Example: House Prices
Is there evidence of a linear relationship
between square feet and house price at the
.05 level of significance?
H0: ρ = 0 (No correlation)

H1: ρ ≠ 0 (correlation exists)
=.05 , df = 10 - 2 = 8
r ρ .762  0
t   3.33
1 r 2 1 .7622
n2 10  2
Week 10 & 11-80
Example: Test Solution
r ρ .762  0
t   3.33 Decision:
1 r 2 1 .7622 Reject H0
n2 10  2 Conclusion:
There is
d.f. = 10-2 = 8
evidence of a
linear association
a/2=.025 a/2=.025
at the 5% level of
significance
Reject H0 Do not reject H0 Reject H0
-tα/2 tα/2
0
-2.3060 2.3060
3.33
Week 10 & 11-81
Pitfalls of Regression Analysis
 Lacking an awareness of the assumptions

underlying least-squares regression
 Not knowing how to evaluate the assumptions
 Not knowing the alternatives to least-squares
regression if a particular assumption is violated
 Using a regression model without knowledge of
the subject matter
 Extrapolating outside the relevant range
Week 10 & 11-82

Strategies for Avoiding
the Pitfalls of Regression
 Start with a scatter plot of X on Y to observe
possible relationship
 Perform residual analysis to check the
assumptions
 Plot the residuals vs. X to check for violations of
assumptions such as homoscedasticity
 Use a histogram, stem-and-leaf display, box-and-
whisker plot, or normal probability plot of the
residuals to uncover possible non-normality
Week 10 & 11-83

Strategies for Avoiding
the Pitfalls of Regression
(continued)
 If there is violation of any assumption, use

alternative methods or models
 If there is no evidence of assumption violation,
then test for the significance of the regression
coefficients and construct confidence intervals
and prediction intervals
 Avoid making predictions or forecasts outside
the relevant range
Week 10 & 11-84

Practice
 Suppose that the management of a chain of package

delivery stores would like to develop a model for
predicting the weekly sales (in thousands of dollars) for
individual stores based on the number of customers
who made purchases. A random sample of 20 stores
was selected from among all the stores in the
chain. Since we wish to predict Sales with number
of Customers, that makes Sales the dependent,
response, or "Y" variable, and number of Customers is
the independent, explanatory, or "X" variable.
Week 10 & 11-85

 The regression output:
Week 10 & 11-86

a) Interpret the meaning of the slope b1 in this
problem.
b) Predict the average weekly Sales (in
thousands in dollars) for stores that have 600
customers.
c) How much variation in Sales is explained by
number of Customers?
d) Using α=0.05, is there evidence of a linear
relationship between Sales and number
of Customers?
Week 10 & 11-87
Solution
a) As the number of Customers increases by

1, Sales increases by $8.73.
b) Sales = 7.661
c) 91.19%
d)
Week 10 & 11-88

Summary
 Introduced types of regression models

 Reviewed assumptions of regression and
correlation
 Discussed determining the simple linear
regression equation
 Described measures of variation
 Discussed residual analysis
 Addressed measuring autocorrelation
Week 10 & 11-89

Chapter Summary
(continued)
 Described inference about the slope

 Discussed correlation -- measuring the strength
of the association
 Addressed estimation of mean values and
prediction of individual values
 Discussed possible pitfalls in regression and
recommended strategies to avoid them
Week 10 & 11-90

QT2 Topic 09 Simple Regression

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

QT2 Topic 09 Simple Regression

Загружено:

Авторское право:

Доступные форматы

Topic 9

Simple Linear Regression

• Determine the least squares regression

1. Compute statistical variables involves the use

Week 10 & 11-3

Week 10 & 11-4

 A scatter plot is a graph of the ordered

Week 10 & 11-5

 Construct a scatter plot for the data obtained in a study

Week 10 & 11-6

Subject Age, x Pressure, y

Week 10 & 11-7

Week 10 & 11-8

Week 10 & 11-9

Week 10 & 11-10

 The correlation coefficient computed from the

Week 10 & 11-14

Strong negative No linear Strong positive

Week 10 & 11-15

Correlation Coefficient, r Value

Week 10 & 11-17

n( xy)  ( x)( y)

Where n is the number of data pairs

Week 10 & 11-18

Subject Age, x Pressure, y

Week 10 & 11-19

 x  345,  y = 819,  xy = 47,634

Substituting in the formula for r gives

Week 10 & 11-23

 Only ONE Independent Variable, X

Week 10 & 11-24

Week 10 & 11-25

Week 10 & 11-29

The individual random error terms ei have a mean of zero

Week 10 & 11-31

 b0 and b1 are obtained by finding the

Week 10 & 11-33

 The coefficients b0 and b1 , and other

Formulas are shown in the text at the end of

Week 10 & 11-34

 b0 is the estimated average value of Y

 b1 is the estimated change in the

Week 10 & 11-35

 A random sample of 10 houses is selected

Week 10 & 11-36

Week 10 & 11-37

 House price model: scatter plot

Week 10 & 11-38

Find the simple linear equation.

Week 10 & 11-39

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Week 10 & 11-40

 House price model: scatter plot and

houseprice  98.24833 0.10977(squarefeet)

Week 10 & 11-41

houseprice  98.24833 0.10977(squarefeet)

 b0 is the estimated average value of Y when the

Week 10 & 11-42

houseprice  98.24833 0.10977(squarefeet)

 b1 measures the estimated change in the

Week 10 & 11-43

houseprice  98.25  0.1098(sq.ft.)

 Total variation is made up of two parts:

SST  ( Yi  Y)2 SSR  ( Ŷi  Y)2 SSE  ( Yi  Ŷi )2