You are on page 1of 32

• Many problems in engineering and science involve exploring

the relationships between two or more variables.

• Regression analysis is a statistical technique that is very


useful for these types of problems.

• For example, in a chemical process, suppose that the yield of


the product is related to the process-operating temperature.

• Regression analysis can be used to build a model to predict


yield at a given temperature level.
Table11-
Table11-1 Oxygen and Hydrocarbon Level
Hydrocarbon Hydrocarbon
Observation Purity Observation Purity
Level Level
Number y (%) Number y (%)
x (%) x (%)
1 0.99 90.01 11 1.19 93.54

2 1.02 89.05 12 1.15 92.52

3 1.15 91.43 13 0.98 90.56

4 1.29 93.74 14 1.01 89.54

5 1.46 96.73 15 1.11 89.85

6 1.36 94.45 16 1.20 90.39

7 0.87 87.59 17 1.26 93.25

8 1.23 91.77 18 1.32 93.41

9 1.55 99.42 19 1.43 94.98

10 1.40 93.65 20 0.95 87.33


Figure 11-
11-1 Scatter Diagram of oxygen purity versus hydrocarbon level from
Table 11-1.
Based on the scatter diagram, it is probably reasonable to assume
that the mean of the random variable Y is related to x by the
following straight-line relationship:

E (Y x ) = µY x = β 0 + β1 x

where the slope and intercept of the line are called regression
coefficients.
coefficients.

The simple linear regression model is given by


Y = β 0 + β1 x + ε
where ε is the random error term.
We think of the regression model as an empirical model.

Suppose that the mean and variance of ε are 0 and σ2, respectively,
then

E (Y x ) = E ( β 0 + β1 x + ε ) = β 0 + β1 x + E ( ε ) = β 0 + β1 x

The variance of Y given x is

V (Y x ) = V ( β 0 + β1 x + ε ) = V ( β 0 + β1 x ) + V ( ε ) = 0 + σ 2 = σ 2
• The true regression model is a line of mean values:
µY x = β 0 + β1 x

where β1 can be interpreted as the change in the mean of Y for a unit


change in x.

• Also, the variability of Y at a particular value of x is determined by the


error variance, σ2.

• This implies there is a distribution of Y-values at each x and that the


variance of this distribution is the same at each x.
Figure 11-
11-2 The distribution of Y for a given value of x for the oxygen purity-
hydrocarbon data.
• The case of simple linear regression considers a single regressor or
predictor x and a dependent or response variable Y.

• The expected value of Y at each level of x is a random variable:


E (Y x ) = β 0 + β1 x

• We assume that each observation, Y, can be described by the


model
Y = β 0 + β1 x + ε
• Suppose that we have n pairs of observations ( x1 , y1 ) , ( x2 , y2 ) ,… , ( xn , yn ) .

• The method of least squares is used to estimate the parameters, β0


and β1 by minimizing the sum of the squares of the vertical deviations
in Figure 11-3.

Figure 11-
11-3 Deviations of
the data from the estimated
regression model.
• Using Equation 11-2, the n observations in the sample can be
expressed as
yi = β 0 + β1 xi + ε i , i = 1, 2,… , n

• The sum of the squares of the deviations of the observations from


the true regression line is
n n
L = ∑ ε = ∑ ( yi − β 0 − β1 xi )
2 2
i
i =1 i =1
n n
L = ∑ ε = ∑ ( yi − β 0 − β1 xi )
2 2
i
i =1 i =1

The least squares estimators of β0 and β1, say, βˆ0 and βˆ1 , must
satisfy
n
∂L
∂β 0
( )
= −2∑ yi − βˆ0 − βˆ1 xi = 0
i =1
βˆ0 βˆ1

n
∂L
∂β1
( )
= −2∑ yi − βˆ0 − βˆ1 xi xi = 0
i =1
βˆ0 βˆ1
Simplifying these two equations yields
n n
nβˆ0 + βˆ1 ∑ xi = ∑ yi
i =1 i =1

n n n
βˆ0 ∑ xi + βˆ1 ∑ x = ∑ yi xi
2
i
i =1 i =1 i =1

These equations are called the least squares normal equations.


equations The
solution to the normal equations results in the least squares
estimators βˆ0 and βˆ1.
Least The least squares estimates of the intercept and slope in
the simple linear regression model are
Squares
βˆ0 = y − βˆ1 x
Estimates
 n  n 
n  ∑ yi  ∑ xi 
 i =1  i =1 
∑ y x
i i −
n
βˆ1 = i =1
2
 n 
n  ∑ xi 
 i =1 

i =1
xi
2

n
n n
where y = (1 n ) ∑ yi and x = (1 n ) ∑ xi .
i =1 i =1
The fitted or estimated regression line is therefore

ŷ = βˆ0 + βˆ1 x
Note that each pair of observations satisfies the relationship

yi = βˆ0 + βˆ1 xi + ei , i = 1, 2,… , n

where ei = yi − yˆ i is called the residual.


residual The residual describes the
error in the fit of the model to the ith observation yi.
Notation
2
 n

n n ∑ i  x
S xx = ∑ ( xi − x ) = ∑ xi2 −  i =1 
2

i =1 i =1 n

 n  n 
n n  ∑ xi   ∑ yi 
S xy = ∑ yi ( xi − x ) = ∑ xi yi −  i =1   i =1 
2

i =1 i =1 n

2
 n 
n n  ∑ yi 
SST = S yy = ∑ ( yi − y ) = ∑ yi2 −  i =1 
2

i =1 i =1 n
Example 1

We will fit a simple linear regression model to the oxygen purity data in Table
11–1. The following quantities may be computed:
20 20
n = 20 ∑x
i =1
i = 23.92 ∑y
i =1
i = 1,843.21 x = 1.1960 y = 92.1605
20 20 20

∑y
i =1
2
i = 170, 044.5321 ∑x
i =1
2
i = 29.2892 ∑x y
i =1
i i = 2, 214.6566

2
 20

20 ∑ i x
( 23.92 )
2

S xx = ∑ xi2 −  i =1  = 29.2892 − = 0.68088


i =1 20 20
and
 20   20 
20  ∑ xi   ∑ yi  ( 23.92 )(1,843.21)
S xy = ∑ xi yi −  i =1   i =1  = 2, 214.6566 −
i =1 20 20
= 10.17744

Therefore, the least squares estimates of the slope and intercept are

ˆ S xy 10.17744
β1 = = = 14.94748
S yy 0.68088

and
βˆ0 = y − βˆ1 x = 92.1605 − (14.94748 )1.196 = 74.28331
The fitted simple linear regression model (with the coefficients reported to
three decimal places) is

yˆ = 74.283 + 14.947 x
This model is plotted in Fig. 11–4, along with the sample data.

Figure 11-
11-4 Scatter plot of oxygen
purity y versus hydrocarbon level
x and regression model ŷ = 74.20
+ 14.97x.
Computer software
programs are widely used in
regression modeling. Table
11–2 shows a portion of the
output from Minitab for this
problem. The estimates
are highlighted.
Estimating The error sum of squares is

σ² n n
SS E = ∑ e = ∑ ( yi − yˆi )
2 2
i
i =1 i =1

It can be shown that the expected value of the error


sum of squares is
E ( SS E ) = ( n − 2 ) σ 2 .
Estimating σ²

An unbiased estimator of σ2 is

SS E
σˆ 2 =
n−2

where SSE can be easily computed using

SS E = SST − βˆ1S xy
Slope Properties:

σ2
( )
E βˆ1 = β1 ( )
V βˆ1 =
S XX

Intercept Properties:

 1 x 2

( )
E βˆ0 = β 0 ( )
V βˆ0 = σ  +

2

n S

XX 
Estimated In simple linear regression the estimated standard error
of the slope and the estimated standard error of the
Standard
intercept are
Errors
σˆ 2  1 x 2

( )
se βˆ1 =
S XX
( )
ˆ 2
se β 0 = σˆ  +
 n S

XX 
11.4.1 Use of t-Tests

 Hypothesis Tests about the Slope

Suppose we wish to test


H 0 : β1 = β1,0
H1 : β1 ≠ β1,0

An appropriate test statistic would be


βˆ1 − β1,0
T0 =
σˆ 2 S XX
11.4.1 Use of t-Tests

 Hypothesis Tests about the Slope

The test statistic could also be written as:


βˆ1 − β1,0
T0 =
( )
se βˆ1

We would reject the null hypothesis if


t0 > tα 2,n − 2
11.4.1 Use of t-Tests

 Hypothesis Tests about the Intercept

Suppose we wish to test


H 0 : β 0 = β 0 ,0
H 1 : β 0 ≠ β 0 ,0

An appropriate test statistic would be

βˆ 0 − β 0 ,0 βˆ 0 − β 0 ,0
T0 = =
1
2
σ  +
ˆ
x 

2
( )
se βˆ 0
 n S XX 
11.4.1 Use of t-Tests

 Hypothesis Tests about the Intercept

We would reject the null hypothesis if

t0 > tα 2,n − 2
11.4.1 Use of t-Tests

An important special case of the hypotheses of Equation 11-18


is
H 0 : β1 = 0
H1 : β1 ≠ 0

These hypotheses relate to the significance of regression.


regression

Failure to reject H0 is equivalent to concluding that there is no


linear relationship between x and Y.
11-5 The hypothesis H0: β1 = 0 is not rejected.
Figure 11-

11-6 The hypothesis H0: β1 = 0 is rejected.


Figure 11-
Example 1

We will test for significance of regression using the model for the oxygen
purity data from Example 11-1. The hypotheses are
H 0 : β1 = 0
H1 : β1 ≠ 0
and we will use α = 0.01. From Example 11-1 and Table 11-2 we have

βˆ1 = 14.947, n = 20, S XX = 0.68088, σˆ 2 = 1.18


so the t-statistic becomes

βˆ1 βˆ1 14.947


t0 = = = = 11.35
σˆ 2 S XX ( )
se β1ˆ 1.18 0.68088
Example 1

Practical Interpretation:
Since the reference value of t is t0.005,18 = 2.88, the value of the test
statistic is very far into the critical region, implying that H0 : β1 = 0 should
be rejected.
There is strong evidence to support this claim.
−9
The P-value for this test is P  1.23 × 10 . This is obtain manually with a
calculator.