Вы находитесь на странице: 1из 86

APPLIED STATISTICS (STAT6084)

LINEAR REGRESSION AND CORRELATION

Week 11-14
Learning Outcomes
LO 1 : Apply statistical method to the real problem
LO 2 : Use proper statistical method to the real
problem
LO 3 : Use statistical software to conduct analysis
LO 4 : Interpret the results of output software and
statistics calculation
LO 5 : Explain the suitable decision from statistical
method solution
(1) Correlation Analysis
(1) Correlation Analysis
1.1 Definition

Correlation analysis is used to measure strength of the


association (linear relationship) between two variables
 Only concerned with strength of the relationship
 No causal effect is implied
 In regression that variables are dependent and
independent variable

Variable Variable
1 2

Bina Nusantara University 4


(1) Correlation Analysis
1.2 Correlation Coefficient

A scatter plot (or scatter diagram) is used to


show the relationship between two variables

Bina Nusantara University 5


(1) Correlation Analysis
Strong relationships Weak relationships
y y

x x

y y

x x
Bina Nusantara University 6
(1) Correlation Analysis

No relationship x

x
Bina Nusantara University 7
(1) Correlation Analysis

The population correlation coefficient ρ (rho)


measures the strength of the association
between the variables

The sample correlation coefficient r is an estimate


of ρ and is used to measure the strength of the
linear relationship in the sample observations
Bina Nusantara University 8
(1) Correlation Analysis

 Range between -1 and 1


 The closer to -1, the stronger the negative linear
relationship
 The closer to 1, the stronger the positive linear
relationship
 The closer to 0, the weaker the linear
relationship
Bina Nusantara University 9
(1) Correlation Analysis
Example the value of r in scatterplot
y y y

x x x
r = -1 r = -0.6 r=0

y y

x x
r = +0.3 r = +1

Bina Nusantara University 10


(1) Correlation Analysis
Sample correlation coefficient:
n

 ( x  x )( y
i i  y)
rxy  i 1

 n   n
2
 ( xi  x )   ( yi  y ) 
2

 i 1   i 1 
r = Sample correlation coefficient
n = Sample size
x = first variable
y = second variable
Bina Nusantara University 11
(1) Correlation Analysis
1.3 Testing a Correlation
Hypothesis Test Statistic Critical
Values

H0: 
t  t ,n  2
H1 :  <  
r n2
H0 :    t
1 r 2 t  t ,n  2
H 1 :  > 

H0:   t  t / 2,n  2 or


t  t / 2,n  2
H1 :   
Bina Nusantara University 12
(1) Correlation Analysis
Example 1

Correlation between student population and quarterly


sales

Bina Nusantara University 13


(1) Correlation Analysis

Example 1

Correlation between student population and quarterly


sales

Bina Nusantara University 14


(1) Correlation Analysis

Example 1

Correlation between student population and quarterly


sales
n

 ( x  x )( y  y )
i i
rxy  i 1

n 2 
n
2
 i ( x  x )   i ( y  y ) 
 i 1   i 1 
2840
rxy   0,95
(568)(15730)
Bina Nusantara University 15
(2) The Simple Linear Regression
Model
(2) The Simple Linear Regression Model
2.1 Modeling

Modeling is often performed by finding a functional


relationship between the expected value of a dependent
variable and a set of explanatory or independent variable

Independent Dependent
variable variable

Bina Nusantara University 17


(2) The Simple Linear Regression Model
2.2 Variable
Dependent variable:
The variable we wish to explain
Independent variable:
The variable used to explain the dependent variable

Independent Dependent
variable variable

Bina Nusantara University 18


(2) The Simple Linear Regression Model
2.3 Linear Regression
Linear regression is

dependent Variable (Y)


a modeling technique
in which the expected
value of a dependent
variable is modeled as
a linear combination
of a set independent
variable Independent Variable (X)

Bina Nusantara University 19


(2) The Simple Linear Regression Model
2.3 Linear Regression

Regression analysis is used to:


 Perform the model relationship of independent and
independent variable
 Predict the value of a dependent variable based on
the value of at least one independent variable
 Explain the impact of changes in an independent
variable on the dependent variable
Bina Nusantara University 20
(2) The Simple Linear Regression Model

Example 2

A regression model for the timing of production runs


 Model : What is the model
relationship of run size to
run time?
 Predict : What is the run
time value when the run
size is 200?
 Impact : Is there the run
size effect on run time?

Bina Nusantara University 21


(2) The Simple Linear Regression Model

Example 2

A regression model for the timing of production runs

 Model :

 Predict : 201.7
 Impact : Yes

Bina Nusantara University 22


(2) The Simple Linear Regression Model
2.4 Simple Linear Regression

 Only one independent variable

 Relationship between independent variable and


dependent variable is described by a linear function

 Changes in dependent variable are assumed to be


caused by changes in independent variable

Bina Nusantara University 23


(2) The Simple Linear Regression Model

When data are collected in pairs the standard notation


used to designate this is:

Where
x = independent variable
y = dependent variable
n = number of data

Bina Nusantara University 24


(2) The Simple Linear Regression Model
2.5 Simple Linear Regression Model

The population regression model:


Population Random
Population Independent Error
Slope
y intercept Variable term, or
Coefficient
Dependent residual

y  β 0  β1x  ε
Variable

Linear component Random Error


component
25
Bina Nusantara University
(2) The Simple Linear Regression Model
What is linear?  Scatterplot :
dependent Variable (Y)

Independent Variable (X)

Bina Nusantara University 26


(2) The Simple Linear Regression Model
y y

x x
Linear relationships Nonlinear relationships
y y

x x
Bina Nusantara University 27
(2) The Simple Linear Regression Model
2.6 Estimation

 Usually we have a sample of data instead of the


whole population.
 The slope β1 and intercept βo are unknown, since
these are the values for the whole population
 Then, use the given data to estimate the slope and
the intercept.

y  β 0  β1x  ε
28
Bina Nusantara University
(2) The Simple Linear Regression Model
 Estimated
Estimated (or Estimate of the Estimate of the
predicted) y regression regression slope
value intercept

Independent
ˆ ˆ
ŷ   0  1x
variable

The individual random error terms ei have a mean of zero

29
Bina Nusantara University
(2) The Simple Linear Regression Model

Estimation of slope and intercept


n

S xy  x i  x  yi  y 
ˆ1   i 1
n
S xx
 x  x
2
i
i 1

ˆo  y  ˆ1 x

30
Bina Nusantara University
(2) The Simple Linear Regression Model

Estimation of variance
n

 y i  yˆ i 
2

ˆ 
2 i 1
n2

31
Bina Nusantara University
(2) The Simple Linear Regression Model
2.7 Coefficient Determination
2
 R , the coefficient of determination of the regression
line is defined as the proportion of the total sample
variability in the Y ’s explained by the regression model

R r 2 2
xy
R  rxy
32
Bina Nusantara University
(2) The Simple Linear Regression Model

Example 3

Student population and quarterly sales data


For 10 armand’s pizza parlors

33
Bina Nusantara University
(2) The Simple Linear Regression Model

Example 3

Scatterplot

34
Bina Nusantara University
(2) The Simple Linear Regression Model

Example 3

35
Bina Nusantara University
(2) The Simple Linear Regression Model

Example 3
n

 x i  x  yi  y 
2840
ˆ1  i 1
n
 5
568

 ix  x  2

i 1

ˆo  y  b1 x  130  5(14)  60

yˆ  60  5 x
36
Bina Nusantara University
(2) The Simple Linear Regression Model

Example 3

yˆ  60  5 x

37
Bina Nusantara University
(3) Inference on the Parameter
(3) Inference on the Parameter
3.1 Parameter 1

The slope is a normal distribution

Confidence interval estimation


ˆ
ˆ1   t / 2,n 1 
s xx
39
Bina Nusantara University
(3) Inference on the Parameter
Hypothesis Test :
β1 = 0
β1 ≠ 0
Statistical test :
ˆ1
t
ˆ / S xx

Critical value :
t  t / 2,n  2

40
Bina Nusantara University
(3) Inference on the Parameter

Example 4

Continue from example 3


Hypothesis Test :
β1 = 0
β1 ≠ 0 ˆ1 5
t   8,61
Statistical test : ˆ / S xx 191.25 / 568

t t    t  2,308
Critical value : 5% / 2 , 8

Conclusion : Reject H0
41
Bina Nusantara University
(3) Inference on the Parameter
3.2 Regression Line

An estimator of this unknown quantity is the value of the


estimated regression equation at X = x *, namely

42
Bina Nusantara University
(3) Inference on the Parameter

Has normal distribution

Confidence interval estimation

43
Bina Nusantara University
(3) Inference on the Parameter
Prediction Interval
for an individual y,
y given xp

Confidence
Interval for
 + b x the mean of
y = b0
1
y, given xp

x
x
Bina Nusantara University xp 44
(4) The Analysis of Variance Table
(4) The Analysis of Variance Table
4.1 Definition

Is a different test statistic which can be used


when there is more than one predictor variable,
that is, in multiple regression
ANOVA table is based upon the variability in the
dependent variable (y) and provides a hypothesis
test
β1 = 0
β1 ≠ 0
46
Bina Nusantara University
(4) The Analysis of Variance Table
4.2 Hypothesis Test

- Hypothesis :
H 0 : 1  0
H 1 : 1  0
- Test statistics :

47
Bina Nusantara University
(4) The Analysis of Variance Table
4.3 ANOVA Table

48
Bina Nusantara University
(4) The Analysis of Variance Table
n
SST    yi  y 
2

i 1
n
SSE    yi  yˆ i 
2

i 1
n
SSR    yˆ i  y 
2

i 1

49
Bina Nusantara University
(4) The Analysis of Variance Table
4.4 The Sum of Squares for a Simple Linear Regression

50
Bina Nusantara University
(4) The Analysis of Variance Table

51
Bina Nusantara University
(4) The Analysis of Variance Table

52
Bina Nusantara University
(4) The Analysis of Variance Table

Example 5

Continued from example 3

53
Bina Nusantara University
(5) Multiple Linear Regression
(5) Introduction and Example
5.1 Multiple Linear Regression

 More than one independent variable (x)

 Relationship between independent variable and


dependent variable (y) is described by a linear
function

y  β 0  β1x1  β 2 x 2  ....  β p x p  ε
Bina Nusantara University 55
(5) Introduction and Example

Independent
variable (x1)

Independent Dependent
variable (x2) variable (y)

…….

Independent
variable (xp)

Bina Nusantara University 56


(1) Introduction and Example

Example 1

Car Plant Electricity Usage


The car plant is in a southern location with a warm climate and for
most of the year air conditioning is required to cool the plant to a
working temperature 65o Fahrenheit. The manager therefore
expects that in addition to the plant’s production, the amount of
air conditioner use required should have a significant impact
on the plant’s electricity usage. The conditioner usage perform
by cooling degrees days.

electricity usage  β 0  β1 ( production )  β 2 (cooling deg ree )  ε

Bina Nusantara University 57


(1) Introduction and Example
Example 1

No Electricity usage Production cooling degree


(y) (x1) (x2)
1 2.48 4.51 0
2 2.26 3.58 0
3 2.47 4.31 13
4 2.77 5.06 56
5 2.99 5.64 117
6 3.05 4.99 306
7 3.18 5.29 358
8 3.46 5.83 330
9 3.03 4.7 187
10 3.26 5.61 94
11 2.67 4.9 23
12 3.53Bina Nusantara University
4.2 0 58
(6) Matrix Algebra and Formulation
(6) Matrix Algebra and Formulation
6.1 Model in Matrix

Model :

y  β 0  β1x1  β 2 x 2  ....  β p x p  ε

Matrix

Y  Xβ  ε
Bina Nusantara University 60
(6) Matrix Algebra and Formulation

Matrix :
1 x11 x12  x1 p   y1  0  e1 
1 x  
x22  x2 p   y2 
 
 1 
 
X
21
Y  β  e2 
   e 
  .... ... ....
       
1 xn1 xn 2  x np   yn   p  en 
 

Y = vector of dependent variable ( nx1)


X = matrix of independent variable (nx(p+1))
β = vector of parameter estimation ((p+1)x1)
e = vector of residuals (nx1)
Bina Nusantara University 61
(6) Matrix Algebra and Formulation
6.2 Estimation

Parameter estimation of β1 , β1 , …, βp

β̂   X' X  X' Y
1

Bina Nusantara University 62


(6) Matrix Algebra and Formulation

Where,
 n n n

 x1i  x2 i   
n
n ... x
 ki 
  i  y
 n
i 1
n n
i 1
n
i 1
  ni 1 
 x1i xki 
  x1i
i 1
 1i
x 2

i 1
x
i 1
x
1i 2 i ... 
i 1

 y x 
 i 1i

i 1
X'X   n n n n  X 'Y   n 


 x2 i
i 1
 x1i x2i
i 1
 2i
x 2

i 1
... 
i 1
x1i xki 

  yi x 2 i 
 i 1 
 ... ... ... ... ...   ... 
 n n n n
2 
 n 
 x ki x 1i kix x x
2 i ki ...  xki    yi xki 
 i 1 i 1 i 1 i 1   i 1 

Bina Nusantara University 63


(6) Matrix Algebra and Formulation

Example

Continue from example 1


1 4.51 0  2,568 
1 3.58 0    0  Intercept
  2,243  
X  1 4.31 13 Y   β   1  production
  .... 
     
2,460  2 Cooling degree
1 4.20 0 

Bina Nusantara University 64


(2) Matrix Algebra and Formulation

Example

Continue from example 1

 12 58.62 1484  35.15 


X ' X  58.62 291.23 7863  X 'Y  173.45 
 1484 7863 392028 4685.06

 6.821  1.474 0.00376 


 X ' X  1    1.474 0.326  0.00096 
0.00376  0.00096 0.0000076
Bina Nusantara University 65
(6) Matrix Algebra and Formulation

Example

Continue from example 1

β̂   X' X  X' Y
1

 6.821  1.474 0.00376  35.15  1.62 


 
   1.474 0.326  0.00096  173.45   0.245 
0.00376  0.00096 0.0000076 4685.06 0.0009

electricity usage  1.62  0,245( production)  0,0009(cooling deg ree)

Bina Nusantara University 66


(6) Matrix Algebra and Formulation
6.3 Test of Significance

Bina Nusantara University 67


(6) Matrix Algebra and Formulation
6.4 ANOVA Table

n n n
SST    yi  y  SSE    yi  yi  SSR    yi  y 
2 2 2
ˆ ˆ
i 1 i 1 i 1

Bina Nusantara University 68


(6) Matrix Algebra and Formulation

Bina Nusantara University 69


(7) Evaluating Model
(7) Evaluating Model

a. Multicolinearity
b. Residual Analysis
c. Influential Points

Bina Nusantara University 71


(8) Application with Minitab
(6) Application with Minitab
Simple Linear Regression

Bina Nusantara University 73


Bina Nusantara University 74
Bina Nusantara University 75
Regression Analysis: Score2 versus Score1

The regression equation is


Score2 = 1,12 + 0,218 Score1

Predictor Coef SE Coef T P


Constant 1,1177 0,1093 10,23 0,000
Score1 0,21767 0,01740 12,51 0,000

S = 0,127419 R-Sq = 95,7% R-Sq(adj) = 95,1%

Analysis of Variance

Source DF SS MS F P
Regression 1 2,5419 2,5419 156,56 0,000
Residual Error 7 0,1136 0,0162
Total 8 2,6556

Unusual Observations

Obs Score1 Score2 Fit SE Fit Residual St Resid


9 7,50 2,5000 2,7502 0,0519 -0,2502 -2,15R

R denotes an observation with a large standardized residual.

Bina Nusantara University 76


MULTIPLE LINEAR REGRESSION
Bina Nusantara University 78
Bina Nusantara University 79
Bina Nusantara University 80
Bina Nusantara University 81
Regression Analysis: HeatFlux versus East; South; North

The regression equation is


HeatFlux = 389 + 2,12 East + 5,32 South - 24,1 North

Predictor Coef SE Coef T P


Constant 389,17 66,09 5,89 0,000
East 2,125 1,214 1,75 0,092
South 5,3185 0,9629 5,52 0,000
North -24,132 1,869 -12,92 0,000

S = 8,59782 R-Sq = 87,4% R-Sq(adj) = 85,9%

PRESS = 3089,67 R-Sq(pred) = 78,96%

Analysis of Variance

Source DF SS MS F P
Regression 3 12833,9 4278,0 57,87 0,000
Residual Error 25 1848,1 73,9
Total 28 14681,9

Source DF Seq SS
East 1 153,8
South 1 349,5
North 1 12330,6
Exercises
(1)

Bina Nusantara University 83


Exercises
(2)

Bina Nusantara University 84


THANK YOU

Bina Nusantara University 85


Reference

Anthony Hayter. (2013). Probability and Statistics for


Engineers and Scientists. 04. Thomson Brooks/Cole.
Australia. ISBN : 978-1133112143.

Bina Nusantara University 86

Вам также может понравиться