Вы находитесь на странице: 1из 9

3/23/2011

INTRODUCTION

• Regression and correlation analysis are used


to describe a relationship between dependent
variable and independent variable(s).
REGRESSION • Example:
(i) To describe the relationship between
& household expenditure and number of
children.
CORRELATION (ii) To predict the number of complaints
based on type of room.
(iii) To investigate the relationship between
house price and house area, number of
bedrooms, number of bathrooms and
location of the house.
1
Nornadiah Mohd Razali/FSKM/UiTM 2

INTRODUCTION
SCATTER DIAGRAM
• Regression analysis - produces an equation
that express the dependent variable (Y) as a
function of independent variables (X). • In a scatter diagram, the independent variable is plotted along the
horizontal X-axis and the dependent variable is plotted along the
vertical Y-axis.
• Correlation analysis - measures the
• Information available in a scatter plot:
strength of a relationship.
(i) Type of a relationship
(Linear / Nonlinear / No relationship)
• Scatter Diagram / Scatter plot - An (ii) Direction of a relationship
initial step to investigate the relationship (Positive / Negative)
between the dependent and independent
• The less scattered the points in the scatter diagram, the higher
variable. is the degree of relationship between the dependent and
independent variables.

Nornadiah Mohd Razali/FSKM/UiTM 3 Nornadiah Mohd Razali/FSKM/UiTM 4

Y Y
X
Positive Linear Relationship
Y

X X
X
Nonlinear Relationship
Negative Linear Relationship
Nornadiah Mohd Razali/FSKM/UiTM 5 Nornadiah Mohd Razali/FSKM/UiTM 6

1
3/23/2011

CONSTRUCTING A SCATTER DIAGRAM


Y
EXAMPLE 1

A dietician wants to check the association between height


(in inches) and weight (in pounds) of a baseball player. A
sample of 9 major league baseball players is selected at
random. The data were recorded as follows:

Height 73 69 72 70 72 66 72 72 74
X Weight 201 170 180 200 190 175 205 185 186

No Relationship Construct a scatter plot for the above data. Comment on


the relationship between the heights and weights.

Nornadiah Mohd Razali/FSKM/UiTM 7 Nornadiah Mohd Razali/FSKM/UiTM 8

CONSTRUCTING A SCATTER DIAGRAM CORRELATION ANALYSIS


EXAMPLE 2

Suppose we take a sample of seven households from a low to


moderate income neighborhood and collect information on their
• In a correlation analysis, the correlation between two
incomes and food expenditures for the past month. Construct a
variables is measured by using a linear correlation
scatter diagram for the data obtained.
coefficient, r.
Income ($ ‘00) Food Expenditure ($’00)
35 9 • The correlation coefficient tells us the strength and
49 15 direction of a linear relationship between two variables.
21 7
39 11 • To measure a linear correlation between two quantitative
15 5 variables, we use the Pearson correlation coefficient.
28 8
25 9
Nornadiah Mohd Razali/FSKM/UiTM 9 Nornadiah Mohd Razali/FSKM/UiTM 10

CORRELATION ANALYSIS CORRELATION ANALYSIS

• r can only takes a value of between -1 and 1 i.e. -1 < r < 1


• The Pearson correlation coefficient is calculated as follows:
• How to determine the strength of a correlation based
on the value of r?
SS XY
r r value (ignore the sign) Interpretation
SS XX SSYY 0 No correlation
r  0.5 Weak/low correlation
where
 X  Y  0.5  r  0.7 Moderate correlation
 X Y , SS
2 2

SS XY   XY  XX
X2  , SSYY   Y 2  r  0.7 Strong/high correlation
n n n
1 Perfect correlation

Nornadiah Mohd Razali/FSKM/UiTM 11 Nornadiah Mohd Razali/FSKM/UiTM 12

2
3/23/2011

EXAMPLE 3 EXAMPLE 3

Refer to Example 1. Compute the Pearson coefficient of correlation


and interpret the result.

SS XY
 Y  _____  Y  _____  XY  _____  X  _____  X  _____ n  _____  r 
2 2

SS XX SSYY
SS XY   XY 
 X Y 
n
 X  2
Interpretation: There is a ______ , ________ linear correlation
SS XX   X 2  
n between __________ and ___________.
 Y 2
SSYY  Y 2  
n
Nornadiah Mohd Razali/FSKM/UiTM 13 Nornadiah Mohd Razali/FSKM/UiTM 14

EXAMPLE 4 EXAMPLE 4

Refer to Example 2. Compute the Pearson coefficient of correlation


and interpret the result.

SS XY
 Y  _____  Y  _____  XY  _____  X  _____  X  _____ n  _____  r 
2 2

SS XX SSYY
SS XY   XY 
 X Y 
n
 X 2 Interpretation: There is a ______ , ________ linear correlation
SS XX   X 
2

n between __________ and ___________.
 Y 2

SSYY   Y 2 
n
Nornadiah Mohd Razali/FSKM/UiTM 15 Nornadiah Mohd Razali/FSKM/UiTM 16

REGRESSION ANALYSIS • Regression analysis produces an equation


that express the dependent variable (Y)
as a function of independent variables (X).

• Regression analysis – analyze the relationship • The equation is called a linear regression
between dependent and independent variable (s). model.

• Simple linear regression – analyze the relationship • A regression model describes the
between one dependent variable and one independent relationship between the dependent and
variable. independent variables.

• Multiple linear regression – analyze the relationship • The regression line can be used to make
between one dependent variable and more than one a prediction about the value of y for a
independent variables. given value of x.

Nornadiah Mohd Razali/FSKM/UiTM 17 Nornadiah Mohd Razali/FSKM/UiTM 18

3
3/23/2011

ASSUMPTIONS OF A LINEAR
REGRESSION ANALYSIS
In general, a simple linear regression model
is written as:

y  A  Bx   1. The random error term,  has a zero mean.

Where y = Dependent variable 2. The random error terms are independent.


x = Independent variable
A = Y-intercept 3. The random error term is normally distributed.
B = Slope of the regression line /
Regression coefficient 4. The random error terms have a constant variance, σ2
 = Random error term

Nornadiah Mohd Razali/FSKM/UiTM 19 Nornadiah Mohd Razali/FSKM/UiTM 20

• In the linear regression model, the values of • Interpretation of the regression coefficient:
A and B are unknown. Therefore, we have to
estimate their values by using the least
square method. a in the regression model means:
• The regression model with the estimated The value of y when x=0
values of A and B is called an estimated (If x=0 is in the range of the dataset)
regression model/regression line and is
written as follows, No practical meaning
(If x=0 is not in the range of the dataset)
ŷ = a + bx

Nornadiah Mohd Razali/FSKM/UiTM 21 Nornadiah Mohd Razali/FSKM/UiTM 22

• By using the least square method, we can


b in the regression model means: estimate the values of A and B as follows:

The change in Y for a one unit change in X.


 xy 
 x  y 
SS xy n
b 
Positive values of b means, when X increase SS xx
x 2

 x  2

by 1 unit, Y increase by b units.


n
Negative values of b means, when X increase
by 1 unit, Y decrease by b units. y x
a  b  or a  y  bx
n  n 

Nornadiah Mohd Razali/FSKM/UiTM 23 Nornadiah Mohd Razali/FSKM/UiTM 24

4
3/23/2011

EXAMPLE 5 EXAMPLE 5

Refer to Example 1. Find the equation of the regression line and


interpret the values of the regression coefficient.

y x
From the previous calculation,
a  b 
 X  _____ Y  _____ SS xy  _____ SS xx  _____
n  n 

SS xy Interpretation:
 b 
SS xx
Interpretation:

Nornadiah Mohd Razali/FSKM/UiTM 25 Nornadiah Mohd Razali/FSKM/UiTM 26

EXAMPLE 6 PREDICTION

Refer to Example 2. Find the regression equation and • Given a value of X, we can predict the value of Y by using
interpret the values of the regression coefficient. the estimated regression equation.

Example 7:

Using the equation of the regression line found in Example


5, predict the weight of a football player whose height is
72 inches.

Using the equation of the regression line found in Example


6, predict the expenditure of a household with income
$3500.
Nornadiah Mohd Razali/FSKM/UiTM 27 Nornadiah Mohd Razali/FSKM/UiTM 28

COEFFICIENT OF
EXAMPLE 8
DETERMINATION

• Coefficient of determination, r2, measures the total Compute the coefficient of determination for Example 1.
variation in Y that is explained by the independent Interpret the values.
variable, X.
From the previous calculation,
SS xy
r2  b SS xy  _____ SS xx  _____ b  _____
SS yy
SS xy
 correlation coefficient, r  r 2  b
2

SS yy
• If r2 =0.80, this means that 80% of the total variation Interpretation:
in Y is explained by X.
Nornadiah Mohd Razali/FSKM/UiTM 29 Nornadiah Mohd Razali/FSKM/UiTM 30

5
3/23/2011

CHECKING THE
EXAMPLE 9
ASSUMPTION

Compute the coefficient of determination for Example 2. • To check the normality assumption, we use the normal
Interpret the values. probability plot or Q-Q plot.

From the previous calculation,


SS xy  _____ SS xx  _____ b  _____

SS xy
r 2  b
SS yy
Interpretation:

Nornadiah Mohd Razali/FSKM/UiTM 31 Nornadiah Mohd Razali/FSKM/UiTM 32

CHECKING THE EXAMPLE 10


ASSUMPTION APRIL 2010

• To check the assumption of variance, we use the plot of


residual versus predicted values.

Nornadiah Mohd Razali/FSKM/UiTM 33 Nornadiah Mohd Razali/FSKM/UiTM 34

EXAMPLE 10 EXAMPLE 10

Nornadiah Mohd Razali/FSKM/UiTM 35 Nornadiah Mohd Razali/FSKM/UiTM 36

6
3/23/2011

EXAMPLE 10 EXAMPLE 10

Nornadiah Mohd Razali/FSKM/UiTM 37 Nornadiah Mohd Razali/FSKM/UiTM 38

EXAMPLE 11
EXAMPLE 11
OCTOBER 2010

Nornadiah Mohd Razali/FSKM/UiTM 39 Nornadiah Mohd Razali/FSKM/UiTM 40

EXAMPLE 11 EXAMPLE 11

Nornadiah Mohd Razali/FSKM/UiTM 41 Nornadiah Mohd Razali/FSKM/UiTM 42

7
3/23/2011

EXERCISE 1 EXERCISE 1

Driving Experience Monthly Auto Insurance


(years) Premium
A random sample of eight drivers insured with a 5 64
company and having similar auto insurance 2 87
policies was selected. The following table lists 12 50
their driving experiences (in years) and monthly 9 71
15 44
auto insurance premiums.
6 56
25 42
16 60

Nornadiah Mohd Razali/FSKM/UiTM 43 Nornadiah Mohd Razali/FSKM/UiTM 44

EXERCISE 1 EXERCISE 2

a) Construct a scatter diagram for the data. Table below lists the amount (in million of RM) spent on
advertising and the total sales (in millions of RM) for the
b) Compute SSxx, SSyy, and SSxy.
year 2007 for a sample of six different hotels:
c) Calculate r and r2 and explain the values obtained.
Advertising Total Sales
d) Obtain a regression equation using the least square Expenditure
method. 2.0 47
e) Interpret the meaning of the values of a and b 1.6 35
calculated in c). 1.0 23
f) Draw the equation above on the graph in a). 3.5 74
g) Predict the monthly auto insurance premium for a 1.2 26
driver with 10 years of driving experience. 4.5 85
Nornadiah Mohd Razali/FSKM/UiTM 45 Nornadiah Mohd Razali/FSKM/UiTM 46

EXERCISE 2 EXERCISE 3

a) Construct a scatter diagram for the above data. A regression analysis was done to examine the relationship
Comment on the diagram. between the working experience (in years)of tourist
b) Compute/calculate the equation for the regression guides and their level of knowledge regarding the local
line. Interpret the coefficient of the regression line. places of interest. The following table gives the knowledge
c) Does the model useful in explaining the total sales? scores of 10 tourist guides and their working experience.
If yes, give your reason.
d) Forecast the total sales of a hotel that plans to
spend RM5 million on advertising for the year 2010.

Nornadiah Mohd Razali/FSKM/UiTM 47 Nornadiah Mohd Razali/FSKM/UiTM 48

8
3/23/2011

EXERCISE 3 EXERCISE 3

Working Experience Knowledge Scores


15 80 a) Construct a scatter diagram for the above data.
8 72 Comment on the diagram.
23 89
b) Determine the correlation coefficient. Explain on the
value obtained.
18 79
c) Calculate the least squares regression line. Interpret
14 75
the coefficient of the regression line.
7 70
d) Calculate the coefficient of determination. Interpret
21 88 the value.
12 75 e) Estimate the knowledge score of tourist with 20
16 82 years of working experience.
10 72
Nornadiah Mohd Razali/FSKM/UiTM 49 Nornadiah Mohd Razali/FSKM/UiTM 50

Вам также может понравиться