Академический Документы
Профессиональный Документы
Культура Документы
The objective of many investigations is to understand and explain the relationship among variables.
Frequently, one wants to know how and to what extent a certain variable (response variable) is related to a
set of other variables (explanatory variables).
Regression analysis helps us to determine the nature and the strength of relationship among variables.
Types of relationship:
i) Deterministic relationship also called functional relationship
ii) Probabilistic relationship also called statistical relationship
In deterministic relationship the relationship between two variables is known exactly such as
a) Area of a circle= r2
b) F=k(m1m2/r2) (Newton’s law of gravity)
c)The relationship between dollar sales (Y) of a product sold at a fixed price and the number of units sold.
In statistical relationship the relation between variables is not know exactly and we have to approximate
the relationship and develop models that characterize their main features. Regression analysis is
concerned with developing such “approximating” models.
For example, in a chemical process the yield of product is related to the operating temperature, it may be
of interest to build a model relating yield to temperature and then use the model for prediction, process
optimization, or process control.
The word regression is used to investigate the dependence of one variable called the dependent variable
denoted by Y, on one or more variables, called independent variables denoted by X’s and provides an
equation to be used for estimating or predicting the average value of the dependent variable from the
known values of the independent variables. When we study the dependence of a variable on a single
independent variable, it is called simple regression. Where as the dependence of a variable on two or more
than two independent variables is called multiple regression. When the parameters in the model are in
linear form, then we say that model is linear.
The dependent variable is also called the predictand, the response , the regressand, where as the
independent variable is also called the predictor ,the explanatory or the regressor variable.
The regression analysis is generally classified into two kinds.
1. Linear Regression
Simple Linear Regression
Multiple Linear Regression
Curvi Linear Regression
2. Nonlinear Regression
Intrinsically Linear
Intrinsically Non-Linear
Linear:- The regression model is linear if the parameter in the model are in linear form (that is no
parameter appears as an exponent or is multiplied or divided by any other parameter). Otherwise, non-
linear model.
Suppose
Y 1 X 1 2 x 2 where & are parameters. It is a linear model
But if
X
Y= X or Y= it is non-linear.
Non Linear Model:- The non linear model that can be linearized (that is it can be converted into linear
model) by an appropriate transformation is called intrinsically linear and those that can not be so
transformed is called intrinsically non-linear.
e.g. Y= X Apply log on both sides Log(Y) = Log( )+ Log(X)
X
Y= Apply log on both sides Log(Y) = Log( )+ X Log( )
Regressor:- The variable that forms the basis of estimation or prediction is called the regressor. It is also
called independent variable, or explanatory or controlled or predictor variable, usually denoted by X.
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 1
Regressand:- The variable whose resulting values depends upon the known values of independent
variable, is called regressand. It is also called response, dependent, or random variable, usually denoted by
Y.
In simple regression, the dependence of response variable (Y) is investigated on only one
regressor (X). if the relationship of these variables can be described by a straight line, it is termed as
simple linear regression.
The population simple linear regression model is defined as:
b 0 Y b1 X
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 2
Example: - The following data are the sparrow wing length in cm at various times in days after hatching
Wing Age XY X2 Y2 Y^ e=Y-Y^ e2
Length (X)
(Y)
1.4 3 4.2 9 1.96 1.525 -0.125 0.015625
1.5 4 6.0 16 2.25 1.795 -0.295 0.087025
2.2 5 11 25 4.84 2.065 0.135 0.018225
2.4 6 14.4 36 5.76 2.335 0.065 0.004225
3.1 8 24.8 64 9.61 2.875 0.225 0.050625
3.2 9 28.8 81 10.24 3.145 0.055 0.003025
3.2 10 32.0 100 10.24 3.415 -0.215 0.046225
3.9 11 42.9 121 15.21 3.685 0.215 0.046225
4.1 12 49.2 144 16.81 3.955 0.145 0.021025
4.7 14 65.8 196 22.09 4.495 0.205 0.042025
4.5 15 67.5 225 20.25 4.765 -0.265 0.070225
5.2 16 83.2 256 27.04 5.035 0.165 0.027225
5.0 17 85.0 289 25.00 5.305 -0.305 0.093025
44.4 130 514.80 1562 171.3 44.395 0.005 0.525
(i):- Draw scatter plot for the data
(ii):- Fit simple linear regression and interpret the parameters
(iii):-Find Standard error of estimate, SE(b0) and SE(b1).
(iv):-Test the hypothesis that there is no linear relation between Y and X. i.e 1=0
(v):- Test the hypothesis that 0=0.95
(vi):-Construct 90% C.I for regression parameters.
(vii):-Perform Analysis of Variance. Calculate coefficient of determination and interpret it.
(viii):- Test the hypothesis that the mean wing length of 13 day-old birds in the population is
4cm. Also find 95%C.I for mean value of Y when X=13.
(ix):-Test the hypothesis that the wing length of one 13 day-old birds in the population is 4.2 cm.
Also Construct 95 % C.I for single value of Y when X=13.
Solution:-
6
4
2
0
0 2 4 6 8 10 12 14
age (days)
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 3
X 10 Y 3.415
XY n 70.8
n ( X )( Y )
S ( XY ) ( X i X )(Yi Y )
i 1
( X )2
S ( XX ) ( X i X ) 2 X 2 262
n
( Y )2
S (YY ) (Yi Y ) Y 2 2
19.6569
n
S ( XY )
b1 0.270 cm/day
S ( XX )
bo Y b1 X 0.715 cm
So estimated simple linear regression equation is
Y=0.715 + 0.270 X
Interpretation of estimated regression parameter
The value of b1=0.270, indicates that the average wing length is expected to increase by 0.270 cm
with each one day increase in age.
The observed range of age(Explanatory Variable) in the experiment was 3 to 17 days(i.e scope of the
model), therefore it would be an unreasonable extrapolation to expect this rate of increase in wing length
to continue if number of days were to increase. It is safe to use the results of regression only within the
range of the observed value of the independent variable only (i.e within the scope of the model).
In regression equation b0=0.715, is the average wing length when age=0 day. In this example since
scope of the model does not cover x=0 so b0 does not have any particular meaning as a separate term
in the regression equation.
NOTE: Interpolation and Extrapolation
Interpolation is making a prediction within the range of values of the predictor in the sample used to
generate the model. Interpolation is generally safe. Extrapolation is making a prediction outside the range
of values of the predictor in the sample used to generate the model. The more removed the prediction is
from the range of values used to fit the model, the riskier the prediction becomes because there is no way
to check that the relationship continues to be linear
2 (Y Y) 2
Y 2
boY b1XY
S e
n 2
0.048 OR
n 2
0.525/11 0.048
Se.=0.218
1 X2 1
SE(b0) Se 0.148 SE(b1) Se 0.0135
n S(XX) S(XX)
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 4
Inference in Simple Linear Regression (From samples to population)
Generally, more is sought in regression analysis than a description of observed data. One usually wishes
to draw inferences about the relationship of the variables in the population from which the sample was
taken. To draw inferences about population values based on sample results, the following assumptions are
needed.
Linearity
Equal Variances for error
Independence of errors
Normality of errors
The slope and the intercept estimated from a single sample typically differ from the population values and
vary from sample to sample. To use these estimates for inference about the population values, the
sampling distributions of the two statistics are needed. When the assumptions of the linear regression
model are met, the sampling distribution of bo & b1are normal with mean 0 and 1 with standard errors
1 X2 1
SE (b0) Se SE (b1) S e
n S(X , X ) S(X , X )
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 5
90% C.I for 1
b1 t / 2( n 2) SE (b1) 0.270 t.05(11) 0.0135
(0.2458 , 0.2942)
90% C.I can be interpret as If we take 100 samples of the same size under the same conditions and
compute 100 C.I’s about parameter, one from each sample, then 90 such C.Is will contain the parameter
(i.e not all the constructed C.Is)
Confidence interval estimate of a parameter is more informative than point estimate because it reflects the
precision of the estimate.
The width of the C.I (i.e U.L – L.L)is called precision of the estimate. The precision can be increased
either by decreasing the confidence level or by increasing the sample size.
Confidence level C.I Width
99% (0.2281,0.3119) 0.0838
95% (0.2403,0.2997) 0.0594
90% (0.2458,0.2942) 0.0484
90% C.I for 0
b1 t / 2( n 2) SE (b0) 0.715 t .05(11) 0.148
(0.4492 , 0.9808 )
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 6
Re g .SS 19.1322
R2 x100 x100 97.33%
TotalSS 19.6569
The value of R2, indicates that about 97% variation in the dependent variable has been explained by the
linear relationship with X and remaining are due to some other unknown factors.
Yˆ13 t / 2( n 2) SE (Yˆ13 ) 4.225 (2.201)0.073
(4.064 , 4.386)
1 ( Xo X ) 2
Y13=0.715 + 0.270 (13)=4.225 SE (Yˆ )1 Se 1 0.230
n S ( XX )
1) Construction of hypotheses
Ho : Y13 = 4.2
H1: Y13 4.2
2) Level of significance
= 5%
3) TEST STATISTIC
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 7
Transformation to a straight line
It is easy to deal with the regression, which is linear in parameters, but in some situations the
models are non-linear. The non-linear models can be divided into two types
(1):-Intrinsically Linear (2): - Intrinsically Non-Linear models
The models that can be transformed in to linear models after applying some suitable transformation are
called intrinsically linear models and the models that can not be transformed in to linear models are called
intrinsically non-linear models.
Following are the examples of some common non-linear models with suitable transformation to convert
them into linear models:
Non-linear Form Transformation Linear model
1. Y aX b
1. Log (Y ) Log ( a ) bLog ( X ) 1. Y * a * bX *
4. Y ae bX 4. Ln(Y ) Ln(a ) bX 4. Y * a * bX
5. Y a b X 5. Y a bX * X X* 5. Y a bX *
Y
6. aX b 6. Y * b aX
6. Y aX bX
2
X
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 8
Example:- The number (Y) of bacteria per unit volume present in a culture after X hours is given in the
following table
Y X Log(Y)=Y* XY* X2
32 0 1.50515 0 0
47 1 1.6721 1.6721 1
65 2 1.81291 3.6258 4
92 3 1.96379 5.8914 9
132 4 2.12057 8.4823 16
190 5 2.27875 11.3938 25
275 6 2.43933 14.636 36
833 21 13.7926 45.7014 91
Fit a leasr square curve having the form Y=abX to the data. Estimate the value of Y when X=7.
We have to estimate a model Y=abX for which transformed line takes the form:
Log (Y ) Log (a ) XLog (b)
Y*= a* + b* X
S ( XY *) 4.33
b* 0.154
S ( XX ) 28
a* Y * b * X 1.51
The regression equation is
Log(Y) = 1.51 + 0.154 X
Now
Log(a)=1.51 Log(b)=0.154
Antilog [Log(a)]=Antilog(1.51) = 32.36
Antilog[Log(b)]=Antilog(0.154) = 1.43
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 9
Multiple Linear Regression
Multiple linear regression is a relationship that describes the dependence of mean values of the response
variable (y) for given values of two or more than two independent variable (X)
There are many applications where many explanatory variables affect the dependent var. for
example
1) Yield of a crop depend upon the fertility of the land, dose of the fertilizer applied, quantity of seed
etc.
2) The grade point average of students depend on aptitude, mental ability , hours devoted to study,
type and nature of grading by teachers.
3) The systolic blood pressure of a person depends upon one’s weight, age, etc.
If there are only two independent variables than Multiple Regression Model is:
Y 1 X 1 2 x 2 Population Regression Model
Where
X1 & X2 are independent variables and Y is the dependent variable.
a: Y intercept
b1 & b2 also called partial regression coefficients.
Where a, b1, b2 can be estimated from sample information as:
S ( X 2 , X 2 )S ( X 1 , Y ) S ( X 1 , X 2 )S ( X 2 , Y )
b1
S ( X 1 , X 1 ) S ( X 2 , X 2 ) [ S ( X 1 , X 2 )]2
S ( X 1 , X 1 )S ( X 2 ,Y ) S ( X 1 , X 2 )S ( X 1 ,Y )
b2
S ( X 1 , X 1 ) S ( X 2 , X 2 ) [ S ( X 1 , X 2 )] 2
bo Y b1 X 1 b 2 X 2
Interpretation of regression coefficients:
a is the mean value of Y when X1=X2=0
b1 is average change (increase or decrease) in response variable Y for one unit increase in the
explanatory variable X1 when the effect of X2 is held constant.
b2 measures the average change in Y for unit increase in X2 when the effect of X1 is held constant.
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 10
EXAMPLE: The following data represent the performance of a chemical process as a function of several
controllable process variables:
CO2 Solvent Hydrogen
Product Total Consumption Y2 X 12 X 22 X1Y X2 Y X1X2
Y X1 X2
36.98 2227.25 2.06 1367.52 4960643 4.2436 82364 76.179 4588.1
13.74 434.90 1.33 188.79 189138 1.7689 5976 18.274 578.4
10.08 481.19 0.97 101.61 231544 0.9409 4850 9.778 466.8
8.53 247.14 0.62 72.76 61078 0.3844 2108 5.289 153.2
36.42 1645.89 0.22 1326.42 2708954 0.0484 59943 8.012 362.1
26.59 907.59 0.76 707.03 823720 0.5776 24133 20.208 689.8
19.07 608.05 1.71 363.66 369725 2.9241 11596 32.610 1039.8
5.96 380.55 3.93 35.52 144818 15.4449 2268 23.423 1495.6
15.52 213.40 1.97 240.87 45540 3.8809 3312 30.574 420.4
56.61 2043.36 5.08 3204.69 4175320 25.8064 115675 287.579 10380.3
229.50 9189.32 18.65 7608.87 13710479 56.0201 312224 511.926 20174.4
Y^ e Y Yˆ e2
47.3928 -10.41 108.42633
13.30172 0.44 0.1920925
13.68672 -3.61 13.008438
8.901963 -0.37 0.1383564
34.23853 2.18 4.7588268
21.29525 5.29 28.034412
16.99981 2.07 4.2857005
15.69707 -9.74 94.810435
10.04365 5.48 29.990376
47.9425 8.67 75.125472
229.50 0.00 358.77
1. Fit a multiple linear regression relating CO2 product to total solvent and hydrogen consumption
and calculate the value of R2
2. Test the significance of Regression
3. Test the significance of partial regression coefficients and construct confidence intervals
4. Can we conclude that total solvent and hydrogen consumption are sufficient number of
independent variables for explaining the variability in CO 2 product?
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 11
43.9475
Y
18.6225
1723.79
X1
716.86
3.865
X2
1.435
25 75 6. 8
6 .7 9 35 65
. 62 . 94 71 23 1.4 3 .8
18 43 17
S ( X 1 , X 1 )S ( X 2 , Y ) S ( X 1 , X 2 )S ( X 1 , Y ) 134212437.4
b2 1.31
D S ( X 1 , X 1 ) S ( X 2 , X 2 ) [ S ( X 1 , X 2 )] 2
102633124.2
bo Y b1 X 1 b 2 X 2 3.52
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 12
ANOVA TABLE
Degree of Mean Sum
Source Of Variation Sum of Squares
Freedom of Squares Fcal Ftab
(S.O.V) (SS)
(DF) (MSS=SS/df)
Regression 2 1983.07 991.54 F.05(2,7)=4.74
19.35*
Error 7 358.77 51.25
TOTAL 9 2341.84
Coefficient of Determination
The co-efficient of determination tells us the proportion of variation in the dependent variable explained
by the independent variables
Re g .SS 1983.07
R2 x100 x100 84.7%
TotalSS 12341.84
The value of R2, indicates that about 85 % variation in the dependent variable has been explained by the
linear relationship with X1 & X2 and remaining are due to some other unknown factors.
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 13
Test of hypothesis for 2
1) Construction of hypotheses
Ho : 2 = 0
H1: 2 0
2) Level of significance
= 5%
3) TEST STATISTIC
b 2 2 1.31 0
t 0.81
SE (b 2) 1.622
S ( X 1, X 1) 5266118.8
where S .E (b 2) S e 7.16 1.622
S ( X 1, X 1) S ( X 2, X 2) [ S ( X 1, X 2)] 2
102633124.2
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 14
Polynomial Regression
Example:- The data is regarding time (in weeks)[X] and the corresponding yield ( in Kg) [Y]of cotton per
plot in the specified period
Put X=X1 and X2=X2
X 1
2
S ( X 1, X 1) X 1 2
82.50
n
X 2
2
S ( X 2, X 2) X 2 2
10510.50
n
( X 1)( Y ) (55)(1266)
S ( X 1, Y ) X 1Y 6860 103
n 10
( X 2)( Y ) (385)(1266)
S ( X 2, Y ) X 2Y 45974 2767
n 10
( X 1)( X 2)
S ( X 1, X 2) X 1X 2 907.50
n
( Y ) 2
S (Y , Y ) Y
2
6402.40
n
D=43560
S ( X 2, X 2) S ( X 1, Y ) S ( X 1, X 2) S ( X 2, Y )
b1 32.7932
S ( X 1, X 1)S ( X 2, X 2) S ( X 1, X 2)
2
D
S ( X 1, X 1) S ( X 2, Y ) S ( X 1, X 2) S ( X 1, Y )
b2 3.0947
S ( X 1, X 1)S ( X 2, X 2) S ( X 1, X 2)
2
D
bo Y b1 X 1 b 2 X 2 =65.3800
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 15
ANALYSIS OF VARIANCE
The hypothesis 1=2=0 may be tested by analysis of variance procedure.
Total SS=S(Y,Y)=6402.4
Reg.SS =b1 S(X1,Y)+ b2 S(X2,Y)=(32.7932)(-103)+(-3.0947)(-2767)=5185.3
ANOVA TABLE
Source Of Variation Degree of Sum of Squares Mean Sum Fcal Ftab
(S.O.V) Freedom (SS) of Squares
(DF) (MSS=SS/df)
2
Regression (X , X ) 2 5185.3 2592.7 14.91* F.05(2,7)=4.74
Error 7 1217.1 173.9
TOTAL 9 6402.40
Test of significance of Quadratic regression
1) Construction of hypotheses
Ho : 2 = 0
H1: 2 0
2) Level of significance
= 5%
3) TEST STATISTIC
b 2 2 3.0947 0
t 5.39
SE (b 2) 0.5738
S ( X 1, X 1) 82.50
where S .E (b 2) S e 13.19 0.5738
S ( X 1, X 1) S ( X 2, X 2) [ S ( X 1, X 2)] 2
43560
Coefficient of Determination
The co-efficient of determination tells us the proportion of variation in the dependent variable explained
by the independent variable
Re g .SS 5185.33
R2 x100 x100 81%
TotalSS 6404.40
The 2nd degree curve is appropriate for the above data set
b1
The value of X at which maximum or minimum value of quadratic regression occur X =5.30
2b 2
b12
The maximum or minimum value of Y is bo =152.28
4b 2
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 16
Comparison of Ist degree and 2nd degree curve
SCATTER PLOT
170
160
150
140
130
y
120
110
100
90
0 1 2 3 4 5 6 7 8 9 10
x1
SIMPLE LINEAR REGRESSION ( 1st degree curve) Curvilinear REGRESSION ( 2nd degree curve)
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 17
CORRELATION ANALYSIS
SIMPLE CORRELATION
Q.1. The following data represent the wing length and tail length of sparrows
Wing length Tail length
(X) (Y) XY X2 Y2
10.4 7.4 76.96 108.16 54.76
10.8 7.6 82.08 116.64 57.76
11.1 7.9 87.69 123.21 62.41
10.2 7.2 73.44 104.04 51.84
10.3 7.4 76.22 106.09 54.76
10.2 7.1 72.42 104.04 50.41
10.7 7.4 79.18 114.49 54.76
10.5 7.2 75.6 110.25 51.84
10.8 7.8 84.24 116.64 60.84
11.2 7.7 86.24 125.44 59.29
10.6 7.8 82.68 112.36 60.84
11.4 8.3 94.62 129.96 68.89
128.2 90.8 971.37 1371.31 688.40
X Y XY X2 Y2
(a) Find Coefficient of Correlation between wing length and Tail length.
(b) Test the hypothesis H 0 : 12 0
Solution
(a) Coefficient of Correlation between wing length and Tail length
X 1 0 .6 8 Y 7 .5 7
S XY XY nX Y 1 .3 2
SX2 X n( X ) 2 2
1 .7 2
SY 2 Y n (Y )
2 2
1 .3 5
S XY
r 0 .8 6 6
S X 2 SY 2
(b) Test of hypothesis for =0
H 0 : 12 0
1) Construction of hypotheses:
H1 : 12 0
2) Level of significance : = 5%
r12 12
3) TEST STATISTIC t
SE(r12 )
0.866 0 1 r122 1 0.8662
4) Calculation: tcal 5.47 where SE(r12 ) 0.158
0.158 n2 12 2
5) Critical Region:- tTab t 2(n2)df t0.025(10)df 2.228
6) Conclusion:- Since tcal tTab so we reject Ho and conclude that there is significant linear relationship
between wing and tail length.
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 18
Q.2. A random sample of 10 families had the following income and expenditure per week
Let Y=Family Expenditure and X=Family Income
Y X Y2 X2 XY
7 20 49 400 140
9 30 81 900 270
8 33 64 1089 264
11 40 121 1600 440
5 15 25 225 75
4 13 16 169 52
8 26 64 676 208
10 38 100 1444 380
9 35 81 1225 315
10 43 100 1849 430
81 293 701 9577 2574
X Y Y2 X2 XY
X 8.1 Y 29.3
S XY XY nXY 200.70
S X 2 X 2 n ( X ) 2 992.75
S Y 2 Y 2 n (Y ) 2 44.90
S XY 2 0 0 .7 0 2 0 0 .7 0
r1 2 0 .9 5
S X 2 SY 2 (9 9 2 .7 5 )( 4 4 .9 0 ) 2 1 1 .1 3
H 0 : 12 0
1) Construction of hypotheses:
H1 : 12 0
2) Level of significance : = 5%
r12 12
3) TEST STATISTIC t
SE(r12 )
0.95 0 1 r122 1 0.952
4) Calculation: tcal 3..04 where SE(r12 ) 0.3122
0.3122 n2 10 2
5) Critical Region:- tTab t 2(n2)df t0.025(8)df 2.306
6) Conclusion:- Since tcal tTab so we reject Ho and conclude that there is significant linear relationship
between City size and development expenditure.
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 19
Q.3.The following data represent the city size and Expenditure.
Let X=City size Y= Expenditures
X Y X2 Y2 XY
30 65 900 4225 1950
50 77 2500 5929 3850
75 79 5625 6241 5925
100 80 10000 6400 8000
150 82 22500 6724 12300
200 90 40000 8100 18000
175 84 30625 7056 14700
120 81 14400 6561 9720
900 638 126550 51236 74445
112.5 79.75 15818.75 6404.5 9305.625
(a) Find Coefficient of Correlation between wing length and Tail length.
(b) Test the hypothesis H 0 : 12 0
Solution. (a)
X 1 1 2 .5 0 Y 7 9 .7 5
S XY XY nXY 2 6 6 3 .6 2
SX2 X n( X ) 2 2
25400
SY 2 Y n (Y )
2 2
3 5 5 .5 0
S XY 2 6 6 3 .6 2 2 6 6 3 .6 2
r1 2 0 .8 9
S X 2 SY 2 ( 2 5 4 0 0 )(3 5 5 .5 0 3 0 0 4 .9 4
H 0 : 12 0
1) Construction of hypotheses:
H1 : 12 0
2) Level of significance : = 5%
r12 12
3) TEST STATISTIC t
SE(r12 )
0.89 0 1 r122 1 0.892
4) Calculation: tcal 4.78 where SE(r12 ) 0.1861
0.1861 n2 8 2
5) Critical Region:- tTab t 2(n2)df t0.025(6)df 2.447
6) Conclusion:- Since tcal tTab so we reject Ho and conclude that there is significant linear relationship
between City size and development expenditure.
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 20
PARTIAL CORRELATION
Q.1. :- Suppose that X1=Fish Length X2=Fish weight X3=Fish age and r 12=0.60 , r13 =0.70,
r23=0.65 n=15
(a) Find partial correlation coefficient between X1 and X2 while the effect of X3 kept constant. Or find r12.3 .
(b) Test the hypothesis H 0 : 12.3 0
Solution. (a)
r12 r13r23 (0.60) (0.70)(0.65)
r12.3 0.27
(1 r13 )(1 r23 )
2 2
(1 0.70 )(1 0.65 )
2 2
Q.2. :- Suppose that X1=Fish Length X2=Fish weight X3=Fish age and r 12=0.60 , r13 =0.70,
r23=0.65 n=15
(a) Find partial correlation coefficient between X1 and X3 while the effect of X2 kept constant. Or find r13.2 .
(b) Test the hypothesis H 0 : 13.2 0
Solution. (a)
r13 r12r23 (0.70) (0.60)(0.65)
r13.2 0.51
(1 r12 )(1 r )
2
23
2
(1 0.60 )(1 0.65 )
2 2
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 21
MULTIPLE CORRELATION
Q.3. :- Suppose that X1=Fish Length X2=Fish weight X3=Fish age and r 12=0.60 , r13 =0.70,
r23=0.65 n=15
(a) Find Multiple correlation coefficient between X1 and joint effect of X2 and X3.
(b) Find R1.23 and Test the hypothesis H 0 : 1.23 0
Solution. (a)
3) TEST STATISTIC F
k(1 R1.23
2
)
(15 2 1)(0.73)2
4) Calculation: Fcal 6.85
2[1 (0.73)2 ]
5) Critical Region:- tTab t 2(n2k )df t0.025(1521)df t0.025(12)df 2.179
6) Conclusion:- Since tcal tTab so we don’t reject Ho and conclude that there is a non- significant
linear relationship between X and Y.
STAT-602 [Muhammad Imran Khan is thankful for the contributors of these notes] Page 22