Академический Документы
Профессиональный Документы
Культура Документы
trgl
.v
Age ofa planl Quantity of fruit produced
Height of students Weight of students
wcight at the end ofa spring Lcngth ofthe spring
Ditunet(l ufstem of a plant Average leigth of leafof the plant
No. ofhrs spent studying Marks achieved
'firne 'l emperature ofcooling object
c8-l
Scatter Dingram
The most common and convcnicnt mcthod ofdisplaying a sel clfbivariate data is by means oi a
scatler diagram.
Wc trcat thc bivariatc pairs as sct ol(r, r) coordinalcs and plot thcm as a graph io obiain a scl ol'
points. Thlr scattcr diagram will revcal thc rclationship bctwccn thc two variablcs.
Eg 2 The marks ola class of l0 studcnts jn a Mathernatics cxamiratjon are give,l in the tablc
Student R (' D F] F (i H
12 84 50 42 33 50 69 8l 5o :15
onark in Paper 1)
v '/3 ,10
31 u3 42 60 63 59 92
(mark in Paper 2)
CC:
Step l: Entcr data
<stat><Edit><enteF
121t1) =
Before plotting,
. 'Y:" screen i unbold all ":" signs
c8-2
Step 2i Plot the data PlntZ Pl+tf
(sr',\'r'PLo1) Dr-{'
'JFE: E La fu
rt}'. {IF l-/
. Sct Plot to 'ON'
I iEL: Lr
<I :Plot 1...><cntcr>. ON>.:cntcr>
I isL: Lr
' Choosc type ofgraph 'scattcr plot' EFK: E +
' Xlist :Lr (-r-coordinates)
. Yljst: L2 L| coordinatcs) 1:Ll.'LZ
. Mark: Any
. .:TRACE>
, <Zoom><9:Zoom Stal':'
. (to vicw full plot)
lrlrl iv
90
80
/rl
EO
50
4rl
3rl
2D
10
0 -T
100
c8-3
Do it yourself
Qn l: The height and weight ofa class ofl0 students are given in the table below:
Student B c D E F G H I J
1.5 I.58 1.6 1.61 1.65 1.72 1.73 r.78 1.8 1.85
(Height in m)
v '70 '75
53 5'7 62 65 66 72 90 85
(weisht in ks)
Soln:
c8-4
Analvsis of Scatter Diagram
X
XX Xand I related in this way are said 1() have a
corelation.
xXX)0(
X
(linear relationship)
X i.e. as n gets larger, y gets larger
XX
X
XX
X corelation.
XX
X XX (linear relationship)
X X
X i.e. as r. gets larger,l, gets smaller
XX
XX n
iYtr
XX
correlation
(No clear relationship)
1XX X
c8-5
Wc rvi ll only be dealing with Iinear rclationship. I f points in the scatter didgram seem to lie near a
Notc:
. Scatter diagrarns arc uscd only for quantitative variables (i.e. height, mass, counts, ctc).
.lnterpretationofthcstrcngthsolclybasedonlhescallerdiagramissubjectivcanditcanbe
cleceiving when clifl_erent scales for the axcs arc used.
To measure the degree oflincar rclationship betweeD two variables r and 1 (which is called
correlation), a quantity called the , will be needed.
The estimatcd product-moment correlation coef'licient ofa sample is given by:
r 0-
> posilivecol_relation
E.g. 'l hc correlation bctwccn thc two scts ofvariables is saicl to be curvilinear. There is
no linear conelation and such a scatter diagram rvill give a vcry low valuc ofr (i.c. r
= 0). But therc is a curvilinear kind ofconelation or quadratic correlation bctwec:rr
the variahles
6. r is a lneasure of the degree of scatter and ," is indcpendent of the units in which the
data is measured- r is unalGcted bv chanscs in the scale of the axes and chanoes of
units of the variables.
c8-7
X
XX v XX ,.X
vX
xxx., x
X,(
X XiX
XX xXX
X
)o(
X
X
XX
XX
aXX X XX X
X XX
Eg 3a
The marks in Mathematics (.r) and Chemistry (/) obtained by ten randomly chosen JC 2 students
were taken and the summadsed data were given as follows
Find the product moment conelation coefficient / and comment on the value ofr obtained.
y- I'IY
Soln: r=
c8-8
Eg 3b
The data in the above example is given in the table below instead ofthe sumtnarised statistics. Find
the product momcnt correlation coefficient r and comml-nt on the value ofr obtained.
18 20 30 40 16 54 60 80 88 9)
J 42 54 60 54 62 68 80 66 80 t00
Soln:
IJse of GC to obtain r
Step I : Key in the data using <STAT> <EDIT>
:I
zg
30
t0
t6
5\
60
Lr(r) = 18
Step 2: Tum diagnostics on
<CATALOG> <DiagnosticOn>
Step 3: <STAT> <CALC> <8:LinReg(a+br.)> <LIST> <NAMES> <Lr> <LIST>
<NAMES> <Lr>
lnRp!l
!J=E+bX
d-Jo. I Jt rlErfJJ
Lr-. J-{a D ?rf I 7J
I.|-.(|+|+L'J(z.zo
r-.8626339159
c8-9
Do it yoursell
Qn 2: The height and wcight of a class of l0 students arc giverl in the table belo\.v:
Find the product moment correlation coefficicnt r and comment on the valuc ofr obtained.
Student B C D F F G H J
;-
1.5 1.58 1.6 1.63 t65 t1) 1.73 1.78 t.8 1.85
(l lcight in m)
v '75
53 5't 62 65 66 10 12. 90 85
(Weisht in kg)
Soln:
3. Regression Lines
/'mcasurcs how well the data fits a linear model. Ifthe fit is good, we can consider fbr-mulating an
cquation ol a straiglrt linc to model the relationship. This straight line is called a regression line.
(a) F-or any biva.iate set ofdata, connecting variables r and/, there are always qg_glliqJglX
defined reercssion Iines.
c8,10
Equation of the lcast squares regression line oft on x
y:a+bx
J = a + hy is obtained by finding values ofa and b such that lel is minimum. (e is the difference
aid a = t-b;
\-
t'^-\-
z-t.,
, xy
b= I ("-;X-v - t)
and r-t=b()l;-;) (in MFrs)
)t"-;)' -,, (I')'
z'^ n
Thus,y = t+b(,Y-t)
c8-l i
Note:
.
t.
\=4::!!, )=4
t"
Regrcssion linc passes through (t,t), the rnean olthe set ofbivariate data.
Eg.la 'lhe marks in Mathcmatics (.r) and flhemistry 0,) obteined by ten randomly chosen JC 2
students were takcn and thc surrrnariscd data wcre given as lollows
20
r12
Find the cquation oflhe estimated regression line oft on -y.
Soln:
Use ofGC to obtain regression line
Step I : Key in the data using <STA'|> <EDl l >
LI L3 1
rFI
20
\2
51r
]'I EO
\4 lrr
t6 EZ
5t 68
60 EO
Li(rl = 18
Step 2: Turn diagnostics on
<CATALOG> <Diagnosticon> <ENTER>
Step 3: <STAT> <CALC> <8'LinReg(a+bi)> <LIST> <NAMES> {Lr> <,>
<LIST> <NAMES> <L:> <ENTER>
I NREg
s=E+bx
???E
---?o 'ao??
Lr-. -rt l E 70I ?,J
t *-- a ++rJa .4a ztf,
r=.8625339159
c8-12
Eg 4b Suppose that the table in Eg 4a is not given and thc data is summarised as I-ollows
\'.\-,, {s28)(ob6)
L^. /-^,/,.1 18640 - "
t0
(b) Slope: For an increase of I in the Mathematics score, there is an increase of0.528
in the Chemistry score.
i/-intercept: A student is estimated to score 38.7 for Chomistry when he/she scores 0 for
Mathematics.
y- I'Ir 38640_
(528)(666)
10
il;'rtt;"arLl
\i{" , )\" n ,f;'ilF,*ff )
t-
Since r ry 0.863 , it indicates a high positive linear correlation between Mathematics and
Chernistry scores. llence, the predicted score is reliable.
c8 13
Do it Yoursell
Qn 3:
The ro. ofhours spent studying for a particular subject in a week and the marks obtained 1br a test
5 7 8 t0 1) 13 15 20 2t
(No. ofhours per week)
(a) Find the equation ofthe estimated regression lile of7 on:r.
(b) Inter?ret the slope andl-intercept in the context ofthe question.
(c) Estimate the no. of hourc a student needs to spend in order to achieve a mark of 80 in the
test. Comme[t on the reliability ofthe va]ue obtained-
Soln:
c8-14
Equatiop of the teast squares regression line of.r on r
\ " r:c+d/
.,
X
x""'x
''"
X'\
...t x
x -\
-;;
;. '(i,r)
\,.
.:/, X
-r - d!
c+ (lea:;l .tqLter.ts rcgresston li)lc oJ x on y)
,r: c + dI is obtained by finding values ofit ilncl b such that te2 is minimum.
(e is the di1l€rence betwcen the obscrvcd and cxpcctcd r, also known as residuals)
\-.!.,
L"L'
\r,
t(,.,-Xr t) or and y-1=d(x ;) (in MFls)
lrr rt'
I:u' _DI
Thus,J= t+d(,t-t)
Note:
\
t, y
t,,
-,
. Regression line passes through (t,t), the mean ofthc set ofbivariate data.
c8-15
' d is known ns the estimated regression coclticient (slope of !3aph).
Regression line of
v .), on -r
Regression line of
x on JL/
Tlc larger the numerical value ofr, the nearer the lines approach coincidence and
c8-16
No lihem correl.tion r = 0
Eg 5a Find the regression lines of/ onr and r on.), for the data below and also calculate the
v l0 l4 l2 13 15 t2 t3
Soln:
Step l: Placer values in Ll and.p values in L2.
<STAT> <EDIT>
Step 2: To get product moment corelation coefficient.
<Catalogue> <Diagnosticon> <ENTER>
c8-t 7
rnReg(B+bx) Lr,LrnRPg
z,Vrl Ic=a+bx
a=11.70403587
b=. 1S68986547
rr=. 1438282624
r=,37BlEB19E3
)=11.7+0.186,lj
Step 4: To get regrcssion line of-r ony.
<STAT> <CALC> <8:LinReg(a+br)> <LIST> <NAMES> <L2> <,> <LIST><NAMES>
<Lr> <.> <VARS> <Y-VARS> <1 TFLINCTION> <Yr> <ENTER>
lnHeg(E+bx) Lr,
r , Vzl 'J=E+bx
d- +. ,J.tiJ ?.4J ?J
Lt-- I OOJltlJl{]J
rz=.1438282624
r=.3781881983
x: - 4.34 + 0.769 y
r + 4.14
.. 'v - --l --:-- ) store thrs as Y'
0.7b9
Note: All above regression lines are stored in Yt, Y2 respectively so that the regression line can
be obtained graphically (Not really a must-do)
Soln:
Regression liney on rc is
Rcgression line -r ony is
c8 18
Eg 5b Find the regression lines ofjl on r and r on / for the data below and also calculate the
prcduct momcnt correlation coeffi cient.
38' :7
lx2 =2t0, )r =
n
Soln:
= 5.43 = 12.'7
I _,
\'-s.,
Z-^ Z.' {18)(8e)
495 : --,1--,1
s,,.-(I,)
z-' l r47- {84)'
n 1
t- I'Iu
-1
4es {J8)(8e)
, l[r"
/1r, _tI,, {Id]
, 1r 7t\
./[zro-:s llrr+z-8]
1l I
llt'- )lt' I
(Compare these a swerc with those you obtained using GC)
c8-19
Eg6
Soln:
(-r,,-t,r)
Regession line
ofy on r
x
. Identify the outlier data pair (J.r,.l,r)
. Remove data (xl,.t/r) ftorn CC
. Recalculate the corelation coeflicient for the revised data
. Recalculate the line ofregression ofy on r for the revised data.
c8-20
4. InterpolationandExtrapolation
Once thc rcgrcssion lincs are found, we can use them lor !!.Elp!4lli9&
Extrapolntion ol rhe sample should be used $rilll caution as the relationship bctwcen )aand I niay
Eg 7 (continucd from Eg 5a) In thc abovc cxanrplc 5a, find thc valuc of
(i) 1 when -r:5 (iiterpolatiol within the range of-r)
(ii) r. when-f :5 (extrapolation - outside the range of})
Soh:
(i) Frorn GC, when ,r = 5, -t = 12.(; trsing thc y on x regressidl line (Y tgraph)
(ii) From GC, when .1, = 5, .y = 0.5 using lhe x on ! regression lhe (y 2 graph)
Eg 8 'Ihe ages, x years and hcights, y cm, of l0 boys wcrc given as follows:
Soln:
(i) I inear (orrclirliun coell. bct. r & 1
s.\-,,
, _,, t-^ z-' I202.t 1 fel6l(r28r\
" ',
t0
899 8-'
rer.6l'lf
|66091
/rzsll')
r0^t0)
c8-21
(ii) Eqn- ofrcgrcssion line o1_r on r:
(iij)
c8 22
trg9
The averagc densities ofblackbirds (in pairs per thousand hcctarcs) ovcr vcry large lreas of
f'amland and ofwoodland arc shown, f-or the years 1976 1o 1982, in the table below.
Year t9'7 6 19'77 197ft 1919 | 9E0 lgSl 1982
Soln: (r)
Manurl meihod
. .y : l1 .7 + 0.226t
As extrapolation is boing caried out in this case, the lin(]ar cofielation may not be valid
outside ofthe range ofvalues. llence, the esfimate is un eliable.
crJ 23
Do it yourself
Qn 4:
The no. ofhours spent studying for a particular subject in a week and the marks obtained for a test
for 10 students are given in the table below:
Srudent R c D E F G H I J
5 l 8 l0 ll 12 t3 15 20 21
(No. ofhours per week)
v '73 '14 89
55 60 62 63 66 75 84
(Mark)
o) Estimate the no. ofhouN a student needs to spend in order to achieve full marks in the
test. Comment on the reliability ofthe value obtained.
Soln:
Obtain the least square €stimates for d and B using an equation of the form
(i) y=q+ Blogtar and
(ii) y=d+Px2
as a fit for the set ofdata shown above.
Determine which equation is a better fit, giving rcasons to support your answer.
c8-24
Soln: (i) .f -.1+ /loglr-r =
Kcy in the cltta lbr x, l and z jnto L,, l-, and L, rcspectively using <STAT>
<EDII'>
LI LZ L] 3
ET 5.5 st FTr{H
t8 6.1 7A -7991t
E6 8.5 E6 .9t9rrZ
It5 \-z E5 .6Zlt5
91 7.t 91 .859t3
EO 5.1 EO
95 9.6 9S .9EZZ7
rr =loB{Lt } lrttt=. 7481888?78-..
(ii)
Key in the clata 1'or -r, y and ; into L,, L, and Lr respeotively using <S1A'l>
<EDt'.l >
LI LZ L} ]
5.E
5.1
BT
7E
F*t
19.59
s.5 s5
Lt E5 17.5t
?.9 91. 5!r.76
5.1 EO t6.01
9.6 95 92.16
.f,6
Using GC,
Since the correlation coelficient ibr part (i) is larger than that in part (ii), there is a
much better positive linear conelation. Therefore, t =a+ / logr0 r is a better fit.
c8,25
6. MiscellaneousExamples
Eg 11
A random sampie ofeight pairs of values of). and.), is used to obtain the following equations ofthe
regression lines ofy on n and ofrc on J., respectively.
7x. t5t
_. 7
'I t0 l0
.t___v+20
6-
Seven pairs ofdata are given in the table.
l0 1l 12 l1 1'7
't4 ls
,7
-l 9 8 6 5 4 1
Find the sth pair ofvalues of(jr,./). Detemine the value ofthe product moment conelation
coefficient and comment on what its value implies about the 2 regression lines given above.
Let y be the value obtained by substituting a sample value ofr into the equation ofthe regression
line ofy onx. Evaluate fforeach ofthe eight values ofxand venfy that )(7 f)'=S.S.
For each ol the sample values ofx, I/'isgivenby y'=a+bir,where u*!!1, 6*-1.y7lro1"on
I0 l0
you say about the value of I(1,- f ')'] ?
Soln:
1 l5t
-lr=--J+-.-...--(l)
l0 l0
7
y= _-y + 20 ...... (2)
6
Using GC, r =
c8-26
Sincc r . 0.90,+,which is very close to I , it indicates a high negative lincar conclation between
,\ and I. Hcnce thc rcgrcssion lines are very close.
l0 1l 12 ll l1 l4 19 10
v 9 8 I 6 5 4 u
_. 1 l5r
t0 t0 8.1 7.4 61 1.4 1.2 5.3 1.8 8.1
Eg 12
The daily rate charged by a ca-hire firn varics with thc lcngth ofthc hirc period. Thc finr-r's
,r l)ays
Daily
149 119 115 11). 109 105 103 10i
Rate $.1
F or the appropriate mod{-l, calculate the least squares estimates of a and b. Find also the product
lnolrlent corelation coefficient and commeDt on the suitabilitv ofthe modcl-
c8-2',1
Soln:
. Entcr thc data into the GCI as in two lists (say Iand y)
. With the command Diagnosticon, on thc Homc Screen, find ilny regressir'n ei.luation.
(Follow previous exarnple to find the regression equalior)
Your scrccn shot should look like this:
LinEes
v=EX+h
a= -. 4986649635
h=128.5658301
rr=.317465457?
F= -. 56344la73la9
(i) The scatter plot ofl, and r shows that the relationship betwecnJ", and x is non-linear.
Mor€over, the / value indicates a low negative linear col.(rlation. Hcnce, the
regression line ofJL, on x is not suitable.
(ii) It can bo €asily identified as C since-! tends to a limit lbr larger value ofr.
'l akc h
y-a.- iey a Ibz.Drawlher(e,ressic,nline_yonz.
c8-28
-l
he screen should look likc this:
b=47, B7?6J415
rt=,98.37837183
r'= - 9914587189
Now r = 0 992 rvhich is close to 1. Therefore there is a very high positive linear
coITelation which implies that the model is suitable.
Eg 13 l{esearch is being carried out iDto how the concenlration ofa dlug in the bloodsiream varics
with time, measurcd lrom when thlj dnrg is givcn- Observations at succcssivc timcs givc the
data shown in the fbllowing table.
'Iine (t minules) 90
aloncentralion
r microrrams Dcr litrc
It is given that thc valuc ofthc product momcnt corrclation cocf{icicnt for this data is
0.912, colrect to 3 decimal p].lces- The scatter diagram for the data is shown below-
100
rJ{l
60
40
2t
0 I {nimtcs)
r00 t50 200 2J0 300 150
crJ 29
Soln:
Equation ofthe regression line ofr on / :
When r:300, r:
It is not a suitable model as the concentration cannot bc a negative value.
Y= 4.62 0 0123t
As / is close to I, the regression lines ofl on I and l on 1 are almost idertical, therelore we can
usel on I to estimate L
c8 30