Вы находитесь на странице: 1из 59


|hcc.aodctAccq n
To I
i\111+ |:

t!'t N |,1J i+
!111.., IH.1'J!11!. hr t'I:G1!11 'I+:t+ 111,1 1\ |11i v.u y
|+ 1i
1 11 1i|:1ii|.111!1 .1J+ 1+JJ11 i 1I .\!1 J11i1J+~ tI`t1|i1\'i` l1+.`t1I.!|+Ji.\ j1++ii|+|| |i+J+JJ1, tJ
! l1i`|v1 t` wU11I thr I! ciuisi! Ihe |ulhl
|!I '!l
|`' 'c ' z2 `2 2
Iii| i\|!l!11 JJ'|J++|1 | !1I11 i+1++1+++1 ` .H Van Gorcwn
Tr:111,1,11, ,111,1111!1, |
II ltl |!I l1l 1111 1 ltdp1nidclel, Assen: Koninklijke Van Gorcum (200d)
Tra11,1,1111111 |' 11, l!I+ :t!JJ! l l111l11 l':1111 11 I 'l`., and Man[reclte Grotenhuis
!l1! 1 111

t !1 |1 11,, l1t ll11 !i1O|tt


MIrI U!1!!
r+ i,
|-t 1
[t+ViJ .\:i
I!IJI1t1 |
1 I 11111 \ \ 11 1l1r Netherlands
Profac 1
Statistical Tool 9
Statistical Data 11
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. o o o . o o o o o o. o. o o o. . o o o o o o o. o o o o o o o o. 1 1
Four Levels of Measurement . . . .. o. o o . o o o o o o o o o o o o o o o o o o o o o o o o o o. o o o o o . o. o o o o o o o o o o o 12
Selecting Uni ts of Analysi s: Random Sampli ng ............................ 15
Coll ecti ng Stati sti cal Data . o o o. o o o o. o. o o o o o o o o o o o. o o o . . o = o o. o o o o o. . . o o o . o o o o o o o o o .. o o o o 1 7
Data Quali ty . o. o. o. o . o o o o o o o o o . o o o o o o. o o o o o o... o o o .o .. o o o o o . . . o o o o .. .. o. o. . . .. o.. . .. o o.. . . . o o. o 1 9
From Col lecti ng Data to Answering Research Questi ons . o.... . . = o. o 22
Descriptive Statistics 23
Introduction o o o o o .. o .o . . . .. o .o. . .o o o o o .. . . o o . . .. o o o o . . . o o o o. . . . o o o. . .. o o o . . ... o o. o.. . o o o . . .. o . o . 23
2.2 Graphi cal Descri pti on of a Si ngl e Variable o o o . o o o o o o o o o o o o o o o o o o o o . o o o o o o o . o o 23
Bar Chart . . o o o o o . . o o o .. . . o o.. . .. o o o o . . . o o o .. . o. o. o o. .. o o o o o .. o. o. . . .. . . o. .. . . o o o .. . o o o o . . .. 23
Pie Chart o. o o .. o. o o o o .. o o o o o o o o o o o o. o. o. o. o o o. o o o o. o o o o o o o o. o o o o o o o o o o o. o. o. o. o o o o. o o o o o o o o 24
Hi stogram o o o . . . . o o o o . . . . o . o . . . o. o o o . o o . o o o. . o o o o o o .. o o o o. .. o. o. o . o. o. o : ................... 25
Stem-and-leaf Plot o o. o . o o o o o o. o o o. o o o o o. o o. o . . o.. . o . . .. . . o. .. .. o. o. . o o. o o o . . . . .. . . . . 26
2.3 Numeri cal Descri pti on of a Si ngle Vari able . o. o. o. . .. . . . . . . . . . . . . . . . . . . . . . . . . . 27
Frequency Table o. o . o o o o o o o o o o o o o o. o o o. . . . . . . . . . . . . .. . . . . . . .. o. . . . o . . . . . . . . . . . . . . .. .. . 28
2.3.1 Measures of Central Tendency o o o o o . . . . . . . . . . . . . . . . . .. . . . . o... o. o . o. o o . ....... 29
Mode o o o o o o o o o o o o o o o o o o o o o o o o o o = o = o o o o o o o o o = = Z'
Medi an . o. o..... ........ o.. o. o. o . o... o...... o. o. o. . o ............o . o............... ..... . o . .'
Mean o o o . o o o o o o. o. o o o o o . o. o. o o o o o. o o o o o o o o o o. o. o. o.. o o . o o o o o o o. o. o. o =.. o.................. ... ''
Measures of Variabi l i ty o o. o o o. o o o o... o. o...... .. . .. . ..... .. ........... ..... ......, .''
Range o o. o o o o o o o o o o o o o o o o o o o o o o. o. o. o o o o o o o o o o o o o o. o o o o . o. o o . .. .. . o o............... . .. . . ''
l nterquartile Range (IQR)...... . . .. .. . . . . . . . . . .. . . .....................
Detecti ng Outliers with Box pl ots . . . . . . . .. . . . . . . . .................
Standard Deviation and Vari ance . . . . . . . . . . . . .. ..................
Measures of Relati ve Standi ng . . . . . . . . . . . . . . . . . . . .................
Percenti les .. o o . . o o o o o . . o o.. ................... ....... ................. .....
Z-scores o. o ... o. o. . . . o .... . .. o.. . . ... o. . . ... .. .. . . . . . . . . . .. . . . . . . . o o ........ I \
Chebyshev's Rule and Empi ri cal Rule . . . . . . . . . . . . . . ........... ....
Stati sti cal Rel ations between Two Vari ables . . . . . . . .. . . .. . ...... .
Graphi cal Description of a Bi vari ate Relati on ................ . .
Box Plot ........................... ........ .... .. . . . . . . . . . . . . . . . . . . . . . . ......... .
Scatter Plot. . .. . ..... .. ..... . . ...... ........ ..... .. . . ... .. . . ........ ........... .... ..
Li ne Graph . . . . . . . . ... . . . ..... . ...... . ..... . ..... . . . .... . . . . . . . . . . . . .................. ..
2.5 Summary o . . . . . . . . . . . . . . . . . . . . . o o .. . . o o o. . . o o o o o . o o o o o o. o o o o o o o o. o o o o o o o o o .... ... o o o o. o o o....... '\!

Inferential Statistics
3_1 .....



3.2 o

32_1 . . -.

.. . . 65
3 2 2 ...,,
. .

3.3 .. c-,.





3.3.1 -.d c.-,

.. . ... 7


3 3 3 AnalySI v..

. -..o.v.... 7
3 4 v..un O A .

77 .

[ j [j c,,...
/,...i:... I
\1:::,\'1L;' | ,



' and c.-.v
84 |1 Q\11|

\ o.v....
MI.1-l1l` A ..


_ _|' Rank c.

... . ..

. . 88
|! 1r1 nan' s

,\ '+
+ .
v,sur ./......-.v...


. . . . . . ......... .. .
.\ '+ Z


. ... 98


. 1g:



-..c,...... . . .

I .



11 o

...-.- ..

111 v,


v,o.. -
3.7 c.--.,

c.,--..c.... . . 118
1 1 9
1 22

11I1|1i'I+H. illHlk 011 statistics. A thOugh many ol thcm arc

i n| rodu tory, till ol kn cover a lot or sta| i stica| groundrcsul ting i n
massi vc vo| uucs. Tus tc.thuuk uas only | 2o pagcs anddocsnot havc
s|a| i sti ca| theory as its main | ucmc l nstcad, Statistical Tools intcnds Ior
studcuts in the social sci.:nces to hccomc Iami | i ar vith commonly uscd
statistical applications.
Quanti ta| i vc da|a analysis is common practicc in thc socia| scicnccs
and knov|cdgcabouts|ati s|i cs i sthcrcIorc csscntia| . This,hovcvcr,docs
not ncccssar| ly mcan that studcnts nccd advanccd knov|cdgc o1 mathc-
matics. Wc think it i smorc important to havc a thorough undcrstandi ng
oIthcpractica|app| i cationsandthci ntcrprctati on o|thcstati sticalrcsul ts.
Conscqucnt|y, no mathcmatica| knov| cdgc i s rcui rcd to undcrstand thc
A| | statistica| app| icati onsarccxcmp| i1cdusi ng data Iromcurrcnt rc-
scarch i nthc socia| scicnccs. Duc to i ts popu|arity. vc usc |hccompu|cr
program SPSS to producca|l stat| stica| outcomcs.
Wc vou| dl ikc to cxprcss ourgratitudc to Rob li si nga, |c|l |cl l i n`_
^an Di rk dc GraaI, Ariana !ccd, and Pccr SchccpCrs for providing a l |
rclcvant stat

stica|datacol | cctcd i nthc Ncthcr|ands during 1979 2005.

RcnscNi cuvcnhuis grcatly hcl pcd in trans|ating thc ori g| nal lu| cut.:xt
book Statistiek als hulpmiddel and hclpcd buil d thc suppor| ing w:bsitl:
vvv. ru n| /mt/statisti cs/homc Spccia|thanksa| so toMa|thcv Bcnnc| | l(>r
corrccti ng our i nitia| manuscripts andIorproviding indispcnsah|c advi cc
about(OxIord)lngl ishusagc
Wc cxtcnd spcci a| thanks to our studcnts Irom Radboud nivcrsity
Nmcgcn, vho contributcd to thc i mprovcmcnt oIour | ccturc matcria|s
ovcrthc|asttcnycarsthatnov1ndthcmscl vcsbundlcdhcrc.
li nal |y,vcvould | ikctothankHansSchmcctsandPccrSchccpcrsIor
thcircIIortsi nmakingStatistical Tools comca| i vc.
Manfred te Grotenhuis
Theo van der Weegen
Radboud University Nimegen, The Netherlands
tool (too I)
l A ttVcc such as 1 s:11v. >t 111 l :uilI I:I\` III:HHI:il or mt:clianical vtIk
, A .n:Iiiiv
ii:I -1 n lnlh,. Hs,d I tuI .mu shuc
l. ScHHl'lliiHg .tj.ut ns 11 I\ | " IJi I`1!1 1!1) "H| "' Cllll''s proll:ssion
Suut +' h|I 111111 /Ii /i+1 /lol/1/111'1'1111///un/
llSlCl 1OOlS
Statistica| scicncc comcs i na| | shapcs and Ioms. Noncthc|css,it is oIIcn
associatcd vith thc morc comp|cx aspccts, |ikc probabi | ity thcory. As a
conscqucncc, pcop|coIcnthink oIstatistics as somcthing quitc diHcu| t.
lor studcnts in thc c| d oIthc socia| scicnccs, (c. g. , anthropo| ogy and
socio|ogy), statistica| knov|cdgc i s typica| | y not an cnd i n itsc| I, but a
practica| mcans to hc|pansvcrrcscarch qucstions. 1hcrcIorc, it docs not
makcmuch scnsc to tcach thcsc studcnts hov todcrivcvariouscomp| cx
Iormu| as orto tcach thc Indamcnta|s oIstatistics atgrcat |cngth. 1hcrc
havc bccn (and sti | | arc) courscs i n statisticsthat Iocus on thc Iundamcn-
ta| s. Hovcvcr, as a rcsu|t, studcnts may attain a dccpcrundcrstanding ol
statistica| thcory, but| ackthcabi |ityt oapp|y thi sknov|cdgcin a prac| . -
ca| rcscarchsctting, ana|ogous to rccciving a drivcr` s | iccnsc Iorocmou-
stratingcompctcncci nrcpairingagcarbox.
1hus, Statistical Tools docs not Iocus on comp|cx statistica| |hcory
(but intcrcstcd rcadcrs can nd additiona| i nIormation in our cndno|cs),
butonthcpractica| app|icabi | ityoIstati sti cs. Isingdatascts Iromrcccn|
rcscarch, vci | |ustratchovstatistics canbcan i ndispcnsab| ctoo|in socia|
scicncc rcscarch. Thus, our main goa| i snotto provi dc studcnts vith cx-
haustivcstatistica|knov|cdgc, butvcdo hopcthatthis book contributcs
to thc propcr usc oIa varicty oIstatistica| too| s that hc|p in ansvcring
qucstions arisingIromthcrcscarchproccss.
Chaptcr onc discusscsquantitativc datathatarc oIIcn co| |cctcd usingran-
dom samp|cs. Si ncc many data co| | cctions arc avai | ab| c through thc
| ntcrnct,ashortovcrvicv i s givcntovhcrcthcscdatacanbcIound.
Chaptcrtvo covc|s impot ant |op. cs ondcscriptivc statistics,Iocusing
ou hov |argc quant i t ies o|ua| . | cau oc sumHariZcd in a concisc manncr.
Thcsc summa|. cs cau | uo. . c |ap| . . ca ' | y us . ug charts, such as a bar
chart o|h . sl og|au, . . | /o .. . . . . . . . a ' ' v . s . j | . casu|cs | i |c |hc mcan and
t hc sl auua|u . |v. . . | . . . . | . . j ..t . . | |s | . o. . . s. . . a| sc. cucc rcscarch,
va|. ouswayso| . ov o. . j ..
: |
. : .t111111t:11y o o . ' . | w | | |c i l l ust rat ed.
|U '-|III 111 |+l
Chapt er t hree deals wit h inkn:rlli;d st;tt ist ics, providi ng answers on hov
|o drav concl usi ons about a popul at i on when on| yinIormationon a sma| |
partoI|ha| popu| a| ion ( a sampl e) is avai l abl e. Rc| ativc|y simp| ctcsts on
proportionsand mcansarc d i scussed a| ongsidc morccomp| x tcstsbascd
on rcgrcssionana| ysi s. Evenso, the propcrusc oIthc statistica|tc
sts, and
thc corrcct i ntcrprctation oIthc outcomcs rcmains thc IocusoIthis chap-
tcr. Thc various scts oIdata uscd throughout this book, arc avai| abl c as
SPSSI | csona spccia| vcbpagc:http://vvv. ru. nl/mt/statistics/homc. To
Iaci | itatc thc usc oIthcsc | cs, vcitalicize a| | variab|cnamcsin thc tcxt.
A| l cxcrciscs that rc| atc to thc statistical topics di scusscd throughoutthis
bookcana|so bcIoundonourvcbpagc.
|or rcadcrs intcrcstcd in morc dctai | cd and oItcn morc tcchnica| -
background i nIormation, vc providc various cndnatcs. Morc ad
statistica| appl ications vhich arc rc|cvant but not di scusscd U this book
and | inks to rc|cvan| | itcraturc Ior Iu1hcr rcading can a| so bc Iound on
Tab| csoIprobabilitydi stributions,oItcn Ioundin statistica|tcxtbooks,
arc not inc| udcd hcrc Ior rcasons oIspacc. Instcad, proccdurcs to ca|cu-
|atcthcscprobabi l itics using statistica| soItvarc (SPSS)arcalsoprovidcd
Si ncc the locus i son practica| statistica|app|ications, vccannotgovith-
out a proper t ool box, that is, a statistical computcrprogram. Wcdccidcd
to use SPSS ( origi nal | y. 5tatistica| Packagc Ior thc 5oci a| 5cicnccs, scc
www.spss.com) . Thi sprogram is oItcnuscdIortcachingstatistics ducto
i t s uscr-fi i endl y in|crIacc. Sincc our goa| is to vritc an aIIordab|c and
;1cccssi bl e book, no cxpl anation oIthc usc oISPSS itsc| Iis givcn. lor
t his, various books arc al rcady avai | abl c (|or Dutch rcadcrs vc rcIcr to
| wo books pub|ishcd by Van Corcum: Basiscursus SPSS and SPSS met
1o app| y statistics, onc nccds data that Iu| | l ccrtai n rcquircmcnts. One
important rcquircmcnt is that thc data must bciiumcricaI, vhich mcans
that a| | inIormation is cxprcsscd in numbcrs. OIcoursc, inIorma| i on is
oItcn cxprcsscd in vords (oItcn rcIcrrcd to as ` a|phanumcrica| `), bu| i t
has to bc transIormcd intonumcrica| i nIormationbcIorc sta| ist i cal proc -
durcs can bc appIicd. NumcricaI data arc oIIcn storcd in a .lpremll'/w('/
(scc|igurc l . l ). Ccncra| | y, thc rovs oIa sprcadshcct rcplesen| th units
oI ana|ysis. Conc|usions dravn Irom statistica| ana| yscs rci'cr | t l | vst
units. In socia| scicncc, thc units oIanaIysi sarc oI|cn people l 'rvsp1111
dcnts ` . Thc co| umnsoI thcsprcadshcctcontainthc variables, whi '!\ t.J1\
inIormationaboutthc units oIana|ysis. Shou| d|hcsc unit s rcpr svttt 1rrn
pIc, charactcristics such as sex, year of birth, ed11cotiou, iutn"''' :11111
marital status arc typical | y rcprcscntcd i n |hc da|a and ar l'OIII III urrly
uscd variablcs Ior statistical ana|yscs. 1hc numcrica| val ues ol' lhv Y 111
ablcs arc rccordcd in thc ccl l s oIthc sprcadshcc|. Figu rL | . ! siHII.' :111
SPSS-sprcadshcct vith inIormation on si x variab| cs fiom t ltr L 1\'spnrr
Eile !d :ie'-N Qata Iranslorm tnalyze Qraphs !ilies Add-Qns ndow _elp

88 UOW Vari blcV|ov
2 2 !'.

!'t\\IO! | |'|JLUSOl |fC0U\

l li''il'"' |
A va|. ah| c mcsu| s 1 s | c . | | . | . .c| c|. s| . co|l hc u u. ls auJ uo' Js va|. -
ous va| ucs. |o| cxa| | |, a | | ll'SjHIIId . | | s uavc a spcci c agc and a spc-
ci1c | cvc| oIcducal . ou. I lc1 tcr:tll y, || | c|c .s a | ol oIvariationamongthcsc
charactcristics, Ior . us| aucc, |.souJcul s` agcs may Ia| l tctvccn l o and
?O ycars old. Mosl va|. ah| cs |vc a ' . m. | cJ scl olcatcgorics to c|assily
thc units oIana|ysi s. Thcsc ca| cgo|. cs a|c . Jcul . | cd through uni quc nu-
mcrica| codcsi nthc sprcaJshccl . | o|cxamp| c, lhc variat lc marital state
in ligurc l | has Iourcatcgor. cs l ha| a|c codcd | , 2, 3, and 4. Typi ca|ly,
i nIormationrcgardingthc mcaning olcodcs cautcIoundi nthcdatasct` s
accompanyingcodctook,tut i sa| so ollcu louud i nthc dataIi l citsc|l To
i | |ustratc thc |attcr, |igurc | . 2 shovs thc codcs Ior thc variatl c marital
state, vhich rcprcscnt ' ^ot marricd` (codc l ), ' Marri cd` (codc 2), ` D. -
vorccd` (codc3) , and ' Wi dov/Widovcr`(codc4).
_|e _mu ]=o.iom _o=|,:e _u(o |||||e. ~dd_+s do _e|p
Figure 1 .2 SPSS Variable View (upper panel) and Value Labels (lower
s| . | s| . cs, V:ll'l:thkn 1'/111 | I .tll')'.llllll'd . u| o ouc o|| | | c | ' | ow. ug |\ Is
Nom ua|
O|J. ua |
lul c|va|
|al . o

a/ variables rcprcscnt thc |ovcst mcasurcmcnt | cvcl Thc calcgo-

cs o| l hcsc variat| csarcon| ydi stinguishcdtythcirnamcs. Thc uumc|. -
.a| coJcs rpcscnti n_Ihc catcgori

cs can thcrcIorctcchoscn arti |rar. | y

as | ong as thcy arc on| y associatcd vith onc catcgory. An cxamp| o| a
uom. ua | variatlc is marital state: it has di IIcrcnt catcgorics vi thout auy
| og1 caI ordcr(unmarricd pcop| carc not ' |css ` inanyrcspcctthan ma||. cJ
cop| c) . Thc | ack oIordcring mcans that thc artitrary coding . u |. gu|c
| . 2 can tc changcd to o (lnmarri cd), | (Marricd), 4 ( Divorccd) , and -6
( W| dov/Widovcr)vithoutchangingthc mcaningoIthc catcgor. cs
Ordinal variables cannottccodcd artitraril y,tccausc lhcca| cpo|. s
a|c |au | ordcrcd. lorcxamp|c,thcvariatl ceducational/eve/ . so|J. ua | . | s
. |

. sassumcdthatt hcvarious |cvcl scantcordcrcdaccording| o| uc kvd

o' |no

Icdgc that has tccn attaincdty thc rcspondcnt Gcuc|.t | l , tl11

| owcsl knov|cdgcl cvc| hastccnattaincdty pcop| cvithou | yc|c u. . | u
scuoo| cducation, a highcr| cvcl i s ottaincdty co| | cgc sl uJcul s, . . J | | . c
u. ghcsl | cvc| s oI knov|cdgc is attaincd at univcrsity. To cx |css ' | i tx
|au|. ug, thc codcs oIan ordina| variatl c must tc in asccuJ. ug 01 J.
sccudmg ordcr. Educational level may tc codcd ' L|cmcn|ary` |, ' I o'
|gc=3, and ' !nivcrsity` =4 Hovcvcr, ' L| cmcntary` =O, 'Co| | cgc` X, at J
Iuivcrsity` l is cquiva| cnttccausc thc rank ordcr rcma. us l hc sau| c.
1ui s c| car|y shovs that thc scug tctvccn sutscqucntva| ucs o1uu o-
J. ua| variatl cis arti trary This is not aprot| cm Iorordina| va|. ah| cs sucu
| s cducationa| | cvc| as thc xact cxtcnt oIthc knov| cdgc incrcasc w. | u
cach incrcascoI cducationa|| cvcl i s unknovn.
ln contrast, il}Jqva! vari
bles havc ca.t knovn di IIcrcnccs (or i ulc-
va | s) hctvccn sutscqucnt catcgori cs. An cxamp| coIan intcrva| va|i ah| c
s Fear ofbirth. Thccatcgorics oIthisvariatlcarcrank ordcrcd. l hcmo|c
|cccul onc` s ycar oIti rth, thc youngcr thatpcrson i s. But cruci a| | y, l hc
Iu| crva | stctvccn sutscqucnt agc catcgorics a| | havc thc samc d. sl aucc
| scc |. gu|c | 3) . | n | hi s cxamp| c. | hc di1 Icrcncc tctvccn tvo ad| accu|
o| || h coho|l sa| vays |cp|cscuscx.c| |yo11e yca|.
1967 T:|I! l'!lIl 1071 1972
111 l Iql| |

c| i l . | v. l ls l'!lll H' i lw 11\I IIJl:llnl: peop' t ||om |ll were hon
30 yca|s earlier tl1:111 Jll'uplt hollll 111 I') 10, wuicu a' so uo' ds |o| pcop'
hon in 1970 aud _000, 111 1 11(,() n11o |11
110. Iu| c|va | variat| csdo not havc
an ahso| ul czc|o va ' u . Tl1v v:11 1 1h| l'c'rll nfhirt!J is a good cxamp| ctc-
causcthc Wcs| cn ca' eudar IISL'S til l hi rlh o||u| . sl as its starting point.
Si ncc this zcro poi nl is arbitr:uy, i t . . . caus | ual pop| c torn a hundrcd
ycars BChavc a ycar oIh. |l uo| - |00. As a conscqucncc, thc ca|cu|ation
oIratios i s not mcani nglu| . lo| . us| aucc, wc cannot statc that a pcrson
tom in thc ycar l OOO vas hon l w. cc as ca|' y in hi story comparcd to a
pcrsontorn i nthcycar 2OOO.Thi sa| so uo' dsIo|thcvariablctemperature
measured in degrees Celsius. An otcc l vi|h atcmpcraturc oI3O dcgrccs
Cclsius is not tvicc as hot as an otcc| that has a tcmpcraturc oI I5 dc-
grccs Cclsius bccausc O dcgrcc Cc| si us is notthc atso| utc zcro va| uc ( i n
Iact-2?3. l 5 dcgrccsCclsiusorOKc| vi ni sthcj atso| utc|zcrova| uc).
Ratio variables havcrank ordcrcdcatcgorics, cqua|distanccstctvccn
catcgorics, and an abso| utc non-arti trary zcro va| uc. As a conscqucncc,

rati o` scantcca|cu| atcd mcaningIul| y. Thc temperature measured in de-

grees Kelvin i s an cxamp| c oIa rati ovariab| c. A tcmpcraturc oI4OO dc-
grccs Kc| vi n is cxact|y tvicc as hi gh (hot) as a tcmpcraturc oI2OO dc-
grccs Kc| vi n. Thc samc holds Ior thc variat| c age: Iorty ycar o|ds arc
cxactly Iour ti mcs as o| d as tcn ycar o| ds. Hovcvcr, i nmany socia| sci-
cncc app|ications thc di IIcrcncc bctvccn intcrva| and rati o variat| cs i s
irrclcvantasIcvca|cu|ationsrc| yonanatso| utczcro-va| uc.
Dichotomous variables arc a spccia| catcgory oIvari ab| cs. Thcsc a| -
vays havc cx act| tvo atcgorics, | i kc thc variat |c sex, Ior cxamp| c.
Thcsc variat |cs a|!ov thc rcscarchcr to rank otscrvations i n tcrms oI
prcscncc/atscncc(orycs/no).lorthcvariat| csex, rcspondcntsarcIcma|c
or thcy arc not ( i . c. , ma| c). In addition to thi s, i t bccomcs irrc|cvant
vhc|hcr or not thcrc arc cqua| intcrva| stctvccn a| | catcgoricstccausc
|hcr is on| y onc intcrva| . ThcrcIorc, mathcmatica| | y, a dichotomous
variat|c has thc samc charactcristics as an intcrval variab| c. In | atcr scc-
t i ons vc vi | | i | |ustratc thc conscqucnccs oIthc i ntcrva| charactcr oIdi-
chotomous(or' dummy` ) variab| cs.
Thc | cvc| s oImcasurcmcnt arc rc|cvant i n tvo vays. lirst| y, cach
mcasurcmcnt | cvcl i s associatcd vi th spcci Ic statistica| tcchniqucs. So,
oncc thc mcasurcmcnt | cvc| i s Inovn, vc knov a| so vhi ch tcchniqucs
arc Icasit| cand vhich arc not. Inthc casc oInomi na| variat| cs thc Irc-
qucncy oIoccurrcncc (i . c. , thc tota| numtcr oIunits oIobscrvation) i n
cach catcgorycantcdctcrmincd and stati stics arc thcn rcstrictcdtoana-
|yzingdataas counts and pcrccntagcs. Thccatcgoricso!ordiua| variat|cs
can tc rankcd, Ior cxamp| c, l|om |cw` | o ' mauy` . Iu | al c| scct i ons vc
vi | | shov uow somc s| :i| . s| . ca| l ccuu. qucs makL l iSL' !ll1i.- . . u|. ug, s ucu
:)it iIi:.Iit:l |i....1
as | uc ca| 1d: l l ion PI 111 llll'd1:111. 'co|cs ou . u| c|va' va|. ao' cs cau oc
addcd a ud sul,tractnl, 1 1 1ak in il poss. h| cl oca| cu| al c| uc mcau sco|c. As
d. scusscd hc|o|c,| ' . s o|csH a |a| . ovariat|ccauhcd. v. dcd | oca' cu ' a| c
|al . os.
Sccond| y, tuchierarchy i n | cvc| soImcasurcmcnti sdci sivcinchoos-
ingthcappropriatc statistica|tcchniquc Iromthcmu|titudcoI|ccuu. qucs.
l Ia variatlc oIi ntcrcst docsnotposscssthcrcqui rcd | cvc| oImcasurc-
mcntnccdcd Ioraparticu| arstatistica| tcchniquc, thcnthistcchniquccan-
notbc app| i cd. LiIcvisc, i Ia tcchniquc i sapp| i catlc to a parti cu| arvari-
at| c, i tgcncra| | ya|soapplics to variatlcs mcasurcdathighcr | cvc' s. Tuc
hicrarchica| ranki ng oIthc | cvc| oImcasurcmcnt Irom |ov to hi gh is
nomi na| , ordinal, i ntcrva| , and rati o. Thus, tcchniqucs that ar sui l cd l(x
nomina| variat| cs can a| so tc uscd Ior ordi na| , intcrva| , and |at . o va|. -
at| cs.
Thcstmingpoi nti nscicnticrcscarchi sa|vaysl uc|csc. . |c ' . qustHIII 1 1
cxp| icatcs thc sutcctoIrcscarchanddcncslucuu. | so|aua ' s s. I kl'il
siona| ly, thc rcscarchqucstioni shi gh| yspcci1c aud/o| the taql'l p11p1d 1
tion is vcry sma|l . In thcsc (rathcrrarc) cascsall uu. | s wi thi n lhl' 1H1111d 1
tion can tc samp| cd. Consi dcr, Ior cxamp| c, a |csca|cu projn l till 1 1ty
counci | s in a spcciIccity, oron a| | Ircmcn in somcLou1 1 l y. 1:u1 l | . k11 1 d
oIrcscarch qucstion, no practica| protlcms ari sc |cga|diug |il L' sl'i riHlll
oIthc uni ts and conc|usionstascd on thcsc otscrval ious do HI IlL' o 1 1l
tc gcncra| izcdto a| argcrpopu| ation. Rathcr, dcsc|i pl i vcs| al is | . . d:11 t : l ly
scs arc suIcicnti nthcsc sccnarios. Typi ca| | y, hovcvcr, uo| a | ' uu. | s a11
tc i nc| udcd i nthc rcscarch, and a sc|cction is rcquircd l n l u. s casc, til:
iuportant qucstion is to vhat cxtcnt thc sc|cctcd units arc val id o| |cp|c-
scntativcoI thccntircpopulation.
A random sample is rcquircd i n ordcr to gnci

l ic fi ndings | o a
popu|ation tascd on a | imitcd numbcr oI sc|cctcd units. Thi s samp| c
compriscs a rc| ativc|y sma| | part oIthc cntirc popu|ati on. l n thc Wcscn
vor|d, scvcra| organizati ons (c. g. , Thc Wor|d Va| uc Survcy Ncl wo||
( vvvvor| dva|ucssu|vcy. o|g) ) |cgu| a| ' y intrvicv a ' 8rg samp| c o|
pcop| conva|ious | op. cs, sucuaspo| . | . ca' vol inghcuav. o|. Tucsc samp| cs
o'Jcu comp|isc o| scvc|a | l ' ious. . . . os o!. uoiv. dua| s ( o| cu |clc||cd | o as
' |cspoudcul s` i rrtll Wll i dl 1 wi lk SJll'\' 1 1111 1 1 o|dal . | . s co| ' cc| cd. As wi' '
hc suowu i u h: l pl l'l I, :t t:u1dn111 s:unpk a | ' ows |csca|cuc|s lo make
s| a| . mcu| s ; 1 h011l 1 1 1 1' l'lllltl' l"l
liiLIIIIIII l'11 Ill val i d, 111 s gcuc|. d sl a| c-
l lJU1 l srequi rl' till': lllllph 111 | 11 11111 li'JIII"I'Iilnllilll tH1JI ptlpul:lti1llt. | l
| nuui 1
is ol' ten sai d t hat a sample should he ( surlic icnt|y) refJI'C.I'C'IIIotil'e, wh ich
means that thc sampl cshould possess thc samc kcy charact eri st i cs of t he
targct popu|ation. A random samplccomprisinga Icma|c-to-malcratiool
l . 5 i snotrcprcscntativcas i nmostpopu|ationsthcratioi sc|oscrto l . I.
Simple.r.dom s(J g is acommonlyuscdstratcgy to obtain arcp-
rcscntativc samp|coIthcpopu|ati on. | n thi ssamp| i ngproccdurc, rcspon-
dcntsarc choscn random|y Iromthctargctpopu|ationand a|| rcspondcnts
havc thc samc probabi |ity oIbcing sc|cctcd. Simp|c random samp|ing i s
l ikc scl cctingnamctags Iroma baskctbya b| indIo| dcdpcrson.1oavoi d
non-random di stortion ( ' bi as` ), thc tags arc mixcd-up thoroughly bcIorc
Stratijed random sampling_is a stratcgy typi ca| l yappl icd vhcrc units
arc not di rcctIy scIcctcd
at random but arc Irst groupcd into catcgorics

(ca]lcd ' strata` ) Irom vhich indcpcndcnt

random samplcs arc dravn i na

sccond stagc I na simp|crandom samp|c thcrc is sti | l thc possibi| ity that
thc agc distribution or thc ratio bctvccn Iorcigncrs and nativcs i n thc
samplc di IIcr substantial |y Irom that in thcpopu|ation. 1hiscan bcprob-
lcmatic iIthc rcscarch qucstion is about nati ona|ism, as a biascd samplc
can potcnti al | ycndangcroutcomcs. 1o prcvcntthi s Irom happcning, thc
popu|ation is typi ca|| ygroupcd into di IIcrcntcatcgorics bascd onagc and
country oIbi rth, cal|cdstrata. lo| |ovingthi sstratication, si mp| c i ndc-
pcndcntrandom s
mp|cs arc dravn Iomcach stratum or a combination
oIstrata ( i . c. , young Iorcigncrs) Lsua|| y, thcsc samp|cs arc dravnpro-
portional|y tothatoIthctotal population So, iIthc population has a Ic-
ma| c-to-mal c ratio oI 55-to-45, thcn approximatc|y 55 oI samp|cd rc-
spondents shoul d bc Icma|cs. 1hc strati cd samp|capproximatcspcrIcct
r presentati on oIthcpopu| ationand itscharactcristics,suchas agc, coun-
t ry ol birth, sex, andmaritalstatus.
Alllltistoge sampling is a proccdurc that uti l izcs onc or morc random
p1 L' selecti ons |rom vhich a simplc random or a strati cd samplc i s
drawn lat er, at a sccond stagc. 1his samp| i ng tcchniquc is considcrab|y
cost cI|ctivccomparcdto a si mpl crandomsamp| c. 1his i sbccauscsi m-
pl e random samp| ingdravs Irom thc cntirc population, rcquiring thc rc-
cru i t ment o| intcrvicvcrs Irom al l ovcr thc country or thc travc| l ing oI
l ong d i stanccstoconductintcrvicvs,bothoIvhi ch canprovccxpcnsivc.
|urthcrmorc, simplc random sampl i ngrcquircsa | ist oIa|| pcop|c i nthc
popul ation (thc ' sampl ing Iramc` ), vhich i s di Icu| t to acquirc i nmany
countrics duc to privacy |cgi sl ation. lirstly, in mul tistagc sampl ing i t i s
morc cIcicntt osamp|c among communitics (vhich mightbcstratiIcd
accordingto dcgrcc oIurbanization). Sccondly,thc scl cctcd communitics
arcthcn rcqucstcdto providca samp|c oIinhabi tants from thcircommu-
ni tydatabasc(again, possibly strati cd,for i nstance by age and sex).
I 11 II|: I1' l
lksides L`\l l+`IiJIl1 |li` 1111111vcs undcrlying mu l t istage sampli ng
can be rcsc;1rcl1 i! |VL' 1l L':;nnplc, il' the research is about soci al net
vorks, OI1\U O1 |!\L' |L'[1|1I\\!l\l in t he sampl e have t o bcrel at ed t O ot her
respondent s. Likcwi sc, suppose t hat a researchcrvants to invcsti gatc thc
extent peopl e choose their partncrs on thc basi s oIsocia| charactcristics,
such as cducational attainmcnt. 1his rcquircs a randomprc-sc|cction at
thc houschold l cvcl , vhcrc both partncrs vi thi na sampl cdhouscholdarc
subscqucntly intcrvicvcd. A disadvantagc oImul tistagc samp| i ng is that
rcspondcntsarc notsc|cctcdindcpcndcnt|y Irom cach othcr. With rcspcct
to houscho|ds, this mcans that intcrvicving thc hcad oIthc houscho| d
automatica| | y rcquircs that hi s/hcr partncr i s al so intcrvicvcd as vcl l .
1his, oIcoursc, i scxactly vhat (among othcrthings) i s nccdcdt odctcr-
mincvhcthcr pcoplcchooscpartncrs bccausc thcy sharcthc samccduca-
tiona| background (knovn as educational homogamy i n thi s rcscarch
arca) Hovcvcr, thi s al so mcans that intcrdcpcndcncy vithin houschol ds
has to bc takcn into account (scc scction3. 3. l ). Statistical programs that
accountIorintcrdcpcndcnccbctvccnunitsarc knovnas ' MixcdModcls`
andcanbc pcrIormcd usi ng SPSS (vvv. spss com), thc popular V|vi !
program (vvv. cmm bri sto| . ac. uk), and thc Irccvarc package |

1hcrcarcIourcommonl yuscdmcthodstocollect st at i st i cal data:
In a _urvey, data arc collcctcd Irom a |argc numbcr oI(prcIcrab|y) ran-
dom|y sc|cctcd rcspondcnts. lor (PhD) studcnts and rcscarchcrs i n gcn-
cra|, i t is c|osc to impossib|c to carry out a survcy indcpcndcnt|y, cspc-
cia| |y iIa l argc samplc i srcquircd. 1hcrcIorc, gcncral l y only spccia| izcd
rcscarch i nstitutcs, univcrsitics, andgovcrnmcnt agcncics coll cct stati sti-
ca| data using|argc-scal csurvcys,i nvhichscvcralrcscarchcrscontributc
to thc qucstionnairc. An cxamp|c i s thc Dutch SOCO procct
(vvv ru. n| /socio| ogy/rcscarch/socon), in vhich rcscarchcrsIromthcdi s-
ci pl incs oI psychology, sociol ogy, and communication sci cncc at thc
Radboud Ini vcrsity !i mcgcn (^cthcr|ands) intcrvicv l , 5OO Dutch rc-
spondcntsevery 5 years about a wi de array of subcct s includingrcl i gi on,
mcdiausage, a l l i ludcs towards (et hni c) minori t i es, and proIcssions.
| | | . l l i i pl i ll
/c'XJWrilllell/.' an.: | | | St:l' OII d v. | y o| co| | cct . ug Ja| a i ll wl 1 i ll |cspou-
dent s a|c |auoou| yass| 1 1 d to |oups. o| p|ccxi s| . ng g|oupsa|c usco. | n
c| assi ccxp|. mcu|s, | wo |oupsc. s| . l hctrcatmcntgroup vhorcccivc a
sti mul us` and a compa|. soug|oupwho do not (rcIcrrcdto as thc control
group). In arcccntcxamp| c, cmp| oyccscommutingbycarvcrcrandomly
assigncd to a trcatmcnt o| a coul |ol group Thc cmployccs vi thi n thc
trcatmcnt group vcrc askcd to commutc by bicyclc instcad oIby car (i n
thi scxamplc thc stimulus i s|hc bicyclc, rcsulting i nmorcphysicalcxcr-
ci sc) 1hc cmployccs i nthc control group continucd commuting by car
AItcr six months thc physical condition oIthc cmpl oyccsvas comparcd
to thcir physical condition at thc bcginning oI thc cxpcrimcnt. Thc cx-
pcrimcntal rcsults suggcst that thc physical condi ti on oIthc bi kc com-
mutcrs improvcd signicantly and thcy al so rcportcd Icvcrbouts oIi l l -
ncsscomparcdtopartici pants i nthccontrolgroup.
Observation is a rcl ati vcl yl abor intcnsi vc mcthod Ior collcctingdata
This data collcction mcthod rcquircs rcscarchcrs to bccomc part oIthc
group undcr invcsti gation (participant observation). Altcmativcly, rc-
scarchcrscan rcIrai nIromIul l participation,thusmi nimal1zingthcir l cvcl
oIi nucncconthoscundcrinvcstigation(unobtrusive observation). Both
obscrvational stratcgicsuti lizcthcnaturalcnvi ronmcntoIthcparticipants
bcing studi cd lor cxamplc, participant obscrvation i s uscd in cul tural
anthropology,vhcrc rcscarchcrs study (sub-)culturcs by mcans oIactual
parti ci pation, and unobtrusivc obscrvation is uscd i npsychological stud-
i csthatcxplorcthcintcractionsbctvccnschoolchi l drcn.
rcscarchcr, but comc vi th considcrabl c timc and nancial constraints
Al tcnati vcl y, rcscarchcrs canmakc uscoIthc cnormous amountoIdi gi -
ta| | y storcd statistical data thathasalrcady bccn coll cctcd. lurthcrmorc,
l hcscsecondar data arc vi dcl yavailablcon thc I ntcmct. Thcsc dataarc
routi ncly col l cctcd, oIIcn using hi gh quality random samplcs, and can
al so capturccntircpopulations( i . c. , ccnsuscs). Hcrc is usta shortl i stoI
importantvcbsitcsthatprovidc,orl i nkto, sccondary data.
vvv. cbs. nl/statlinc(dataIromthc!cthcrlands)
vvv dans knav. nl ( idcm)
http. //css nsd. uib. no(luropcanSocial SurvcysIrom2OO2)
http.//cpp.curostat cc. curopa.cu(othcrdataIromluropc)
http.//Iactndcr.ccnsus.gov(ccnsusdatai nLSA)
https.//intcmational . i pums. org/intcmational(ccnsusdata)
vvv mcasurcdhs. com( dcmog|aph i cano hca | t h su|vcys)
http.//ropcrccn|ciuconu. cou/ ( su|vcys. u|'Ai
h|1p.//soc. os. t c. uc| /Ja| abascs pup( worl dwi de l i hr: 1 |y i
h| l p. //'s| apcs. o|p// s2 gi ( wor l dwi dv l i hr : r r y )
h| | p / /s| cs. org/i d:il a/ |sr : 1 1 r l r 1 ' 1 1 ) '
1 1 1 1 ' )
: il 1 1 1 1 1 1 1 1 : d l i ! 1 . 1
| ' |
A| | hougu t i ll: pri 1 1 1 : 1 r l r ll'liS o| | u. s ooo| .s ou dcscript. vc and i n|crcntial
stat t sl r cs, son1 c a t l clll i oJ I w. | | oc pa. otothcquality oIstati sti cal data On
thconc hauo. i t . s o| cua|guco l hal rcsults Iromstati sticalrcscarcharc to
bcv. cwcow. | hs|cpl . c. smbccauscthcdatavcrccol lcctcdi naninappro-
pnatc manncr On thc othcr hand, pcopl coItcn cl ai mthatstatistical out-
comcs should notbc chal l cngcdas thcyarcbascd on ' rcprcscntati vc` rc-
scarchsamplcs 1hctruth, hovcvcr,probabl yl icssomcvhcrci n-bctvccn
cxtrcmcs. Statistical rcscarch cannot provc somcthi ng to bc

c , but t tcn dcmonstratc that oncoption i smorc l ikclythan anothcr

optun, provrdmg somc Iundamcntal condi tions havc bccn mct. Thcsc
conditionspcrtai nto.
Val i di ty
Rcl i abi | i ty
Mi ssingData
Mcasurcmcnt vqlidit rcIcrs to vhcthcramcasurcmcntactual l ymcasurcs
vhat it i ntcnds to mcasurc lor cxamplc, tcachcrs arc taught not to ask
studcnts vhcthcrthcyundcrstanda lccturc` scontcnt OIcoursc any pro-
Icssional tcachcr vants to knov vhcthcr hc or shc succccdcd bu|
ansvcrs tthatparticularqucstionprobablyarc thcproductoIpccrrcs-
surc or st|gma. vho i s vi l l i ng to conIcss not havi ng undcrstood somc-
thing? Sincc Icv studcnts vi l l do so, thc tcachcr appcars to havc suc-

d 1hc qucstion ' Havc I bccn unclcar about ccrtain aspccts?` is Iar
upcnor bccausc thi s timc thc tcachcr` spcrIormancc is bcing cvaluatcd

oIthc tudcnts ` abil i ty to vithstandpccrprcssurc andIorgo stig-

mat|zatun. 1hr scxamplcdcmonstratcsthatqucstionscanmcasurcsomc-
thing qui tc di IIcrcnt to vhat vas intcndcd. 1hcrcIorc, i n rcscarchtcrmi-
nol ogy a di stinction is madc bctvccn val i d and invalid mcasurcmcnts
1hc validityoImasurcmcntsi s oItcndi scusscdanddcIncdviththchcI p
oIcxpcrts andpnorrcscarch. lor cxamplc, di dactical cxpcrts undcrstand
that pccr prcssurc i n a classroom should bc takcn i nto account and vi l l
rcognizcthi sIaulty(inval i d)mcasurc oItcachcr` s pcrIormancc. | n addi -
tion t ocxpcrt cvaluation. thc mcasurcmcnt should rclatc t oothcr mcas-
urcmcnts associatcd w. | h lhc suo| cc| lo|cxamp| c, it might bc cxpcctcd
that studcnts who J ndJ cill c l h: r l a cou|sc .s | oo hcavy a | so i ndi catc that
thcy do uot

l l l l d ' I'Si ; J Id j )i ii'IS o| l h ' l' I H I I'Sl: COil f cnt . | I |hat rc| ati onship
docsuo| CX I SI , Olll' ll l q,lr l l l iiVl' ) '. IHHI Il' : I SOII i ll l j ii CSi i on the vu| . o. ty oI th

qucs| . ou 1 1 1 l llldl' r SI : tlll l l ll) ' l n' l l l l l ' , . . . ll il'lil

l<l 'liul!ili ( l ' .c' . | . s |o l
( . . | ' . . . k . | i . . . . | . . . .. o.o| | ' i | . i .. s. . | . | . . . . | . . . . u `
s . u . |a . c. |c . . . . . s| . | . . . . s A . .
. | . o . . | . . . s . . | . . . . . i | suou' u ( |ou ' . ' y i |c:su' | . u
a s . m ' a . ou| co. i . c . | | . x. . . . |, | | . c . ss. o | . | y | ' i a| a mcasuo mcu| . s uu-
|c' | ah| . uc|cascs wu
| q . . s| . . s . . . . . . su | ' . a' cau hc u| c|p|c|cJ u mu| -
l | pl cways. Supposc| | i . | . s
. a|. . | s wc | as|cJ| oauswc|thisqucstion.
lol | | cs dca' s w. | u | | . c .cu . | o| . o| | |a | ' c ams, vi th mi ni mizing
crimc | cvc| s, auJ w | u | u s| | . . | ' . cu up o| womcu` s | abor markct
partici pation. Do uga ' ' | ua| , | uc povcnucu| cau makc good and bad
dccisions. P|casc | uJ cal c oc' ow wu cu auswc| bcst corrcsponds to
C About I OO oIthcscdccisionsa|cgooJ
C About?5oI thcscdccisionsarc gooJ, 25arc bad
C About5O oIthcscdccisi onsa|c good,5Oarc bad
C About25 oIthcscdccisionsarcgood,?5arcbad
C About l OO oIthcscdccisionsarc bad
1his multi-bgrreed que!! is a bi ascd mcasurcmcnt oIthc pcrcci vcd
qua| ity oIgovcrnmcnt dccisions. 1hc i ndicatcd topics arc vcry divcrsc,
rangingIrom opini onsaddrcssi ng govcmmcnta|dccisionsontraIcams
togovcrnmcnta| dccisions on Icma|c |abormarkct participation. 1hcrc i s
a s| i m chancc that thi s mcasurcmcnt accuratc|y capturcs rcspondcnts`
opi nions aboutthcsame govcrnmcntdccisions, butscparatingthc di IIcr-
cnt typcs oIgovcmmcnt dccisions into di IIcrcnt qucstions vi | | incrcasc
l hc rc| i abi |ity oIthis mcasurcmcnt. | I a rcscarchcr is intcrcstcd in gov-
cnmcntal dcci sionmaking, ansvcrs cou| dsimp|y bc summcdtocrcatca
Ukat scale. 1hissca| c` src| iabi | ity vi | | bchighcrthan that oIcach scpa-
.a| c qucstion. LnrcIiabi |ity undcrmincs va| i dity as vc|| , iIonc docs not
( |oupu | y mcasurc thc samc conccpt cach timc Ior cvcry rcspondcnt, it
uocs uo| |ogica| | y mcasurc thc conccpt itsc| l 1his oIcoursc docs not
. . . c. u l ua|rc| iab|cmcasurcmcntsarca|sova| id,rc| i abi |ity is ustancccs-
s. . |y coud |i on Iorva| i dityandi snotasuIcicntcondition.
| | auJ- n-haud vith rc| iab| c and va| i d data, rcprcscntati vity is a kcy
cua|acl c|| s|ic in statistica| samp| ing. LnIortunatc| y, rcscarchcrs oItcn as-
sumc | uat thc samp|c thcy usc accuratc|y rcprcscnts thc popu|ation an
assumpl | on that oItcn gocs unchcckcd. I Ithc pri ncip| cs oIrandom sam-
p' . ug arc strict|y Io| | ovcd, a |argc samp|c vi | | gcncra| | y bc suIIci cnt|y
|cp|cscuta|i vc. lor instancc, it can bcshovnthat thcratio bctvccn mcn
auJ vomcn i n a random samp|c oIhundrcd individua|s vi | | bc c| osc to
| ua| in thc cntirc popu|ation. Hovcvcr, by shccrchancc. ( i . c. , bad |uck)
Jcv a| | ousIromthc popu|ation cau occu| | u | ucsamp' c. Ccnc|a| | y, this i s
uol vcry p|oh' cma|| c| o| hc gcuc|al. za| ouo|s| . . | . s I c. . | li uu| ugs|ccausca
. |
u. a. . . of | | | | LL' | | . | | l | | V . s ' . . y ' . . |. . . . . . | o . . cco. . . . | ( scc cuap| c|, I 'onji
dence lntcnnls. . , . ' ' i | | . s . | . . . . | . ou . s uo|c c| | . ca| , | uougu, wucu a
samp' c. suca v | y o. . so |y1!1 1 1 1 / '( ',\'fOI ISC', wu cumcausl ual pa|l cu' a| sc|s
o||cspouJc. i | sa| . . o| \ uu| c||cp|cscu|cJ | u lhc samp| c. 1his coa|d oc-
cu| l | u lc|v cwc|s p|cuom uau| ' y v s | | sc| cc|cd rcspondcnts during thc
a | .cnoou as pcop| c wo|| ug h | | -timc vi | | not bc rcachcd. 1hc rcsu|ting
samp|c vi | | uol hc rcprcscntativc oIthc |abor markct and thc ma|c-to-
|cma|cratio may a|so bc distortcd sincc in many socictics morc mcn arc
inIu| |-timccmp|oymcntthanvomcn.
Anothcr sourcc oInonrcsponsc i svhcn rcspondcnts rcIusc to ansvcr ,
parts oI thc qucstionnairc. C| assic nonrcsponsc gcncrators arc qucstions
aboutpo| itica|issucs. Rcscarchsuggcststhatpcop|cvhoarca| i cnatcdor
avcrscto po|iticsarc |css | ikc|y to participatc in po|itica| rcscarch. Con-
scqucnt|y, thc |cvc| oIpo| itica| intcrcst mcasurcd in thc samp| c wi | | bc
ovcrcstimatcd. Bccausc nonrcsponsc can turn cvcn a vc|| dcsigncd ran-
dom samp| cinto a non-rcprcscntativc co||cctionoIrcspondcnts, it is im-
portant to dca| viththi s prob|cmatancar|y stagc. Possib|c stratcgics to
prcvcnt scrious nonrcsponsc i nc|udc spccia| instructions Ior thc | u tc|-
vi cvcrs to dca| vith scnsitivc subccts and cvcntua||y rcvardi ng rcspon-
dcnts initia| | yrcIusingto partici patc. lurthcrmorc, as| i ght|y h| ascd sam-
p|c induccd by modcst amounts oI nonrcsponsc cau hc maJc mo|c
rcprcscntativcby weighting thc samp|c. Hovcvcr, awc| gu| | ug s| |a|cgy s
a|vaysbascd onvariab|csvith vc| |-knovn popu|ation di sl || oul | ous. lu-
Iortunatc| y thcsc arc oItcn not thc variab|cs causing thc s|a|| s|| caI p|oo-
|cms. lorcxamp| c, thcratiobctvccnmcnandvomcu inthcpopu | a| | ou s
oItcn knovn cxact|y, butthc di stributionoIcducationa| |cvc| is no| | cl
a|oncthcdi stri butionoIpo|itica|a|icnati on!
| nadditiontovci ghting,i tispossib|ctotakcintoaccountundcrrcprc-
scntation oIa popu|ation (c. g. , high|y cducatcd pcop|c) using statistica|
contro| s (scc chaptcr 3, Multivariate Analysis, pagc I O l ) . Hovcvcr, sta-
tistica| contro|s andvcighting proccdurcs arc on|y cIIcctivc vhcn thc
high|ycducatcdrcspondcntssamp|cdarcrcprcscntativc oIa| | hi gh|ycdu-
catcdi nthcpopu|ati on.
lina||y, missing data can ncgativc|y i nucncc thc qua| ity oIthc co| -
|cctcd data. p| c, rcspondcnts i nluropc arc askcd about thcir
incomc, thcymay bcrc|uctantto ansvcrbccausc carningsarcconsidcrcd
a privatc mattcr. Conscqucnt|y, it is not surpri si ng t|at a | ot oIi nIona-
tion rcmains mi ssing vhcn |cspondcu|s a|c as|cd |o rcport thcir cxact
incomc. |I thi s |c' uc| aucc |o sua|c |u| o|ma| ou occc|s random|y +mong
|cspoudcu| s, uo| mucu s| . . | . s| ca| ua.u .s uouc. The s| |ua| ou bccomcs
mo|c | |ouo| . up wuc . i | . sp . |. . | s | t . | | . c . | ppc| c| asscs sysl cma| | cal l y
|c|usc | o auswc | ' . q . . | o. | I
o. . s. . | . | | ' y, . | . c ac|apc cs| | ma| cd in-
| .| i . i | | |
comc | u| ucs. . . np|v . do| . . l| | . . . . . i | cu. / po| . i | . . dso| . . | | . | o| u| s
p|oo| cm | s . . o| | o. . s| | | | . . l' \ : I L ' I I II L' OI I H: , ou| | ouavc |cspo . uc. . | s | uu. -

al c l uc| | . ucomc oy : 1 . . . . . . . | . . | | . x u o|oau| y uc| ucu . ucomc ca|cgo-

Ccncra||y. al l cmpl s s | a . . ' o h . . . au' | o l i mi | |hc amount oImi ssi ng
datatothc| ovcs|poss| o|| cv | s. | c . c|. . ' s| |a| cg| cs i ncl udcpropcri ntro-
ductionsto intcrvicvcrs wucus us| | | vcqucs| | ousoccurin qucstionnaircs,
or to havc i ntcrvicvcrs l |a | ucu |o |cac| app|opr. al c' y vhcn rcspondcnts
givc cvasivc ansvcrs or si mply |c |usc |o auswc|l ucqucsti on Hovcvcr,
cvcn vhcn taki ngthcsc prccaul . ous samp| cs may sl | ' l suIIcr Irom mi ss-
ingdata. |ortunatcl y, stati stical l ccuu| qucs ' | |cmultiple data imputation
can bc uscd to rcp|acc mi ssing data, p|ov| ucd |ua| somc spcci c condi -
1hc prcviousscctionsprovidcd abricIi ntroduction tothc conditions 1hat
data must mcct bcIorc thcy can bc IruitIu| | y uscd in stati stica| analyscs
Al l rcscarch c|dsrcquirchigh qua|itydata,butthis is cspcci a| l ytrucoI
scicntic rcscarch. 1hc mcthod oIdata col|cction shou|d c| oscly corrc-
spond to thc goal oIthc rcscarch procct, and thc rcscarchcrs shou|dpro-
vi dc a c|car ovcrvicv oIthc va|i di ty and rc| i abi lity oIthc data, thc sam-
p| c` srcprcscntativity, andthcvays (scrious)missingdataproblcms havc
occu dcalt vi th lurthcrmorc, i t is customary to chcck and corrcct thc
ual a lo| crrors - a proccss rcIcrrcd to as data cleaning. 1his should bc
uouc wc| ' bc|orc prcscnting dcscriptivc statisti cs (scc chaptcr2)ori nIcr-
cu| . a' s| a| i sti cs(scc chaptcr3) 1hcncxtchaptcrout| i ncs various dcscrip-
| . vcs| a| i stica| tool s, i ncl udi ngthoscuscd in thc proccss oIdatac|cani ng

Wucu dcscri bi ng statistical data, tt t not vcry uscI| to dcscribc cvc
uu| | scparatc|y a stratcgy morc c|osc|y tting vith qua| i tativc tcct_
u| qucs such as i n-dcpth intcrvi cvs. Bccausc thc numbcr oIobscrva| i oh

d l
. . . ata scts t rcativcl y largc, adcquatc summari cs oI|uc data arc l llC\
c| ur


hcscsummaricscan |crcprc

cntcdby diag|ms( graph. ct


o v|thstatrstrca| mcasurcs(numcncal ) 1hrschaptcrw| | | | . |s| . . . | |ouuc\ I )
uumhc| oIgraphi cal and numcrical summarics oIa s| ug| c va| | ao' . '
ouu, dcscri ptions oIthc associations bctvccn | wo va|| ao| cs OlL . . |
uuccd lina| |y, thi s chaptcr cnds vi th a schcmal . c ovc|v| cw o| | ' . . o

sc|i pti vcsatisticaltoo| sthat vcrc i ntroduccd

Bar chart
!Jar charts arc oIcn uscd summari zingthc scorcs on nominal and oidi n
variab|cs (scc scction | 2). | n bar charts, thc variab| c` s catcgorics a

placcd on thc horizontal axi s (x-axi s) oIthc chart On thc vcrtica| y-ax
| uc abso|utc or rc|ativc proportion ( i n pcrccntagcs) oI cach catcgory `
shovn lvcry catcgory i srcprcscntcdi nt hcchartbyabar 1hc hci ghts
| ucsc bars is proport ional to thc Ircqucncy oIoccurrcncc 1hc bars ha\f
cqua' w| d|h, vhi l c thcrc is somc spaci ng i n-bctvccn bars 1o cnsu
|cadahi ' i |y oIthc chart, thc numbcr oIcatcgorics shoul dnotbctoo lar

( mauysoIvarc packagcs a| | ov Ior thc cxclusion oIonc or morc catcg

ncs |rom a barchart), al lovingthc bar chart toprovidca c|carpicturc Q
. . ' | counts lor cxamp|c, ligurc 2. l shovs that many rcspondcnts havf
|owc|Vocational School( | 5),Sccondary Vocationa|School(24),Q
to' | cgc(2O)asl uc. | u. gucsl cducational l cvcl , vhcrcas 0 l cvc|sand
| cvc' sa|c c| ca|| y a| | a| ucu ' css( ool u approxi matcly 5). 1hi s i snotsu
p|| s| ugg. vcu| ua| | | . c s.
o. . . . | | ou. . ' ' cvcs. uc' uuc| | u' cvocationaltrai ni ng


g b7

c U7


L. Secondary

L Levels
Lower Vocational
P Levels


Figure 2. 1 Bar Chart for Highest Completed Educational Level
Pie chart
Pie charts providcauscIul al tcrnativctobarcharts. 1hc diagramcontains
a c. |c| c, and cach scgmcnt oIthc circlc rcprcscnts a catcgory. Lach scg-
mcul covcrs an arcathat is proportional to thc Ircqucncy oIoccurrcncc.
l. c cua|l s arc Ircqucntly uscd to shov rcsults i n thc mcdia (c. g. , during
pol i l ica| cl cctions). I n sci cncc, bar charts arc gcncral ly prcIcrrcd instcad
|ccauscthcyarcclcarcrandpcoplcarcl cssl i kcl ytomi s udgcthcpropor-
tions oIcacharcatothccxtcntthatthcy dovhcn cvaluatingpic charts. I I
a pi c chart i s choscn, corrcsponding pcrccntagcs shoul d bc i ncl udc i n
cach sccti on t oavoidmi sconccption (sccligurc2. 2). Pi ccharts arc di I-
cul tt oi ntcrprct vhcn many catcgorics arc rcprcscntcd, cspcci all yvhcn
thcrcarcno catcgorics vi th a high Ircqucncy oIoccurrcncc. In practicc,
thc uscoIpicchartsis l i mi tcdto nominal (and to a lcsscrcxtcntordinal )
variablcs vith a smal l numbcr oIcatcgorics, vhi l c (prcIcrab|y) onl y a
Icv catcgoricsrcprcscntl argcportionsoIal l units,as in li gurc2. 2.
| lu i l pi i V< i l l i i i i , I I L
54. 2%
Wi dow/
WK l ow | ,
'. 8%
Not Married,
29. 4%
Figure 2.2
Pie Chart for Marital State (ercentages incuded)
l l istogram
S ncc i ntcrvalndratiovariablcs gcncral l yhavca largcruumoc|oIca| c-
oncs,a dcscnptionoIthcsc variablcsusinga barchart i spre|cla|l c l oa
p1 c chart. A bar chart, hovcvcr, has spacingbctvccn ad accntcatcgorics
( sec |igurc 2. 1 ) and symbol izcs thc Iactthat thc cxact di stancc bctvccn
: i l l c

atcgorics is unknovn. As statcd bcIorc, thi s is thc casc Ior both

ommal and ordinal variablcs. Hovcvcr, thc subscqucnt intcrvals bc-
| wc

cn catcgoricsi ni ntcrvalandratiovariablcsarc xcd. 1hi scharactcris-

l l c I S accountcd Ior in histograms as thc spacing bctvccn bars i s abscnt.
\hi stograIorthcvariablcage (ratioscalc) isshovn inli gurc2 3 1his
| gurcprovidcsgoodinsi ghtintothcdi stri butionoIthcvariablc vhi ch i s
somcwhathi | | -shapcd. `

2li Cl l ; qJI I Z

1 0
20 30 40 50 60 70
Figure 2.3 Histogramfor Age (range: 18- 69 year, one-year interval
Stem-and-leaf plot
A t:n-oou-lc pl ot is an altcmativcvayoIgraphi cal l yprcscntingvari-
ahlcs mcasurcd at i ntcrval and ratio |cvcl s. Likc a histogram, stcm-and-
| ca |p| ol sgi vc i nIormationaboutthc shapcoIavariabl c` sdistribution. l n
| ucsc p| ol s, a di stinction i s madc bctvccn thc stem and leaf ligurc 2. 4
suowsthcdi stri butionoIthcvccklyvorkinghours. ThcstcmoIthcchart
contains thc rst digit ( ' stcm-vidth=I O` ) and thc lcavcs dcnotc thc scc-
onddigit(vhcrc cvcry lcaIrcprcscnts a sing|c obscrvation ( ' cach l caI. l
casc` )) . 1hc rst rov contains Ivc rcspondcnts vho vork at |cast l O
hours pcrvcck(as i ndicatcd by thc stcm oIl ) 1hc l cavcs i ndicatc hov
many hours cach individual vorks. To i | |ustratc. tvo rcspondcntsvork
l O hours ( l O O), thc othcrthrcc vork l 2 ( | O 2), 1 ( | O + 3), and l 4
( l O + 4) hours, rcspccti vcl y. 1hc s|cm-and- l calp| o| cl carl y shovs that
vorking Iorty hours a vcc| i s mos| r|cqucu| : 42 |cspouJcul s havc a
' ninc-to- h vc` oo. 1hc i u| crva| /ra| | o ch; r ract r o| s' c. . | -auJ-| ca | pl ol s | s
mi rrorcd |yl ucl i uLa| . uc|casc | . di | so' s | . . s, l' V' l l i l " there a|c uooo-
sc|vat . ous a| | acucJ | o t i l l' sl vr 1 1 . Tl t v s l l ' ' " n 1 1 d k: r r pl ul is cspeci a l | y
su. | cJ | o |cp|csc. i ' . . . i n h' l l' : t l l l l l l l l l l l ! d l l . i l r k YY r t | | . ' | | i i . | c| . i . . | i i oc|ol "
. /
oosc|va| . ous. | . | i | jt d: l l ; t set s, | u c |ows vc|y qu. c|l y occomc loo l oug.
1o couu| c| | u. s, s| a| s| i ca l so| tvarc such as SPSS makcs i t possi bl c |or
cacu l ca||o|cp|cscu| mo|cl hauasingl cobscrvation. 1his,hovcvcr,may
rcsult in a s | . gul l y lcss accuratc plot, vhcrc thc di stribution is l css rcad-
ab|c. A morc suitcd graphi cal dcscription oIi ntcrval and ratio vari abl cs
vi th many obscrvations is thc histogram.
Worki ng hours a Week
Stem Width: 1 0
Each Leaf: 1 Case)
c.... c.-- -.
5 1 . 00234
1 0 1 . 5555668889
1 3 2. 00000001 23344
9 2. 566778889
1 9 3. 0000000001 222222222
27 3. 566666666666777888888888888.
42 4. 000000000000000000000000000000000000000000
2 4. 55
7 5. 0000000
2 5. 55
Figure 2.4 Stem-and-Leaf Plotfor Working fours l Wed
Thc prcvious sccti on shovcd hov a mu| titudc oIdata can |c app|op|i -
atcly summarizcd using graphical too|s. !cvcrthc|css, prcscntmg (thc
shapc o{ a distribution i s olcn notthc only ob cctvc. In stati sti

arc al so various vays to numerically cxprcss spccrc charactcnstics oI
thatdi stribution. Thcscnumcrica|dcscriptionsgcncra||yrcl atctothccen
ter and thc variabilit oIa variablc (scc ligurc 2 5) . |or cxampl c, it i s
i nstruct i vct oprcscntboth ccntcrandvariationoIthc agc distributi onnot
onlygraphi ca| | y(scc|i gurc2 3) butal sonumcrical l y.
VH|| HD| | | | y

l gHt' ( 01 ' / / /1 ' / l !l l l l l l l l l l l f , , {, t r /// . / )istrihutiun

Ftc<J ucncy l abl c
Afrequenc y lahle . sa usc | i d: 1 1 1 d vpu|| wayo|uumc|. ca | ' y p|cscul . uga
variablc, i rrcspec| i vc o| | uc kvd o| ucasu|cmcul . | l coula. us a ' ist ol a| |
thcvariab| c` s catcgor. es a| oug w. | u a oso| ul c counts, pcrccntagcs, and i I
ncccssary, val i dpc|ccu|agcs auJcuuu| al . vc pcrccntagcs. 1hc numbcroI
catcgoricsshou|dbc| i mi tcd asa |cqucucyl ao| cvithtcnormorc catcgo-
rics is oItcndi IIcult to rcad 1ab|c 2. isa |rcqucncytab|c oIthc highest
completed educational level (thc samc va|. ah| cthat vas uscd and graphi-
ca| | yprcscntcdcarlicri nli gurc2. |).
Table 2. 6 Frequency Table for Highest Completed Educational Level
Highest Compl eted Counts Percentage Vali d Cumulative
Educational Level Percentage Percentage
Elementary school 90 6. 5 6. 7 6. 7
2 Lower Vocational school 21 5 1 5. 6 1 5. 9 22. 6
3 Lower Secondary school 1 78 1 2. 9 1 3. 2 35. 8
4 Secondary Vocational 334 24. 3 24. 7 60. 5
5 L levels 62 4. 5 4. 6 65. 1
6 A levels 79 5. 7 5. 8 70. 9
7 Col l ege 281 20. 4 20. 8 91 . 7
8 University 1 1 2 8. 1 8. 3 1 00. 0
9 Other educational levels 24 1 . 7
Total 1
375 1 00. 0 1 00. 0
1ab|c 2. o shovs ' 0 l cvcl s` tobc|cast Ircqucnt. oIa| | | , 35rcspondcnts
ou' y2 havccomplctcdthi s|cvc| oIcducation, amountingto4. 5pcrccnt
( ( - 2 I I , 35) * |00). Notc that thc dcnominator i nc| udcs rcspondcnts
|om a|| catcgorics inc| uding ' Othcr lducationa| Lcvc|s` 1o cal cul atc
pcrccntagcs bascd on a| l rcspondcnts vith a c|assiIcd cducational |cvcl
only, thi s ni nth catcgory must bc cxc|udcd (i . c. , dcIncd as a ` missing
va| uc` ) Bccausc thc dcnominator nov i s | ,35 | (24 |css) thc va| i d pcr-
ccntagcs arc s l ight|y hi ghcr. Bascdon cumu| ativc pcrccntagcs, o0. 5 oI
a| | rcspondcnts (= ((0 + 2 | 5 + | o 334) I | ,35 | ) * | 00) havc sccon-
dary vocationa| schoo| or | css. Again, thc 24 rcspondcnts in thc ` othcr`
catcgory arc cxcludcd A Ircqucncy tab| c providcs a lot oIinIormation
andmaybcconIsingtothcrcadcr,cspccial l yi I| argcand/ormanytabl cs
arc prcscntcd. IIthi si sthccasc, graphical |cp|cscntationsarc oItcn morc
suitabl c, vhi | c it is al so possi o| c |o p|cscul rc| cvan| charac|cristics oIa
distributionvi thas . ug| e val ue. These a|c . u | |oJuccJ . u |uc ucxl sccl . ou.
l lt r .cr l pl lvo '`| . | | |n| | i: .! |
The l east comp' . catcd vay oIdcscribingthc ccntcroIa di sl ribul . on w. l u
a sing|c va| uc i sto rcport thc catcgory that has thc hi ghcst |requcnc o|
occu||cucc. 1hisi sca| | cdthcmode. I nli gurc2. l and1ab|c2. | hcmoJc
cqua|s4 vhich i s` SccondaryVocationa| School ` , vhilc in |. gu|c 2. 2 l uc
modcis ` Marricd` (codc -2).
1hcmodc i soItcn uscdvhcn i ncomcdi stributions arc desc|ibcd. | | . s
hi gh| y i nstructivc to knov vhat i ncomc catcgory most wo|| . ug pcop|
|a| l into(a|soknovnasthcmodal income cass). |yJc| u. l . ou, t he uoJc
docs notrcquircanyrankordcroIthcca|cgo|. cs uo|Jocs . | |cqu . | ` c . . . . d
distanccs bctvccn catcgorics Hcncc, | uc moJc cau oc a
l . J | t \ . . . | v
variab|c, al though i t i s typi ca| l y app| . cJ | o uou. ua| v: 1 1 i : hl s ^ J. s. a |
vantagc oI thc modc i s that its va| uc . s souc| . ucs J . u .d| l l' c. | . . . | | .
and i tcanbc rathcr amoi guous| y. |o|cxaup' c, | | . c noJc i 1 1 | | ll ' . . . o
tribution (scc ligurc 2. 3) cau oc 32 auJ 34 as | o' u . . | | . j . | . . i \t t | u
cqua| | yIrcqucnt|y (o2 obscrvationscac u , wu . | . | | | . . a' . . | 1 1 n 1 1 1 1 '
a|mostas Ircqucnt(3oand 42 ho|hoccu| 6 1 l . u si . . . 1 1 nl . . . . | | | o 1 1
a| l i nthcmodc
1hc median dcscribcs anothcr aspcct o| a d. s| | . ou| | o . ` s . . . c , . . . . | . a| v
thc pointatvhich ha|IoIthc total numbcroI observat i o1 1 s s r \ : 1 ( ' 1 1 o | ! \
dctcrminc thc mcdian, thc data mustbc rank orJe|cJ | |s| 1 :o :l li l
l ,
thcrangcoI numbcrs.
| 0, 0, '0, 50, 20, 30, 40, 40, | 0, o0,0,80, 90, 90, 90
is rstrankcdto.
| 0, I 0,20,20,30, 40, 40,o0,0, 0,o0,0, 0,90.
1hc mcdian in thi srankcdrov oInumbcrs i s situatcd at obscrvl | on uo.
8, bccausc thi s i s thc most ccntra| obscrvation (scvcn obscrvations havc
|ovcrnumbcrs and scvcnobscrvationshavchighcrnumbcrs) 1hismcaus
l ha| thc mcdian cqua|s 50. WhcnthcnumbcroIobscrvations i seven, l uc
mcdian | i cs cxact|y bctvccn thc tvo most ccntral obscrvations lor i n-
stancc, i Ithcnum|cr I 00 . s adJcdl ol uc |augc o|uumhc|s . uthccxamp| c
showu aoovc, l uc mcJ. au l ucu occoucs13( | uc u umoc| cxac| | y . u oc-
l wccunumbe|s:0 : 1 1 1 d (10 ) .
I 0, |0. _0, . 0, \ 0, 1 0, 1 0, 0, ( 10, /0, 0, KO, <)( ), <)( ) , 1)0, I 00.
or course, | | |

| | . . . . | | . . . | oh' 1l . V | | . . . . . . . o. . . s. | s i s l ypi l: : dl y | . . | g|cal c|

| uau . u t h . ' X. | | | | j
i | ' ' | l IV1 1 I l l | I I l l . . | 1 1 \ . l ' \ `_ | ' . . llll'l l i ;ll l Cll l occ: i i CI I
i i qI 4
| a| J frt l l l l <l frql i L' I l L' Y l : 1 hk u l' .: l l l l pl ' u ' J ' ahk '.. ( l I l l ' l l lL'd i an s l u
|ou|l ua| cgo|y ( S.: cl l l HI : l l V Vm : l l l t l l l : d ) h.:cause o|a| | I, \ 5 ' va| . Johsc|-
val . ous lhc mosl c.: nt r: t l ohSl'I V: 1 t i o1 1 .s l Hl . 676 ( calLu| a| i on. ( | ,35 | + I ) /
2) , and | h. s ohsc|val . ou | : . ' ' s . u| o | u I ( J ur t h Cal cgory. 1hc third catcgory
(Lovcr Vocational ) cau . | o| oc l | . . . | cJ. au hccausc this l cvcl contains
only obscrvations up l o uo. 4X3 ( () 0 l 2 ' 5 | ' o) . || kcvisc, thc lh
catcgory (0 Lcvcl s) is uol | uc mct | au hccausc . t starts vith obscrvation
no. o l (4o3 + 334). 1hi scaua| so hc cas . | yi n| crrcd nom thc cumul ativc
pcrccntagcs in Tablc 2. o. lor thc thi rU ca|cgo|y th|s i s 35. o, and Ior thc
Iourth catcgory thi s amounts to O. 5. 1hc point at vhich 5O oI al l
(rankcd)obscrvationsarc countcdthusrcsidcsv| thinthcIourthcatcgory.
Ccncrall y, hovcvcr, thc mcdian nccd not bc cal cul atcd manual l y in thi s
vaybccausci t sal gorithmi si ncludcdi nal l stati sticalsoIIvarcpackagcs.
Table 2.7 Median ofHighest Completed Educational Level
Medi an
Number of Val i d Observations
A vcl l -knovncxamplc i nvhi chthcmcdianpl aysan importantrol ci si n
dctcrminationoIthc povcrtythrcshol d. lirst, thcmcdianoIal l houschol d
incomcsi sdctcmincd,i . c. , thc incomcoIthchouschol dsaItcr5OoIal l
thc rankcd houscholds arc countcd. 1his i s shovn i n li gurc 2. o, vhcrc
thcmcd| anoIthcincomcd| stributioncquals l ,3OOcuros.
780 1 . 300 Medi an
Bel ow Poverty Threshol d
50% of al l Househol ds
_ I ncome Di stributi on
Figure 2.8 Definition u}lli: / 'ut c t / i 'u, s!Jn!r l tlll 'l l ll,i ! l' tlw Median
.\ |
l ; rom t h.: 1 1 1L:d i : 1 1 1 ol I , 1 00, : 1 t' | t' ' l l l ag.: s l a|cu lo Jcl c|m. uc| ucpovc|| y
| u|csuo| J. | 1 I l l I : I I HI J ll ' : l l l l i 1 1 i ul l , | u s pc|ccu|agc sgcuc|a | | yscl | o60%.
'o, l uc tu|csuol d al l Hl Ul l l s l o 7XO cu|os( I , 3OO * (O/ IOO)), vhich mcaus
l ual houscho| Jsw. l ua uc| houschold incomcbclov oOcurosarcconsid-
crcd to bc bclov thc povcrty l inc. 1hc mcdian i s uscd to dctcrmi nc thc
thrcshol dbccausc it is notscnsitivcto cxtrcmcly highincomcvalucsthat
arc part oIthc ovcral l incomc distribution in many parts oIthc vorl d.
Considcr, Ior instancc, a samplc oI l ,OOl houscholds i n vhich thc most
ccntral houschol d aItcr ranking is no. 5Ol . Supposc that alcr ranking,
houscholdsno. 45 l through 55 l tumoutto havc anincomcoII , 3OO cu-
ros. | I l O houscholdsarcaddcdtothcsamplcvith anincomcoItvomi l -
l i on curos, thc total numbcr oIhouscho| ds riscsto l O l l . As a rcsultthc
mcdian shiIIs Irom houschol d no. 5O l to houschol d no. 5Oo. Hovcvcr,
thc total incomc oIhouschol d no. 5Oo i s l, 3OO curos, so thc mcdian rc-
mains thc samc. Morc cxtrcmcl y, vc coul d add up to I OO houscholds
vithcxtrcmclyhi gh incomcs(thc cxactincomc i si rrclcvant)tothcorii -
nal samplc oI l ,OO l houscholds vithout any changc i n l hc mcJ au ( t he
mcdian vi l l stil l rcmain to bc l ,3OO curos, bcCausc | uc mcJ. a. | s at
houschold numbcr 55 l i ncasc l OO hi gh | ncomcs a|c aJJcJ i . | ' l l r: d I ,
thc mcdian is said to bc a robust mcasurc, wu ch mcaus | ual l .s r a l hl l
inscnsitivctocxtrcmc scorcs (al so cal l cd outliers ) . A nccssary ol H i i t i ot l
t ousingthc mcdian| s that variablcs nccd to bc a| | casl ordi 1 1 al as t i l l oh
scrvationshavct obcrankcdmcaningIul l yIirst.
1hc mean (or morc accuratcl y, thc arithmetic mean, symbo| . x ) . s l uc
mostcommonlyuscdmcasurcto indicatcthcccntcroIadi stribution 1hc
principl coIthcmcan isthatthcrci sa point in avariabl c` sdistribution at
vhich cqui l i brium is Iound (scc ligurc 2. ). 1o cal cul atc this poi nt, thc
scorcsoIal l obscrvat| onsarc summcd anddividcdbythctotalnumbcroI
|or cxamplc, in thc Iol lovingrangc oInumbcrs, thc mcan
5, o, l O,25, 25, 5O, 5O, O, O, o l , O -
5 + o + I O+ 25 + 25 + 5O + 5O+ O + O + o l + O 4o4-
4o4/ l l 44.
A|l numbcrscannovbc si multancous|yrcplaccdby44vithoutchanging
thcsum oIal l scorcs( I I * 44 484) . So, on average, cvcryobscrvation
has a sco|c o| 44. As sa . J, l uc mcau s l uc po| nt on thc di str| bution at
vhi ch | hc scor:s pl' r l \ l l l y h: l i : l l l C ` c: 1 ch ol uc|. 1o . | | usl |a| c this. vc rst
suhl |acl ' oui L : 1 tl | v: d1 1 ` t l 1 v l l l l ' : l n : ` 1 ' 1 . X 44 . . . . . 90 44, vhich rc-
3Z Chapter Z
sultsi nthc Io| l ovi ngnumbcrs. -3, -3o, -34, - l , - l , o, o, 2o, 2o, 3, 4o.
1hc mcan conscqucntly i s O. 1hc sum oI al l ncgativc numbcrs cquals
- l 4 (-3 + -3o -34 + - l - l ) and thc sum oIa|| positivc numbcrs
cquals l 4 (o o + 2o+ 2o 3 + 4o) . | n absolutc tcrms, both sumsarc
cqual, and thus ba|ancc cach othcr out. Additiona|!y to this arithmctic
cxcrci sc vccan al so graphi cal l yshovthatthcmcan is thcpoi nt atvhi ch
thc balancc is in cqui l ibri um.
| | ' ' ' '
5 8 1 0 25 50 70 81 90
25 50 70
Figure 2.9 The Mean as the Center ofa Balance in Equilibrium
An obvi ous disadvantagc oIthc mcan can bc dcrivcdIrom thi s Igurc. | I
outliers (vcryhighorvcry l ovval ucs)arc addcdtothcbalancc,thcpoint
at vhich thc ba|ancc i s in cqui l i brium shiIs proIound|y. lor cxamplc,
supposc thata valuc oI l i s addcd to thc balancc. 1hc mcan thcn bc-
comcs(44 + l ) / l 2= 5o! otc thatadding | to thcscorcsdocsnot
a|tcrthc mcdian (- 5O) ' Ccncra| l y, thc mcan i s a adcquatc mcasurc Ior a
distributi on` s ccntcr as l ong as thi s distribution is not ovcrly skcvcd to
thc | cIt orthc right duc to cxtrcmc scorcs (out l icrs). Highly skcvcd di s-
tributions can casi | y bc rccognizcd bccausc oIthci r di stinct shapc (scc
li gurc 2. lO) . By dcIni tion, a distribution is skcvcd to thc right i Ithc
mcan i shi ghcrthanthcmcdi an andviccvcrsaIordi stributionsskcvcdto
thc |cIt (scc ligurc 2. lO). Ccncral l y, i n strongly skcvcd di stributions
(such as incomc di stributions) thc mcdian i s morc appropriatc than thc
skew cl t o i lql 1 l
Descriptive Statistics 33
|ndividua| charactcri s|i cs, sucu as |ody hci ght and body vci ght, tcnd to
havc a morc or lcss SVJIIJ/Ielricl d| s|ribution, vhich mcans that an
approximatc|y cqual numhc| o|o|scrvationscanbcIoundto thc | cII and
tothcrightoIthc mcau ( scc l i gu.cs 2.3 and2. 23) . ThcmcanthcrcIorc i s
a vcry uscIl tool t oi udi ca|c | uc ccntcroIthcsctvo distributions. Tab|c
2. l l shovsthcmcanslor|hc|a|i ovariablcsbody height andbody weight.
linal | y, vc vou|dl i kc|o no|cthatthc usc oIthc mcan is l imitcdtoi n-
tcrval and rati ovariablcs as ca| cu| ations oIthcmcan rcquirc summation
oIa| l valucs, vhi chi sonl y mcauingIul vhcnthc i ntcrva|sbctvccnada-
ccnt catcgoricsarcknovn (orassumcdtobcknovn).
Table 2. 1 1 Means ofBody Height and Body Weight
Hei ght
1 73. 83
Wei ght
76. 24
Whcndcscri bing a di stributionnumcrical l y, it is oItcn not cnough to rc-
port thc ccntraltcndcncyusingthcmodc, mcdian,and/ormcan,bccausca
distribution a| so has a ccrtain dcgrcc oIvariabi |ity around its ccntcr As
shovn in li gurc 2. l 2, thc variabi lity oIdi stri butions can bc qui tc di IIcr-
cnt,cvcnvhcnmodc,mcdian,andmcan arc cqua| .
mode/medi an/mean

Figure 2. 1 2 Sa111e Mode/Median/Mean but Different Variabilit

The most basi c w: 1 y t u :l o y . ul l l \ ' 1 1 1 1 1 1 ) ', n l lt 1 1 i l ; 1 di s l r i hul i on' s va.i a|i | i| y is
I n cal ud< k t hv d i l l t l l ' l l l t ' ht l l\ 1 '< 1 1 1 1 1 1 ' 1 1 1 . 1 \ 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 i ni nn 1 1 l l sco.c
' 1 '1, ; , , , 1; 1 1 : . ... . . . . . . . . . . . - . . 1 1 + 1 t 1 . . . + , . . .. l = . l . . . l ! . . l 4

Cl r . r pl or ?
cascs , . ul c|va' auJ|a| o v. . . . 1 |s. | ' ucscqucucc 1 0, 30, 50, (J O, 90, | hc
rangc cqua| s SO ( 90 I0 ) . | | owcv |, I uc Uovusidc olus . ug | hi smcasurc
is its hi gh scns. t . v. | y 'o | | u sco|cs. vhcu ust onc scorc ol | ?O | s
addcdt othc scqucncc aoovc, I uc|augc . sdoub|cd. Anothcrdisadvantagc
is thatthc rangc i s not . u | u|ua| . vcaoou| l uc cxact shapc oIthc di stribu-
ti on. To i | lustratc thi s, |. gu|c 2. 1 3 shovs tvo qui tc di IIcrcnt| y shapcd
distributions thathavcthcsamc |augc ( SO) .
V !
Figure 2. 1 3 Same Range but Different Shaped Distributions
Interquartile Range (I QR)
A morc appropriatc a|tcrnativc is thc interquartile range ( | QR). This
mcasurc i ndicatcs thc rangc oIthc mi dd| c 5O oIa|l obscrvations. To
dctcrmincthis, quarti | cs arcuscd. Quartil cs sp| i t thc distribution in Iour
cqua| | y sizcd parts, vhcrc cach part contains 25 oIa|| obscrvations.
Prcviousl y, thc mcdianvassaidto bcthcpoi ntatvhichha| Ithcnumbcr
oIobscrvationshas bccn countcd(aItcrranking). l ntcmsoIquarti | cs, thc
mcdian is thc sccond quarti l c (i ndicatcd as Q2). Thc di IIcrcncc bctvccn
thc Irst and thc third quarti| cthcn i sthc i ntcrquarti | crangc (Q3 - Ql
IQR), asshovn in ligurc2. l 4
D - Fi rst 25% of Observations
D - Central 50%
- Last 25%
02 03
Figure 2. 1 4 Meonin,!!, oj' (hwrtill 's l ll l r l lll li ' l "r fl l r l l 'li ll ' Hnllge (IQ!?)
Do:a; r r pl r vo ' l i i l r : , l rc: .
As p|cv . ous| y s ' a| . J, l i r e . . cJ au ( Q2 . s |obust, wu. cu mcaus | ual | u. s
ucasu|c | s |c' a | . vc' y . uscus . | . vc | ocxtrcmc scorcs Thi s mcaus | ua l Q| ,
Q3, auJ couscqucnt| y, l uc | QRsharc thi srobustucss as vc| | . Thcadvau-
l agc o|thc | QR ovcrthc rangc i s that thc di IIcrcnccs in thc dcgrcc ol
vari abi | ity arc bcttcr rcprcscntcd. li gurc 2. | 5 shovs thc distri buti ons
| iom li gurc 2. I 3, but nov vi th thc addi ti on oIthc i ntcrquarti | c rangc.
Thc | QRoIthc rstdi stribution is 4O, vhi | cthc IQR is onl yha| Iolthat
( 2O)Iorthcsccond. Thi s i s bccausc thcsc di stri buti ons h c quitc di Ilcr-
cutshapcs. Asviththcrangc,thcI QRcan bc ca|cu| atcd vitha| | typcs ol
variabl cscxccptnomina| oncs. Tab|c2. l shovsthcmcdian(Q2), raugc,
mi ni mum and maximum, and thc thrcc quarti| cs Ior thc vari ab|cs boc6;
height and body weight. Noticcthat in SPSSthcIQR is not prcscu|cUauJ
hast obcca|cu| atcdIromQ| andQ3aItcrvards (I QRbody height = Q3
Ql l O - l ? l 3and | QRbody weight = 5- o= l 9 .
Figure 2. 1 5 Diferent JQR due to D[/r( 'ff/ ,)'/'nt wr l ! J,, , ,, , hl l tl l l l l
Table 2. 16 Numerical Measures ofthe Voriahilil \ ' u/
Body Height and Body Weight
Hei ght Wei ght
Number of Observations
1 , 209 1 , 209
Medi an 1 73 75
Range 52 81
Mi ni mum 1 52 44
Maxi mum 204 1 25
Quartiles 1 st (Q1 ) 1 67 66
2nd (02) 1 73 75
r( l ( QJ) 1 80 85
Detecting OuUi crs wi C h Bux pl uC s
Box plots vcrc uo| i | | t l stral d vh i l t |scri bi ng charts i n scct | on 2 2 Ior
thc rcason that thcy cou| . | . . s| a| . s | | a | mcasurcs that had not yct bccni n-
troduccd at that poi n| t he . . J| . | . . , qua|l . | cs, andthc i ntcrquarti l c rangc.
Box p|ots arc vc| | sui t cJ t o dclecl cxcept i onal ly |ov and hi gh scorcs, to
dcscribcthcovcra| | di s|ri bu| i ou, auJt ocomparcdi stributions(thcl attcris
dcscribcdin scction2. 4 |)
As mcntioncd, somc mcasurcs | i |c |hc mcan arc scnsitivc to cxccp-
ti onal l y |ov and hi gh scorcs ( | uowu as outliers). Out|i crs can originatc
Irom crrors duringdatacntry ( |ur instancc, somconc crroncous|ycntcrs a
scorc oI l OO into thc data basc i nstcad oI thc intcndcd I O). A|so, it i s
commonpracticct odcsignatcrc|ativc| yhighscorcs( or)t ospccia|
catcgorics such as thc ansvcr ' don` t knov` in qucstions about attitudcs.
Whcnanal yzi ngdata,thcsc scorcs nccdto bc sct to ' mi ssi ng` duringthc
data c| cani ng proccss but occasi ona| | y mistakcs occur. lina| | y, cxtrcmc
scorcs canrcsul tIromva| i dobscrvations thcincomc camcd bytop scn-
ior managcrs, Ior cxamp| c. In box plots crcatcd by SPSS thc cxtrcmc
|ov/high scorcs arc indicatcdvith 0 and * Obscrvations i ndicatcd vi th
0 arc |ocatcd bctvccn Q| l . 5 I QR and Ql 3 | QR(l ov scorcs), and
Q3 + l . 5 IQRand Q3 + 3 | QR(highscorcs). Obscrvations i ndicatcdvi th
* arc l ocatcdoutsidcQ| 3 I QR(cxtrcmc|yl ovscorcs), and Q3 3 I QR
(cxtrcmcly hi gh scorcs). Vcry cxtrcmc |ov/high scorcs arc potcntia| | y
unvantcdout| i crsthati nucnccthcrcsultsi nanundcsirab|cvay.
To i | l ustratc, ligurc 2. | ? shovs a box p| ot Ior thc variab|c weekly
working hours. In this gurc, Q| cqua|s 24 vorkinghours pcrvcckand
Q3 cqua|s 4O hours pcr vcck(I QRthus cqual s l ) . Thc cxtrcmc scorcs
arc | ocatcd at thc top oIthc di stribution. Obscrvations i ndi catcd vi th 0
arc bctvccn Q3 + l . 5 IQRand Q3 3 I QR, that is,4and hours(4O +
l . 5 * l 4 and 4O + 3 * l ). Thc obscrvcd scorcs 5, , ?, ?O,
?2, ?5, OIa| | intothatintcrva| . Somcobscrvations(i ndicatcdvith *) arc
| ocatcd bcyond thc point Q3 + 3 * I QR- . Thcir cxact scorcs arc O
andhours. !otcthatthcboxpl otindicatcspotcntia|out| icrsbuti t docs
not shov cxact|y how many obscrvations havc cxtrcmc scorcs. A Irc-
qucncy tab|c i s suitcd to providc i nIormation about thc Ircqucncy oIoc-
currcncc(sccTab|c2. l ) . 1abl c2. | shovsthat24rcspondcntsvork5
hoursormorc.Asmcntioncdcar|icr,thcmcani sscnsitivctosuchscorcs
ThcscorcsOandvi| | cxcr| |hcs| |ougcsl .ul l ucucc, andthcrcscarchcr
may rightIu| | y vonUcr whc| uc| | ucs` : 1 r v: t li d observu| ions at a| | . On
cl oscr i nspcct i ou_ t he coJ ooo| shows t | 1 i l ti lL'SL' : 1n; coJcs lor | hc an-
svc|s ' Jou ` | k uov (
) 0) : 1 1 1 d ' di d t i PI . | l '- Wt ' |
( 1 11 1 ) | l l SPSS | hcsc coJcs
shou | J oe dcs i l.! t l at nl :1s | | | | '-| | | j ' V t l t w , , wl 1 1
| . t ' \ t " i t
< ks t hem ||ou auy
st at i st i l'al al l : t l yst .
|lo: H: I i pl l vo : 1 nl i : l l lc: 1
1 00
Hi ghest score (=60)
within 03 (=40) en
03 + 1 . 5 * | OR(=64)
Lowest score (=0)
wi thi n 01 (=24) en
01 - 1 . 5 * | OR(=0)
j 03 + 3 ' OR(=88)

Weekly Worki ng Hours
03 + 1 . 5 | OR(=64)
03 (=40)
|OR(=1 6)
01 (=24)
Figure 2. 1 7 Box Plot for Weekly Working Hours
Table 2. 1 8 Respondents Working More Than 64 Hours a Week
Frequency Cumul ative
Counts Percentages Percentages
4. 2 4. 2
4. 2 8. 3
4. 2 1 2. 5
70 1 1 45. 8 58. 3
4. 2 62. 5
75 2 8. 3 70. 8
80 5 20. 8 91 . 7
4. 2 95. 8
99 d . 7 1 00
Total ; ,t | | |H
' J /
. . . . . I .
1o . || cnn . . | | . . . . ' | . a. . .. | . o oy | u sc cx| |c. | | c c+scs, | | . . . . ca. . auJ
l uc qua|| . |s . | . . . a|u| . . | . o | . | ' | . . sc . . a|. os. | uc' us. o. . 1 1 1 : dl cascs | ua|
sco|cJ90auJ91) , . \ c

| . . s . .
1 1 | | . . s. c. . scs,auJcxc| us | ouo| a | | cascsvi th
o5 olmo|cwo||| u | . o . . s 1 uc . . s. . | | s . . |c shovn inTah| c2. | 9.
Table 2. 1 9 DesaitJiil '< ' .\'totistil s l l ' iiiJ oud \ithout Out/iers
Weekly Working Hours J u| | Sampl e 90 and 99 >64
Excluded Excluded
Val i d Observations 1 , 31 3 1 , 31 1 1 , 289
Mean 33. 74 33. 65 32. 99
Fi rst Quartile 24 24 24
Medi an (Second Quarti le) 38 38 37
Thi rd Quartil e 40 40 40
Whcnthccascsvith scorcs Oandarc cxc| udcdIrom thc sampl c, thc
mcan changcs sl i ght|y Irom 33. 4 to 33. o5. Thcrc i s only onc casc that
scorcd O and on|y onc casc vith , vhi ch cxp|ains thc rathcr minor
changc |Ithcrcvcrca substantial proportionviththcscscorcsthc mcan
voul dhavc bccn scriousl y aIIcctcd !otc that a| l quarti l cs rcmaincd cx-
act|ythc samc.
lxcl usionoIa|lcxtrcmcscorcs(morcthano4hours)hasmorcscrious
conscqucnccs Ior thc mcan as it dccrcascs by thrcc quartcrs oIan hour,
vhcrcas thc mcdian dccrcascs by onc hour Bccausc it i s p|ausiblc that
pcop| c vork oO hours pcr vcck bcaring in mind |ong vorking hours,
Ior instancc, i nbars,rcstaurants, and Iinancc thc sccondco| umn i nTa-
blc2. l sccmstobc bcstIor dcscribingvorkinghours.Thati s, cxc| usion
oIO and scorcs (not rcprcscnting obscrvcd hours) and i nc| usion oI
Standard Deviation and Variance
Thcstandard deviation i sthcmostcommonl yuscdmcasurc Ior variabi|-
ity Thismcasurc is rc|atcd to thc distancc bctvccnthc obscrvationsand
thcmcan. lorcxamplc,supposcvchavcthcIol |ovi ngrangcoInumbcrs.
l O,2O,3O,4O, 5O, oO, 0, 80,O,and l OO. Thcmcan is 55 ( ( | O + 2O+ 3O
. . | OO) I l O) . Uow cau | uc variabi|ity around thc mcan bc bcst dc-
Incd?Takinga | | J| s| auccs h+m | ucmcan | ogc| uc|isinappropriatcasthis
vou|d rcsu|l . u | uc |aupc. 45 ( |0 5 5 ) , - J5, - 2), - | 5, -5, 5, | 5, 25, 35
and 45. Tucs. . . i | o| | u| s |. . . | jc . sol\ l 't / \ '.1' 0, vu .u o| c oa|sc i s not i nlo|-
| 1 t t | i I i Vt | O| l!| l I
ua| | vc l l l ' | | | . v. . . . . ' . ' . . , | | . . | . . . . . . op|| a| c | o | u|u a | | J. s| . . uccs . u| o
ohsolutc J s' . | . | .. s| | | . . | . 1 _ . | . . . | | . p| y. . . p| uc ucga| . vc uumoc|s oy- | i . 1uc
sum | ucu a u . . . . | s | .' ' | | (

l 'l | \ , _5 | | 5 | 5 | 5 ' | 5 ' 25 35 +

45 . 1u. s s u i . , J. v . . |J oy | uc . . l l l | oc| o| obscrvations, yic|ds tuc mcau
J is| aucc: 250 1 | 0 2 . | | ovcvc., | hi sabso|utc mcasurc i snotoItcn uscd
bccausc .l Jocs uo| |c| a| cwc| | | o in|crcntia|statistics(sccchaptcr3) .
Auol uc| s| |a|cgy . s l o sum thc squared di stanccs (a ncgativc scorc
tunspositivcwucusquarcd) Thi srcsu|ts in asum

+ -35
+ -25
+ - l 5
+ -5
+ 5
+ | 5
+ 25
+ 35
+ 45

O25 + | 225 + o25 +

225 + 25 + 25 + 225 + o25 | 225 + 2O25) . By dividingthis sum by thc
numbcr oIobscrvations (lO), thc avcragc squarcd distancc to thc mcan
cqua|s o25. | n statistics, this numbcri sknovnasthc variance. Thcvari-
ancccanbccomparcdtothcarcaoIasquarc(sccligurc2. 2O)
Si des = 28. 72
Area =
28. 72 * 28. 72 825
Figure 2.20 Variance Compared to the Area ofa Square
In statistics, thc mcasurc oI variabi l ity i s prcIcrab| y i ndicatcd as a di s-
tancci nstcad oIasquarcddistancc(i c. , asquarc) Thc squarc root oIo25
(-2o. 2)i stakcn(thisva|uccqual sthclcngthoIthcsi dcsi nligurc2. 2O)
andthcrcsul tingmcasurc i sca|cdthcstandard deviation
Roughl y, thc
standard dcviation can bc intcrprctcd as thc average distance from the
mean, althoughmathcmatical |ythi sisnotcorrcct.

Hovcvcr, Ior practica|

rcasonsthi sintcrprctationsuIIccs 1hc standarddcviationrcIcrstoadi s-
ti nct and Ircqucnt|y uscd distribution- thc normal distribution. Thc nor-
maldistributioni scrucial |u|mauysla| istica| lcsts(sccchaptcr3) . In|ikc
thc variancc, thc >| auJa|J Jcv. a| . ou . s cxp|csscd in thc samc units oI
mcasurcmcntsas | ucva|. ao| c | | s | | l o|calp| c, wucuthc variablc body
weight is mCasu|cJ | u| cnuo| | . | o . . . us, | uc s| auJa|J dcvi ation indi cctcs
thc avcragc J| s| a. . . . . . | . | o, . . . . s |. . . | . . .

| | . . . . . squa|cJ | . | oglams, as is
thc casc w| | u v. . . . aucc ( . .s u. . . o |o . . . . . ( . | c vua| .\' ifi iOred kilograms
vou| J mcau | . . . i | . . | | . . | . . . | . v.
. . .
c. , 1ao| .) . 2 1 suows tuc mcan,
thc stauJaJ Jcv . . . | . . . , . . . o l l w . . . . . . . . O| | l | v. . . . . . o|s hld1 lwight and
hodv l l 'ei.e.tl .
4| || . 1 pl o1 2
Table 2. 2 1 tti, ^li u li l ' ' i |: ilit i i iiO Variance of l: ii lil iiO Ut 't_hl
Standard Deviation
Vari ance
Hei ght
1 73. 83
9. 48
89. 90
Wei ght
76. 24
1 3. 41
1 79. 70
Rough|yspcaking,rcspondcu|sJ. vergc ou avcragc . 4ccntimctcrs Irom
thc mcan body hcight ( l ?3. 3 ) auJ Jivcrgc on avcragc approximatc|y
l 3. 4l ki |ograms Iromthcmcan body vc| ght(?. 24). Thcvord ' divcrgc`
is uscd bccausc thc disti nction bctvccn | ong/short and | ight/hcavy is no
|ongcrrc|cvant. This i sbccausc a| | di IIcrcnccs bctvccn obscrvations and
thc mcanvcrc squarcdtoca|cu|atcthc standarddcviation. Conscqucnt|y,
rcspondcntsvcighing3 ki |ograms bc|ovavcragc andrcspondcntsvcigh-
i ng3 ki | ograms abovc avcragcbothscorc ` squarc ki|ograms` sothc in-
Iormati onvhcthcrthcyarcbc|ovorabovcavcragc i s|ost.
Li kcthcmcan,thcstandarddcvi ationcanbccomparcdtoaba|anccin
cqui | i brium. Wccanrcdistributca| | rcspondcntsi nsuchavaythatha|IoI
thcmmcasurc l 4. 35ccntimctcrs(thcmcanminusthcstandarddcviation
- l ?3. 3 - .4) and vcigh 2. 3 ki |ograms (?. 24 l 3 . 4 l ) vhi | cthc
othcrs mcasurc l 3. 3 l ccnti mctcrs( l ?3 . 3 .4) andvcigh .5 ki| o-
grams( ?. 24+ l 3.4l). Thcscncv| yconstructcddi stributionsoIthcvari-
ab| csheight andweight havcthcsamcmcanandsamcstandarddcviation
as thc origina| variab| cs. Thc on|y di IIcrcncc is that novall respondents
arc at a distancc oI. 4and l 3 .4 l uni ts uom thc mcan rcspcctivc|y (scc
|i gurc 2.22vhichshovsthisIorthcvariab|cbody weight).
Bccausc thc mcan is partoIthc ca|cu| ation, thc standard dcvi ation is
ou| ysui tab| cIor i ntcrva|andratiovariab| cs. A|so | ikcthcmcan,thcstan-
Uard dcvi ation is scnsitivc to out| icrs, and in thc casc oIcxtrcmc| yright
and | clt skcvcd di stributi ons,thc |QR is actua||yabcttcr-suitcd mcasurc
50% of al l observati ons wei gh l ess than average
- 50% weigh more than average
62. 83 76. 24
1 3. 41
89. 65
Figure 2.22 Standard Deviation os l >istr lllr ' t ' In tltc ' ^ I: i ii i
(/Jod1' Utt_ll lu:ii os ' ' ' ' ""I 'll ' )
l lo::CI I I il lvn ! ) l ai i : . I I L: I
A commou p|oo|u . u cs a|cu. s incomparabi | | |y. O| cu, Jal aouva|. ous
vari ab| cs arc ava. | ao| c, oul l uc uni ts oI mcasurcmcnt arc uo| i dcu| | ca| .
Thi s | s prob| cmat| c vhcu thcsc variab| cs nccdt obc comparcd. |rcvi ous
ca| cu| at i ons dcmonstratcd that thc standard dcvi ation Ior O' c/_!l . s
9. 48 ccnti mctcrs and l 3. 4l ki |ograms Ior ody weight (scc Tab| c 2. 2 l )
|rom this vc cannot i nIcr that body hcight shovs |css variab| | i ty l uau
oody vcight - this vou|dbc| ikc comparing app|cs and orangcs. Th| sis
uotto say thatapp| csandorangcscannotbc comparcd ata| | , oncou | yuas
| o takc i nto account thci r si mi | aritics. lorcxamp| c, thcamount o|v. l a-
mi us and/or ca|orics i n app|cs and orangcs i s pcrIcct| y comparah| c auJ
somcthing si mi |ar i s possib|c vi th body hci ghtand body wci gu|, |ou-
si dcr, Iorinstancc,apcrson vho mcasurcs l O ccnti mctcrs auJwc. gus
) ( )
|| | ograms. Bascd on thc mcans (shovn i n Tab| c 2. 2 l) _ . l cau oc t` O| |
c| udcd thatthi s pcrson i sboth ta|| cr andhcavicrthau |ue avc|aj. | . 01
Jcrt omakc a propcr comparisonthough, thc posi l . on . u l uc J. s| | ' . | . | |
o|hcightandvcightmustbccomparcd. W| th rcgarJs | o u . ju' , | [ | | t ` |
son | s locatcd tothcright oIthcmcan. Thi s | sa| so l ru | I i ! | | . s [ ' | ' | |

vcight, but this obscrvation | ics Iar morc to thc |i gu| o|| uc | | | i' | | | 1 1 1 1 11
ever_ thcqucstionrcmains. hovmuchmorctol uc |i gul cxa lI y
Hei ght
1 80 90
A 1 73. 83 A 76. 24
I Vt | Q| i l
Figu re 2. 23 lJooliull n/ I l 'S] u l i i l i i i lli lltOi ltt_!l | tiii i tiit l
lIti li I J 't it /il '|/ l. i
Percent i les
Onc ansvcr to l h| squ. s| ou ' s |u pc|ctu|agcs. In this casc, vc havc to
cal cul atc thc perccu| agc o | |cspomkut s v | h a hci ght oIl oO ccntimctcrs
or|css, andthcpcrccntagc o|ispouJcu| svci ghingOki | ograms or|css.
Thcsc pcrccntagcs arc ca| | cJ perentiles. | n Iact, vc a |rcady cxp|ai ncd
vhat pcrccnti | cs mcan, bccausc l hc quarl | |cs in Tab|c 2. l arc cqua| to
thc 25th, 5Oth, and ?5thpcrccn| i | c | n othcr vords, a pcrccnti | c i ndicatcs
thc pcrccntagc oI( rankcd) obscrvations that is countcd Irom obscrvation
no. l onvards. Todctcrmincthccxactpcrccnti |c,cumu| ativcpcrccntagcs
arcmostuscIu| . Tab|c 2. 24 shovs(partsoI)thcIrcqucncytab|cs Ior body
height and body weight. Thc cumu|ativc pcrccntagcs i ndicatc that thc
(truncatcd) pcrccnti | cscorcs arc ? (Ior l oO cm) and (O kg), rcspcc-
tivc| y. Bascdon thcsc scorcs, a Iai rcomparison i s possi b| c. Comparcd to
thcpcrson vho mcasurcs l oO ccntimctcrs andvcighsOki |ograms, 24
oIa| | rcspondcnts arc ta| l cr,but ' on| y` l 4 arc hcavicr. Sothcpcrson in
thi s cxampl c is rc| ativc| y morc hcavy than hc is ta| | . Pcrccntil cs arc
common|y uscd incl uding i n cducation Ior a| l kinds oIschoo| pcrIorm-
ancc tcsts. Thc pcrccnti|c i ndicatcs thc cumu|ativc pcrccntagc oIpcop|c
pcrIormingcqual l yvcl | orvorsccomparcdtoapupi l ` stcstpcrIormancc.
Table 2.24 Frequency Table for Body Height and Body Weight
Hei ght Wei ght
Frequency Cumulative Frequency Cumulative
Counts Percentage Counts Percentage
1 78 72 70. 9 88 21 81 . 5
1 79 1 2 71 . 9 89 1 5 82. 7
1 80 58 76. 7 90 46 86. 5
1 81 1 8 78. 2 91 1 2 87. 5
1 82 35 81 . 1 92 1 5 88. 8
| ina| |y, i tshou| dbc notcd that pcrccnti| cs canbc computcd Ior al I vari -
ab|cs cxccpt Ior nomina| variab|cs. | nthccasc oIi ntcrva|andratio vari-
ab| cs, z-scores can bc uscd i n addi ti on to pcrccn|i| cs to i ndicatc thc rc|a-
tivcstandi ng.Z-scorcsa|cJ| scusscJ |u |uc ucx | scc| | on.
1o Jc| crm| uc |c| a| . v s| 1 . J ugs, | uc aoso| ul c J| || c|cuccs oc| wccu oosc|-
va| | ous auJ | uc mcau a|c |cqu | .cJ. l u ourcxamp| c whc|c a pc.sou mca-
surcs l oO ccul | mcl c|s auJ vcighs O ki |ograms, thcsc abso| ul c J| | 1r-
cnccsamountto. ' ccntimctcrs( l oO l 3 o3) and l 3. |i |ograms( 90
7. 24). It is i ncorrcct to inIcr that this pcrson di IIcrs approxi matcly
|vi cc as much Irom thc mcanbodyvcightcomparcd tothc mcau ooJy
hcight. As statcdbcIorc, thc tvovariab|csar

i ncomparab|cbccausc J| |
|crcnt units oI mcasurcmcnts arc bci ng uscd ( i . c. , ccntimctcrs vs. |i |o-
Toobtai na common mcasurc othcrthanpcrccnti |cs, thcstandard Jc-
viationi svcryuscIul as itindi catcsthcavcragcdcviationIom |hc mcau
|orcxamp|c,i nli gurc2. 23 thcrcarc rcspondcntsvhoarc cxac|ly | s| au-
dard dcviation to thc right oI thc mcan. Thcsc rcsponJcu|s mcasu.c
l o3. 3 l ccntimctcrs ( l 3. o3 . 4o) and vci gh . 5 |i ' ograms ( Cl . .l
l 3. 4l ) . In absolute tcrms, thcy arc locatcd at . 4 ccu| i mc| crs nnd I 1 . 1 1
|i |ograms Iromthcirrcspcctivcmcans.Relativel)l, uowcvc|, | ' . . s. J ll'PJ l k
arc cqual | y tal | and hcavy, Ior thcir rclati vc pos| | | ou | o I l l 1 1 1 . . . . s | ' +
samc - cxact|y | standarddcvi ati on' Tvovari ao|cswi t h di f 'l l t l' l l l . . . . . | s1o l
mcasurcmcnt can bc corrcct|y comparcd whcu aoso' u| . di r i ' t vt l i ' i' ' :. 1 1
rcp|accd vith rc|ativc di IIcrcnccs. To ach | cvc |u| s v. l l : I Vl' | . i' I I I I I J I I I f t
thcrc|ativcstandingin tcrmsoIstandarddcvi a| | on.
Wc rcturn to our cxamp|c to i || ustratc | h| s. 1uc ahso| l t i I` d i l i l ' t i ' l l t i"l
bctvccn obscrvations and mcan amount to . ' 7 ccu| i mtl s 1 1 1 1 d r I / I
|i |ograms (scc prcvious ca| cu|ations). Thc standard Jcv at | ons : 1 1 1' I J 1 >
and | 3 . 4 l , rcspccti vcl y. I is clcarthat thc hcight Ji |f crs | css | ' . . | . | I s | a .
darddcviation Iromthc mcanhcight. Thc vcightJ| uc|sapp|ox u. . | c | v |
standard dcvi ationIrom thc mcan vci ght. Thc cxact J| | lc.cnccs, ' . . |c| d
::-scores, arc. . 5 (ca|cu| ati on. . l ? / . 4) and | . O3 ( l 3 . / 1 3 . 4 1 ) . 1 T| |
rc|ativc vcight is thus about onc and a ha| Itimcsas largc as l hc rcl|| vc
hcight ( l . O3 / . 5 l . 5o). Z-scorcs can bc cal cu| atcd i u sla| | s| | ca| so| l -
varc packagcs. | n Tab|c 2. 25, thc z-scorcs Ior rcspondcnts l 8O ccu| | mc-
|crs ta| l ( 5obscrvati ons) and 4 rcspondcnts vci ghingO |i j ogramsa.c
shovn(ca|cu|ations. SPSS).
Table 2.25 Z-scores for Bodv Height= 1 80 Cand Body Weight= 90 kg
Hei ght = 1 80 I| Weight 90 ki l o
Z-score ( i ! l l 1 . 026
Chebyshev' s Rul e anc l J< : mpi ri(: al Rul e
Bcsidcs compar| ug | nd. v. dua| oosc.va| | ons hom di l|crcnl variab| cs and
mcasurcs, z-scorcs a.c uscJ |o coupa.c | hc rc|ativc standi ng oImultip|c
obscrvations i n onc s . ug| c di s| ri |u| | on. Whcn Irom any di stribution thc
obscrvationsarc takcn l ha| | . c wi t hi n z-scores -2 and2, thcnthi ssc|cction
always compriscs at | cast 31 ( 7YX, ) o| a| | obscrvati ons. Bctvccn thc z-
scorcs -3 and 3 at | cast % ( 88. 9'Yo ) o|a|| obscrvations arc always Iound
(sccligurc 2. 26). Gcncra| | y, |oranynumbcroItota| obscrvati ons,a pro-
portion oIat |cast | - l / z
is |ocatcd bctvccn -z andz (vhcrc z is thc
numbcr oIstandard dcviations). 1his Iomula is knovn as Chcbyshcv` s
Ru|c,namcdaItcr a ni nctccnth ccnturyRussianmathcmatician.otcthat
vhcn thcIonu| ais appl i cdtoz | , atleast zcro obscrvations( I - l / l
O) arc | ocatcdvithinz-scorcs- I and l . C| carly, this is not i nIormativc.
Chcbyshcv` s ru|c thcrcIorc i s uscIu| Ior any z - l , but it i s cspcci a| | y
knovnIorz 2 (75%) andz 3 (88. 9%). Chcbyshcv` s ru| cmaybcuscd
Ior any di stributi on, rcgard|css oI its shapc. 1hc di str|bution shovn in
ligurc 2. 26 has mu| ti pl c pcaks and has a numbcr oIsuddcn riscs and
Ial | s. A|so, this distribution is skcvcd to thc right. cvcrthc|css, Chcby-
shcv` s ru|c isva| i d'
- -3 -2 mean 2 3
Figure 2.26 Chebyshev 's Rule (applicable to any distribution)
Whcn a di stribution is approximatcly symmctrica| and hi l l -shapcd (scc
li gurc 2. l 4), thc empirical rule is much morc i nIormativc comparcd to
Chcbyshcv` s nlc. l tstatcsthatIorcvcryrough|ysymmctricalhi l | -shapcd
di stribution,approximatc| y68% oIa| | obscrvations Ia| | vi thi nthcz-scorc
rangc - l and l . Bctvccn z-scorcs -2 and 2, approxi matc|y95% oIa| l ob-
scrvations arc |ocatcd, and approximatc|y a| | obscrvations (99. 7%) I i c
vithin -3 and 3 ( Scc |i gurc 2. 27) . otc that | hc wo.J ' approx| matc|y' | s
uscd in thc cmpirica| ru|c, bccausc i l | s a Jc|| v: t | . v o| | uc cxac| ru| c,
vhich statcs l hal 68. 27% ol a | | oosc.va| | ous: t t"L hl" I W\Tt t Z scorcs - I auJ
| , tha| 95. 45" | s | oca| cJ oc| wccu -2 . . . . | . ) , : t t H I | I L t l ' /' I [ \
0 |s | oc. . | cJ
oc| wccu -3 auJ . 1ucex< t ct .u l e . so . | v: t l t d ' "' l l w / t/t. tl . ///tl///n,
Docr r pl r vo l ; 1 1 1 : : 1 1 r : :
wh| ch | s syum | . ca| a. . J | . . | | su. . pcJ auJ cau oc Jcsc.| ocJ w| | ' . . . |c| . |
| | vc|y s| mp| c | nuu |a ( ou ou. wcos | lc a S|SS( syulax ) | | c. s a va . |a o| c| o
ca|cu|al c pc.ccu| agcs |o. auy z-scorc i n a norma| d| s|ri but| on) Wc vi 1 1
.Ctun to this di sl r| bul i on i n chaptcr 3 bccausc i | p|ays a c|i l | ca| |o| c . u
inIcrcntia|stati sti cs.
Approximately 99. 7%
3 -2 -1 mean 2
Figure 2.27 Empirical Rule (suitable for |n0uc/r/: ( //!( / l/ / ll /t. t
/ ` t l . //
Asummary oIboth ru|csis shovnbc|ov.
Withi n
-1 and +1
-2 and +2
-3 and +3
Any Di stri buti on
(Chebyshev's Rul e)
at least 75% of al l observations
at least 88. 9%
Symmol r i c t i i H I I I I I I d t o t t u " l
( mpt l l < : t l I { I l l " )
approxi t t l l t l ! ly 1 > 1 1 ' 1,
approxi t nnl ol y ! l ! > %
approxi mat ly ! l ! l /%
ptothispoint, a| | ourdcscriptivc statisticsrc| atctoonc si ug|c va.| ao| c,
and arc thcrcby ca| |cd univariate dcscri ptions. Wi th bivariate s| a| i s| | cs,
|hc stati stica| rc|ati onship bctvccntvovariab|cs i s dcscribcd |us| caJ o|
`rc|ationship, othcr vords | i kc association, intc|dcpcudcucc, o.
`corrc|ationarcuscd l odcnotc| ha| tvo variab|csarcslati sl i cal | yrc|a| cU
1vo variab| cs a |c pos | | . vc| y .c| a| cJ whcu | ow sco.cs ou a | i rs| va|. ao| c
co| ncidcw| | u | owscocs r l l l ; t scco. | J v. . | . ao| cauJu . gu va' ucsou| uc | |s'
gologc| hc.v| | | t l t i ) , l t Sl r u r: s 01 1 | ' . . SLT< l l l d v; t . . ao| C. Wucu | owva| ucsou
ouc va. | aol L' O
i l l \' t r k \\ t i l t l 1 1 1 ' l t \ . t i l l \' o . l i l r: o| ue. va| . ao| c < l l l d v i ce
vc.sa, | uc |c |l . < l . t . | l .
| ' Au | | | i i , d l y l t l ' l ' : t l t VI ' ^ hi v; t r i : t l s| a| | s| . a| | l ; 1
1 | +,. . .
l i onsu . p ca. . o suow . . | | . , . . | ' . . . . ' | y ( us i ug p| o| s i o l l l l l l l l' I I L' : I i l y.
Numeri cal sl a | . s| . cd |c' . ' . . . . s' . . s . . o| j t 1 st used . u : |s. . . | . v. s| a| . s-
tics,bu|a|ca l so uscJ | | | . . |. . | i | . . . | s| a| i s| . cs( . . c. , gcuc|a ' . .. u | o a ' . |gc|
population). | u |c|cul . a' s' a| s| . c. d . . . . . . s. . |csa|c dcscribcd . u chapter 3.
Box plot
Box plots havc alrcady bccn descr i bed i n scction 2. 3. 2 to dctcct cx-
trcmcly l ov and cxtrcmclyhi gh scores. Box p| ots, hovcvcr, can al so bc
uscd to dcscribcthc distribution oIa dcpcndcntvariablc ( indicatcd asy
variable and Iound on thc y-axis oIthc plot) |or cach catcgory oIan in-
dcpcndcnt variablc (indicatcd as x variable and Iound on thc x-axis).
ligurc 2. 2 shovs an cxampl c i nvhich thc distribution oIcducational
attainmcnt(scc1ablc2. Ior dctail s) i scomparcd bctvccnthrcccohorts,
rcspondcntsbornbctvccn l 35- l 5O, l 5 l - l ? l , and l ? l - l O.

7 -

6 L




me .
1 935- 1 950
| OR
an (02)
1 951 - 1 970
1 971 -1 980
Figure 2.28 Box Plotfor Highest Educational Level and Cohort
ligurc 2. 28 i | l usl rales | ual t he mcJ. au ror p op' . ||o| . | | uc o| dest cohor|
cqual s 3, luc mcJ. au |o| . . | . JJ'

l' ( ) hnl . q. . . | | s . J . . . . J | | .

medi an ror t he
youngest couo|l cq. . . | s . T| | s. . . . . . . s | ' | . | | ' . . d | . | ' ' . .
op|o,l oug| n_ t o
11 /
l uc o| Jcsl co| . o. ' | . , cop| c oon bel vcen 1 935 and 1 950) co. . . | |J
cduea t i on at t he Lower Scconda|y School or l ower. I n | ucm. JJ| cco| .| ,
ua| | ' o| " t hc pcopl ccomp| ctcdSccondaryVocat i ona' School l evel o| ' owc|,
whi l e in thc youngcst cohort, halIoIthc |cspoudculs have I | cvc' s l
l ower as thcir highcst lcvcl oIcducational attai nmcn| |u|| uc|mo|c, 1 : i
ure 2. 2 shovs that thc rst quarti l c (Ql ) bccomcs i ncreas i ngl y u. gu |,
and thatthcthirdquarti l c Ior thcol dcst cohor| i sc' eary l ower | ua . . . | . s
lo|boththcothcrcohorts.1hcintcrquart i | crangcs ( | QR) | u|l uc | |s| | wo
cohorts arc cqual but l argcr than

hc l QR i n t he youu cs| couo|| T' .

shi Ii ng mcdian and dccrcasing | QR charac| c|i 7.e | uc p|occss o| u. . .
t| ona| cxpansion that took p| acc i n the moder West er wo|' J. | | | . . s
proccss morc and morc (young) peopl e L end | o . | | . . . u u j' | . |c' . . . . . | o. . . . |
| cvcl s. As aconscqucncc,thcshare o| ' owc| Ju. . . ' cu pco| I l l | | . . ii i i | |
lation dccrcascs vhilc thc Shares o| m. JJ| ! t l l t f ' . . , | . ' I t ' di H' I I i l ' d i| ' t u
incrcasc ovcr timc. 1hc box pl ot . u . -X . . J. . . | c | | . + | ' . I ' d"' . | . . . . , |
cxpansionsomcvhat | oscs momc. . | . . . . . , p. | ' y J. . . | o . . . | . . . j . | | . . . . .
thc proportion oI rcspondcu| s | ua| a. . ' . . j , ' + t 'dl i i l l i l d . . . . ll . |
stronglyasindicatcd by | ucs| . o. ' . ' y o| t ) ' N | . . | ' | | j i . . |
mum mi nimum) rcma. ns cqu. . ' . I ( H . . . . | . . . ' . . + | | ' . 1J | | t t | | ' . .
sti l l rcspondcnts vi tu el ement ary sc' | o. | . . . , , . . d , u | | . . . i | ' ' "'" |
thcrcarcrcspondcntswho graduat cd | t u . . . . . . v. | . | v
Scatter plot
Scatter plots can bc uscd Ior thc bi vari al e d sc. . | . o . . | | | + . . ' + | . . . |

bctvccn tvo variabl csthat cithcra|c i nt erva l o||. . ' . . . | | . . p' . l . . | . . . .

horizontal (x-)axis, and a vcrtica| ( y- )axi s. t r a . . . . s. . ' . . | . i | . . . ' . . | i '
sumcd bctvccn thc tvo variabl cs, thcn t he JcpcuJ u| v. . . . | o| s :
tioncd on thc y-axi s. lor cxampl c, iI onc want s to Jcsc| . o | | . |. | . | | . o. .
ship bctvccn thc variablcs body weight and hodv ttt_l/l | | | . . . . . sa |
rc|ationshi p is undisputablc, thc ta| lcr thc |cspondenl , l uc mo|c | . /suc
vcighs. l n thi s cxamplc it i s hard to assumc rcverse causa | . ' y, a ooJy
vcight incrcasc docs not i ncrcasc body hcight. Thi s means | ua| ho1 t '
weight is thc dcpcndcntvariabl c(y variablc) and OU\ t/_hl . s | ' | c. uu
pcndcnt variablc (x variablc). By convcnti on, thc dcpendcul va| . . | o| c i s
p| accd on thc y-axi s and t hc indcpcndcnt x-vari ab| e i s p| accd ou | uc
axi soIthcscal l er pl ot . as shown i n Fi gure 2 . 29.

| . . | j| i .
1 Zb
1 Z0

1 1 b
1 1 0
1 0b
O 1 00


60 L

7b U

70 0

1 b0 1 bb 1 60 1 6b 1 70 1 7b 1 60 1 6b 1 90 1 9b Z00 Z0b
Hei ght ( Measured in Centi meters)
Figure 2. 29 Scatter Plot for the Relationship between Height and Weight
Line graph
1hc rclationship bctvccn thc variab|cs body height and body weight is
c| carto scc inthc scattcrp|ot. ta| |crpcop|carcindccdhcavicr. Hovcvcr,
|hi s cxampl c i s rclativcl y clcar-cut and thc rc|ationship i s quitc strong.
Ccncral l y, such strong rclationships arc rarc in thc socia| scicnccs, rcn-
Jcr. ng scattcrp|ots di Hcul tto rcad and intcrprct. |or cxamp|c, thc rc|a-
| i onship bctvccn thc number of hours someone watches television and
age i sshovnin|igurc2. 3O. |n communicationstudics, iti shypothcsizcd
thatol dcrpcopl ctcnd tovatchtc|cvisionmorcthanyoungcrpcopl c, but
thi s isnotobvious i nthc scattcrpl ot.1hisi sbccauscthc rc|ationshipbc-
tvccn agc and tcl cvision vatching i s rc|ativc|y vcak and al so bccausc
cach obscrvation i sdcpictcd scparatc| yin a scattcr p| ot. Whcn thc aver
age hoursoIvatchingtcl cvisionarcshovnIor cach agccatcgory,amuch
c| carcrpicturcariscs. Suchapl ot i scal cd a/nc graph, vhi chi stypical l y
bcttcr suitcdatgaugingstati stica| rc| al ionsu i rs l uau arcscattcrpl ots. |or
cxamp|c, |i gurc 2. 3 l shovs l ual o| Jc|p. op| c . . J.cJ l LnJ lovatch morc
tcl cvisi on than youngcr pcop| c Jo. | . spc ci . d | y a | t| age 55 thc avcragc
ti mc spcnt wal cu. ug TV . . . |c. | s.s su. . p| v | | . wc vc, .l . s impossi bl c to
concl udcl u| s ||om | . ju . . . H . v. . t l l t l l ) '. l l | Ol l p| ol susc l uc cxacl samc
samp| co|o|sc|v. . l . ous'
l )o:;cl ipllvo Slnll : ; l lw


20 30
4 0 ! l( ) l i ( )
Age ( M HH| i | . . . Yt i H )
4! |
/ | |
Figure 2.30 Scatter Plot.for IHC tlOltOOS/ll] //i | t| f , . , . rt| / | | tl 11 1 1 1 1 / I



b L
: U



1 6 Z3 2fI 33 3h 43 4h b3 b6 63 Uh
Qt (Me | SH| O || Years)
Figure 2.3 1 . l/l f I/ t t 'l/ l / lt ' ` ' t//t t/ \ /l l] hc ' fl l 'l '< 'l l Age ( // 1( / UOlt//t/l :

| | | . i | t .
2. 5 Summary
We summa|. zc | u. s ci l ; 1 ptc r ' s t' ullkl l t scucmal . ca| ' y . | ' 1 ': 1 hks l . .L auJ
2. 33. |or any g. vcu u | casu|c . . ' I l l | vc' , ouc o| mo|c su. l ao' c g|apuica'
andnumcri cal dcsc|. p| . vc | oo' s: 1 r pr\scut cJ. |or bi var| a| c |c| al | onsui ps,
1ab|c 2. 33 rcports g|apu. ca| Jcsc|. p| . ous onl y. Bccausc numcrica| dc-
scriptions oIbivaria|c |c| al . oi | su. ps a.c o| Icu gcncra|i zcdto a popu|ation,
vcvi | | di scussthcsc in| uc ucx| cuap| c|on inIcrcntialstati stics.
Table 2.32 Descriptive StatisliIs /or u 'iu/c Variable (univariate)
Numerical description
Center Variability
ment level Graphi cal description
Bar Chart
Frequency Table
Nomi nal
Pi e Chart
Mode (when number of
categories is smal l )
Frequency Table
Bar Chart Mode*
(when number of
Ordi nal
Box Plot Medi an
categories smal l )
l nterquartile range
Frequency Table*
Box Plot (when number of
Hi stogram Mode* categories is smal l )
I nterval/Ratio Stem-and-Leaf Plot Medi an* Range*
(when observations are Mean I QR*
l i mited) Vari ance
Standard Devi ati on
* Thi s description i s general l y only used 1f other measures fall short, for Instance
due to extreme skewness or to (extreme) outliers.
Table 2.33 Descriptive Statistics (graphical) for Two Variables (bivariate)
Dependent I ndependent variable (x)
variable (y)
Nomi nal Ordi nal I ntervai/Ratio
Nomi nal None
None None
Box Plot
Ordi nal (When categories are
Box plot l i mited)
I nterval/Ratio
Scatter Plot
Line Graph
Thcprcvi ouschaptcraddrcsscddcscri pti vcs| a| . s| . csq l ual . s. | ucg|apu| . . . |
and numcrica| dcscription oI quantitativc da|a. l u l c|Cul ia l s| . | | . s| . cs po.s
onc important stcp Iurthcr. bascd on Jal a ||om a |auJom s. | u | p' ` gVI I l ' l
a| i zationsarc madcaboutthcpopu |a| . ou h+ m wi l i c i l l hL' s 1 . 1 pk i . d r : r w1 1
(sccli gurc 3. l ). lori nstancc, l|om ca| c. | a| . . 1 1 1 l l i L' : i l l ' ul 1 1 1 dl \' h l 1 !
a|s i narandom samp|c, gcncra | i zal . ousc1 11 h ' 1 1 1 : Hk . 1o. . t i H' 1 1 1 1 ' 1 1 1 1 i j ' t
i nthcpopulation.
Figure 3. 1 Generalizing Outcomesji-om u .\utiJ. / . | / ' itl t tt tt
A statcmcnt | ikc ' Morcthanha|IoIal | rcsponJcu| s a|c ovc|4) . . |s o| J`
| sa dcscriptivc stati sti c, a parti cul ar charac|cr| st| c o|| ucJa| a sc| . s si l l .
p|ydcscribcdvithout Irthcr gcncral i zati on. On thc cou| |a|y, a sl al cmcu|
| ikc ` Bascdonarandomsamp|cIrom2OO?, 25to3Opc|ccu| o|l ucDut ch
pcoplc smokc` , rcsults Irom i nIcrcnti al stati sti cs. 1ocorrcc|l y |'U l ucsc
gcncral i zi ng statcmcnts, somc thcorctical knov|cdgc oIstat . sl ica l . u | |-
cncc i s rcquircd. 1hi sthcoryvi l | bccxcmpl i Icdbc|ov usingJa| a ||oma
census (i . c. , aproccdurctocol lcctdataIromthc entire popu' a| . ou ).
Lnti| l ? l in thc cthcr|ands, it vas customary to conduc| ccus. . scs
vhcrc cach and cvcry i nhahi |an| had scvcral pc|sona l va|. ao| cs aoou|
| ucmco' ' cc| cJ, sucu. . s`i, ;f .< '. . . uJ Auri/u/ 'lu/0. I u | uc I X99 ccusus,
l uc mcau agc l i l l : . | | . | 11 1 i l l io11 I ) l l t dt w. . s 27. 1 yca|s auJ l uc s| auJa|J
Jcv. al . ouo|| i l t' t ' dl l: l l i l I | l I l l l l l ll l l l'd . . | | ohl' 20. (l yca|s. Such cua|ac-
l c|. s| . cs ( ) i ' t i ll' j ll l j l l d, i l l l ll l l , d l \ - d f lt l/ t l/ 1 / t ' ft / '.\' . |L' gt: l l CI '< I I I y i ndi L i l tl' l l
us i ng ( ! reek ! el l 'I ' S. Tl w l l l L ': I I I 1 1 1 : 1 popul : 1 t i on is i 1 1 d1 \ n l nl i i J| | j j i ( pr
nunci at i on: mu (as i n ii lls
i t } ) : 1 1u l I l l st a1 1 dard devial l l ll l 1 .' l l l l l t c: t t nl us-
i ng l ( si gma ) . Fi l H . \ . _ sl l uws t h age di st ri but i on 1 '1 ut l l I X' N, and
popu|a|ionparamLl crs j : 1 1 t d .

1 40000
1 20000
1 00000
Mean Age ( I) 27. 1 years
Standard Deviation (o) 20. 6 years
1 0 20 30 40 50 60 70 80 90 1 00
Figure 3.2 Age Distribution (Age 0 - I 01) in the Netherlands in 1 899
(source: CBS, http://statline. cbs. ni/Stat Web/dome/? P, theme: population)
Central Limit Theorem
Novadays, duc to high costs and strictprivacy | cgi s| ation, i t is a|most
i mpossi bl c to ho|d a c| assica| ccnsus i nthc Ncthcr|ands. Hovcvcr, i t is
sti | | possi b| cto gainknov|cdgcaboutthccntircpopu|ati on. |n stati sticsit
i s not rcquircdtoknov,Iorcxamp|c,thcagcoIcachandcvcry i ndivi dua|
in a popu|ationtotc| | thc mcan agc Ior that popu|ati on. lnstcad, a rc| a-
ti vcly sma|| samp|cvi | | provi dcavcry good approximation oIthispopu-
1o i | | ustratc that a smal | random samp|c can indccd achicvc thi s, a
thought cxpcrimcnt is dcscribcd bc|ov. Supposc that in l o a simp|c
random samp| c oI | ,OOO rcspondcnts vas dravn Iromthc popu| ation oI
5. l mi | | i onDutchpcop| c. 1hcqucstion oIi ntcrcsti swhat is the mean age
for all people in that sample? Gi vcnthc mcanagc i nthc popu|ation Irom
l o ( i . c. , 2T. | ycars), i ti shi gh| yimprobab| ctha| thi svou|d havcbccn
bc| ov | O ycars. Suchan i mprobab| csamp| cvou|dhavcconsi stcdoIprc-
dominant|y young ki ds. 1hi s i n |un vou|d i mpl y that i n thi s samp| c oI
I ,OOO rcspondcnts randoml y drawn l'ro m t he popul at i on oI 5. l mi | | i on
pcop| c, hardly any adu l t s were sckckd. Thi s is qui t e un| i |e|y hccausc
I l l re was ahot l l : 1 l t l t y l t l t y L'l t : I I I Cc tltal : 1 Dut ch pcrso1 1 you1 1 gcr t ha1 1 2 |
w: t s randomly sl kvkd I Hll l l t i l e I X9<) popul at i on ( i n |X99 2. 35 mi l l i on
Ol l l or 5. | mi l l i ol l I )ut cl l pcopk were younger t han 2 | years ) . 1hc prob-
: t hi l i t y t ha t no person ol ' a t | casl 2 l ycars o|agc vas sc|cctcd aItcr | vc
onsecut i ve J|awscqua | s2. 35/5. O l * 2. 35/5 . O | * 2. 35/5. O | * 2. 35/5 . O | *
2. 35/5. O| . O2, vhich is a chancc oI on|y tvo pcrccnt' Gi vcn thc agc
di stribution in thc popu| ation, it is most | ikc| y that quitc a numbcr oI
aUu||s vi | | bc rcprcscntcd i n thc saaip| c. Although thc mcan agc i nthc
samp| cvi | | gcncra| |y not bccxact|ycqua|to thc mcan agci nthc popu|a-
t i on, iti shi gh| yun| i kc| ythatthcmcanagcvi | | bcmuch| ovcrorhighcr.
To dctcrminc vhich samp|cmcans (notation. x ) mayrcsu|tIrom a ran-
dom samp|c oI l ,OOO Dutch pcop| c, thc thought cxpcrimcnt is cxtcndcd
| urthcr. 1histimc,assumingtimcandmoncy is i nnitc,vcdrav | OO,OOO
random samp| cs Irom thc | o popu|ati on, cach consisting oI l ,OOO
rcspondcnts. Ncxt, thc mcan agc in cach oIthcsc samp|cs is ca| cu| atcd,
hcncc rcsu| ting in l OO,OOO mcans. Ising statistica| soItvarc such as
SPSS, this thought cxpcrimcnt is casy to pcrIorm gi vcn |i gurc 3. 2 and
t hcrcsu|tsoIthi s arcprcscntcdi nligurc3. 3.
E( x ) " 27. 1
OX " . 65
l-T -
25 26 27 28 29 30
Mean age i n sampl e
Figure 3.3. `lu /u / //r/lv/t/ou ]r 1:uu c (1 00, 000 Samples,
l , ||| uUi t 'iUu:l\ t ,ciiilt)
|. gu|c 3. 3 suows I | l di S t l l hu l l l l l l t ) l " t h 1 00,000 l l l ' i i i i S t or : q t , l l' St i l t i 1 1 g
Irom |00,000 |auJou s: t 1 npks. Thi s di s t r i but i on . sc; t l kd : r \i iiiiluit OtS-
tribution. | ntc|cs| i ug| y_ t h . l ) VL' ra l l l l l t: : t n o|a| | I 00,000 S: l l npk ucaus . s
a| most idcntica| l o l uc rea l 1 1 1 a gL i n l hc popu| al . ou . u I W)9 27. l
ycars ' 1his is no coi nci dcu. , i ua| u. ma| i ca| | y. thc ovcra | | mcan oI all
possibl c samp| c mcans ( . uJ. ca| cJ w. | u l(x )) cqua| s thc mcan in thc
popu| ation ( () exactly. Thcic ' |c, . u s| al . sl . cs i t is said that thc samp| c
mcan i s an unbiased estimator o| l uc popu| a|ion mcan. |urthcrmorc, an
intcrcsting rc| ationship cxis|s ocl wccu l uc ori gi na| standard dcviation (a
2O. , scc |i gurc 3. 2) and thc s|andard dcvi a|i on oIthc samp| i ng di stri-
bution oImcans (a x ). 1his standard dcvia| i on a x appcars to cqual a /

n. 1hus, i n our cxamp|c thc standard dcviation oIal | samp| c mcans i s

. 5(cal cul ation. 2O. /- l,OOO).
Wc vou| d| ikcto strcssthat thc standarddcviation (a) oI 2O. ycars i s
roughl ythcavcragc distancc oIan individual 's agc tothc ovcra| | mcan
(2?. l). 1hc standard dcvi ation(a x ) oI.5 i nIon1s us about thc avcragc
di stancc oIa randomsample mean to thc ovcral | mcan (again, 2?. l). 1o
avoi d conIusion, thc standard dcviation rc|atcd to thc samp| i ng distribu-
tion(a x ) is notca| l cdastandard dcviationbutastandard error.
Notc that thc standard crror (a x ) is much sma| | cr comparcd to thc
standard dcviation (a). 1his i s duc to thc rcp|accmcnt oIa| | 5. l mi | | i on
obscrvations (scc |i gurc 3. 2) by thc mcan agc oI l , OOO pcop| c Irom a
random samp| c (scc |i gurc 3. 3) . An i ndi vi dua| ` s agc i n l varicd bc-
tvccn O and l O l ycars, and this variabi | ity rcsu|tcd in a rclati vcly |argc
standard dcvi ati on oI2O. ycars. Duc to thc dcsign oIa simp|c random
samp| c oI I ,OOO i ndi vi dua| s Irom a population oI5. l mi || ion, thc prob-
abi| ity oIcxtrcmc samp| cmcans | i kc O and l O l is vcry sma|| . As such,
thc possib| c samp| c mcans arc | ocatcd morc closc| y to thc popu| ation
mcan comparcd vi th thc i ndi vidua| scorcs, vhich rcsu| ts in a rc|ativc|y
sma| | standardcrror(a x ) comparcdtothcstandarddcviation(a).
1hcmoststrikingIcaturc i sthcshape oIthcsamp| ingdistribution(scc
|i gurc 3 . 3). 1his c| osc| yrcscmbl csthc symmctrica| and hi l| -shapcd di s-
tribution shovn in |i gurc2. 2?. Morcprccisc|y, thc sampl ingdistribution
rcscmb| csthc normal distribution!
1his is quitc rcmarkab|cbccauscthc
shapcoIthcagc di stributionvasnotnorma| l ydi stributcd at a| | (scc |ig-
urc 3. 2). Ccncra| | y, a samp| i ngdi stributiontcndsto rcscmblc thcnorma|
distribution i rrcspcctivc oI thc shapc oI thc origina| di stribu|i on Irom
vhichthc random samp| cs arc dravn. 1his is knovn as thc central limit
theorem, vhich gcncra| | y app| ics lo |auJom samp| cscous. sl . ugol3O or
morc ooscrva l i ons. Thc | a|gc| | hc number ol " ohserv. t t . ous in a samp| c,
thc mo|c l hc samp| . ug J. s| | . ou| . o . s rL'SL' I nhks t i l l ' l l ormal J. s| |. ou| . ou .
W. | h a sampk col l l a i l t i ng | 5 t o .) 1 ) i i |. t i v: r l l l l l l '> I l l \ ' s: l l l l pl i r l )., di st ri but i on
l 1 1 l t r onl l : i l | | | .| i i
. s ou|y approx i n1 : i l dy o . . . | | y J. s| | . ou| cJ . |
| uc J. sl | . ou| . ouo|

| uco| g. -
ua' va|. ao' c . s syl l l l l l l: l ri ca l ( au a ' mos| cqua| u umoc| o| oosc|val . ous l o
l uc | c|t and t o t he |. gul o|l hcmcan) Wi th cvcn sma| | crsamp| csi zcs(2
| 4) , thc ori gi na| variab| c shou| drcscmblc a norma| di stribution to gcncr-
al ca ( c| oscto) normal samp| ingdistribution.
Confdence Intervals
Whcn thc samp| ing di stribution oIthc mcan is approximatc|y normally
di stributcd (scc|i gurc 3. 3), thc position oIcxtrcmc hi gh and lov mcans
canbc cas i ly ca|cu| atcd. |orcxamp| c, 5oIal l possiblc sampl cmcans
arc |ocatcd at a maximum di stancc oI2 ( morc prcci scl y. l . ) standard
crrors tothc | cII andtothcrightoIthcpopu| ationmcan ((.). 1hcva| uc2
is az-scorc (sccscction 2. 3 . 3), a|though in thc casc oIsamp| i ngd. st |. ou

tions, thc tcmz- value is morc Ircqucnt| yuscd. ln thc samp| ing d. s| | . ou-
tion oIthcmcan agci n l , 5 oIal | samp| cmcans a|c ' ocal cJ oc-
tvccn 2?. l 2 * .5 - 25. and 2.4. Bccausc a norma | di st r i but i o1 1 i s
symmctrica|,2. 5oIal l sampl cmcans | i cbc| ov 25. 8 wuc|cas 2 . .'Y, a rL
|ocatcd abovc 2. 4 (scc thc grcy arcas i n |. gu|c 3 4) | u o| uc| words_ oi '
cach l ,OOO sampl cs, approximatcly 25 samp| cs w. | | uav.. a 1 1 1 e: 1 1 1 i i ) ' ,L'
|ovcr than 25. and approximatc|y 25 s8mp| cs vi | | have a l l l t:al l agl'
hi ghcrthan2. 4.
z-val ue - -2
2 * a x
25. 8 27. 1
p 27. 1
a x . 65
z-val ue +2
2. 5%
28. 4
Figure 3.4 The Percentage ofSample Means outside -2 and outside + 2
Standard Errors from ]I in a Normal Distribution
Sad| y, ourt hought cxpc|. mcu| . s uo| rca | . sl iC_ asi tvou| dcostaIortuncto
J|aw I 00,000 sa1 npl s i 1 1 ordt: r |o | uJ| uccxacl popu' at . ou paramctcr Ior
t he mean a c. hl l l t l l l nl dy, Ol l l ' si 1 1 t pk raudom samp| c su uccs bccausc
sc. cu| . s| s g ' l l l ' l ! d l y l l l l ' I I PI l l i l l ' I L' Si l' d i l l l l i c ex: r cl va | ucS o| popu' al . ou
l i i i J i l t | !
pa|amcc|s, ou| sc| | | | t | vc . y i good app|ox . u| . | ' i ous 1 1 1 dc: ul '. . |p|. s-
. ug' y, ou| youc|c| a| . vc | y sn. l l r: t 1 ul o1 n s. unp' cs u|

| ccst p . . .
' . . c vc | ' s '
lmaginc l ual ||om | uc I 00, 000 saup| cssuowu | u | . pu| ` ' . ' , | us| ouc
simp|c random samp| c . s J|awu h+m | uc popu' a l | ou wua| . s luc cx-
pcctcd mcan agc i n | hi s samp| c' Vcaus hc| ow 25. 8 aud aoovc 2. 4 arc
hardly to bc cxpcctcd, Fi guc 3 . 4 Jcmousl|atcd that thc chancc oIthis i s
on| y 5. 1hi s mcans thal l uc cuauccs o|| 1ndi ng a mcan (x ) bctvccn
25. oand28. 4(2. | 2 * . 5 i svc|y | a|gc. 5( l OO 5) .
Cl car|y, thc distancc oI|hc popu| a|i on mcan (1) t oa ccrtain sampl c
mcan (x ) i s cqua| t othc distancc o| |hat spcci c samp|c mcan t o thc
population mcan. 1hcrcIorc, it is a| so corrcct to statc that thcrc is a 5
chancc that a samp|c vi | | bcdravn in vhi chthcpopu|ation mcan (1) i s
|ocatcd i nthc intcrvalx 2 * a x . l n|i gurc 3 5, vcca|cu|atcd suchi n-
tcrval s ca||cdconfidence intervals (orCl) - Iromthrcc sampl cs. | nthc
Irst samp|c, thc sampl cmcan agc is 25. oycars. 1hc condcncc i ntcrva|
thcn cqua|s 25 o 2 * O. o5 (24. 5, 2. I ) . 1hc mcan agc in thc popu|a-
tion (2. | ) l i cs ustvithin this intcrval 1hc samccan bc saidIor thc i n-
tcrva| associatcdviththcsccondsamp|c thcmcanagcoIvhi chis2o. 4
ycars. 2o. 4 2 * O. o5= (2. l , 2. ). 1hismcansthat every samp|cvhcrc
thc samplc mcan agc is bctvccn 25. o and 2o. 4 (thc grcy arca in |i gurc
3. 5) hasaconIdcnccintcrva|inc|udingthcpopu|ationmcan oI2. I ! 1o-
gcthcr,thcscsamp|csconsti tutc 5 oIa|| possib|csampl cs. 1hcrcmain-
ing 5 vi | | havc a 5 conIdcncc intcrva| excluding thc popu|ation
mcan oI2. | . lorcxampl c, thcthi rd samp|c (x 2 l ) bc|ongs to thcsc
5asthcconIdcncc intcrval is2. | 2 * O. o5 (2. o, 3O. 4) .
1hc crucia|conc|usion Irom li gurc 3. 5 is that vithalmost l OO ccr-
|ainty(5tobcprccisc), vc vi | | drav asamp|c in vhichthc popu|ation
mcan (1) is |ocatcd somcvhcrc in thc intcrva| x 2 * a x . 1his mcans
|hatoIcvcry l OO samp|cs, an avcragc oI5 samp|cs hol dsa 5 conI-
dcncc intcrva|that inc|udcsthcpopulationmcan. | nothcrvords,thcrcis
a rathcr s| i m chancc ( i . c. , 5) that vc vi | | drav a sampl cthat docs not
i nc| udcthc popu|ationmcanin its 5 CI. OIcoursc,onccou|dchoosca
vcry |argcconIdcnccintcrva| . |orcxamp|c, vc cou|dcasi|ystatc thatvc
arc |OO condcnt that thc mcan agc i n thc popu|ation - vhich is
norma||y unknovn oIcoursc - i n | o vas somcvhcrc bctvccn O and
l O l ycars. Hovcvcr, a|thoughvcarc l OO condcntthatthi sis truc, i t
docsnotprovidcuscIul i nIormati on. Wccou|dhavcsaidcxact|ythc samc
thingvithoutdravinga samp|c, andvc cou| d havc donc so comIortab| y
vithoutanyknov|cdgcoIstati s|i cs.
l nl t l l l l l i l l nl : ; t . l l l : . t l t : :
25. 8
28. 4 29. 1
L _ - 0. 65
Sample si ze = 1 , 000
= Al l sampl es
(95% of total )
where p i s wi thi n 95% Cl
= A sampl e
(from 5% of total )
where p i s not withi n
95% CI
= sampl e means
Figure 3.5 The 95% Confidence interval.
(Cl) OOU llit ' l 'i ii :lit t i i
Parameter (Mean Age (f) = 27. 1)
Hovcvcr, itis a|sonotdcsirab|ctohavcrc|ativc|y |ovcou| 1dcucc | cvc' s
instcad. | maginc, Ior cxampl c, arandom samp|ci nvhi chrcspondcn|s on
avcragc arc2. ycars o|d. According to |igurc 34, this samplc mcan is
quitcp|ausib|c. Statistica| thcorydictatcsthatthcbordcr| incs (ca| | cd con
fidence limits) oIthc4O conIdcncc intcrvalarc |ocatcdatabout.5 stan-
dardcrrorsIromthcsamp|cmcan. 1hismcansthatvcarc4OconIdcnt,
that thc popu|ation mcan is |ocatcd somcvhcrc bctvccn 2. 4 and 2o. O
(ca|culati on. 2. O. 5 * . o5). 1his statistica| statcmcnt is quitc intcrcst-
ing Ior it narrovs thc intcrva|, but thi s timc it is rathcr qucstionab|c
vhcthcr this narrov intcrva| ` capturcs` thc ( unknovn) popu|ation mcan.
Rcca| | that in a 4O conIdcncc intcrva|, thc popu|ation mcan vi l | bc
vi thi n this intcrva| approximatcly 4O out oI I OO samp| cs. otc that thc
samp|c vith a mcan agc oI2. docs not bc|ong to thcsc 4O samp|cs
(40-C| (2. 4, 28. 0 , vhi |c L 2 . | ). lngcncra|,oncvantstoarrivcat
|al hcr ua||ow cou | Jcucc . u| c|va| s vithou| |osi ng too much ccrtainty.
Tu| s|cs u' | s | u ' |cquc. | | | y uscJ |vc' so|

cou | Jcucc o | 0 lo. |ina||y,

i | . s | u| c|cs' . u t o . ' c l l t : 1 l l k l l lL'di : t | yp. ca | ' yJo uol |cpo|l cou |:dcucc
| u| c|va | s vuL1 | | | t ' l t ' . t t l t , o| .. . . |s : l rL' suowu. A ucwspapc| ucad| . nc
Ucc| 1 | . u l kt l I | l ' I kt l l i l\' 1 : t l : YY 1 1 1 | vosc. . | s` . s: c l t t : l l l y d l ' , l l l i l . d d! wi t 1 1
no cou | Jc. | cc i n l crv: i l i s t qH H i nl asV | l ( o| cou.sc, X\ nSSI I I I H l l t : t l e| ec-
l ou po| | s a|c hascJ 01 1 r a 1 1 do1 1 1 s: 1 1 npks, o| uc|w. sc | uc l i L' : t dl t l ll' . s ' u
vorsc|han ` d spul ah| c` i .
Testing Hypotheses
1hc prcvi ous sccti on dcmousl |a| cJ l ual vi l h ust onc simp|c random
samp|c, high| y condcnt slal sl . ca| p|ohah | ly statcmcnts can bc madc
about an unknovnpopu|ation pa|amcl c|. Si mi | ar|y, it is a|so possi b| cto
tcst assumcd va|ucs oIa ccrtai n popu| a| on paramctcr. ln socia| scicnccs,
as apointoIrcIcrcncc, itisIcqucnt|y |1rs|assumcdthatparamctcrs cqual
O. 1his is ca||cd thcnull hypothesis (no|a|ion. H0), vhicha|vayscontra-
dicts or opposcs thc rcscarchcr` s thcorcti cal cxpcctation. lor cxamp|c,
vhcn thc l i Iccxpcctancyi nluropcis hypothcsi zcdtohavcriscnovcrthc
ycars, H, statcs tha| lhis dcmographica| proccss did not takcp| acc (i . c. ,
cqua|s O orhasnotriscn). 1hc nu| | hypothcsi sdocsnotncccssari | yhavc
to bc takcn | itcral | y, onc cou| dhypothcsizc ust as vc| l that thc | i Ic cx-
pcctancyroscbymorc than l ycar. | nthi scasc, Ho statcsthatthc | i Ic cx-
pcctancydi dnot riscbymorcthan l ycar. 1hccountcrpartolthcnu| | hy-
pothcsis is ca| | cd thc research hypothesis or thc alternative hypothesis
(notati on. Ha) and is Ircqucnt|y dcrivcd Irom ncv sci cnti Iic insi ghts or
Irom ncv (or o| d) thcorics. Quitc oItcn H is di rcctiona| , vhich mcans
thata popu| ati onparamctcr is said to bc cithcr | argcr or sma| | crthan thc
va| uc cxprcsscd in H0. | n non-dircctionalrcscarch hypothcscs, thc popu-
| ation paramctcr is said todeviate Irom somcva|uc. | nscicncc i t is stan-
dard to ri gi d|y tcst thc a|tcmativc hypothcsi s, rcquiring vcry convi ncing
cvi dcnccbcIorcHisacccptcdandH,rccctcd. 1hismcansthatthcstatis-
|i ca| rcsu|ts havcto rcndcrthc nul | hypothcsis high|y impl ausib|c bcIorc
rc ccting it. 1o dctcrminc hov imp|ausib|c H,i s, thc conIidcncc intcrval
( C| ) can bc uscd. Whcn thc conIdcncc i ntcrva| docs not i nc| udc thc
popu| ationval ucstatcd i nthc nu| | hypothcsi s, onccan saIc|y saythatH,
is probably vrong. 1o bui|d a strong casc against H,, a |argc conIidcncc
intcrvalshou| dbctakcn(typi ca| l ybctvccnOand).
A| though usi nga CIto tcsthypothcscs iscntirc|yappropriatc, it isnot
oItcn uscd thisvay. 1hcmost popul artcst stratcgy uscs thc level ofsig
nicance (notation. a). Wc vou|d | i kc to strcss, hovcvcr, that both
mcthods oItcsting arc idcnti ca| and | cad to thc samc conc|usion. Com-
mon l cvc|s oI signi hcancc uscd a |c I on;, , S'Y, and l , and constitutc
rejection areas - | uc u u | ' uypol ucs. s s |c( cclcd vhcn thc samplc rcsu|t
Ia| | s i nto thc |ccc| ou a|.a 1uc |al . oua| c hc ug l hat it s u gu| y i mprob-
ab| c | hal l uc popu' . i | | o . p: t r: l l l l l' l ` .s cqua| |o l uc va| uc uypo| ucs zcd in
t1 f l ll il ! ! |
I ucnul l uypt l | uL si s l l 1 1 ' t. | t| pob: t h | . | y ( a | so | uowuas' p-va| uc' o|| us|
p` i can he ca | ul : l l l' d Wi l l ! : t ny sl a| i sl ica | soHwa|c pac|agc. 1u s p-va | uc
cau hc . | uc|ouc-| a. | cJ o| | wo-l a | cd. 1hconc-|a| | cdp-va| uc is |hc prob-
ah. | l y l ua| l uc samp| c |csu' l, o| an cvcn morc cxtrcmc samp| crcsu|t, i s
| t uud vhi | c Ho i sassumcd t obc truc. 1hc onc-tai|cd p-valuc is a|vays
uscd |o |cst dirccti onal a|tcrnativc hypothcscs. 1hc tvo-tail cdp-va|uc i s
l w. cc thc si zc oI thc onc-tai | cd l cvc| and i s uscd vhcn H is non-
Ui rccti ona| . Onccthcp-va| uci scalcul atcdi tcan bccomparcdtothclcvc|
o|si gnicancc.
Whcnthc onc-tai | cd p-va| uc is less than or equal to thc | cvc| oI si g-
ni Icancc, thc nu| | hypothcsi sis rccctcdand thc di rcctiona|a|tcmativc
Whcn thc tvo-tai l cdp-va| uc is less than or equal to thc l cvc| ols g
ni Icancc, thcnu| | hypothcsi si s rccctcd and thcnon-di rcct ona | a | l c|-
nativchypothcsisis acccptcd.
1his can a|sobcsummarizcd i nsymbo| s.
When P one-tailed :: ' - Ho 8, H" (d|ec| ou' ;

When P two-tailed - ' - Ho 8, H" (non-di rectional )

Moststati sticalsoIvarcpackagcs, | i kc SPSS, prcscntp-valucsCxp|csscd
as proportions (rangc O- l ) instcad olpcrccntagcs (O- l OO ) . 1hcrcIorc,
thc l cvcl soIs igniIicanccvi | | bcprcscntcdproportiona||y Iorthc rcmain-
dcroIthis book, Iorcxamp|c. l O instcadoIl O.
1o i | | ustratc a hypothcsis tcstusi ngp-valucs, vc rcturn to thc mcan
agc in thcycar l . Bctvccn l and l 3O, hca|thcarcandvorkcon-
ditionsimprovcdgrcat|y.1hus,ourdircctiona|altcmativc hypothcsis( H)
i sthat duc to thcsc i mprovcmcnts, | i Ic cxpcctancy rosc, as didthcmcan
agci nthcNcthcrl andsduri ngthispcriod.Convcrsc|y,ournu| l hypothcsi s
_n,)i s that thc mcan agc i nthc Ncthcrlands di dnot risc bctvccn l
and l 3O. |n othcrvords,accordingtothcnu| | hypothcsi s, thcmcanagc
in thccthcr|andsvas sti | l 2?. l ycarsi n l 3O.
Supposcthat i n l 3Oarandomsamplcconsi sti ngoII ,OOO i ndi vidua|s
vas dravn f|om | ucpopul ati on. Bccausct hi ssamp| c i s| argc cnough,thc
ccntra| | i mi l | u.o|cm app| i cs and thc rcsu|ting samp| i lg di stri bu|ion vi | |
lhusbc( app|o . . . . . | cy)norma| | ydi stributcd.
Thc s. t ual . oudcsc|. ocd . u H, .s assumcd l o oc l |uc u u | ' co. | us. vc| y
Ial si cd. 1hcrcIorc, thc mcan oIthc samp| ing d. st |. out . ou s assumcd to
cqua| 2. l ycars (thc popu|ation mcan in | o). lurthcr, i t is assumcd
thatthcstandard crroroIthissamp| ingdistribution is . o5, vhich is bascd
on thc standard dcviation in thc | o ccnsus and thc sizc oIthc l 3O
samp| c(ca|cu|ation. 2O. o/

l , OOO). l n scction 3. 2. l , vc vi | | shov hovto

tcst a hypothcsis vhcn thc standard crror oIthc popu|ation mcan is a|so
unknovn. 1hc nu| l hypothcsis assumcs thatany random sampl c ( inc|ud-
i ngours)oI| , OOOindividua|si spartoIthcsamp| i ngdistributionvi th 1 =
2. l and a x = . o5. Novassumc that in thc samp|c thc mcanagc is 2o. 4
ycars. 1his clcar|y cxcccds 2. | , vhi ch Iavors our dircctiona| hypothcsis
that thc mcan agc rose bctvccn l o and l 3O. 1hc qucstion rcmains,
hovcvcr, vhcthcr thc di IIcrcncc bctvccn 2. l and 2o. 4 i s si gni Icant|y
|argc cnough to rccct H,, bccausc i t is possib|c thatthc samp|c mcan oI
2o. 4 is containcd inthcsamp| i ngdistributionvi th 1 = 2. l anda x = . o5.
l ndccidingvhcthcrornot t orccctH, ,vcmustknovthisprobabi | ity(p) .
lirst vchavc t oca|cu|atc thc numbcr oIstandard crrors that | i c bc-
tvccn 2
o. 4and2. | . Assuming a norma| sampl ing di stri bution, thc rc|a-
tivcsharcoIa| | samp| cmcansthatarc cqua|toorcxcccd2o. 4canbcca|-
cu| atcd using az-val uc. 1hc z-valucappcarsto bc 2 (calcu| ation. (2o. 4
2. l ) / . o5 (scc cndnotc 4)). According to thc cmpirica| ru|c (scc pagc
44), thc onc-tai |cd p-va|uc is about 2. 5 or . O25. Ising thc Iormu| a Ior
thc normal di stribution (scc cndnotc ), thc cxact p cqua|s . O22o. 1his
|csu| t . s Lompa|cd to thc |cvc| oI si gni Icancc (a) sc|cctcd by thc rc-
sca|cuc| oc| o|c thc hypothcsis tcst. With a = . O5, H, is rccctcd and thc
oi rcc| i onal al | cnativc hypothcsis (H) statingthatthcmcanagc in l 3Ois
h ghcr | hau in l 8 is acccptcdbccauscp is sma|| crthana ( . O22o< . O5),
scc |. gu|c 3. 6. Convcntiona| |y, thcmcanagc Iound i nthcsampl c(2o. 4)
.s sa. dl oocsigniicantly | argcrthan2 . l .
Rejection area
(a=. 05)

One-tailed probabil ity
(p=. 0228)
/7. 1
Sampl i ng distributi on
/( normall y distri buted)
2 * a x = 1 . 3
l 28. 1
Figu tc 3. 6 Testill,! O / / r t Jotfll '. l' /, 1' \ \ 'l tll f ' r r t lllr ' \ r tl lr f t l
| | i | t| u| i | i. i | l . l |Sl |c.: | l
1'c ' cvc| o | s. gu . ' i c. . . . cc ( a) al so i ndi calcs l hc p.o0aoi | i l y l ua| | | , . s |c-
j cclcdgivcut ua| | | ,.s l ruc. This incorrcctdccision iscal l cd atype I error.
| u sc. cncc it is gcncra|lyagrccdthatthis typc oIcrrorshouldbc |cpl | oa
mr u r mum. Hcncc, thc |cvc| oIsigni cancc rarc| ycxcccds . l O. l nour cx-
ampl c vc uscd a = .O5 vhich i s thc convcntional standard. 1his means
t hat before thc tcst i sconductcd, onavcragc 5 out oI | OO timcs vc vi | l
i ncorrcctlyconc| udcthatthcmcanagcdidi ncrcascvhi l cthcmcanagc i n
l hc population actua|| ydid notincrcasc. Al tcrnativcl y,vhcnthca|tcrna-
ti vc hypothcsis is truc (i . c. , thc popu|ation` s mcan agc incrcascd) vhi | c
vcdoNOTrccctthcnul l hypothcsi s,thcnthisi sca| l cda Type 11 error.
| nour cxampl c, thcrc is a uniqucoppor|uni ty to chcckvhcthcratypc
crrorvas madc. lrom thc l o and | 3O ccnsuscs vc knov l ual thc
mcan agc i ncrcascd Irom 2. | to 2o. o. So, in thc targct popu| al . ou | hc
al |cmativchypothcsis istruc,vhi | cthc nu| | hypothcsi svas |c| cclcd us. ug
t hc outcomcIromarandom samp| c. ThcrcIorc, notypc l c||o|vas m. . J`
l hat i s, H, vas corrcct|y rccctcd. Hovcvcr, |csca|chL|s suou ' J po. . J
vi th caution,aschccksIorthcsccrrors arcl ypi cal l y uo| poss o| ` oc.. . . s.
| hc truc populationparamctcrs arc gcncral l y uu|uowu( wc J. J . . . ! | r s i 1 1
sl ancc bccatsc thc ccnsus contains thccnt i rc popu | a| . o 1 rro1 1 r wl . dr 1 v
d|Cv thc samp| c). Wcc|aboratc uponboth |ypcso| c||o| usc` 1 o. . l. |
li na| |y, Iour important pointsrcgarding hypo| hcsi s | cs| . . . uc J l u |
considcrcd. |irst,bcIorctcstingdircctional hypol hcscs . | mus| oc dr Tkl'd
vhcthcrthcsamp| crcsu|t(a|soca|| cd asample estimate) . uJccJJ. ||c|s . .
I hccorrcctdircctionIromthchypothcsizcdpopu|ationpa|amc| c|. u| | ,. |r
| h s i not thc casc, thc samp| c rcsult never l i cs vithin thC (ouc-| ai ' cd
|ccct!On arca, andautomati ca| l yrcsults in notrccctingHo!
Sccond,i nstatistica|soItvarcpackagcs| i kcSPSS, tvo-tai|cdp-va| ucs
a|c typica||yprcscntcd. Whcn tcstingnon-dircctional hypothcscs, thisp-
val uccanbccomparcddircct|ytoa. Hovcvcr, dircctiona|hypothcscsarc
' yp. Ca| | ytcstcd,sothc tvo-tailcdp-va| ucmust bcdi vidcdbytvo thcn.
Third, vhcn thc nu| l hypothcsis is not rccctcd (bccausc p - a) this
oocsnotmcanthatH,i sacccptcdtobctruc Ior i t i svcrydi Icu| t to suI-
| i cicnt|y provc that a popu|ation paramctcr is cxact|y O (or any othcr
val uc). Among othcr things, this is rc|atcd to thc | cvc| oIsi gni cancc.
Supposc a rcscarchcr dccidcs to usc a vcry sma|| a (c. g. , . OOOl) , consc-
qucn| l y, thc tcst oIH,i s so strictthatH,vi l | notbc rccctcd in most in-
s| . nccs This o|courscdocsnotimp|ythata| ov a causcsH,tobctruc!
|. ua| l y, 8cccpt . ug l l , docs uot mcan that Ha i s truc. Among othcr
I | . . ugs, l h. s .s Juc |o t he | : . L| l hal a samp' c i s uscd instcad oIthc cntirc
popu | a| . ou. | | ow vc |, . ' c. . . |c couv . uc ug| yslat cd | hal | | .s much morc
' |c' y ' ' ;u l l o, oc. . . . . , , . . l l l l l i d l l . - . | | | i c.c is s| i | | a ( sma ' ' |. s| o|comm. t-
1 1 1 1 .. | ypc | |`| | i i l
|n soc. a| sc. cuccs, | '.. c . . | wo co. umou| y uscJ s| a| s| . c. . | | cs| s ' o| pa-
ramctcrso| a s. up' cv. . |. . o' `. :l | cs| | o|a popu| ation mcauauJa | csl lora
popu| ationp|opo. ! . ou. 1| . c ' |ucwasuscJin thc prcviousscc l . outotcst
vhcthcrthcrc |s couc' us. vc cv. Jcucc | o|ccc| thc nu| | hypothcsisrcgard-
ing thc assumcd mcau apc . u | 'c popu' at | ou. 1hc |attcr is uscd to tcst
vhcthcr a samp|c p|opo|| . ou ( o | |cti on) di IIcrs Irom a hypothcsizcd
popu|ationproportion,ass| a| cJ .u | ucnu|| hypothcsi s.
3. 2. 1 Test for a mean
Scction3. | discusscdscvcra|importantprincip|cs oIi nIcrcntia| statistics.
1o avoid unncccssary comp| cxi ty, i t vas assumcd that thc popu|ation
standard dcviation (a) vas knovn. Bccausc a ccnsusvas uscd in thccx-
amp| c, this assumption vas not prob|cmatic at a||. Hovcvcr, in most
situations a vi | | bc unknovn. |ortunatc|y, itcanbcdcmonstratcdthatthc
samp|c standard dcviation (s) c| osc|y rcscmb|cs a. Rcca| | that di viding
thc standard dcviation (a) by thc squarc rootoIthc numbcroIobscrva-
tions i n thc samp|c yi c| ds thc standard cnor oI thc mcan ( Iormu|a.
a . = a /f ). Sincc a i sgcncra| | y unknovn, s oIIcrs a goodapproxima-
tion and thc Iormu| a bccomcs Sl. = s /F, vhcrc Sl[ dcnotcs thc
standardcrroroIthcmcan(Sli sshortIorstandard crror). 1hci ntcrprcta-
ti onoISL[ hovcvcris cqua|toa . Bccausca is rcp|accdhy s, vhichi s
a samp|ccstimatc, additiona|statistica|unccrtaintyi sintroduccd. Statisti-
cian Wi | | iam Cossct ( | oo- | 3), using thc pscudonym ` studcnt` ,
shovcdthatthi sunccrtainty rcsu|ts i na somcvhatbroadcrsamp|ing dis-
tribution, vhich i s knovn as thc student 's distribution, or !-distribution
(scc |i gurc3. ).

MNormal (z-)di stri bution

Figure 3. 7 t-distrilmtion und the Normal (z-)Dislrihution
1uc . u| cp

c| . . | | o . PI I v . . ' ' . . . | ' . c | J s| | ou| | ou s cqua ' |o .-va | ucs

oo| u| uJ . ca| cuow . . . . . . . v . | . . | . . J.. +o|s| | cocl wccul ucsamp| ccs| . malc
auJ| 'c 'ypo| | . cs z.' ' . . . . | . . cau( L ) .
1ucs| auJaJ c|o| s . |cy c' cmcul i n statistica| tcsts asi t mcasurcs
thc |c| a| . vc J. s| aucc o| | uc samp| c cstimatc to thc popu|ation paramctcr
statcJ in thc nu| | 'ypo| 'cs . s. l maginc, Ior instancc, that onc vantsto tcst
vhcthcrthcavc|agc uumhcroIchi |drcn oIDutchcoup|cs is lower than 2
shou|d this dircctiona| a|tcmativc hypothcsis bc conIrmcd thcn it i s
probab|cthatthcDutchpopu| ationvi | | dccrcasci nthc|ongrun.1hcnu| |
hypothcsis statcs thatthcavcragc numbcroIchi | drcn cqua|s 2 ( 1). In
thc ycar 2OOO, a samp| c oI34 Outch coup| cs shovcd that thc avcragc
numbcroIchi | drcn vas | . (= . ) vith a standard dcviation (s) oI | . 24.
1hcqucstioni svhcthcrthissamp| cmcan i sarandomdcviationIromthc
assumcdpopu|ationmcani nthcnu| | hypothcsi s, vhich is2. Stati stica| |y,
vchavcto ca|cu|atcthc probabi | ity that a samp| cmcan oIl . (or|css)
Irom a samp| i ng di stribution vith a mcan oI2 and a standard crror oI
| . 24 is dravn. l n absolute tcrms, thc samp| c mcan i s |ocatcd -. 2 |
(chi| drcn) to thc | cIl oI thc assumcd popu|ation mcan ( | . 2). Civcn
that s | . 24andn = 34, Sl. cqua| s . O45o (= | . 24/

34). 1hcrcIorc,
in relative tcrms, thc di IIcrcncc bctvccn thc samp|c csti matc and thc as-
sumcd popu|ation mcan i s 4. 5 (= -. 2 | / . O45o) standard crrors. So, thc
associatcdt-va| uci s-4. 5(scc|i gurc3. o) .

1hc na| qucstion is vhcthcr thc rc|ativc di stancc oI 4. 5 is | argc
cnough to rc cct thc nu| | hypothcsis and conscqucnt|yacccpt thc a|tcrna-
tivchypothcsis. Bccauscthc t-di stribution is symmctrica| and hi | | -shapcd
in cascs oI|argc samp|cs(sccli gurc 3.), thc cmpirica| ru| capp| ics. Rc-
ca' ' that according to thisru| capproxi matc|y. oIa| | samp|c mcans
( .\ ) | i cvithin -3 and 3 standard crrors oIJo. 1his mcans that approxi -
ma|c' y. 3oIa| | samp|cmcansarc |ocatcdoutsidcsthcsc | imits. Bccausc
thc di stribution is symmctri ca|, approximatc|y . | 5 oI thcsc cxtrcmc
mcansarctobcIoundtothc | cIt oIjq. 1hcsamp| ccstimatcoI| . is | o-
catcd i nthi sarca, Iorthcassociatcdt-va|uccxcccds -3. It isqui tccasy to
ca| cu|atc thc cxact cumu| ativc p|ohah. | | | y associatcd vi th a t-va| uc oI
-4. 5. G. vcuour h|s| approxima|ion us. ugl hccmpirica|ru|c, it is not sur-
prising to Iind this probabi | i ty |o hc vcy sma ' | . . 0OOOO3 (on our vcb
pagc vc oIIcr casy-to-usc SPSS p|og|ams |o ca' cu| a|c probabi | itics Ior
anyt-va| uc).
Hov i st that a samp|cmcan oI l . 9 | s | uJ w' ' c luc chanccs oI
Iud| ug l u. s ou| comc | s cxtrcmc|y |ov acco|J. up | o l | , , ` 1uc|c arc |vo
poss. o| c ausw :s | o | '| s qucs| . ou. || |sl | y, | '. s s puc oaJ | uc|` - hy
shccr cuaucc . . . c| . . . . sa . . p' c was J|awu. | 'cu. . ps, Juc |o cuaucc.
many | | m | | . s w | ' . . ' . | . | . . wccsamp' cJ, | uc. oy cJ c| u | 'c mcau
uumoc | o| c| . | J| . i p 'I l l l l l l i l y , . +o. . J| y, | uc n u l l hypot i iL'S I S I S i ncorrect ;
tuc | ruc popul a| i on 1 1 1 ; 1 1 1 1 . I . s t l t : 1 1 1 IlLoursc, | uc secnnd a uswc| i s
much more l i kel y | ua| t l l l t t sl t l t l . 1uc|c|o|c, thc u u | | uypo| hcsi s i src-
cctcd ( at thc . 0 1 l evel ( ) l ' si gn l l t r: l l tce ( u) ) audthcal tcna| i vc hypothcsis
i sacccptcd( L < 2)
|or i l lustrati vcpurposcs wc w. ' ' | a|c | ui s tcst onc stcp lrthcr. Ducto
thccxtrcmcp-valuc ( . 000003 ), | is a| mosl vi thoutdoubtthatthc popul a-
tionmcanindccd is lcss thau 2. 'o, ouc coul d al sotcstvhcthcr it is l css
than l . , orcvcn l css thau |` Couscqucut| y, a point vi l l bc rcachcdat
vhi chthc nul l hypothcsis vi l | uo l ougc| hc rccctcd. Atthc . O5 signi I-
cancc l cvcl , thi spoi nt i sapproxi matc| y | ocatcd atthcmcanoIl . ?. 1hc
t-valucatthis pointi s- l . ? (calcul atiou. ( l . ?-l . ?) /( l . 24 /

?34). 1his
t-val uc isassociatcdvithaonc-tai l cd p-val uc oI. O45 andmcansthatthc
nul l hypothcsiscanbcrccctcd. A nul l hypothcsisvi th anassumcdpopu-
lation mcan oI l . (or l css), a sampl c csti matc oI l . ?, and a l cvcl oI
signiIcanccoI. O5, vi l l nol ongcrrcsul ti nrccctingthcnul l hypothcsis.
One-tailed probabi li ty
(p) = .000003
( black area)
x - 1 . 79 0= 2
Sampl i ng distri bution
Fi urc 3.8 A t-Testfor a Mean (,2, . = l . ?, sl . 24,andn-?34)
| i ua| l y, vc voul d l i kc to cmphasizc that a mcan tcst using thc t-
o i sl rihution i s statistical l y corrcct onl y vhcn thc sampl i ng distribution
approximatcs a t-distribution. Wit hrclativcly largc random samplcs this
i s gcucra| ly truc. Whcnusi ng sampl cs sizcsbctvccn l 5 and 3Oobscrva-
tious, this is onl y thc casc vhcn thc tcst variablc (i . c. , thc variabl c Ior
vhi ch thc mcan is calculatcd) is approxi matcly symmctrical ( as many
ohscrvatious arc locatcd to thc l cl as |o | uc |. pu| o| l hc mcau) . With
smal l er uumhcrs o|oosc|va| ous, | '` | s| vari ahl ` s'ou' J |c approxi-
matc| y uorma l l y J. s| | ou| cJ i 1 1 t i l e o | . . ' . + . I 1 1 S j lLTi i 1 1 g | ' . ui s| ogramo|
| uc lcsl va| ao' c . u t i l e sau t pk i 1 1 di 1 t ' t ' l l y 1 1 t l t l l t l l 'i , , ., : l i t l l i l | u . s. The L est
va| ao| c Nu111hcr nj' ( '/ult !r. u ; l , dl l
''' ' 1 , 1 1 1 1 \ 'i \ 1 1 1 1 1 1 1 ' 1 1 1 1 : t l ( : 1 1 |. . s| u | uc
l nl o1 nl inl ,' l : tll:-> l lc: ; bb
`c| uc|| auJs . | ur at L t cl 1| vc|y mauy( youug) coupl cs vithoutchi | drcu
auJ|clat vc| y rew o1 1 pl es v| | u morc thau thrccchi ldrcn. l nthis casc, thc
appropriatc stat sl ica| |cst | orthc mcan uumhcr oIchi l drcn i nthcNcthcr-
l ands must hc carricd out vi th random sampl cs consisting oIat lcast 3O
couplcs. | ngcncral, such analyscsusc Iar l argcr sampl cs. 1hc advantagc
hcingthatthcstandardcrrorisrclativclysmal l (cal cul atcdbydividingthc
staudard dcviation (s) by

n, vhcrc n is thc sampl c sizc). A smal l stan-

dardcrroris dcsirablcbccausc itlovcrsthcchancc thatvci ncorrcctlydo
uotrccctthcnul l hypothcsis(type I error) . Inmcdicalrcscarchthistypc
oIcrror (as vcl l as a tpe I error) can bc vcry impor|ant. |orcxamplc,
supposcancvtypcoImcdici ncisindccdmorccIIcicntthano|dcrtypcs.
RcscarchcrsoIcourscdonotknovthispopul ationparamctcrsoas aprc-
cautionthcy vantthc chanccs oIi ncorrcctly notrc ccting H, as smal l as
possiblc in a sampl c study. | magincthc conscqucnccs oInotintroducing
a morc cIIcctivc drug to thc markct (a dccision rcsul ting Irom a typc 1 1
crror. thc ucv drug vas Iound not signiIcantl y morc cIIcctivc). Of
coursc a t

pc I crror shoul dbckcptat mi nimum also. Imagi uctaki uga u

cIIcctivcdrugIromthcmarkctvhi l crcplacingi tvithaless c||cctivcouc
( thatvascrroncouslyIoundsi gnicantlymorccIIcctivci ua samp| c)
3.2.2 Test for a proportion
A proportion is thc numbcr oIunits (c. g. , rcspondcnts) that havc somc
particularcharactcristic oIi ntcrcst, di vidcd by thc total numbcr oIunits.
This charactcristic is typical l y mcasurcd using a dichotomous variab| c
codcdO Ior thoscnot havingt hccharactcristicandl Ior al l thathavcthc
charactcristi c. |orcxampl c, thcdi chotomousvariablcOverweight maybc
codcd O, indicatingthata rcspondcnt is not ovcrvcight, and l , indicating
that a rcspondcnt i s ovcrvcight. 1hus, using this coding tcchniquc, thc
proportion ovcrvcight cquals thc sum oIa| l pcopl c bcing ovcrvcight,
di vidcdby thctotal sampl csizc(thiscquals cal cul ati ngthc mcan Ior thc
variablc overweight). By dcInition, a proportion l ics bctvccn O (not a
singl c obscrvationposscsscs thc charactcristic) and | (cvcry ohscrvation
carrics thc charactcristic) . l n a 2OO5 rcscarch procct, 55 out oI l 2O
rcspondcnts happcncd to bc ovcrvcight. 1hcproportion (notati on. p l ) is
l hus 55/ l,209 46 Thismcansthat,on avcragc, 4 oIcvcry l OO rc-
spoudcuts vc ovcrvcigh| Thc propor| iou oIrcspoudcu|s vho vcrc not
ovcrvc ght ( uo| a' . u. p0 i , cqu. ds . 54 ( p | l pO
I ) .
Gcucra l l y, whL' t l 1 \ ' sl i l l ) '. l i H poo|| . ous, | uc u u | | uypo| ucsis docs uot
s| a| cl ua| | .s ' l ( l l : l l l l 0 I h N 1s hl' v: I I I Sl' | uJ u al l east oucs . ug| c rcs-
pouJcu| u :1 s: l l l l
l l l 1 1 1 1 1 1 h. t . 1 1 1 1 ' ' l t : n : I \' I L' I I Si i . o| u | c|cs| ( . . c. , coJc |
wo. dJcs. ( | I l l ! I l l ' 1 1 I l 1 1 1 1 1 1 1 1 1 1 1 1 ' 1 l t i l l l l i y pt l hl' S I S.
I .
l i
l 1 1 . . . . . y L' : I Sl' S, ! l l l d . i l l l l l 'i l ' l ' l l ! l l l i l y 1 1 .

csc . . 1 1 \ I l l l l l ll ' ' d i Y . . 1 popi i l : l

t i ou p|opo|' o . ( npl n l l < l l l p l 1 d hq,l' l l h: t l l 0 v. ' l he sl : l l l'd 1 . i hl n u l l uy
polucs| s. | o| | us|. . . . c, N1 1 ppw:, | | . . . | i l l t a|c |casous l o h l 1 ' Vl' l l i al l uc
p|oporl iou o| oocsc |) l l l l ' l l pvopl cqua| s .40 . u 2005. l| owv |, o| ucrrc-
scarchcrs arguc l ua| | u s j l l ( l j lOI I I OI I s | | | o|c | | |c| y |o mi ||| l uc propor-
tionsinthc Uu. | cJ'| a|c s | 1 . : | : l l'l' v 1 1 ovc| 40. 1hus,accordi ng|o Ho thc
proportion oI ovc|wc. u| p opk s . 40. , vui l c in Ha - .4O is hypothc-
sizcd, vhi l c .4 (no|a| iou. ' i was | u. . . J |u luc 2005 sampl c.
1otcst vhcthcr p| is couc' us . vc| y u . guc||hanp l , (a|| oving ustorc-
cct H,), a norma| samp' i ug Ui s| . i ou| iou is uscd (scc |igurc 3 ). Hov-
cvcr, i ncascsoIsmal | samp' cs. .cs auJ/o.vc|y smal | orl argc val ucs Ior
p l ,, thc samp|i ngdi s|ribution is p|o| . o' y uol norma| l ydistributcd. Con-
vcntional ly, to dctcrmincvuctucr luc Jal a a| |ov lora tcst oIproportions
using a nonua| sampl i ng distribu|i on, a gCucra| l y app| i cd rul c oIthumb
statcs that thc , intcrva| shou|d not inc| udc thc va|ucs O and/or l .
Rccal | Irom scction2. 3. 3that. ola| | sampl ccsti matcsl i cvithin thc
rangcoI-3 and3 standardcrrorsi nanormal di stribution. So, thc.
intcrvalcqua|sp l , 3 a p l , vhcrca p l , i sthcstandardcrroroIthcsam-
p| i ng di stribution. This paramctcr is ca|cu|atcd by divi ding thc standard
dcvi ationoIthctcstvariab|c(a) by thcsquarcrootoIthcsamp|csizc(n).
!otcthatvcusca hcrcinstcadoIsbccauscthcpopul ationstandardcrror
is a dircct Inction oI thc assumcd popul ation paramctcr. According to
thc nu| | hypothcsis, thc popu|ationproportion oIovcrvcightpcop|c is . 4O
andthcassoci atcdpopu|ationstandarddcviation (a) oIthcvariab|cover
weight cqua|s .4 ( . 4O. O ). 1 1 Conscqucntly, thc standard crror cqua|s
. Ol 4( . 4 /

l 2O). Accordingto thcru|coIthumb, O and/or l shou|dnot

bcpart oIthc . i ntcrva| . 1his is i ndccdthc casc(p | , 3 a p l , - . 4
3 . O l 4 . 4 . O42= ( . 3, . 44).
|o| |ovingH,,vith pl , = . 4O,thcsamp|cproportionthatvas actua| |y
Iound ( . 4) i sust onc oIthc samp|c proportionsthatbc|ongto thc sam-
pl i ngdi stributionvith p l , . 4Oand a p l , . Ol 4(scc |i gurc 3 ). Rccal |
that i nIcrcntia| statistics |cts us ca|culatc thc probabi l ity oIInding par-
ticular samp|c rcsu|ts gi vcn thc assumcd population paramctcr and thc
standard crror. Thc absolute di IIcrcncc bctvccn thc samp|c proportion
andthc assumcdpopu|ationpropotion cquals .O ( . 4 . 4O). Relatively,
this di IIcrcncc amounts to 4. 3 standard crrors ( . O / . Ol 4) . Bccausc a i s
uscd,4. 3 i snota t-va|ucbut az-val uc. Thconc-tai l cdp-val uc Iorz 4. 3
i svcry sma| | ( . OOOOO), suggcstingthatthcrc i sgood rcasontorccctthc
nu| l hypothcsis and to acccpt thc al tcmativc hypothcsi s. In othcr vords,
thcproportion oIovcrvcightpcop| cinthc 2005 Du| chpopU' a| ion isvcry
| i kc|yto cxcccd .40( scc |i gurc J. 9 .
l l l i t l l t l l l l i i l : l l dl l : , l lu
Sampl i ng di stri bul i 01 1
( Normal l y di s|ri buted)
p1 o - .40
| /
probability (p)=
0. 000009 (= black area)
fl = 0.46
Figure 3.9 A Test ofa Proportion using the Normal Distribution
(1 o = . 40, pi = . 46, andn = 1, 209)
Ccncra|l y, a tcst Ior proportions usi ng a norma| di stribution vi | | oc co|-
rcct vhcnthcproportions O and l arc not inc| udcd i n luc 99. "- . u| c|v. d
around p l ,. | Ithi s rcquircmcnt i s not mc|, l uc samp| . ug J. s| |. ou| . ou . s
probab|ynot (approximatc|y) norma|

l n lucsc cascs, | uc oi uoH. : i l oi s' |i

butionshou|dbcuscdi nstcad. Thisdi sl ri out i ou . smos| sui | tJ 1 ( ) 1

l l sl 1 . g
proportionsbutc|osc|yrcscmb| csthcnorma| J. st |. ou| iou |l0 auJ/o| I 1 : 1 1 I
outsidcthc. intcrval .
1o conc|udcthis scction, vcprcscn|l uc |csu' l so|| ucovt|We ju| uy
pothcsis tcst, vhcrc thc nu| l hypothcsi s is |ha| t uc p|opo|| . ou o| OV r
vcightDutchpcopl cdocsNO1cxcccd 4O and thca ' |C|at . vc uypo| ucs. s
isthatthi sproportioni shi ghcrthan . 4O. 1hi stcstvas | rst pc|| u|mcd us-
i ng a samp|c oI I ,2O rcspondcnts and thc rcsu|ts vcrc comparcd to a
smal| cr randomly sclcctcd samplc oI rcspondcnts, using both thc nor-
mal andthcbi nomial distributions(scc1ab|c3. l O).
Table 3. 1 0 Test of a Proportion Using the Normal Distribution and
Using the Binomial Distribution with n= 1, 209 and n= 9
Overwei ght Counts Observed Proportion One-tai led p One-tailed p
proporti on( pl ) in Mp (normal (bi nomi al )
di stri buti on)
Yes 558 .46 .40 .000009 .000008
No 651 (= 558 l 651 ) .000008 *
Total 1 , 209
Yes 1 . 1 1 40 .037 . 071
No h ( - 1 l9) . 076 *
Total D
* with corrocl l o! J 1 1 1 1 ' "' 1 1 1 1 1 1 1 i l y . i t l l i H I I J OI O l : l )
| u | u sma ' ' sa. n| c ( . < ) ) , | | . d| s i 1 1 l o | u <)9. 7"/o i l l l l'i v . . | | | | I \ *
( . 1 1 * 9 /( 9 ) ( - . 20; 1 . ) ) , /1 a s n i l icanc kvd | i o| . 0', 1 1 1 1 cau
cl car|y bc |c cc| cJ vu . . l h ` . . onn. d o. s| | hu| . ou s c.:ouous| y usJ hul
thi s i s not thc casc vi | u | l' o . i ou. a | J s| |. hu| ou. Sc cu| ' c uoucsly i s
rcquircd t o rcport | ua| | u s J rl lrnc u |cst rcsu|ts di sappcars vhcn a
correction for continuity s p|| u mcJ.
In scction3 . 2. I , vctcstcd vhcthcr a si ng| c samplc mcan di IIcrs signi l -
cantly Irom a hypothcsizcd popu| ation mcan. | t i s also possibl c t o tcst
vhcthcr tvo or morc mcans statistical |y diIIcr Irom cach othcr. In thcsc
tcsts, a gcncral distinction is madc bctvccn comparing mcans vi thi ntvo
dcpcndcntgroupsandvithintvo(ormorc)i ndcpcndcntgroups.
Tvo groups arc said to bc statistical l y dcpcndcnt vhcn cach unit oI
anal ysis(oItcnrcspondcnts)vithinthcIrstgroupi ssomchovrclatcdtoa
unit in thc sccond group. lor obvious rcasons thcsc groups arc oIcn rc-
Icrrcd to as paired groups. Considcr a random samplc oIadul t vomcn
(group I ) and a sccond group consisting oIthcir mothcrs. Thc goal oI
such a dcsign coul dbcto dctcrminc di IIcrcnccs in occupational carccrs.
Anothcr cxamplc is a random sampl c oIrcspondcnts intcrvicvcd at tvo
momcntsi ntimc,Iorcxampl c, duringclcctionshc| d in 2OO3 and in 2OOo.
Athirdcxamplci sthccomparisonoItvovariabl cs, suchasthcrcsultson
a languagc tcst(group | ) anda math tcst in(group 2),vhil cbothgroups
contain thc samc rcspondcnts. A typical charactcristic oIthcsc thrcc cx-
ampl csisthatthcrc is intcrdcpcndcncybctvccnthc (paircd) obscrvations.
Obvi ousl ymothcrs andthcirdaughtcrsarcrcl atcdthroughIami | yticsbut
thcyarcalso statisticall yrclatcdas i ti sl i kcl ythat thciroccupationalca-
rccrs arc morc simil arthan anyrandom|y choscnpairIromthc samplcoI
mothcrsandthcsampl coIdaughtcrs. Rcspondcnts attimc O arcrcl atcdto
thcmsclvcs at timc 1 ; rcspondcnts taking a l anguagc tcst and a math tcst
arc thc samc rcspondcnts during both tcsts, vhi ch makcs it hi ghl yprob-
abl cthatthc outcomcs oIbothtcsts arc rclatcd. Thus. thcunitoIanalysis
i snota singl cuni tbuta pairo|u u . ls vi lh lvoscorcslhatarcto bccom-
parcd(sccTabl c3 . 1 1 ) Va| ah| c3 . n Tahk 3. 1 1 uJi cals| ucdi |1rcnccs
bctvccnva| ah' c I auJ vari ahk ' o|ac| | a .. |> I " ul t l rs, | u s ncv vari-
abl chasa mcau ( | uu . au o . | . . | . . i . . . o . s | . . . o. . . o ocv. a| ou( s .
1 1 1 1 ' | l l l l l l nl 1 1 1 1 1 : 1 I.N |! l
Tahl c 3. 1 1 !OlO

il ' | i 'l ll 'l 'l t 'O l h' fJCt ulcnf Groups

Pai rs ( i ) vari abl e l vari abl e 2 vari abl e 3
(diference between 1 and 2)
1 2 6 4
2 5 1 4
3 1 1 0
n A y A y
Again, thc standarcrroroIthc di IIcrcnccs can bc cal cul atcdbydi vi di ng

si z

. Bccausc d| IIcrcnccs bctvccn thc tvo variab|cs rcsu|t i n a singlc

vanablc (scc Tab|c 3. | | ), thc tcst i s cqual to thc mcan tcst in scction
3. 2. | . Thc nu| | hypothcsis in thistcstvi | l olcnstatcthatthcrc i snodi I-
cc (mca

diIIcrcncc - O) . Thc altcnativc hypothcsis is typica| | yJ . -


hichmcans that thc rcscarchcrcxpcctsthc mcau d. ||c|cucc | o
bccithcrhighcror|ovcr(positivcoI ncgati vc).

ncl udingthi sscction,vcvi l | prcscntl u|cccxamp| cs rrm uC| | | |

rcscarch. hc Irst cxamp|c dcals vith i ucqua| i | y hc| wcu u | c. . . . . |

omcn. I t I Scxpcctcd that vomcn, onavcragc, oh| a ucJ a ' ow. o. . .

tunal l cvcl than thcir spouscs ( vcmcasurcd cJucal oua| | cvc' w. | | | | o. a|

s oIcducation to obtain an intcrva| variab| c). Thc scouJ a . . |
utiltzcsapanclstudyIrom | o5and | O. I nboth ycars, l ucswne g.oups
oI rcspondcnts (thc panc| ) vcrc askcd about thci r church alcuJaucc

asurcdasthcnumbcroIdaysthcy attcndcdchurchaycar). Tuca| lc|-

nat|vc hypothcsis is thatin thosc Ivc ycars thc mcan |cvc| oIchurch al-
tcndancc had dccrcascd on avcragc. Thc third cxamp|c comcs Irom rc-

h on occupational mobi lity. It is gcncra| l y cxpcctcd that thc social

prcst|gcoIonc` srst ob (mcasurcdatintcrva| lcvcl) i slovcrthanthatoI
thcrcspondcnt` scurrcntob.
Table 3. 1 2 Three Paired Sample t-Tests (Dependent Groups)
Mean p
exampl e Pai rs ( i ) Diference i n: Diference (one-tai l ed)
1 Female-Male Education (years)
, 44 <. 001
2 I ndivi dual in 1 985 and Church Attendance -. 83 . 03
i n 1 990 (days a year)
3 Fi rst j ob | | ||i|| job cupati on ( presti ge) -3.80 <. 001
l| l l l u
' l i l l .l
1ao| 3. 1 - suppcs| s t l 1 : i l | | . . 1 1 1 d l l t ypot hcsi s ( 1 1 1 1 ) 1 1 1 l w t t' J l" t 1 nl ' o| : dl
| u|cc cxampks al | | | . 0 s. , . . . | . .

. | . . c |vc| . Nolc l ' . . . l : i l l v. | ucs ac

ouc-l a. | cJ as a | | a| | c. " l l O| i v ' l ypot l | csLs a|c J |ccl . oua| . Won t cl l ou avc|-
agc uavc | owc| cJuca| . oua | l evel s compa|cJ lo l hc. | pa|| uc|s ( sdmp| c
mcan di f|c|cucc .44 ycas i . T| l c ||cqucucy oIattcndingchu |cudi ddc-
crcascascxpcctcd ( ou avc|apc wi t u . h' L i mes a ycar), andthcprcstigcoI
thcrcspondcnt ` s |i|sloo. uJccJ i s | owc.| hauthcprcstigcoIthcircurrcnt
ob ( 3. 8 points on thc p|csl . gc sca|c) . Nol c l ha| iIthcsc hypothcscs vcrc
tcstcdmorc rigorous|y us ug a .0 I | cvc| o|si gni cancc(a), thc dcc| i ncin
churchattcndanccvou|dnothavcoccus. gu . Icant.
Thc tcst Ior a di IIcrcncc in mcaus w. l h dcpcndcnt groups i s statisti-
ca| | y corrcct iI thc random samp| c is sumcicnt| y |argc (n 2 3O). Wi th
sma|| cr numbcrs ( 3O- n - 4), it is assumcd thatthc tcst variab|c is ap-
proximatc|y norma||y distributcd in thc popu|ati on. A histogram oIthc
tcstvariab| c` sdistribution i nthcsamp|cmayprovidc thcrcscarchcrvi th
inIormation about thi s. Wit hsamp|cs sma| |crthan 5, i t i s assumcd that
thc variab|c isdistributcdnorma| | y.
Thc prcvious scction discusscd tcsts Ior mcan di IIcrcnccs bctvccn
vomcn and thcir spouscs, bctvccn individua| s at tvo timc points, and
bctvccn i ndividua| s` pcrIormancc on tvo comparab| c tcsts. |n a|| thrcc
cascs thc groups vcrc rc|atcd or dcpcndcnt. Croups can a|so bc indc-
pcudcnt, such as a random samp| c Irom a popu|ation oIvomcn and a
randomsamp|cIromapopu|ati onoImcn. Tcchni ca| | ythi smcansthatthc
random sc|cction oI vomcn Irom thc popu|ation docs not dctcrminc
wh . chmcnarcsc|cctcd Iromthc popu|ation in any vay. | ncasc thc Irst
group contains random|y sc| cctcd vomcn Irom a popu|ation vhi | c thc
sccondgroupconsistoIthcirspouscs,bothgroupsarcnoti ndcpcndcnt,as
i | | ustratcdi nthcprcviousscction.
To comparc mcans intvo i ndcpcndcntgroups, i ndi vi dua| scorcs can-
notbcsubtractcd| i kc thcyarc inTab|c3. l l . Thcmcandi IIcrcncc i snov
ca|cu| atcdbysubtractingthc mcan in group l Iromthc mcan i ngroup 2.
Thc standardcrrorassociatcdvi th t hi sdi IIcrcncc i smorcdiHcu|ttoca|-
cu|atc comparcd to paircd groups. Bcsi dcs group sizcs, this ca|cu|ation
dcpcnds upon thc di IIcrcncc in va| auccs w l h. u thc tvo groups. In thc
popu| ation, thcsc |wo va|. auccs cau oc cqua| | o cacu ol hc| (vhi ch i s
ca| | cd ` homosccJas| . c` , o| J. | Je|cu| | wu. cu . s c. d|J ucl c|osccJas| ic ).
Th. s . s suowu . u l . urc 3. 1 1 .
/ . ' i 't ' it '

\' lt Sl 1 1 1 : 1 y he uscJ |o tcst

whc| uc|t here is | | ou sc J. | sl . c. l vt l l l l l " l \ ' l o:a vd: t : l l l " I I Y ' J ' I t i s | cs| assumes
l t t l l l l l l l i l l . d ! i l l t l l nl i 1 .11
/ I
cqual v. 1 r. ancLs 1 .1 1 t l w j o. . | | | | 1 1 1 1 ) and . s t eslcJ ap 1 1 sl l uc oppos| l c
| . c. , uucqua| V: t l " l : t l l \ ' s ) I > l t v1 1 , : 1 S l l l a l l l ( c. g. , . 05 ) . suscJ . u | ucsc | cs| s,
wu. cucau|csu| t i 1 1 1 1 01 . lj li i t he u u | ' uypol hcs. s ( popu | al . ou va|. auccs
a|c cq| a| cvcu t i H H I h t here a|c |c| cvaul d. f|cuces Iound in thc vari-
auccs | |oml hcsampl e. Wucu| uc|carc vastdi IIcrcnccs,andvhcngroups
uavc uucqua| si zcs, i| |s aJv. scdto assumcuncqua|varianccs irrcspcctivc
o|thcoutcomcso||cvcnc` stcst.
Thc t-distribution is vc| | sui tcd Ior tcsting mcan di IIcrcnccs. Rcca|l
| ua| a

t-va|uc i ndicatcs hov manystandardcrrors | i cbctvccn thc di IIcr-

cncc H mcans and thc assumcd mcan di IIcrcncc i n Ho (vhich oItcn
cqua| s O). Subscqucnt|y, vhcnusi nga t-va|uc thc associ atcdp-va|uccan
bc ca|cu| atcd and

comparcd to thc |cvc| oIsigniIcancc (a). Again, Ior

d| rcct|ona|a|tcmativchypothcscs, thc onc-tai|cdp-va|uc shou|dbcca|cu-
group 1
group 2
Figure 3. 13 Heteroscedasticity (large df erences between variances)
Whcn usi

ng SPSS, both variants oIthctcst ( i . c. , assumingcqua| andun-

cqua|vananccs)arcpcrIormcd si mu|tancous|y,a| | ovi ngthcrcscarchcrto
sccvhcthcrrc|cvantdi IIcrcnccsoccurinthcp-va|ucs. IIoncvantstotcst
|hc nu| | hypothcsis as rigorous|y as possib|c, onc shou| d sc|cct thc tcst
vith thc|argcstp-va|uc.


Ior adi IIcrcncc in mc

a| csuIIcicnt|y | a|gc( u 2 30) . W. | hsma| | cr groups ( 3O> n - 4), it i sas-
sumcdthat lhc t est va|i ab' e iu | ue popu| at . ou Io| ool h g|oups i sapproxi-
atc|y symmc| c. d. l csarcl l shows, uowcvc|, l ha| | uc tcstisa|so app| i -
cab| c wheu bo1 l 1 d1 s l 1 1 hl l l t ons . . c a-symmc| | ca l . out bcar stron
rcsemo| aucc. l
l 1 si o) ', I : I I I I S 1 1 1 t l 1 1 v: 1 1 i : 1 hl s i n l ue samp| cp|ov. Jcinsighta
|o l hcsha

pc o| l ' . .dl , l l i i H i i l l l l l ' 1 1 1 | ' . . . u| . . l ou. | | the g|oups havccvcn

sma| | c| s1 zcs ( 1 1 ` j , t l w ' l l l i l hk 1 1 1 l lol l t opu ' . . | ous uavc |o oc ap-
p|ox . mal el y o . . | ' ' , l i t 1 1 d 1 1 t l t 1 1
Wc p|cscu | | wo cx. . . . |s | : . | | . s | |a |c | u s |s| | . | | . . | . . | c . . . . ' , a
compa| sou o|
wc || y wo. | . . . j l i t i i i i 'S ( pa. J c| | . p| oy| . . . . . | i | . . . . . . | oc-
l wccu mcu auJ wo. . . i W . x c| | | i a| womcu ou avc|. j. ' i . vc | owc|
l cvc| soIlu| | l . mc cmp| oy. . . . . . ' | ' . . . . . . . cu. 1u. s sccms l o oc | uc casc u a
sampl coIDu lcu|cspouoc . | s. wo . . | wo||, on avcragc, | 5. | 4 uou|s |css
than mcn (2. 02 42. | o i . Ti l e va| i a uccs a|c | 43. and l l 4. 5, rcspcc-
tivc| y. 1hcsc d. IIc|cuccs p|ooao| y accou u | lo| thc Iact that many Dutch
vomcn vork part-timc, | uc|coy ' owc| ug |uc mcan, albcit vi th a |ot oI
variabi | ity. Mcn arc mo|c | . |c|y |o wo|| |u| ' -| mc, so thc mcan is about
40hours vhi | cthcvariancc s |c|a | i vc' y ' ow. Al though thc varianccs arc
uncqua|, thc samp|c sizcs a|c app|ox . mal cly cqua| (3 l o vomcn and3 l
mcn). 1hcrcIorc i t sccms rcasouah| c l o assumc cqual varianccs to tcst
vhcthcrthc di IIcrcncc oI- | 5 . | 4 dcviatcs signicant|y Irom0. 1hc stan-
dard crror associatcd vith this di IIcrcncc provcs to bc 0. o. As a rcsult,
thc t-va|uc is - l . (- l 5. l 4 / . o). Bccausc thi sval uc i s |ocatcdto thc Iar
l cIt i n thc t-distribution, thc (onc-tai l cd) p-va|uc i s vcry sma| l (sma| l cr
than . 00 l) . So, vc can conIdcntly concludc that on avcragc Dutch
vomcn vork Icvcr hours than Dutch mcn. Whcn vc assumc uncqual
varianccs thc outcomcs arc virtua||y idcntica| (standard crror .o, t-
val uc- l . 4).
I nthcsccondcxampl c, tvouncqua| l ysizcdhctcrosccdasticgroupsarc
comparcd.Body Mass Index ( BMI ) scrvcsasthcdcpcndcntvariabl c.
I n
t hcrst group, 2 rcspondcnts arc agcd bctvccn 20 and 2 | and i nthc
sccond group, o20 rcspondcnts arc agcd 30 or o| dcr. 1hc mcan BMI i n
t hcIrstgroup i s23. vi th a variancc oI30. , vhcrcas thc mcan BMI i n
thc sccond group i s 25. 3 vi th a variancc oIl 5. . Apparcntly, BlI in-
crcascs vi th agc(as mightbc cxpcctcd), butthcvariancc dccrcascs. 1hc
d llc|cncc bctvccn thc mcans oI BMI amounts to - l . 4 (23. 25. 3),
wu cusccmsquitcsma| | . Assuming cqualvaianccs, thc standardcrrori s
. vhich rcsul ts i n a t-va|uc oI- l . o4 (- l . 4 / .). Assuming uncqua|
varianccs,vhichrcIcctsthcdataIarbcttcr,thc standardcrror is l . 04and
thc t-va|uc is - l . 35(- l . 4 / | . 04). 1hc associatcdp-val ucs arc . 03 (t-va|uc
= - | . o4) and . 0 (t-va|uc - | . 35) rcspcctivc|y. Whcn tcsting at thc . 05
signiIcancc | cvcl , andassuming uncqua| varianccs, H0 i s notrccctcd.
1ab|c3 . l 4 summarizcsourtvocxamp| cs.
Table 3. 1 4 Three Two-Sample t- Tests (independent groups)
Exampl e I ndependent Diference i n Observed p
groups ( n) diference (one-tai l ed)
Femal e ( 31 8) - Mal e (371 ) worki ng oous - 1 5. 1 4 <. 001
2a Age 20
21 (29) t 30 (8/0) MI 1 10 . 03
2b i dem ( uncqi J < l l v; u i < I I I C :< : : ) 0 | I l l . o I l l . 09
| n| | u i | |. ' . .
/. !
Wucu l uc|c a|c mo|c | ua u | vo indcpcndcnt groups, thc t-tcst Irom thc
prcvious scction .s uol sui |ab|candanF-test must uscd instcad. 1hc nu| |
hypo|hcsis i n thi s tcst statcs that all popul ation group mcans arc cqua|
and thc altcrnativc hypothcsis statcs that not a| | popu|ation group mcans
arc cqua| . 1hi si sa non-dircctiona|hypothcsis,Ior i ti son|y hypothcsizcd
thatthcgroupmcansdi IIcrIromcachothcr. 1hcrcarctvoimportantIac-
torsthatdctcminc vhcthcrthc nu| | hypothcsis i src cctcdornot. |irst|y,
thc sprcadorvariancc oIthc groupmcans is considcrcd thc morc thcy
di IIcr, thcmorc| i kc|y i tis thatthcmcansarcnotcqua| i nthcpopu|ation
(rcccting H,). As a mcasurc Ior this group mcan variabi| ity, thc group
mcan` s variancc around thc ovcra|| mcan ( ` grand mcan` ) is uscd. 1his
variancc i s oItcn indicatcd as thc between variance (or MSC vh cu s
shortIorMcanSquarcsoICroups). 1hchcightoIthcbctvccn va|. aucc . s
ca|cu|atcd bascd on a parti cu| ar sum oIthc di IIcrcnccs hcl wccu p|oup

A graphical rcprcscntation oIl u shc| wccu va.
ancci sgivcn i nthcl cItpanc|oI |i gurc 3. | 5.
Group means
Grand mean
Group mean
I ndi vi dual s

I t +
Z 3
Figure 3. 1 5 Between Variance (MSG) and Within Variance (MSE)
1hchi ghcrthc bc|vccn va| aucc,|ucIrthcr thc group mcans arc | ocatcd
avay Irom l uc g|auJ mcau. Or, u o|uc| wo|ds, l uc u . guc| thc bctvccn
variancc, | uc| u|| u || uc |oupmcausJ. llc| l|omcacuo| uc|.
Tuc scco. . J ' . | . | | . . . | . . | l . + . . ccs | uc| cs| |cs u| | s | uc amouu| o| va|. -
ah. | . | y ntll/tii . . . ' . j , . . 1 1 ' ' . . s v. . . . . o. ' | y . s qu. | c ' a|pc, i t | s | css | . |c| y
l | . j | | \
l hal l hc group l l l l' : I I I S : i l l' 1 1 1 1 t' l l l l l l ) ' l l : 1 p: 1 r l I n rcj cl l hl' l l t d l I I YJ H i l i l ls l s.
1u. s va|. ao. ' . | y i s 1 \ ' l l l ' d 1 1 1 : 1 :. 1 1 1 | . il//itt t utiuu:c ( 01 MSI : Mc< l l l
Squa|cs o|| ||o| i . Ti l l S v: l l l : l l l l l ` I S : 1 S l l l l l or | uc uu. | s` V< l l i : l hi l t l y ; round
l hc|cspccl . vc p|oupl l l l: : I I I N ( Sl' l' I I l l' I i ghl p. uc' o||. gu|c 3. 1 5 i . Th: | ar
gc|l hc w. lh . u-va|. a . , t i t | | | | t l t L' I | uc oosc|val . ous a|c apa|l ||om l hci r
rCspccti vc g|oup mcau, < I I H I t l l I I H l l u . || cu ' l . l i s to dcmonstratc that
thcscgroupmcausJ . ||| rro 1 1 1 L': l l ' l t ot l l n iu l uc popu| ation.
1o summarizc, wucu t he hl: l w i v. |. aucc i ssmall (smal l variabi l i ty
oIthc group mcans) and | uc w. | u . . . v. |. aucc . slarge (a |ot oIvariabi l ity
around thc group mcans) , | uc|. w. ' | oc ' . | | | i ndi cation Ior group mcan
di IIcrcnccs in thc popu ' al iou. Jl | C opposi | c is a| so truc, largc bctvccn
variancc, combincdvith sma ' ' w. | u . u va|. au. c, ccar|y points todi IIcrcnt
groupmcans inthcpopul ati on. To cxp|css | u. s |c' at. onshi pbctvccnboth
varianccsstati stica| |y,thcbctvccn vari ancc isdi vidcdby thcvithinvari-
ancc. 1hc outcomc is ca| | cd an I-va| uc. |c' al . vc| y sma| l bctvccn vari-
anccandl argcvi thin varianccrcsu|tsin sma| | |-va| ucs(sccthc| cItpancl
oI|igurc 3. l ) . Convcrscly,a |argc|-val uc is associatcd vi th rc|ativcly
largc bctvccn variancc comparcd to thc vi thi n variancc (|i gurc 3. l ,
rightpanc| ). Bccauscvarianccsarc thc kcy obccts,thistypc oIana|ysisi s
la bc l cdA!OV A, vhichi sshortIorA!a|ysisOIVAriancc.
Group means, - Grand mean, O I ndi vi dual score
F i s smal l F i s large
0 0
8 0 0
1 5
0 0


2 3
2 3
Groups Groups
Figure 3. 1 6 Small F-value (lcfi uuc/) unu Iurc |-i u/uc (rihl Onel}
| n| t | t | | | | . | '| . H| . | |t:
To t kt crmi ne whcl l i L' I : 1 1 1 1 : va | uc . s | a|pc cuoupu | o |cccl | hc u ul | hy-
po| uCsl s, J cot l l put cr p|ogral | | ( suc| asSlSS cau hc uscJ loca| cu' al cl hc
p-va ' uc . u | hc 1 :-di st r i but i on. 1h. s samp| ing di stribution | s rightvard|y
s|cwcJ ( scc |. gu|c 3.1 7). Only p-va| ucs to thc right arc oIintcrcst bc-
causccxtrcmc |-val ucsarcalways IoundtothcIarrightoI0. 1 9
Observed F-value
Figure 3. 1 7 An F-distribution, Observed F-value, and - iu/m
1o i | | ustratc thc ana|ys| soIva|. aucc, au cxalpk | ' | ows l 'n1 1 1 1 a l l ' Sl': l l t ' l t
pro cct rcgarding thc rc|ati onshi p bet we<;n c u. | u r: 1 i s i ng : t l l t t l l < ks 1 1 1 u l
|cvclsoIcducation. Cogni ti vc l hco|. cs su p. s| t l l n l 1 1 1 l' dl l t at 1 ( ) 1 1 : d h wl
attaincd inIlucnccs thcsc aui tuJcs. 1omcasu|. : t t l i l l l ( ks . . . d1 dd l : l l h l l l )' ,
rcspondcntsvcrc askcdto rcspond lo | uc | t ( | ow. u sl : t l l' l l l l' l l t : " l \ oy:1 I ` 1 1 1
c ra|scd morc | cni cnt than gi rl s hy choosi ng ( ) l l (; or ' "l' l ' : i l q, ( l i l `
complctc|y agrcc` (codc l ), ` agrcc` ( 2 , ucu| |a ' ` (i , ' d i s: t gt'l'l' ' ( I ) ,
` comp|ctc| ydtsagrcc` ( 5) . Strictly spcaki ng, | hi s is a o|J. ua ' va . ao' c hut
it i s c


on prcticct oassumc cqua| di stanccs hc| wccu cachca| cpo|y.

rcndcrmg it an mtcrva| variab|c and a| l oving us to ca' cu' alc | hc mcau
scorc. 1hc rcsu|tsarcshovn in |i gurc3. l .

Educational low
level :
average hi gh
= Grand mean
= Group means
Figure 3. 1 X !hc /c/uliouhi hcl uCen Educational Level and Raising
lfu\ '.1' 1 '.1' . ( ,'iris. 1: uu, unu l ( iddlc 50%)
| *' : *
lpu.c 3. 'X sujjcs | 1 1 1 1 1 g nt i g i | t : . . . av. aj' I uJ1 1 1 J. I I J ', I I ' I ' v: . ' . | | c
s| al cucu| o' |a . s , huy, . . . J , . . | s J . | | c .cu' ' y . | u . JJ. | . . . , J W< pk w. | |
avcragc o| u. pu | v. | s ul . | . +. . | . . . | cuJ ' o J. sap|cc . . . o. c s | . oop' y
comparcd | o ' uos' w. | | . | ov . . | v | s ul ' cJuca ' ou. 1ao| ` ' ' 9 Jcuou-
stratcs tha | | hc oc| w. . v. u . . . . . . . ( I X. X ) . s much | a|gc| l uau | uc w | hi n
variancc (. | . 1uc .a| o | . ' vc . | | . .sc ' wo va|ianccs ( |- va' uc | s 4. l
(4o. o/ . | ) and . ss ' a| | s| . .
. . | ' y SI j ' I | l i c. c. | | a ' | hc . O l signiIicancc |cvc| ) .
ThcrcIorc, thc nu| | uypo| uI s. s | 'a| a ' ' | ' . .cc mcaus arc cqua| can bcrc-
Table 3. 1 9 Results from I NO VI. '''dumlioua/ Level -Raising o \
Between vari ance
Withi n variance
. 761
64. 1
< . 001
A morc i nIormativc thcory drivcn hypo|hcs| s suggcsts that highcr | cvc| s
oIcducation rcsu| ti n|css tradi tiona| chi |d-rai s| ng attitudcs. I nthis casc,
thc a|tcrnativc hypothcsis statcs that thc highcr somconc` s cducationa|
attainmcnt, thc strongcr hc or shc vi | | disagrcc vi th thc statcmcnt that
boysshou|dbcraiscdmorc| cn| cn|| ythang| r| s. | ndccd, this associationi s
obscrvab| c i n |igurc 3. l . Thc Bonf erroni test i s an appropriatc t oo| to
statistica||y tcst vhich group mcans d| IIcr Irom cach othcr. Thi stcst rc-
quircsthrccscparatc t-tcsts Iordi IIcrcnccs in mcans Ior tvo indcpcndcnt
groups(sccscction3. 3. 2) bccauscthrccscparatcgroupsarcprcscntinthc
data. Hovcvcr, thc | cvc| oIsignicancc isadaptcdin such avaythatthc
tota| typc | crror docs not cxcccd 0. Wi thout this adaptation, thc typc I
crror vou| d bc cqua| to 3 0, vhi ch is gcncra| | y consi dcrcd too |argc.
According to our BonIcrroni tcst, a| | group mcans di IIcr signicant|y
Irom cach othcr in thc cxpcctcd dircction, conrmingthc a|tcrnati vchy-
pothcsis(sccTab|c3. 2O).
Table 3.20 Means and D!fferences in Means in Child Raising Attitudes
(the larger mean, the less traditional)
Educational level
Low Average Hi gh
Low 3. 734 .
Average . 31 6 4. 050 .
Hi gh . 571 . 255 4. 305 .
group mean , * diference between means (p < . 01 )
l l
1| c 1 ;-1 s| | s| s01 1 | wo . I :. :> I I I I I J l i i i J I I S. | s| ' y, | uc . uJ. v . Jua| sco.cs . ncacu
p|oupa .c ass . . . i . .J ' o ' . . . . . . . . ' ' y J. s ' . . ou| cJ . n | uc popu' a| on. | | owcvc|.
|csca|cu uas suowu | ua| c vc. . w. | u nou-uo|ma| di stributions, thc |-tcsti s
app|op|. a| c | u p|ac' | . . | usc, wucu n pcr group > 5. Sccond|y, it i s as-
sumcd | ha| | uc popu' a| . on var| anccs in a| | groups arc cqua| ( i . c. , homo-
sccdast| c| |y) . Vi o| a| ionoIthi sassumption may rcndcrthctcst |css uscIu|
vhcnthc ratio bctvccn thc |argcst and sma||cst samp|c group variancc
di IIcr by morc than 2 and thc ratio bctvccn | argcst and sma| | cst group
sizc ismorcthan4. |n this situation,thcBonIcrroni tcstisnotappropriatc
andatcstthataccounts Ioruncqua|groupvarianccsshou| dbcuscd.
| nthis scction, mcasurcs oIassociation arc prcscntcd that dcscribc rc|a-
tionships bctvccn nomina| and/or ordina| variab|cs. Discusscd cxamp|cs
i nc| udc thc rc|ationship bctvccn religious affiliation andpolitical parl11
preference (bothnomina| variab|cs),and thc rc|a| ionsh| poc| wccueduca
tional /eve/ and income (bothordina|variab| cs).
Contingency tables are common| y uscdtodcscribcthc aSsoc. a| iou o||c-
|ationshi p bctvccn variab|cs vith |ov numbcrs oIcatcgor|cs (prc|crab|y
< I O) . Duc to | imitations imposcd by thc numbcr oIcatcgorics, contin-
gcncy tab|cs arc gcncra| | y uscd Ior nomi na| and ordina| variab|cs on| y.
Thctab| c consists oItvo or morc co| umns androvs, dcpcnding on thc
numbcr oIcatcgorics. Thc inncr cc| | s contain thc obscrvations Ior cach
combination oIco|umns and rovs. Thc outcr cc| | s arc ca|| cd thc mar
ginals, in vhich thc tota| numbcr oI obscrvations Ior cach co| umn and
cachrovarcprcscntcd. Thctota| sum oIa| | margina|sisthctota| numbcr
oIobscrvations and i s shovn i nthc | ovcrright oIthc contingcncy tab|c
(sccTab|c3. 2 l) .
Table 3. 21 Basic Structure Of Contingency Tables
Co' uco1 Col umn 2 Margi nals
Row 1 | ooe|Ce| | 1 | ooe|Ce' | 2 Row total (Cel l 1 + 2)
Row 2 | ooe|Ce| | 3 | ooe|Ce| | 4 Row total Ce' | 3 + 4)
Margi nal s Co| ua|t|. | Co| uaulo|n| Grand total
|.o' | 1 ,( 11 : l )

( .o' ' . l to| | ' ) (Cel l 1 + 2 + 3 ? 4)

/l l
1 1 . .
Wucu | uc.c s ooJ | | . o. . | . . . . ' . . . .o ' o . . ssu | aa c . t / t . tl . : | o . . . ' . . , |
. s commoup.ac| c ' oI L' pr L' :<i' l l l ' ' .l l ldt t wnr lenl va| ao' | + ' so . . | . . . cJ ' o
asthc x va.. ao|i a s ' uc . o| . . . . . . v+. . . 1 | a . . J| uc deJ)!I Idenl | yi v. . . . ao|as
thc rov va.. ao| c. l | x; r npk. s . . osc ' ' u| a .csca.cuc.s| uJ| cs | uc .c| a-
tionship bcl vceu edumlionul lt ' l 't ' l | oJ ua | i auo income class | o.o. ua| ) .
Sincc most pcop| c comp| | | | . . . cJu a| ou hc|orc starting thcir Irst
rcgu|arob, thc causal o|oc.sc i . s | uJ spu| ao|c. This mcans thatthcedu
cational level variab|c sc|vcs as ' uc o| uuu variablc, vhcrcas income i s
thcrovvariab| c. Whcu l ucsc va . ao| cs cacu havc tvocatcgorics,thcrc-
sul tingcontingcncytab|c w. | | uavc| wo .owsauo|vocolumns(scc 1ab|c
3 . 22). 1ab|c 3. 22 dcmonstra|cs l ua| o| 567 rcspondcnts vith Sccondary
VocationalSchool orlcss,23o|l ucmcana mou| u| yncti ncomcoIovcr
2,OOO curos. OIthc 4O rcsponocnl s w. | u 0 lcvc|s or hi ghcr, 2? carn
ovcr 2,OOOcuros. Bascd on thc abso|ulc o. rfcreucc (23 - 2?), it sccms
that cducationa| | cvcl docs not i nhucncc incomc much. Hovcvcr, thi s
comparison i s i ncorrcct bccausc oI uncqua| col umn tota|s (5? vcrsus
4O). Ccncra|l y, abso| utc diIIcrcnccs i n conti ngcncy tablcs arc inappro-
priatcIordctcrmi ni ngthcrclationshipbctvccnvariab| cs.
Table 3.22 Contingency Table with Educational Level and Income (in
cells: absolute counts)
Educational Level
Secondary L levels or hi gher Total
Vocati onal or less
I ncome 2, 000 at maxi mum 304 1 21 425
more than 2, 000 263 287 550
Total 567 408 975
1o ansvcr thc rcscarch qucstion as to vhcthcr highcr cducationa| l cvc|s
i nducc highcr i ncomcs, a comparison has to bc madc bctvccn thc tvo
cducationa| | cvc| s shovn i n 1ab|c 3.22. Hovcvcr, a comparison oIthc
absolutccountsi nthcinncrcc||s i sproblcmaticbccauscthccol umntotals
vary (5? and 4O). In a Iai rcomparison, this variation has to bc rulcd
out. ThcrcIorc,thccol umntotalsarcscttobccqua|rst.A commonvay
Iorthis is to sctthccol umn tota|s Irom 1ablc 3. 22 to l OO pcrccnt.
i sdoncbydi vi dingthc co| umn |ola| s oy5. ?( 5?I 5. ? l OO)and4. O,
rcspcctivc|y. As l hc co| umu | o| a| s .cp.cscu| | uc sum o| lhc inncr cc| |
counts, |hcsc couu| s uavc | o oc J v : |J o, | 1 auJ | 0' as wc| | . Consc-
qucul | y, | uc . uucr c ' | s o ' o (, . . | . . .. . ' ' ' . . . o o|. | . i ' s |ul l hc
l l l l l l l f l l l l l r l l ; 1 1 I l l I I l t :
cou. . uu p r . i ' . . , s ( .'
\ '\ ' I . r l lk \. @1 \ ) . l |

cxamp|, | uc pc.ccu| agc o|.c-

spouocu' s w ' | '. oJa V Vo.. | oua| 'cuoo| o. | css auo can. ugau . u-
comco| ovc. _ 000 . . . . os . s

| | . 4 | 263 I 5. ?o.al tcnativcly(23/5?)

|00) . 1uc pc|ccu| a o||cspouocnts vi th 0 l cvc| cducation orhi ghcr
vilh lhc samc . ucou. c | s suoslautial | y hi ghcr ?O. 3. I n othcr vords,
.ougu| y 4 o| cvc.y |00 pcop| c vi th Sccondary Vocationa| Schoo| as
| uci . hi ghcst | cvc| oIcducationa| attainmcnt havc a month|y nct incomc
o|morc than 2,OOO curos. Likcvisc, approxi matcly ?O oIcvcry l OO rc-
spondcntsvith0 l cvcl sorhighcr,cammorcthan2OOOcuro` s. Thus,thc
di lIcrcncc inpcrccntagcs(notati on. d)cqua|s23. (?O. 3 4.4). |n thi s
casc, both variab|cs havc on|y tvo catcgorics, mcaningdcan bc com-
putcd by comparing thc top i nncr cc| l s as vc| | (2. ? - 53. ) . Dcscrip-
|ivcly, thc ansvcrto thcrcscarchqucstion is thathighcr l cvc|s oIcduca-
|i ondoappcarto incrcascthcchanccsoIcami ngahi ghcrl cvc| oIi ncomc
latcrinl i Ic.
Table 3. 23 Contingency Table with Educational Level and Income (in
cells: absolute counts and percentages)
Educational Level
Secondary L levels or higher Total
Vocational or less
I ncome
2, 000 at maxi mum
304 1 21 425
53. 6% 29. 7%
more than 2, 000
263 287 550
46.4% 70. 3%
Total 567 408 975
1 00% 1 00%
Prcscntingpcrccntagcs in contingcncy tab|cs is common practicc, and i n
thc cascs vi th | ov numbcrs oIrovs and col umns, i t providcs a c|car
ovcrvicv oIthcrc|ationshipbctvccn tvo variab| cs. Hovcvcr, thc di IIcr-
cncc in pcrccntagcs (d) is not common|y uscd as a mcasurc oIassocia-
tionductotvo importantdi sadvanlagcs. |i rst|y, ddcpcndsonvhcthcr
thc rov tota|sorthc co|umn tota|sarc sct to I OO, wu. cu i sparticu|ar|y
prob|cmatic vhcn a causa| o|oc. cauuol oc csl ao| i suco on thcorctica|
grounds. UnIo.l unalcl y, c| ca. causa| o.oc.. ugismo|c J. rfi cul l tocstabl i sh
. uthcsoci a| sc. cuccs | uau | | . s | u | ucua| u|a | s . cuccs. 'ccouo| y, intab|cs
vith tvo |ows. . . J | woco| uui . i s, o. . | yo. . J l x sl s |u cascs vith morc
co|umns auu/o vs, I I H l l l' d i l f rl m c p .+ . . | . cscau occa| cu| atcd,
vhich cau oc ' . o. . |' . so. . . .,' .| . o| | ci I I H I I \ ' u . s| . . ' | vc lo prcscnt l hc
.c| al . ousu . pI I Si l l ) ', . . . . . j ' I H I I I i i H t
\ I l 1 1 1 pl 1 1 1 . l
Chi-square Test and taOt`
Cramcr`s V can bc ucJ | o Jcsc| o` | | i c rc|a| ionshi p bc|vccn |vo vari-
abl cs, vhcrc at |cas| onc is uou. u. d va|. ah| c. 1hi s mcasurc is dcrivcd
Iromthcchi-square ( uol al . ou. x\ p|ououuccJas ` Ki -squarc` ) . 1hchcight
oIthi s chi-squarc i ndi ca|cs | uc J. | c|cucc ocl vcCn thc obscrvcd and thc
cxpcctcd counts in thc inncr cc| | s o| cou| ingcncy tabl c. 1hc cxpcctcd
numbcrs arc cal cul atcd Irom l uc upo| uc| i ca| si tuation oINO stat i stical
rc|ationshipbctvccnthc vari ab| cs. l n our prcvi ous cxampl c, vc Iounda
rc|ationshipbctvccneducational /eve! and income (scc1ab|c3 . 23). !ov
supposc that no rclationship cxi sts bctvccn thcsc variab|cs, vhi | c thc
counts i nthc marginal s(i . c. , thcoutcrcc| l s) arccxactlycqua| tothosc i n
1ab|c 3. 23. Si ncc thcrc i s no rc|ationship, thc pcrccntagcs i n both col -
umnsarc i dcntical anddcqua| szcro. Sinccthcpcrccntagcs i nthc mar-
gi nal s arc takcn Iom 1ablc 3. 23, thc pcrccntagcs pcr column a|so cqual
thcpcrccntagcsi nthcrovtota|s(sccthcgrcy shadcdcc||si n1ablc3. 24).
I nthistabl ci t i scomp|ctcly irrclcvantastovhichoIthctvocducational
lcvc|s i s considcrcd, as thc chanccs oIcaning a hi ghcr i ncomc arc cx-
act|y cqual ' In othcrvords, about 5o oIcvcry | 00 rcspondcnts, can an
incomc oIovcr 2,000 curos- a numbcr that holds Ior both cducationa|
|cvc| s. Instatisticaltcrms,thi smcansthatnostatistica|rc|ationshipcxists
bctvccncducationall cvcl andi ncomc. lromthci nncrcc|| pcrccntagcs in
Tab|c 3 . 24, vc can casil y ca|cu|atc thc cxpcctcd counts Ior cach oIthc
inncr cc| | s. lor inncrccl l l , thc cxpcctcd count i s 24 (. 43o 5o) and
|orinncrcc| | 2 thi sis | o (. 43o 40o). 1hccxpcctcdcountsIor inncrcc|l
J auJ4arc320 (5o - 24)and 230 (40o l o), rcspccti vc|y(scc 1ablc
J . 24
Table 3. 24 Contingency Table with Educational Level and Income (ex
pected counts and percentages, condition: no relationship)
Educational Level
Secondary L levels or hi gher Total
Vocati onal or less
I ncome 2, 000 at maxi mum
247 1 78 425
43. 6% 43. 6% 43. 6%
more than 2, 000
320 230 550
56. 4% 56.4% 56. 4%
567 408 975
1 00% 1 00%
t nl oi i J I I t l nl : i l nt l nl l i : l |l |
1uc cxac| cu. SQu | | t ' |S l i 1 1 t dnkd oy | a| . ug l uc J. | Ic|cucc oc| wccu | uc
oosc|vcJ aud t ue xpl'ct nl t` l l l | | | t s . u cacu cc| | . 1ucsc J. | Jc|cuccs a|c
squa|cJ, | ucu J. v . . |u oy | | assoc. a| cJ cxpcc|cd count, and | na| | y
: .
l u 1ao| c 3 . 2. \ , | ' | ccui -squarc cqua| s 55. 5(ca| cul ati on. (304 -

I 24 ( | 2 | I n)2 I | 8 (23 320)

I 320 (2o 230)21 230
| 3 . | | 8. 2 | 0. ( ' | 4. | 55. 5). 1hus thc chi-squarc i ndi catcs thc
| cvc| oIdi scrcpancy bctvccn thc obscrvcd tab|c (scc 1ablc 3. 23)andthc
|ab|cvithoutany statisticalrclationship(scc 1ab|c 3. 24). Highchi-squarc
val ucsi ndicatcahi gh | cvcl oIrc|ationshipandviccvcrsa.
1ypically,thc ncxt( inIcrcntial)rcscarch qucstion is. docsthc obscrvcd
rclationship also cxi sts in thc population? 1his qucstioncanbc ansvcrcd
using thc chi-square-test. I Ia suIcicnt|y largc numbcr oIobscrvations
arc prcscnt in thc i nncr ccl ls, thc samp| i ng di stri buti on associatcd vi th
| hi stcstc| osc|yrcscmb|csthc,-distribution (shovni nligurc3. 25).

,-di stri buti on

Probabi l ity (p ) , grey area
Observed X -val ue
Figure 3.25 A ,-distribution, Observed ,-value and p
Tab|c 3 . 2o shovs thc cal cul atcd chi -squarc va| uc and thc associa|cd
probabi l i ty. Notc that thc chi -squarc valuc i n 1ab|c 3 . 2o s lightly di l|rs
lrom our ovn ca|cu|ations bccausc stati stical soItvarc uscs cxact cx-
pcctcdcountsvhi l cvc uscdroundcdnumbcrs. 1hcprobabi lityis sma| |cr
than . 00 | , suggcstingthatvc shou|drccctthc nul | hypothcsi sstatingno
rc|ationship bctvccn thc tvo vari abl cs. As thc di IIcrcncc i npcrccntagcs
is as cxpcctcd (scc 1ablc 3. 23), thc a|tcnati vc hypothcsi s that pcopl c
vith 0 l cvcl s or hi ghcr gcncra| | y carn morc i ncomc than pcopl c vi th
Sccondary Vocationa| Scuoo| o| | css is supportcd. Wc l ikcto notc that
i utablcsvi th 2 rovs and 2co| umns. d| rcctiona|hypothcscscan bc|cstcd.
l nsuchatab| c, |hcrcpor|cd pro|a|| | i |yshou|dbcdi vi dcdbytvo

Table 3.26 !li: .|:iu/iu /:i | | | l.. /0:u/iuu/ Level and

lncnnw: I //i \.jitcitc l . /
th|-SQu | vi t|tl l I ' ' ` | ' | j ( t wn l i l i l l d) . 00 1
l l | : i | :| . \
|c|| t nn. u . | c ' | . sq. . . | . | . | I I :: I I I J '. : 1 u. s| |. ou| ou . ss l : i l l . l l l l l l l y . i ( l j l l l l
p|. al c wucu | | i c l l l l l l l h pi Pi lSrl v: i l " l i i S . s su | | c. cu| ' y l : 1 1 gi I : i ' l l i ' l a l l y,
| hi s is Jc|c|m. ucJ us. u I 'it li i i i i

.,. i: , wu. cu sl al cs | | | . | | ' . : t i: t ltO

numbcr o| obsc|val . os .u l' : ! l' l l i l l l l l r . c ' ' o| l uc uypol ucl . ca' | . o| c . uJ. -
catingno|c| a| . ousu. p( c. g. , ' l ' i i hl '

2'1 ) suou| Joc a| | casl l , wu. | c. uoO

oIa|| inncrcc| | s thc cxpcc| cJ | . . . oc|suou| J oc a|| cast 5. l u 1ab|c3. 24,
thisru| ci ssatiscd,so us. ugl i | L x! u. s| | . oul . ou . s notprob|cmatic.
Whcn samp| cs a|csma| | auJ/o| wl l cn l uc l ao| c hasmanyrovsand/or
co| umns,thc | i kc|ihood| ual Cocu|au` s |u| c .s uol satisIcd incrcascs. Onc
possi b| cvayto so|vc thi sp|oo| cm .s |o como. uc |ovs and/orco|umnsto
incrcasc thc obscrvations in |hc |csu| | . ug . uuc|cc| | s. Hovcvcr, i Ithis i s
notIcasib| c, ancxacttcsti smorcapp|op|. al c. 1histcst rcstsont hcnum-
bcrs i n thc margina| cc||s oIthc cou|. ugcucy l ah| c. Bascd on thcsc, thc
corrcct samp| ingdistribution i sdcrivcd,again uuJc||uc assumptionthat
no rc|ationshipi sprcscnt. 1his i sarc|ati vc| y| aho|. ousproccdurc,si mi | ar
to rcpcatcd| y draving ncv samp|cs Irom a popu|ation (scc scction 3 . l ).
Hovcvcr, this proccdurc is not as timc consuming anymorc thanks to
modcrn computcrs. 1hc cxact tcst is part oIcvcry cstab| i shcd statistica|
soItvarc packagc, inc| uding SPSS. Isi ngthc corrcct samp| i ng distribu-
tion and thc obscrvcd counts, thc corrcctp-va|uc can bc ca| cu| atcd and
comparcdviththc |cvc| oIsigniIcancc(a) sctbythcrcscarchcr.
1hc chi-squarc cannot bc dircct|y uscd to i ndicatc thc strength oIthc
rc|ationship. 1his i s duc to thc Iact that thcrc i s no natura| | imit to its
hcight.Largcrcountsi nthci nncrcc| |sand/or|argcrnumbcroIi nncrcc| | s
automatica| | y | cad to |argcr va| ucs Ior chi -squarc. |or cxamp| c, this
mcans that a chi-squarc va| uc oI 3O i n sma|| samp| cs may i ndicatc a
strong rc|ationship, vhcrcas in |argc samp| cs it vou| d indi catc a vcak
rc| ationship. 1hisincomparabi | ityprob|cmvas so|vcd by Svcdish statis-
ti ci an Hara|d Cramcr (l 3- l 5). Hc ca|cu| atcd thc maximum possib| c
va|ucIor chi-squarc,givcnaccrtainsamp| csizcandgivcnaccrtainnum-
bcr oIrovs/co| umns. Hc thcn di vi dcd thc obscrvcd va|uc Ior chi -squarc
by th| s maximum va|uc and took thc squarc root. W| thout this squarc
root, a di IIcrcncc bctvccn thc obscrvcd and cxpcctcd numbcrs that vas
tvicc as |argc vou| d actua| | y i ndicatc a rc|ationship that vas Iourtimcs
strongcr. 1his i s duc to thc squaring oIthc diIIcrcnccs bctvccn thc ob-
scrvcd and cxpcctcd numbcrs vhcn chi-squarc is ca|cu|atcd. In this casc,
Cramcr` sV (asthcmcasurc i snovadaysca|| cd)cqua|s

( 55. 3I ?5)
O. 23

1hc mcrit oICramcr` s V is that its va|ucs arc always bctvccn O
and l . A va|uc oIO i ndicatcs no rc|ationship (thc obscrvcd numbcrs arc
thcn idcntica| to thc cxpcctcd numbcrs, so chi-squarc O). 1hc va| uc l ,
on thc othcrhand, indicatcs a pcrIcctrc|ationship (scc 1ab|c 3

2? Ior an
cxamp| c) .
Tahk J. 27 't i]c `c ' ! f,' , ft J I I ' II ' II It ' /: l i | : i i c u:clitucl .ti : i itti l lntti t i t
I 'i t ii i i l `\ I
Educational Level
Secondary L levels or more Total
Vocati onal or less
I ncome 2, 000 at maxi mum
more t han 2, 000
A| thoughCramcr` s V is a|vays | i mitcdbctvccn O and l , i tis notcasy l o
i ndicatcvhcnarc|ationshipi s` vcak` or` strong` . Contrarytorcscarch . u
thc natura| scicnccs, i t i s vi rtua| | y impossib|c to Indva|ucso I C|amc|` s
V that cxcccd . in mostsocia| scicncc rcscarch. Morcovcr, i ncommou
rcscarch app|ications a va| uc oI. i sconsidcrcd cxccptiona||y high. Fo
cxamp|c, thc rc|ationship bctvccn cducation and incomc vi | | ncvc| he
pcrIcct (Cramcr` s V = l ) bccausc othcr Iactors a| so p| ay a :o| c, sucu . s
vorkcxpcricncc, vcck|ynumbcr oIhours oIvork, typc oIoo, auJ sex.
Wcproposcthc Io| |ovingindi catorsIorthcstrcngthoIarc|ationshi p.

- 0 .lO vcryvcak
. l O- . 25 vcak
. 25 . 35 modcratc
. 35 . 45 strong
- .45 vcrystrong
Asa mcasurcoIassociation, Cramcr` s V i scommon| yuscdvucual ' cas|
oncoIthc variab|cs i snomi na|andboth variab|cs donothavc |oo mauy
catcgorics. 1hcrcIorc, i nmanyinstanccs both variab|cs vi | | bc uom. ua| ,
oronc may possib|y bcordi na| . otc that thc variab|cs education c\t`
and income i n our cxamp|c arc dichotomous and thcrcby havc . u lc|va|
charactcristics(sccscction l . 2). 1his mcans that othcrmcasurcs o|asso-
ci ati onthatprcsumca highcr | cvc| o|mcasurcmcu| app| y tothiscxa.i p| c
asvc| | andvi | | | caJ | ol uccxacl samca|so| u| cva| ucasCramcr`s v. -'
1o :cstvhcthcra ca | cu'a | cu| |au|` s V-va | uc J. ||c|s Iom 0, luc cu. -
squarctcstcan ocuscJ w| . c. . su | | i c. . . | | oosc|val | ousa|c p|cscul . l | Cocu-
ran` s ru| c is uol sa| . s | i d, . . . l' X : I l' l | . s| suou| u oc uscd . usl cad. l u o| uc|
vords i I l uc V< l i l l l' l ( l l i ' l l i Sql l : l l l ' . s s j u . | . | . i | | y J. | ||cul | |ou 0,
Cram-r` s V . s : ( i / i ll i q . | | | | | i i i | | y d i l l vl vl l l as wc ' ' , | | | uc | al l c| . s J. -
rcct|yJc|. vcJ ' . . |I l l ' I PI I I I I ' i
Wc cuJ l u s s. . 1 1 1 1 1 1 v . . ' 1 1 1 1 \ ' " " l ( l h ' . . . I . . . a|` s V . s | uc onl y
co||ccl me: I SI I I I ' 1 1 1 1 1 ' " | . | ' . . \ \ . . ' . l ( l l l ' ', i i i ) J I . s wu l uc| . | |c' . | -
l ions u. p ex i st s I l l ' I \\ . ! ' " t l i i ' | u\ ' '"' '' ' ' l i P/ / t I I I I I I I I I L I I ) . . J tililii : ti ul t '
JJreji! re11ces ( uo. | | u. . ' i 1 1 1 l l l l ' Nvl l t 1 l : 1 1 t ds ( sc:c Table \. .' h ) l l n l '. t c:y
shaJcJ cel l s show l l t ; l I H l l l l l l l' t i l hL t s l t : t vc: a s| |oug prc: ll. :1 ' I I T l ot kl i
ving part | cs. Dul cu ' al l t nl t rs l t ndt ' i ou: d | y havcJ| || cu | l . cs i 1 1 cuoos| ug
bctvccn | cIt wi ug ( au c ot Hi t l l i e; i l i nl rc: sl ) auJ Ch|| sl | au parl | cs ( a cu| -
tural intcrcst), wh | ' c Prol csl ; l l t l s | dout | uaul |y prcIcr Chri stian partics.
1hc rc|ativc|y | argc J| l lcreu c:s l gardi 1 1 g po| | tica| party prcIcrcnccs arc
rcI|cctcd i na strong rc| al i ousu| p h 1 w c: t t re: I | g| ous aII | iation and po|iti-
cal party prcIcrcnccs (Cram
r' s V . 39, p- va| uc < . 00 I , and di IIcrs si g-
ni Icant|y Irom 0 vith a| | commou va' ucs or a) . As a si dc commcnt, vc
voul d l ikcto addthatthcsc
at a arc 0+ m 2005 aud thcy suggcstthatpo-
| itica|partyprcIcrcnccs sti | l arc |c| a|cJ |o rc| | gi ousaI| iation,dcspitcthc
vc| | documcntcdproccsscsoIsccul arizal i on.
Table 3.28 Relationship between Religious Affliation and Political Part
Preferences (counts, percentages, Camer 's V and p-value)
Rel i gi ous afi l iation
Political party i n the
Netherlands Catholi c Protestant None Total
Chri sti an parties
79 1 26 39 244
38. 7% 63. 6% 6. 4% 24. 2%
Left wi ng parties
84 47 409 540
41 . 2% 23. 7% 67. 4% 53. 5%
Ri ght wi ng parties
35 21 1 1 3 1 69
1 7. 2% 1 0. 6% 1 8. 6% 1 6. 7%
Li beral party
6 4 46 56
2. 9% 2. 0% 7. 6% 5. 6%
204 1 98 607 1 , 009
1 00% 1 00% 1 00%
Cramer' s V = . 39, p < . 001
I n thc prcvious scction vcuscdchi-squarc andCramcr` sV to dctcrminc
thc rclationship bctvccn variabl cs oIvhi ch at |cast onc vas nomina| . II
both variab|cs arc ordina| , not on| y can thc strcngth oIthc rclationship
bctvccnthctvobcdctcrmincd,butsocanitsdirection. |orcxamp| c, i tis
obvious to cxpcct a positivc rc|ationship bctvccn educational level and
income: thc highcr thc |cvc| oI cducation attai ncd, thc morc incomc
carncd. Likcvisc, studics shov a ncgativc re| a| | oushi p bctvccn health
care and child mortality: lhc more a govcnmcu| | uvcs| s | u hca| l h carc,
thc lower chi | d morta | | t y wi l l he. | u hol h cases. C ' r: t nt ( r' s V i s uo| appro
priatc,as it |a| | s l oJcl ccl l hL di r +| | o. | 01 S i ) ', l l l l f l l l \ ' t l ' i : t i i ( I I I S i l l j l .
| | . l Ol l Hl l i r d ) l r d | | | i . i
l(cndal l ' s Raul< ' m l' l nf MM. f au h and l au C
Vau|| cc Kcu|1 1 ( 1 1 JO/ 1 1JX3 ) coustructcd a ran| corrc|ation to cxprcss
uo| on| y l hcprcscl l cc a1 1 d sl rcngth oIa rc|ati onshi p bctvccn tvo ordina|
variab|cs, bul a' so | uc di rcction. Kcndal | ` s corrclationrcachcs thc maxi-
mum vaucs o| I o| - | in a contingcncy tab| c vith ordina| variab|cs, in
cascal lobscrvationsarc| ocatcd onthcmaindiagonal(scc1ab|c3. 29).
Table 3.29 Perfect Positive and Perfect Negative Relationship
Hi gh
Low Moderate High
Kendal l ' s tau b 1
Low Moderate Hi gh
. '
Kendal l 's tau b -1
1ab|c 3. 29 i s hi gh| yhypothctica| as such situations vi | | rarc|y occu|, il
cvcr. I nthc socia| scicnccs it is morc rca|istic to nd thc o|1 J| agoua '
ccl | s|| cdto somccxtcntasvc| | . I n cascso I positivc|c' al | oush| ps, uu| ' s
(c. g. , rcspondcnts) scoringl ovonthc rst variab|c tenJ | o score l ow on
thc sccond variab|c and units scoring high on |hc | |sl var| ah| c l end l o
scorc high on thc sccond variab|c as vc||. Hovcvcr, thcrc wi l l a| so bl:
units vhcrc hi gh scorcs on variab|c l rc|atc to |ov scorcs on va|| ao' c 2
and vicc vcrsa. 1hcsc vi ol ationsvi l | causc thc positivcrcl ali onsh| plo hc
|css than l . Kcndal | constructcd hi s rank corrc| ation tau (i n notation l hc
Crcck | cttcr T oIIcn i s uscd) bycomparing thc numbcr oIpositivcly rc-
latcd obscrvations vith thc numbcr oI ncgativc|y rc| atcd obscrvations.
1abl c3 . 30 i sancxamplcIorcducationa||cvc| andincomc.
Table 3.30 Relationship beteen Education and Income, as Example 20
Respondents with Average Education and Average Income
(+ indicates positive relation, - indicates negative relation)
Lowest Low Average Hi gh Hi ghest
level :
I ncome:
Lowest + 20 + 5 - 3 - 4
Low + 5 + 1 0
- 2 - 1
Hi gh 4 + 1 5 + 1 0
Hi ghesl
. + 5 + 25
| u 1ao| c:LW. | u u . l d i ) hkd 1 1 1 1 1 \ ' 1 l 1 1 . | | us | a| :s| | |c. + | . 1 1 1 i i l l l l l l 1 ! 1 |. cu-
Ja| | ` s l au. 1u. s cc| | |`j sc . . | s . '0 c s o| i : |u| sw. | u av . . . j I v | s1 1 1 cuu-
ca|. ou auJ . ucomc 1yp. a | | , c| . . . . . | . ou auJ . ucomc
ls i | . vc| y |c-
|atcd, thus |cspouJcul sw. | ua | owo| vc|y | owcJucal . oua| | cvc| a|c | . |c| y
t o havc an i ncomc | owc| l uau l uc 20 pcop|c i n thc g|cy suaJcJ cc| | .
Li kcvi sc, pcop| c vi|h a u. gu o vc|y u . gu |cvc| oIcduca|i on vi | | typi -
ca| | yhavcahighcr | cvc| o|. ucomccompa|cd vith thcsc 2Orcspondcnts.
I ndccd, thi s i s thc casc |o| l uc |cspouJcul s in thc cc| | s markcd ' ` ,
amounti ng to 5 rcspondcn| s( 20 5 | 5 ' |O l 5 + lO + 5 +25) . 1hi s
mcansthat IorcachoIthc2O |cspouJcul s in thc grcy shadcd cc|l, 5 rc-
spondcnts bchavc according to thc prcsumcd posi ti vc rc|ationshi p. Si ncc
thcrcarc2Orcspondcnts in thcgrcy shadcd cc| | , thcrcarc5 2O l OO
combi nations vhich arc ca||cd concordant pairs. On thc othcrhand, rc-
spondcnts in cc| | smarkcd vi th a ' ` i ndi catc a ncgativc associ ati on. I n
tota| thcrc arc l (3 + 4 + 2 +l + 4 + | 2 2)rcspondcnts vhodonot
bchavc to thc assumcd posi tivc rc| ati onshi p, rcsu| ti ng in 3O (2O l )
discordant pairs. oticcthat thc cc| | si nthc samc rov and co| umn as thc
grcy shadcdcc| | (ca|cd ' ti cs ` ) arc not uscdvhcn ca| cu| atingthc numbcr
o|concordantanddi scordant pairs. |o|| ovi ngthi sstratcgy vc can ca|cu-
|a|c|hcnumbcroIconcordantanddi scordantpairsIorcvcrycc|| i n 1ab|c
3. 3O. Kendafl 's tau is si mp|y thc di IIcrcncc (notati on. S)bctvccn thcto-
ta| numbcroIconcordantanddi scordantpairs,di vi dcdby a ccrtai nnum-
bcr to kccptau i n thc rangc oI- l (pcrIcct ncgativc rc|ati onshi p) and +I
(pcrIcctposi ti vcrc|ationshi p).
Kcnda| | uscd hrccdi IIcrcntdcnominators Ior tautocnsurcarangc oI
(- l , +l), rcsu|ting in thc cxi stcncc oIthrcc di IIcrcnttau mcasurcs. tau a,
tau b, and tau c.
1au b and tau c arccspcci a| | yi mportant in socia| sci-
cnccrcscarch. Taub canrcachva| ucs- I andl vhcnaconti ngcncytab|c
has an cqua| numbcr oIco| um

s and rovs ( ` squarc` tab| cs). 1au c can

rcachva| ucs- l and l in ' rcctangu|ar` tab| cs(anuncqua|numbcroIrovs
andco| umns). Thus,taub is mostsui tab| cIor squarc tab|cs vhi | ctau c i s
most sui tab|c Ior rcctangu|ar tab| cs. Hovcvcr, thc usc oIdi IIcrcnt dc-
nominators docs not a|tcr thc i ntcrprctati on. Kcnda|| ` s tau is positivc
vhcn thcnumbcr oIconcordant pairs cxcccdsthcnumbcroIdi scordant
pairs and is ncgativc vhcn thc numbcr oIdiscordant pairs cxcccds thc
numbcroIconcordantpairs. 1hcstrength oIthcrc|ationshi p cxprcsscdby
Kcnda| | ` stau can bc dctcrmi ncdusi ngthc ru|cs onpagc 3. Handca|cu-
|atingthc concordantand di scordantpairs in Kcnda| | ` s taub andtauc is
hi gh|y timc consuming but can casi | y |c donc in mos| s| ati sti ca| pack-
agcs, i nc| udi ngSPSS.
1o stati stica| | ytesl wucl uc| Kcuua | | ` s l a. . s. u . | cau| | yu. ||cs ||om O
thcnorma| samp| i ugJ. s| | . ou| . ou .. . . . | 1 1 snl whL 1 1 t i l L s: l l l l ) l k s . .ccqua| s
1 1 1 1 1 r 01 1 i l1 1 1 : i 1 1 1 1 i : l l l 1 : r
l l /
. moc|
| i o|mo.c ( s`L` 1 - i ) ', l l l l ' I < 1 ) < .

. . . . d | y, 1 1 1 1 assumcs l ua| l uc l o| a| . |
o|couco.uau| . 1 1 H| d i SI' I I I d: l l l l p:ms ac cqua| , so' 0auJKcuJa | | `
l au

| J
11 1 O
va| uc
0 |o l csl wuc| u r | | . . . os

.vcu ` s . gu. uaul y . c|s rom , a z

l l
) ovc|
auJ assoc. al cu p-va| uc ac ca| cu| alcJ Wcn |c sampc si zc is r
| uau 30, . l . s aJv . sao| c lo usc au cxact-tcst i nstcad bccausc thc sau
[ ng
di stribu|i on is p|ooao| yuo |ongcrnonua| | ydi stributcd.

| I

1o conc|udc | . s sccton, tvo cxampcs rom socm sctcncc rc
vi | | bc givcn. Thc rst cxamp|c rcgards thc rc|ationship bctvc-
t c
respondents ' educational level and spouse 's educational level to
mi ncthccxtcnt oIcducationa|homogamy (scc 1ab|c3. 3 l) .
Table 3.31
Relationship between Educational Level and Spouse 'S
cational Level
Educational level (respondent)
ota I
Lowest Low Average High Highest

Lowest 21 1 8 3 2 1
39. 6% 7. 7% 1 . 2% 1 . 0% 1 . 2%
Low 1 9 1 26 67 26 5
35. 8% 54. 1 % 27. 8% 1 3. 1 % 6. 2% 3
1 %
Average 8 52 87 64 1 4
. 1 5. 1 % 22. 3% 36. 1 % 32. 2% 1 7. 3% 2
Hi gh 4 33 67 81 28
21 3
7. 5% 1 4. 2% 27. 8% 40. 7% 34. 6% 26.
Hi ghest 4 1 7 26 33
1 . 9% 1 . 7% 7. 1 % 1 3. 1 % 40. 7% 1

Total 53 233 241 1 99 81
1 00% 1 00% 1 00% 1 00% 1 00% 1
Kendal l ' s tau b . 45, p (one-tai l ed) < . 001
1hcgrcy shadcd cc||s i n1ab|c3. 3 l - vhi chho| dthchighcstpcrc-
pcrco|umn - suggcstthat thcrc i s a positivcrc| ati onshi pbctvccn

c or-
dina| variab|cs. thc hi ghcrthc rcspondcnts` cducationa| | cvc| , thc

thc cducationa| | cvc| attai ncd by thci r spousc. 1hi s trcnd i s a|so


atcd vi th a strong posi ti vc rc|ati onshi p (

45), and bccausc thc p-v

l u

c t
vcry sma| | (p < . O0| , i | i s cxtrcmc|y un| i |c|y | ha| | hi spositi vc rc
shi pdocsnotcx. s| s . u l ucpopu| a| . ou | u lc|cs| . ug| y, lucdatai n1ab|

3 l
vcrcco| | CclcJ. u | | | Nc| uc | . ous. u 2000,suggcsl . ug l ual cvcnnov

| |
cJucati oua| |cv | sc us . . . po|| au| w | u c uos . upa pa|| uc| a p 1ev
uouca| | cJ educultnuul ltnlltl l.l :l ll l l i ' ).
7ns 1 1 p
Ou sccouut' X! I I I I J I I I | + 1 1 1 1 1 1 1 . . I P) l l \ ' : i i i L ' ! H i y d 1 scusscd: l uc |c a| .
l l l l l | l (
|ao| c
ocl wccu rc.\'f ll ll/r l / / Y u t // / /O t ' i 't ' 1 1 1 1 1 l l l f '( l lll< ' c OSS scc
' ' 2)
Tabl e 3.32 tlOltOnSlitt /ul i i : t ii l.t lur OlttuOl l. :i 'tl nO lt i iii
Educati onal level
Lowest Low Hi h rota I
I ncome Less than 39 T1 75 38 260
3, 000 61 . 9% 38. 3% 27. 5% 1 5. 7% 7. 4% 27. 7%
3, 000- 22 1 1 1 1 06 86 21 346
5, 000 34. 9 42. 0% 38.8% 35. 5% 22. 1 % 36. 9%
More than 2 52 92 1 1 8 67 331
5, 000 3. 2% 1 9. 7% 33. 7% 48. 8% 70. 5% 35. 3%
63 264 273 242 95 937
1 00% 1 00% 1 00% 1 00% 1 00% 1 00%
Kendal l ' s tau c - . 36, p (one-tai led) < . 001
Again,ccl | svi ththchighcstcol umnpcrccntagcsarchi gh| i ghtcdi n1ablc
3. 32. 1hcrc appcars to bc a positi vc rc|ationship bctvccn cducationa|
| cvc| and incomc. thc hi ghcr thc cducationa| | cvc| oIa rcspondcnt, thc
highcrhi sorhcrcarni ngs.1histcndcncyisa|sorcI|cctcdinKcnda|| ` s tau
c (bccauscoIthcrcctangul ar tab|c), vhichindicatcsastrongpositivcand
signicantrc|ationshi p( . 3, p(onc-tai l cd)< . 00 l) .
Spearman' s Rank Correlation
can al so bc cxprcsscd as thc di IIcrcncc in rankordcr, as argucd by psy-
cho| ogistChar|cs Spcarman ( I 3- l 45). SupposcIivcrcspondcnts cach
havc a di ||crcnt l cvc| oIcducation. Wc can assign rank scorcs to thcsc
indi vidua| s that corrcspond to thcir rcspcctivc ranking oIcducation. 1hc
rcspondcnt vho has thc |ovcst cducationa| lcvc| is assigncdthc scorc l ,
thc rcspondcntvi th thc sccond | ovcstcducationa| |cvc|rcccivcs scorc 2,
thc midd|c catcgory cqua|s scorc 3, thc sccondhighcst cducationa| |cvc|
cqua|s scorc 4, and thc rcspondcnt v| th thchighcstcducationa| l cvc| is
rankcd 5. Ncxt,incomci sramcd i nthc samcvay. Novsupposcthatthc
variablcs educational level and income arcpcrIcct|yrclatcd. Inthis cvcnt
thc ranking oI cducation pcrIcctly matchcs thc ranking oIincomc, and
Spcarman` s rank corrc|ation (oItcn i ndi catcd vith _) cqua|s l . Whcn
thcrc is no rc|ationship bctvccn cducation and incomc, ncithcris thcrc a
rc|ationshipbctvccn thc rank ordcroIcducation and incomc(rankcorrc-
l ation= 0). lina||y,vhcnapcrIcct|yncgativcrc|ationshipcxi stsbctvccn
cducational | cvc| and incomc, thc rank ordcr oIboth variab|cs pcrIcct|y
opposc cachothcr(rankcorrc| ati on - I ) . Tao| c3. 33 coul a. ns dctai | sIor
Spcarman` srank cor|clation.
Tahl t J.:\J .'j ii iii i i i

\ l. i i il |

i i :iliOn i i illi tjiit i l ( l l, lt iOt tt t/t l:nl

{ |), i tii O | I

\ i l ( l | ltOiitn_S.
resp. Education |
I ncome I ncome I ncome
A Lowest Lowest Hi gh 4 Hi ghest 5
B Low 2 Low 2 Lowest Hi gh 4
L Average 3 Average 3 Average 3 Average 3
D Hi gh 4 Hi gh 4 Hi ghest 5 Low 2
E Hi ghest 5 Hi ghest 5 Low 2 Lowest
Rank correlati on: 0 - 1
* r - Ranki ng of Educati on and I ncome
Spcarman`s rank corrcl ation i s cal cu|atcd using thc rank scorcs oItvo
ordina| variabl cs. 1o prcvcnt thccorrcl ationIromIa| li ngoutsidc oIt hc- |
andl rangc,rankscorcs arc rststandardizcdintoz-scorcs, c| i mina|ing
|hc i nucncc oIvariab|cs mcasurcd i n di IIcrcnt units. lor cxamp|c, . u
1ab| c 3. 33, thc variab|cs educational level and income arc di Hcu| t |o
comparc bccausc thc ranki ng i s mcasurcd i n di IIcrcnt uni ts ( l cvcl s vs.
incomccl asscs).Ancasy so| utionIor thi si ncomparabi l ity i st otrans|orm
thcm into z-scorcs (scc scction 2. 3 3) . Ncxt, Ior cachunit oIana|ysis(ol-
tcn rcspondcnts), thc tvo z-scorcs arc mu|tip| i cd and summcd across a| l
units t oa total. 1his tota| sumoImu|tip| i cd z-scorcs rcachcs a posi tivc
maximum i IbothrankordcrsmatchpcrIcct|y. Convcrsc|y, thc total sum
has a maximum ncgativc va|uc vhcn both rank ordcrs pcrIcct|y opposc
cach othcr. Hovcvcr, morc units rcsults i na highcrtotal sum. 1hcrcIorc,
thc tota| sum i s divi dcd by thc l o| a| numbcr o| units (n), rcsul ti ng i n a
valuc that alvays Ia| | s hcl wccu - | ( max. mum ncga|i vc association) auJ
+l (maximumposi tivcassoc. a| . ou i , wh. | c0 mcausuo associationat a| l .
So,thcranksco|csa|c ' |s| | |. . | . s | nu . u| o.-sco|cs (a proccsscal lcd
` standardizati on` ) auJ | | . ` s| . . . |o| : l v . a| ou o..comcs | hc uu. | o| mcas-
urcmcnt. 1hc|c |ol C, | . . . . . . . s | ' . . . | . s| . . . . | . . o' ucv . . . | | ouchaugco| I . u l hc
ranking ol va|. ao| ` .cs . . ' | s . . . . c ' . . . . . , . or i s | . . . J. . |J Jcv . al . ous . u l hc
ran|ing o| va| uo| y (
. . . . . . | ' . , \\ | | i ' | | i . | | . . l s . , . |. sc o| | sl au-
Ja|J Jcv. a| . ou . . . | . . l i u| . j i | | ' i l ' ' - l H | . ' vv ' ' . . . : |c| . |u o|. 5 sl au-
Ja|J Jcv. a| o. . s . . | ( . .

l u duj ' y ; \ t lu l t i ' "' ' ' ' J l ' | . . | . O| . s| | u| t uJ -

I I 1 1 J l l l l 1 , \
ca| cs | uc cx| c. | | | o whi ch ! I l l' v: 1 c 1 : c hks ' r: 1 n k i n s ui +\' I | ou. . . . | . ul hcr.
| | i gh| y s . m. | a | |au|o|u |s lsul | i 1 1 1 | j| . |au| .o|cl a | . us( 1 1 1 : 1 \ 1 1 1 1 1 1 1 1 1 | i ,
and opposi ng|an| i us |cs u| | . uI l l a| vc |au | co||c| a| . ous( . u . . . | . | . . . u- | .
To tcst Spcarman` s |au| o|| 1 : 1 1 . ou |o| samp|.s w. l u a| l casI 30 ob-
scrvations, thc |-d| s| |i hu| . ou cau oc uscu ( scc |. gu|c 3 8) . !ga i u thc t-
va| uc indicatcs thc |c| al . vc J. s| aucc c| vc.n thc obscrvcd rank corrc|a-
tionandthccorrc|ationsa. cJ . u.u u | | hypolhcsi s.
|i na|| y, thcassoci-
atcd p-va|uc is comparcd vilh lhc sc|cc| cJ | cvc| oIsi gni Icancc (a). lor
samp|cs vi th |css than 3O obscrvations, . l is prcIcrab| c to usc an cxact-
tcst, vhcrc thc p-va|uc is ca|cu|atcd hascd on lhc distributionsoIx andy
variab|cs, assuming that thcrc is a zcro ran| corrc|ation in thc popu| ation
(- H,). Agai n, thisprobabi |ity(p)i scomparcdtothc| cvc| oIsigniIcancc
(a) to tcst vhcthcr thc obscrvcd rank conc|ation di IIcrs Irom O in thc
popu|ation (-Ha).
As vi th Kcnda| | ` s tau, Spcarman` s rank corrc| ation can bc uscd Ior
vari ab|cs mcasurcd at thcordi na| | cvc| . Whcthcr Kcnda| | ` s tau or Spcar-
man` s corrc|ation shou|d bc uscd dcpcnds on thc rcscarch qucstion. to
comparc Ii ndings Irom prcvious studi cs, it makcs scnsc to choosc thc
samc mcasurc oIassociation. Additiona| |y, thc advantagc oI Kcnda||` s
l au . s i tscascoIi ntcrpr

tation - namc| y, thcdi IIcrcnccbctvccnthcnum-

bcro|concordantandd| scordantpairs.
ThcbcncItoISpcanuan` srankcorrc|ation i sthati ti ssi mi |arto Pcar-
son` s corrc|ation cochcicnt (scc scction 3. 5. l ), vhich is an important
mcasurc oIassociation Ior intcrva|-/ratio variab|cs. Hovcvcr, usc oIthc
|attcrmcasurc rcquircs an approxi matc|y | incarrc|ationshi p, vhi ch i snot
rcquircdIor Spcarman` scorrc|ation(Iorancxp|anationoI| incarassocia-
tions, scc scction3. 5. 2). ThcrcIorc, rs providcsabcttcraltcnativc to dc-
scribc a bivariatc non| i ncar rc|ationship bctvccn intcrva| variab|cs.

li gurc 3. 34 providcs an cxamp|c oIa non|incarassociation bctvccn age

(mcasurcd i nycars) and thcBody Mass Index. Bctvccn l and 33 ycars
oIagc, thc mcan BMI riscs strong|y and docs not risc much Irthcr aIcr
that. 1hus, thc BMI docs not constant|y incrcasc lincar|y bctvccn l and
?O ycars oIagc. Pcarson` sconc|ation cocIci cnt Ior this rc|ationship i s
. 22, but thi si sa mi srcprcscntation oIt hcrca| non| incar association. Thc
strcngthoIthc rc|ationship bctvccn age andSM i s cxprcsscd morc ap-
propriatc|ybySpcanuan` srankcorrc|ationandi sstrongcr. .2 (p . OOl ).
|i l l 1 1 1 1 I l l I l l > 1 1 1 1 1 . l it : : I
' U
:: Z
1 U Z3 Z U 33 3U 3 U b 3 bU U3 UU
Figure 3.34 Nonlinear Relationship between Age and BM (rs . 29)
: I |
This scction discusscs thc statistica| too|s avai | ab| c Ior intcrva| and rat . o
variab| cs. |irst|y, Pearson 's correlation coefficient can bc ca| cu| atcd
Sccond|y, regression analysis can bc uti |izcd to dcmo

stratc, or i n-
stancc, thcavcragc vcight i ncrcasc Ior cvcry sing|c ummcrca

c m agc

|i na||y, thi s ana|ytica| tcchniquc can bc uscd to dctcrmmc thc mucncc

cxcrtcdbymu| tip| cvariab|cssi mu|tancous|yonadcpcndcntvanab|c(y)
Kar| Pcarson ( l 85?- | 936 su jcs| cJ | ua| | hc |.| al oshi p hcl

ccn tv
i ntcrva| orrati o va|ia .s . . . | oc . u. . | yzcu us| uja | i ncarcorrc| al i oncoc| -
| ci cnt nov uovu : 1 s / ' . |/ / /

: ti i :: tliiii I 'I W//icif ' llf , .. +. . . r) ._

1hi s..u . .. equ: t l s | | c 1 1 1 : 1 \ l l l l l l l l l v: t i i i L' ul ' | vh.u . | - u. | . uc|casco|
l hcvar| ao| c^ | s : I SS( Il' I : C kd wi l l t .1 I . . . . . | I l l \ 1 \ ' : C S\' or v. . | a| c ( scc|i gu |c
3 3 5 )

Th. s rcl ; 1 1 i u1 1 si i i J I 1 . t l l c d . ( HI i l ' l I J II I' I I I Vl ' lni : t i ns.wciotio11 . T|_1c

| . uca|assoc. . . | | . . 1 '1
wc i 1 ' 1 l l \ 1 1 ) ' 1 1 1 1 \ ( I I ) i l vv ' I V o . | . . c|casc o| x
|.su . s . a I 1 1 1 1 1 1 t ;t ' l l t lt 1 1 1 y ! l.
Figure 3.35
Pearson 's Correlation Coc{{ticn/
I l l f l ! l l l l l \
Tabl e 3. 3(, / 'c . tt t t

,,. I

O/t. //itt
t n}Jcictt/ ]t 1 11 1.1 :/11 l |: ilt/
Hei ght p (one-tai led)
Weight . 52 p < . 001
Rcca| | that variab|cs oItcn havc di |lcrcnt units oImcasurcmcnt. lor cx-
amp| c, thc variablcsbody weight andQCarc mcasurcd i nki lograms and
ycars, rcspcctivc|y. ln scction 2. 3. 3, vc dcmonstratcd that this prob|cm
can bc solvcd by transIormingthc mcasurcs i ntoz-scorcs. Likcvisc thc
original scorcs oIboth variab|cs uscd to ca|cu|atc Pcarson` s corrcltion
cocIIcicnt arcal so transIormcd i ntoz-scorcs. ov, pcr unitoIana| ysis,
thcsc z-scorcs on x and y arc mu| tipl icd and na|ly summcd across al |
tnits. 1his
total sumi spositivc vhcnt hcl i ncarassociation i sa| so posi-
t| vc, andvtccvcrsa. lurthcrmorc, ustas i nSpcarman` srank corrc| ation,
thctota|sumtcndsto bchighcrvhcnthctota|numbcroIobscrvcdunits
(oItcn rcspondcnts) i s hi ghcr. Divi di ng by thc tota| numbcr oIobscrva-
tions rcsul ts i n a corrc|ation that Ia| | s vithin thc rangc - l and 13 1 1hc
corrc| ationcoIci ct alvays | i csbctvccnthcsctvocxtrcmcsand cquals
O vhcnthcrc l S no | mcarassociation. 1his, hovcvcr, maynotmcanthat
thcrc i s no association bctvccn thc variab|cs, as non| incar association
maycxist(sccli gurc3. 34).
As vas mcntioncd bcIorc, thc scorcs on thc origina| variab|cs vcrc
IirsttransIormcdi ntoz-scorcs vi th thc standarddcviationas thciruni toI
mcasurcmcnt. 1hcrcIorc Pcarson` s corrc| ation cocIIicicnt i ndi catcs that
vhcn thc scorc on onc variablc (x) i ncrcascs by I standarddcviation thc
scorcon thcassociatcdvariab|c(y)vi | | i ncrcascbyanumbcroIstanaard
dcviations cqua| to r. In chaptcr 2, vc graphica||y dcmonstratcd a rc|a-

si p bctccn hcight and vcight (scc ligurc 2. 2). !umcrica| | y, thi s

posittvcrclat|0nshipcanbcdcscribcdvithaPcarson` scorrclation, vhich
amounts to . 52 (scc 1ablc 3. 3). 1hcrcIorc, Ior cvcry standard dcviation
i ncrcasc oIhcight, vcight i ncrcascs on avcragc by . 52 standard dcvia-
tions. An i ntcrprctation oIthc strcngth oI Pcarson` s corrcl ati on cocI-
ci cnts can bc Iound on pagc 83 As w. | u Spcaan`s rank corrc|ation
Pcarson` scor|c| aion | s s| a| . s| | ca | | y | cs| cJ usi ng a | -J| s| | | oution vhcnth
sampl c has at | cas| 30 oosc|va| | ous. I n sm: d kr saup| cs, | uc cxact-tcst
providcs amo|capp|op|| a' c: dkrn: 1 1 i v .
T| . co|. bt i PI I l 'I WI I I I 1 1 ' 1 1 1 1 : . l i i i i i i i i Ui i l y uscd ; 1 s a nJ <: ; I surc. nl ' assoc i a-
l i nn. | | owc vc|, | u s l l l l ' l l Sl l l l' : I SS I I I l l L' S :1 | i nLar |c | a| i ous u| p, wu . cu cau o.
.| . c c|cJ graph i ca l l y u :1 l l l l L' ) ', 1 : 1 ph ( scc | . gu|c 3. 34) auJ u umc|. ca| | y oy
. ompa|. ug t w. | u l_ wu r t / . uJ| ca| csa uou| | uca|rc|ationship. Onc
d i saJvau| agc o||ca|sou ` s co||c| a| . ou | s its scnsitivity to cxtrcmc scorcs
( ou| | i crs), espcc. a| | y wucurCl at | Nc|y |cvobscrvationsarcprcscnt.
I n ou|cxamp| c, |ca|sou`s cot:clationcocIIi cicnti ndicatcsthcrelative
cuaugc in vcight associatcd vi th changcs i n hcight. Hovcvcr, anothcr
i n| c|csting qucstion rcmains unansvcrcd. on avcragc, hov many ki | o-
j|amsarc addcdto body vcightvhcnbodyhcightincrcascsby aparticu-
l ar ( abso| utc)val uc?1hcansvcrcanbc Ioundinthcncxtscction.
I | ucarrcgrcssion analysis i sconncctcdtothcvorkoISirlranci sCa|ton
( | 822- l l l ) vhorcscarchcd hcrcdity (novadays labc|cd ` gcncti cs` ). lor
e amp|c, hc studicd thc rclationship bctvccn succcssivc gcncrations o|
swcct pcas. 1hc sizc oIthc pcas vi thi n a givcn gcncration providcd a
c| oscprcdi ctionoIthc sizcoIthc ncxtgcncration. Morcgcncra| | y, Ca| ton
o.monstratcd thatva|ucsonadcpcndcnt(y) variab|ccan bcprcdictcd by
scores on ani ndcpcndcnt(x) variab| c. 1histcchniquc is ca|l cdregression
|n |i gurc 2.2 vc dcmonstratcd thatta| lcrrcspondcntstypica| | y vc| gu
1 1 ore. Bascd onPcarson` s corrclationcochcicntthis association appea|s
lo ocquitcstrong(scc1ab|c 3. 3). In addition, i ti sa|sopossib|cto ind. -
ca| c how many kilograms somconc` s vcight vi | l i ncrcasc on avcragc
v| | hagivcnuni tincrcasc oIbody hcight. Again,thcrc|ationshipbctvccn
/ui height and body weight is assumcd to bc approximatc|y l i ncar Ior
| uc rcspondcnts in thc samp|c (i . c. , rcspondcnts vho mcasurc bctvccn
|:0 and2O5 ccntimctcrs). 1hismcans that vcight incrcascs ata constant
| : . ctor. 1his Iactor i s rcprcscntcd oy a rcgrcssion | i nc that can bc Iound
1 s| ug dataIroma scattcr p| o| . 1u|cc a | | emp| s lo lnd this | i ncvcrcmadc
i . |i gurc 3 . 3?. Li ncs | auJ 3 Jou ` | sccm |o |cp|cscut thc | incar i ncrcasc
vcry vc| l . According |o | | uc | | u wc| gu| | uc|cascs loo Iast (too many
l l osc|vations c| us| c| oc| ow t h l i n ) , : n1 d aLLrdi ug |o |i uC 3 thc vcight
| scs toos| ow| y( | o ma . ohsl' l v: i l i \ l l l S ; i l wv t he | | uc i . || uc 2, hovcvcr,
Jocssccmto p|ov| J a ) (I H H I : l pj l l i l ' l l l l : l l i ol l t 1 s i t ro1 1 u | y |uusthroughthc
. . . | oo| c o|a|| oosLlvat i l l l 1 1 1 l l w wn l t l ' l pl 1 1 1 ( l l l l l' . . c| ua | | y . s oascJ ou
c su | | s hom |c r ssi o1 1 n l l : t l y l l ': ) I 1 1 1 1 ' 1 1 t I I l l ' l ', I I: J i s i 1 ca| cu | a| | ug| u. s|c-
j|css| ou | | uc| s l o | i t 1 V t l w I 1 1 Li l ' d l l l l 1 1 t i i l l \ l ' l l l l ' : i l d1 st anccs oc| wccuoo-
Sl' rvat i ons and . | .c I I ' J', I l " ' i l l l l l l 1 1 1 1 l l j l l d 1 1 1 /t ' l l l
-1 05
1 00




C 70
"( 65
s 60
( )( )
( )

C C ( )( ) ( )
| | u . | :| . l
C |
8 g @, , - v -
1 50 1 55 1 60 1 65 1 70 1 75 1 80 1 85 1 90 1 95 200 205
Hei ght (i n centi meters)
Figure 3.37 The Relationship between Height (in centimeters) and Weight
(in kilograms) and 3 Lines Representing the Linear Tendency
1 1 0
1 05
1 00
g 80
: 65
" Q 55
s 50
Diference (' error') between observed weight
and predi cted weight
1 5 kg
b coefi ci ent 1 5 l 20= .75
1 50 1 55 1 60 1 65 1 70 1 75 1 80 1 85 1 90 1 95 200 205
Hei ght ( i n centi meters)
Figure 3.38 Rc_rcxxiuu |iuc /tr::u/iur; /li: lucur Relationship
/c/ uc:u l /:i,j/|t nt. / |c i| /|i
| . . | | us| |a| v pt t qH l: .. 1\ ,. | . | . | . j| | | . . | u. . L\ X | vc .cspo. i Jcu| s
| o c| uc. w. | u . | | |. , . . s . . | . . . . t t ( l . 2 rro m | . gu|c 3. 37. Ouc |cspou|u|

.asu|. ug 1 60 L i | . . . . . | . . s | i . . s. . . . ooscvcJ wc. ghl o|' 45 |i |ogramsanda

pcJ. c| cJ wc. gu| o| | | | | . | oj|a. us 1h| s( vcr| i ca| ) di IIcrcncc (ca| | cdcrror,
|cs . Jua| o| c | s - 1 5 | . | og.ams ( 45 O). Thc di IIcrcncc Ior thc othcr
cspoudcul mcasu|. ug 1 60 ccnli mc|crs but vci ghi ng ?5 is l 5 ( ?5 - O).
| o|oucrcspondcn||hcobscrvcdvci ghtcqua| sthc prcdictcdvci ght(di I-
| .rcncc. ?5 ?5 O), vhi | c Ior thc rcmaining tvo rcspondcnts (mcasur-
. ug 2OO ccntimctcrs)thccrroramountsto 5 and - l 5 ( l O5 - O and ?5 -
90) . 1hc sumIora| | vcdi IIcrcnccscqua|sO(- l 5 l 5 O+ l 5 - l 5) .
Havi ngca|cu|atcd thc rcgrcssi on | incvccan casi |y dctcrmi ncthc av-
c|agc | i ncar vcight incrcasc vhcn hci ght incrcascs by l ccntimctcr.
Va|hcmatica| | y, thi s Iactor is knovn as thc gradicnt, but in rcgrcssion
ana|ysis i t i s rcIcrrcd to as thc b coefcient. Thi s cocIIci cnt can bc dc-
|. vcd Irom li gurcs 3 . 3? and 3 . 3. |n li gurc 3 . 3 rcspondcnts mcasuri ng
1 60 and l O ccnti mctcrs arc comparcd. 1hc rcgrcssi on | inc prcdicts that
| ucy vci gh O and ?5 ki | ograms, rcspccti vc| y. A 2O ccnti mctcr i ncrcasc
(1 80 - l O) thcrcIorc rcsu|ts i na prcdi ctcd i ncrcasc oIl 5 ki | ograms ( ?5
- O). So, thc b cocIIci cnt is .?5 ( l 5/2O), mcani ng that an i ncrcasc or I
ccntimctcron avcragc rcsu|ts in . ?5ki | ogram morc vcight. Gcuc|a| | y, | uc
o cocIci cnt i sthc changc i n unitsoIy (changc i nki | ograms i nl hi scx-
amp| c)associatcdvitha -unitchangci nx ( I ccntimctcri nthi scxamp| ci .
1hc intercept orconstant(a)i sa|soani mportantparamctcri naJJ. l | ou
| o thc b cocI|ci cnt ( b)ascvcry straight | inc can bcmathcmal i ca| | y Jc

sc|. hcdasy = a+ bx. In our cxamp|cthi si s. body weight = a + b * body

height. Thc i ntcrccpt (a) i s thc va| uc oIy vhcrc thc | inc crosscs thc y-
ax| s. It canbcIoundsimp|ybycxtcndingthc rcgrcssion| i nctothc y-axi s.
1hc y-axis originatcsatx O. Whcnthcrcgrcssion | i nc i nligurc 3. 3o . s
cx|cndcd to O (hci ght = O), thc vci ght drops Irom Oki |ograms (hcigh|
1 60 ccntimctcrs)to-Oki | ograms(O O * . ?5 -O),sccli gurc 3. 39.
60 kg
- - - - - - . - - - - . - - - - - - - - - - - . . - . - . - - - - - - - - - . - . - - - - - - - - - - - . . - - - - - - . - - - - . - . - - - . - - - - .
1 60 cm
ll .W l 40 0. 75
Fi gure 3.3'> | ut. | | | t ( Ol ( l t ' ""' ' 1/ \ "r f l l l l f , | / l tti u / | 'ul}fIilu/ (h)
( 1 ' 11! 11 / l ' r ' t l / 1 11' 1 t/l t / 1 1 1 / l f l l l t l t 't /\ 1 ' 1 1/ t l
I J +- 1* f
( ) f COUrSL:, l l ( l |. . d 1 \ l l' : l l l l l l ! '. \ 1 1 1 1 l a : t l l : t \ ' l l nl l o l l l l' l l l i l' l l l ' pl ( 1 1 ) I l l l l 1 1 . ' L'X
amp| c ocausc |u ' | !' : I l l' l l : |u| s w. | u :.cro I H :i ghl 1 1 1 t | H S: t l t t pk,
h. ghl igh| . ug | u Jaup` \ 1 1 ' \ l r: t pui : I I I Oi t . Tu . s, uow v . , Jos l l l l l 1 \ IL:au
thatt hcrcsu l |sa.

c . uv. t | . J l l l l l l t l' l spoul |u| s . u| hcsaup| . Ti l L: ltltS-

sion equation i s. hod); W< ' i,!,11 ( l ( ) ' . 1' * hody height, auJapp| . s|oa| |
rcspondcnts rcprcscu| cJ . u| u s s. . u| ( | ucya| | mcasurc hc|wccn | 5O and
2O5 ccntimctcrs). l| is uu|uowu wi t t l t cr | u. s cquation is va|id Ior pcop| c
tal |cr orshortcr than |h. s, ou| wc |uow |o| su|c that thc cquation i s not
val i dIorncvboms(hcigh|J 50ccu| .u| |si .
I nourcxamp| c, vcdctL|m. ucJ a auJo |ough| yIromthcligurcs 3. 3?
and 3. 3o. l nstatistica| pac|agcs | hc |cg|css. ou cstimatcs Ior a and b arc
ca| cu| atcdusi ngatcchniqucca| l cJ o|J. ua|y | cas|squarcs (OLS).

on this tcchniquc, thc ' rca| ` a and o cqua| -50. 54 and . 3, rcspccti vc| y
(scc1ab|c3 . 4O) .
Table 3.40 Linear Relationship betweLn Height and Weight: Estimates
for a (constant) and b (b coefficient)
Dependent vari abl e (y):
Weight (in kil ograms)
Constant (a)
b coefficient for Height (i n cm) (b)
Esti mates
-50. 54
. 73
Si gnificance level (p)
(two-tai l ed)
< . 001
< . 001
l n1ah| c3 . 4O,tvo-tai | cdp-va| ucsarcrcportcdIorthcconstant(a)andthc
h cocIIci cnt (b) Ior height. 1hctcstIor thc intcrccpt (H,. a O) i s not
rc|cvantasitrc|atcstonon-cxistingrcspondcnts(hcight -O). Thcp-va|uc
lorthcheight h cocIIcicnti suscdtotcstvhcthcritsval ucdi IIcrs si gni -
cant|y Irom O i n thc population

Bccauscp i smuch sma| | cr than 0 vc
cansaIcl yrc cctthcnul | hypothcsisandacccptthca|tcrnativchypothcsis,
Ior it i svcry | ikc|y that hody hci ght andbodyvcightarcpositivc|y and
| i ncar|yrc|atcdi nthcpopulation. !otcthatthc SPSS gcncratcdp-va|ucs
in 1ah|c 3. 4Oarctvo-tai| cdand nccd to bc di vi dcdby2 hccauscthc a|-
tcrnativc hypothcsis is dircctional . Hovcvcr, i n this casc this docs not
makcanydi IIcrcncctothchypothcsistcstsinccp i sa|rcadyvcry|ov.
Rcgrcssioncstimatcs aandh a|| ovIorthcca| cu|ationoIprcdictcd(or
cstimatcd) vcights Ior a|| hcightsbctvccn | 5O and2O5 ccntimctcrs. lor
cxamp| c, apcrsonmcasuring l ?? ccnti mctcrs hasancstimatcdvcightoI
?. ? ki | ograms (-5O. 54+ l ?? * . 13) . 1hcrc vi | | hc Icv, iIany, rcspon-
dcnts vho arc l ?? ccntimctcrs tal l and vcigh cxact|y ?8. ? ki | ograms,
hccausc thc | i ncar rcgrcssion | i nc on|y |c| ! cc| s | uc ovc|a | | | cJcncy and
lilltlronn. 1 i 1 | l |
ohs|va| . ousw. | | JLv t l l l' l l l l l l l | | t L | . u.. 1u| | |.. vc|y |g| ss. oucqua-
| . ou | a|cs| uc for1 1 1 : y . | ' hx ' L ( whc|c L s|auJs | o|cr|orordcviation).
l u | uc soc. a| sc. cucs, t i t .: goa| is lypica| | y toshovthcovcral | l i ncartcn-
dcucy |al hc| | hau p|ov. J. ug cxact prcdictions. Hovcvcr, iIthc cxp|ana-
|ory povcr oIa modc| is a| so important, thc explained variance is uscd.
ln a prcvious scction, vc prcscntcdthcvarianccoIa variah|c as thc sur-
IaccoIasquarc(scc |i gurc2. 2O). Thc importantqucstionishovmuch oI
this surIacc (thcvariancc i n y) can bc cxp|aincd using rcgrcssion ana|y-
sis? 1hc cxplaincdvariancc oIy i sthcsurIacc oIthc squarc that i s ' cov-
crcd` (cxp| ai ncd)hyx, di vidcdbythctotalsurIaccoIthcsquarc(sccli g-
urc 3. 4| ). 1hc outcomc is a|vays a numbcr bctvccn O and l . | Ithc
covcrcdsurIacc isO,thcnthccxpl aincdvariancci sO. IIthcvho|c surIacc
oIy is covcrcd(cxp| aincd)hyx,thcrcsu|ti s | (or | OO ) . I nthi scasc,a| |
ohscrvcd scorcs oIy arc locatcd cxact|y onthc rcgrcssion l inc and thc
l incarrc|ationshipi spcrIcct(anda|| c-O),scc |i gurc3. 4 | . In ourcxam-
p| c, body height cxplains . 32 (32) oI thc variancc in body weight.
Whcthcr thc cxpl aincd variancc is suIcicnt|y hi gh dcpcnds ou |hc |c-
scarch qucstion. As a gcncra| asscssmcn| oI|ody wcigh| oascJ ou hoJy
hci ght,thisrcgrcssionmodc| i snota oaJ . us | |u mcu| , hu| l i1r ucJ. c. | pu
poscsi tvoul dbcinadcquatc

1hccxp| ai ucJ va|. auccc. . u . uc . . s J ,


xvariabl cstothcmodc| ( sccscc| . ou: . (

) .
i n y
i n x

variance expl ai ned: 0%

Figure 3.41 Graphical Presentation of0, 50, and 1 00% Voriaucc /
plained (dark shaded area = explained variance)
Ccrtain assumptions a|ou| |hc data nccd |o oc maJc oc|o|c pcrlonu| u
rcgrcssion aua| ys. s. We w. l l addrcss thcsc in scc| . ou 3 . . 2 vhi ch d| s-
cusscsmu | | . p| I' L' I I L' ss i ul t . i ua| ys . s |c|o|chauJ, wc s| |css| hc i mporl auce
o|l u < l SS l l l 1 1 ed l i 1 1 ' ! 1 1 I V l : r l l i i l l ' d i i J l h\ ' I Wl' CI I y : t l l d X i l l | | + J li l j l l l l . r i ul l l : 1 1 1 d
| hc . mpo|| a. . cc o| hcck l l l ) ', l l r l , ' I l l l l l c d: t l < l . Tu| s .s c: l s i l y c l wc k L d i l l a
| i nc graph ( sc | . u|cs .
. \ I ! l l l l l \ I I ) a1 1 d/or a h| s| ogr. 1 1 1 r ol | uc . i+o.s/
rcsi dua| s ( thc di st r i but i o1 1 sl l o1 dd h appx | malc|y uonua| /| . . | | -suapcd,
cspcci a| | y v| | h n 30 ) . I f t l 1 1 | | . ous| i | p | snot ( rough| y | | ucar, Spcar-
man` s rank coc| a| ou or :t re rcss i o1 1 aua| ys| s using mod| | cd var|ab|cs
is morc appropr|atc |o us. ( se ou| w.h pagc Ior Irthcr i nIormation).
lurthcrmorc, i t shou|d oc uo|cJ t hat t l i . |cs u| | s oIrcgrcssi onana|yscsarc
scnsitivcto out|icrs, cspcc. a| | y w| | usa i . p| csi zcs sma|| cr than 2OO. Wi th
samp|cs as sma|| as this an aua|ys| so| t uccrrors (or rcsi dua|s) i s ncccs-
sary(again,sccourvcbs|tc lo| mo|c | u |u ma| . ou).
3. 5. 3 Odds Ratio
1ypica| | y, mcasurcs oIassociation havc m| ni mum and maxi mumva|ucs.
|orcxamp|c, va|ucs Ior Cramcr` s V a|vays Ia| | bctvccn O and | , vhi | c
va| ucs IorKcnda| | ` s tau, Spcarman` srank corrc|ation,andPcarson` s cor-
rc|ation a| vays Ia| | bctvccn - l and | . As such, thc strcngth oIstatistica|
rc|ati onships can bc comparcd by a standard, namc|y, O (norc|ationshi p)
and l or- l (pcrIcctrc|ationshi p). 1hcva|ucstakcnbythcscmcasurcsarc
a|sodctcrmincd by thc distributionsoIthc x and y vari ab| cs(i na contin-
gcncytab|c thcsc arc thc counts i n thc rov and col umn margi na| s). |or
cxamp|c, vhcn thc y vari ab| chas a skcvcddi stribution, many mcasurcs
oIassociation vi | | shov |ov va|ucs. Additiona||y, changcs i n thcsc dis-
tributionsvi | | changcthcva|ucsoImanymcasurcsoIassociati on. 1hi si s
cal | cd marginal dependency and i s oItcn dcsirab| c. Hovcvcr, i t can bc
prob|cmatic vhcn thc rcscarch qucstion Iocuscs on rc|ativc comparisons
bctvccn catcgorics oIthc x vari ab| c. lorcxamp|c, in soci a| mobi |ityrc-
scarch,thcrcscarchqucsti onsoItcnrc|atctoi ncqua| i ty. I ) dovomcnsti | |
|ag bchind mcn i n schoo| i ng i n thc Wcst? ! Iycs, 2) has i ncqua| i ty dc-
c| i ncdovcrtimc? Li kcvi sc,vcknovthatstudcntsgo ontohighcrcduca-
ti on (co| | cgc)at | ovcrratcs comparcdvith|ovcrand i ntcrmcdiatc | cvc| s
oIcducati on. Hovcvcr, ovcrti mcado|csccnts i ncrcasi ng|y do chooscto
go on to highcr cducation, a trcnd that vc touchcd upon car| i cr and at-
tributcd to cducationa| cxpansi on. 1hc (skcvcd) di stribution oI highcr
and |ovcr/intcrmcdiatc cducation and changcs thcrcin shou|d not i n u-
cncc thc ansvcrto thc tvo rcscarch qucstions abovc. The odds ratio is a
mcasurcoIassociation that is . uscus| l . vc tochangcs in distri butions.
1hi s mcasurc i s I:|s| | | | us| |a| cJ us. ug auo|hc| cxamp|c Irom socia|
mobi | i ty |csca|cu, Jca l | ng w. | h | hc |c| a | . onshi p bctvccn | ilhcr and son
oboccupa| | ous( sc T: 1 bl c 3 . 42) .
l r l i t l l t i i i L r l ! l l r r l l : r l l< : : 9! |
Tahl c 3. 42 'w l: l i li i i\ li i lwf l l 'CL ' II 'tl/!tl S Itt!l
itiltOI! mu/ .`OO \

ui ltO/ lO tlttOlO_t\ OOO .SSOttOltO IOH\ OltO.

Occupation father
Non-manager Manager Total
21 9 51 270
son 65. 2% 37. 8% 57. 3%
1 1 7 84 201
34. 8% 62. 2% 42. 7%
336 1 35 471
1 00% 1 00%
Odds ratio = 3. 07, p (one-tailed) < . 001
|i rst|y,todctcrmi nc thccxtcntthat sonsa|c uovaJays ( t he dal : 1 . l` l H i r l l
2OOO) cconomica||y mobi | c comparcd | o | uc| | 1 ; 1 | uc|s. odds : 1 1 \' 1' 1 1 1 1 ' 1 1

lorsonsvhosc Iathcrsdouol/J. J uo| occupy : t 1 1 1 : 1 1 1 ! 1 ) ',\' l l r l p1 1 . 1

tion,t hcchancc oI cnding upin thcsamcoccup. . | | o . . d\ ' : r l q,I I I Y I ' , , ., "' "
( ca|cu|ati on. 2 | I 33o * | OO) Couvc|sc| y, | | . os. sol i ,' ' l 1 : 1 \' 1 ' : l I I h` .
chancc oI ri si ngandgcttinga managCr. a| j ob ( Sl'l' ' l ': t l l k I I
) l l 1 1 ' 1 1 1 1 1 1 1
bctvccn thcsc chanccs i s ca| | cd the OOO\ and qu: ds ( 1 '
I I I h I h
1his mcans that thc chanccs oIcvcn|ua| | y get t i 1 1 g : 1 1 1 1 1 1 1 1 1 1 . 1 1 1 1 ) ', 1 ' 1 1 1 1 1 ) 1 1 l 1
arc l o timcs hi ghcr Ior sons vho havc l| | hcrs wl 1 o : 1 1 v 1 1 1 1 1 1 1 1 1 1 1 1 1 . 1 ) ' 1 1
|n contrast, thc chanccs oIattai ning a lauagcr i al p1 >s i t i 1 1 1 1 l 1 1 1 :.1 1 1 1 ' 11 l 1 1 1
havc Iathcrs vho arc managcrs is o2. 2,whc|cast he cl l : t l l \' \ ' 1 1 1 l' l l l l t i p 1 1 1
a non-managcria| positi on i s 3. o. 1hc oJJs a|c , fJ I ( 1 7. X I ( , ) , _ ! ) No
ticcthatthcdistributionsoIthcvariab|cs ( p|escn| cJ i n t i l e 1 1 1 : 1 rgi "s o| '
t i lL'
tab|c)donotp|ay anyro|cvhcnca|cu|atingthC oJJs.
1odctcrmi ncthcrelative di IIcrcncc i noccupat| ona| oppo|| uu| | . cs, | uc
ratio bctvccn both odds i s ca| cu|atcd. I n thi s casc, l hc ou| comc | s 3 . 07
(| oI O o|) . 1hi s i s ca| | cd thcodds ratio bccausc it i s lhc |a| | o o| wccu
I nourcxamp|cthcoddsratio i ndi catcs thatlhc oJJs o|sous
vhosc Iathcrs do nothavc managcria|obs ( l o) i s abou| 3 | . mcs as
hi gh as thcoddsoIsonsvhohavc Iathcrs vhovorkas managcrs ( . o I ) .
1hc c| oscr thc odds rati o is to l , thc l css di IIcrcnccs in occupa | | oua| op-
portuni tics. l n Wcstcrn Luropc, occupationa| mobi | i ty uscd | o bc mucu
|ovcr( ' | ikcIathcr, | i kcson` ) andthcodds ratiovas conscqucul |y u. ghc|.
Morc gcncra| | y, an oJJs |a| | oo| | mcans l ha| oo| u oJJsa|c cqua| . wu| ch
mcans that | hcca| cpo|| s o|| uc x var. ao|c a|c uo| s| a| | s| . ca | | y |c| a| cJ | o
thc dcpcnJcn| var i ahk ( SL'' Tahk 3 . 24 ) . | u occupa| . oua| moo| l i | y, an
oJJs |a| | o o| I i l l dl \ ' : l l l' S 1 1 1 : 1 1 I I I L ' l : l l h |' s l I
'C 1 1 pat i ol l | s uo| au | mpo.| . . u|
l 1 clor ( anyi i H l rL ) 1 1 1 I i l l ' r l l i i l l l 1 I H' I ' r r p: r i i PI I : d l r ; l l l l'L'S ol ' l 1 i s so i Ti l e oc-
cupat i ona l J l l uhi l 1 1 y u l I I H' ' l l | | 1 1 1

1 1 I S 1 1 0t l i n1 i t:d hy t l l 1 I . I I i l 1 . 1 ' : 1 1rCl l l ): t

t i on.
No back t o I l l l wu I l'.' : l l rl i l j i J cst i ons on t h<: <duc: 1 t 1 u1 1 ; d 1 1 pport uni
. f men and Wl l l i i L' l l : 1 1 1 1 1 1 1 1 l tn | | udi uu l changes t i H:r<: i n ( sec Tabl e
Table 3. 43 Relationsliit' hel l l '< '< i | `:ond Educational Level in 1 979 and
2000 ( counts. : t l| | i i | i | f !cl n'nlages and odds ratios)
Year of survey: T 1 Sex
Femal e Mal e Total
Less than col l ege
451 422
92.4% 81 . 5% 86. 8%
Col l ege or 37 96 1 33
university 7. 6% 1 8. 5% T .Z%
488 51 8 1 , 006
1 00% 1 00%
Odds ratio - 2. 77, p (one-tai l ed) < . 001
Year of survey: Z Sex
Femal e Male Total
Less than col l ege
378 336 71 4
74.7% 66. 9% 70. 8%
College or 1 28 1 66 294
university 25. 3% 33. 1 % Z.Z%
506 502 1 , 008
1 00% 1 00%
odds ratio - 1 .46, p (one-tailed) = . 003
gcs inthcgrc
y shadcd cc||sshovthatincqua| i tyvasprcscnt
and I H OOO.

c| ymorc mcn thanvomcn vcntto col| cgc or

mvcrsity. D

nng this pcnod thc chanccs oIattainingthc highcst cduca-

na| | cvc| s t
In l , | 3. 2 oIall studcnts vcrc cnro| | cd in

cducation and m 2OOO this rosc to 2. 2. Hovcvcr, did cnrol| -

mcnt mcrcasc morc among vomcn than i t di d Ior mc T
. . .
n . c ansvcr to
,_ucstun hcs H thc odds rat

ios. In l , thc odds ratio cqualcd 2.

roppcdto l . 4o m 2OOO. Thtsdcc| i nc is not cxressed as an abso|utc
di IIcrcncc butasaratio, and isca| cu|atcd as | . 41 2 - 52 l|
b d
. . . 1 1 s num-
cr m tca|cs tha| thc rc|ativc cducat i ona l i nequal i t y hct w<. :cn men and
vomcnrough l y hal ved between l 99

md | | | < 1 1


. -
' -

, l l l ,I I Vl' l l l Cl j l l<l I -
1 1 1 1
l i es d i d dccn. :asc i n l l 1 : i l P' I I I HI , : dt l iou h : 1 st at i s t i cal ! st i s r qui n
:d ! 'or a
l l H l rC dcl 'i n i t i v ' ; I I I SW r ( SC ' i 'url l i cr bel ow) .
The advant age or t he odds rat i o is t hat rel at i ve di l lcrcnces arc ex-
pressed i ndependent l y oi ' the margi nal s. ! di sadvant age i s that no max i -
mum val ue exi st s, so when the rel at i onshi p i s negat i ve, the odds rat i o i s
b<ween I and ncgati vc i nI1ni ty. | Ithc associ ati on is posi t i ve, t he odds
rat i o i s bctvccn l and positivc innity. As a rcsu| t, thc odds rat i o i ncl i
cates thcdirection oIt hcrc|ationship,butnot thcstrength o|t he rel at i on
shi p. |urthcrmorc,thcoddsratioi salvaysca|cu| atcdbascdon the counts
i n |our i nncr cclls oIa contingcncy tab| c. I n a tab|c vi th t wo rows and
two co| umns, on| y onc oddsratio canbc ca| cu| atcd. However, i n l arger
t ab|cs morc odds rat io` s can bc ca| cu|atcd, vhich can be t roubl esome
vhcn no clcar distinction can bc madc bctvccn more and l ess rel evant
oddsrati os.
Hovcvcr,thcoddsrati oi soncoIthe few measur<s or asso i at i on t l i : 1 1
is inscnsiti vct othcmargina| di stri but i ols, whi ch mak s i t hi ghl y s 1 1 i l i ! hlv
t odcscribcshiIsi ni ncqua| i ty, for examp| e. i\l so, i n 1 1 1 ' "ni i l ' : I I Sl' l l ' I H I `
t he odds ratio i s oItcn uscd in epi dcmi ol ogi cl l r s arrl l . . . 1 1 1 11 1 1 1 1 1 l " 1 d
hi gh| yskcvcdvariablcssuchas mort al i t y rat s.
To tcst thc null hypothCsi s that t h<
: odds r: I I I o npi l l i ' i I | 1 1 1 1 1 1 ' 1'1"' . . .
li on), a c

i-squarc tcst can be used, O t l l : 1 \' l t v. t 1 1 I I U 1 1 1 1 1 1 1 1 1 h 1

not mct (scc scction 3 . 4. 2) . The -va l ues sl t uw1 1 " ' l ' 1 1 hl 1 1 \ I ' . . . . l I I I
vcrc dctcrmi ncdusi nga chi -sgua|e t est ( o1 1 l ! l l i l' d, l h'l ' l l l l l l I l l | ' 1 1 1 1 l 1
hypothcscs i sdirccti ona| , st at i ng the odds r: l l i u t o h1 I . " [ ` | t l " ' " | i l1 1 1 1
thati na| | thrcccascs,usi ng0 . 05, t h<.: nul l l t ypl l l l u""' I ' l 1 1 [ i i 1 1 d
Odds ratios arc oItcn ca| Cu| ated and I st ld I I SI I I J '. /r l l ; / 1 l 1 1 l t i ' / \ \ l t |
analysis. Thistcchniquccan tcst whet her t i t ! kr l l l i i i i i' l l l l ' ' i " d 1 i \ 1 1 1 ' d1 1
cationa| opportunitics Iom Tab| c3. 43 ( . 5 i s s q,J t i i l l' , l l l l l v 1 1 1 1\ 1 1 i l l d l l I
( I = no changc) . Hovcvcr, it i sbeyond t he scoi K' 1 1 1 t l w. l 11 t1 1 k 1 1 1 d i I I
thi stypc oIanalysisIurthcr. A praLt i ca| exph1 1 t : t l i o1 1 ( ) I l ( ) }'. ' l u 1 1 ' } ' 1 1 ' '1 1 1 1 1 '
analysi sandasclcctionoIanal scs can be round 01 1 01 1 1 \Vl' l l'4 1 1
|n thc prcvious scctions vc discusscd scvera| bi vari ate rel at i onsh i ps I K
tvccn vari ab| cs. Thcsc rclati onships vcrc most l y assumed lo be causal .
|orcxamplc, i t vas assumcdthat a highcr l evel of education ( Gt us< ) r<
su|tsi nahi ghcr income (cIIcct).Thi sassumpti on seems real i st i c !'or t hree
rcasons. |i rst, thcrc i s a c| car chrono|ogi cal order; general l y, a p<. rson
l nishcs hi s or her educat i on /c] orc start i ng t o ear a regul ar i ncome.
Second, a rd: t t i onsl t i p het w<cn educat i on and i ncome was i ntk<d con-
l |' I I I I I J I I I I I . l
| nucJ . u s| a| . s| c. . | . . . . . | ys . s | sc | aok . LL) . T' |J, | s.+. . | s . . . . | . |c| y
that thc |c| a| | ousu| p o | w . J. . c. | o | . uJ | ucomc s sp. . . . . s | . . c. , uo|
rcal), bccausc ouco|moo va. ao|cs causa | | y dc|crmi uc |o| | i cJca| i ona|
|cvc| and i ucomc. 1uus, s| . . | . | | . . . ' a u gucr cJucat. ou C/^l^ a h| ghcr
incomcsccmsto hc us| | | icJ.
Hovcvcr, i n thc soc| a | sc. cuccs causa| | | y i s not alvays that c| car.
Lsi ngstandardcross-scc| | oua| su|vcy da| a, i| is oItcndi Icu| ttoasccrtain
a clcar chronologica| o|Jc| oc| wccu | wova|| ab| cs.3 Thc ncxt bcst thing
i s to ground thc chrono| ogica| o|Jc| as | i |m| y as possib|c in thcorctical
argumcnts, a| though an cmp| || ca| | cs| or | hcscargumcntsgcncrally isnot
possib|c. lor|unatc|y, it i s lcss plo| Lma| i c |o statisti cal|y dcmonstratc
that an cmpirica| rc|ationship bctvccn tvo variablcs cxists or not. lur-
thcrmorc,multivariate analysis canbc uti l izcdto cstabl ishvhcthcrornot
an i ni ti a| (signicant)bivariatcrc|ati onshi pisnotthcvorkoIsomc othcr
(conIounding)variabI c(s).
l nmu|tivariatcanalyscs, thc i nitialbi variatc rc|ationshipcan bc tcstcd
to scc iIit sti l l cxists aItcrcontrol variables arc takcn into account. Con-
trol l i ngIor othcrvariablcs i sa common and IruitIul proccdurc i nthc so-
ci a| scicnccs. Hovcvcr, this tcchniquc is somcvhat di Icu| tto cxp| ain,
vhi chi sprobab| yvhyi ti stypicallynotrcportcdi nncvspapcrs, |ctalonc
di scusscd on tclcvi sion. An i ntcrcstingcxccptionto thisvas rcccnt criti-
ci sm and skcpticism tovards rcscarch on crimi na| ity amongst asyl um
scckcrs in thc !cthcrlands. 1hc bivariatc rc|ationship sccmcd c| car. asy-
| umscckcrs inthcDutchprovinccoIGroningcncommittcdvctimcsthc
numbcr oIcrimcs comparcd to thc |ocal population. Hovcvcr, a critica|
Io| lov-up study dcmonstratcd that ' applcs` had bccn comparcd to ' or-
angcs ` . thcrc vcrc i mportant dcmographic (c. g. , agc) and sociocconomi c
di IIcrcnccs (cspccial | y incomcand cducation). lrom stati stical andcthi -
cal points oIvi cv, itvould havc bccn bcttcr andIaircr, rcspcctivc| y, to
comparc asy| um scckcrs vi th pcop|c Irom Croningcn that di dnot di IIcr
Iromasy|umscckcrs inthcscimportantcharactcristics.
Itisdi Hcult,hovcvcr, (iInotimpossib|c) to Indarandom| ysclcctcd
group oInativc pcoplc Irom Groningcn vho arc idcntica| in important
aspccts to a group oIasylum scckcrs. 1hcrcIorc, vith thc cxccption oI
cxpcrimcnta| rcscarch,thisi snotauscIu| stratcgy.
lortunatc| y, important di IIcrcnccs can bc rulcd out by statistica|ly
control l i ngIorrc| cvant variab| cs. 1hi smcthodvi l | bc i | | ustratcd using a
study in vhi ch a respondent 's income i s rc|atcd to l hci rfather 's educa
tional level (scc 1ablc 3. 44). To avoid uuucccssa|y comp| cx . | y, ou| y onc
si nglccontro| va|| b| cw. | | oc uscJ( . u | uc | | | c|a| uo , co. . | o| var| ao| csu|c
oItcn i ndi catcd us . ug| hc | cl lc| Z ( o.somc| ms | i i .
. or . . . . . . . . .. . . .
l |!
Tahl c 3.44 J<doflnllsll lt ' lidl l 'c 't 'll fr 'nt!Jer .,. lduculionnl I A r 'l 'l . tt t . /
He.l'f ll illd< 'llf
s lllr'ullle
Educational level (father)
I ncome
2, 000 oax. ouo
oo|etcao2, 000
ecoa| | ' staub =. 1 1 , p ooeta.|eo) = . 01
|ow Hi gh
1 39 46
46. 6% 35. 4%
1 59 84
53. 4% 64. 6%
298 1 30
1 00% 1 00%
1 85
43. 2%
56. 8%
Tablc 3. 44 shovs a vcak but positivc rc|ati onship bctvccn thc cduca-
l i onal | cvcl oIIathcrs and thc incomc oIthcir sons (thc rcspondcnts . or
a| | rcspondcnts vho havc Iathcrs vith lov lcvc|soIcducation, 53. 4 pc|-
ccnt can ovcr 2,OOO curos. In contrast, o4. o pcrccnt oI a| | rcspouJcu| s
vi thhigh|ycducatcdIathcrs camovcr2,OOOcuros (thcdi IIcrcncc | u p|
ccntagcs (d) i s l l . 2). Thc positivc rclati onshi p i sa| so rcI|cctcd oy | uc
si gni cant Kcnda| l tau-b, amounting to . | | . Again, this i mp| ics l ha| |cs
pondcnts vithhi gh|y cducatcd Iathcrs arc morc l ikc|y to carn au . ucom
ovcr2,OOOcuroscomparcdtorcspondcntsvi th |ovcrcducatcdIi| hc|s
Hovcvcr, i t i s dcbatab|cvhcthcrthisrc|ationshi pi scausa| . dopcop|
rcal l y makc morc moncy because thcir Iathcrs arc hi ghly cJuca| cJ

|robab|y not ' lncomc is primari | ydctcrmi ncdbyonc` s own cduca| ion. as
cmploycrs vi l l inquirc about thc cducationa| l cvcl oIthc appl i cau|, and
uotaboutthatoIhi sorhcrIathcr. I nTab| c3. 44,thcobscrvcdrc|at| oush| p
.s probab|y duc to thc Iactthat father `s educational level is positi vc| yas-
sociatcdvithbotheducational /eve! andincome oIhi schi l d(rcn).
Nov, |ct us assumc |hat i ncomc i s rca| | y dctcrmi ncd by onc` s owu
cducationa| achicvcmcu| sauJ uo| oyhi sorhcr|athcr` scducational | cvc|
Whcn this is actua| | y | .uc, | uc.c cauuo| oc auy rc| al | onship hctwccu | hc
| i | hcr` scducationa| | cvc| . . . | u| s s . . ` s | ucomc, amougsouS vhoa| | sha|c
| 'c SOC cducu| i oua| | cv. | . | | . csc |.spouJcu| s uavc pc|| u|mcd s. m| | ar| y
auJarc |cva|JcJ w. | ' a l . | . . i . . . . . . c , i rcspcL| i vt (o . uJcpcuJcu| ) o|
| uc. | |a|hc|s` ac'. cvc . . . . . | s | | . . . Jc . . s |s| c| . . . T. . o| c :1 . 45 | u| |cspou-
dcu| s vho a | | | . . . vc .. | w . J. . . | | . i . i | kV1 | . . ( . | . o| i . . uJ |tspouJcu| s
wuo a | | uavc . . | . | J . . ' . i . | | c. | ( | v . . | . . | | i T| . c J . | |cuccs . u
pc|ccu| a cs | J i . . i . | ' i i l 1 . . . . . vv . . | . | | . . J . . | o . c| s. u | | i cau|
T'. s mc. . . s | | +| VY | . . \ . . . 1 | . . , ' .
. . J. . . | ' c| . . ca| . ou. d | cvc| ,
| 'c|c | s us. . | | | . . . i 1 . o l t l 1 1 1 t 1 1 | + | . . | | . | . . | . | . .
x s| s oc' w.

c. .
1 \ 111
| | | | a|` s cJuc. | | o. | . | | |vc . . . J ' | . c . | | . . | | . c o| u. s . | | . . | . . j l i l l ' l l' hy co. | -
| |m| ug | uc . .|a l k1 1 ' ' . . | . , | | . a ' |c| . . ' . ous u| p ocwc c | . | . . | ' c ' s cJuca-
| | oua| ' cvc| auJ |cspouJ. | ` s | | . . o. . |c | suo| causa ' .
OIcou|sc, l u. s Jocs | . o| | | . c. | . . ' ua| | uc cJucal . oua' |vc o| | uc | i l hc|
- or morcgcuc|a ' | y l uc p. |c. . ' s Jocsuo| p' aya ro| cal a | ' . lvculoday i t
i s casicrto obtain a h | guc| cJu a| | oua | |vc| vhcn your parcnts arc al so
high| y cducatcd. Thc|c| o|c, . ' . s o| iu sa. J that on| y an indirect causa|
rc|ationship cxists hclwccu luc cJuca' | oua' l cvc| oIthc parcnts and thc
Table 3.45 The Relationship between athcr Education and Respon

dent 's Income, Controllingfor Respondent 's Education

Educational level Educational level (father)
(respondent): low
|ow . qc 1ota|
I ncome 2,000 cax. cuc 1 22 28 1 50
(respondent) 55.2% 53.8% 54. 9%
co|etcao2, 000 99 84 1 23
44. 8% 46. 2% 45. 1 %
1ota| 221 52 273
1 00% 1 00%
ecoa| | staub = . 01 . poce-ta. | eo)= .43
Educational level Educational level (father)
(respondent): hi gh
|ow . qc 1ota|
i ncome 2, 000 cax. cuc 1 7 1 8 35
(respondent) 22. 1 % 23. 1 % 22. 6%
cotetcao2, 000 60 60 1 20
77. 9% 76. 9% 77. 4%
1ota| 77 78 1 55
1 00% 1 00%
ecoa| | ' staub -. 01 . p(oce-ta. | eo)= .44
| nmu|tivariatcana|ysi s, di IIcrcntcausal modcl s can bcassumcd and ana-
lyzcd. |ivc importantmodc| svi | | bcdi scusscd in thi sscction, vhicharc
l nl t l l l l l l l l i d : il nl lnl lt .:
|. | |' | a| | . . J| . . ' i o1 1 I p1 1 1 1 1 : d .
. | . | a . s. . css
'upp|css. ou
VoJc|al | o . o|i 1 1 1 |. c' i o1 1

lc' s o|
1wo p|cd| clo| ( x ) vanao' cs v| l | hc uscd to i | | ustratc thcsc mo

casc o|mtcrprctali ou, hul l hcsamcappl i csIormodcl svi th thrcc |
p|cJ. c|o|variab| cs
tcncra| | y, i n a mu|tivariatc modcl onc or morc contro| variablc
) k

I d
. d I

that a
| | ou. z arc ta cn mto account. n a me tatwn mo e i I S assumc
l h h b| ( 11 d
the mc
changc H thc x vanab c causcs ac angc H t c zvana c ca c .
d/o/c/), vhi |c z in turn causcs a changc in thc y variablc. lurthcr|

| l
| s assumcd that thc original (bivari atc) rclationship bctvccn x a''
| s
) d
1 v d| ||c|
|cduccdtoarc|ationship(notati on. xy. z that ocsnotstg cant

||om zcro. 1hcrcIorc, a mcdi ation modc| is a|so rcIcrrcd to as

g chc


3 o d
d I d h
wc t| | s-
'no el. I n act H sccti 0n . a mc at0n mo c vas usc v cn .
b h ` d d th
, mc o|
cusscd thc rc|at| 0ns ip ctvccn a at cr s c ucati on an c H
n | ucs
h| s oIIspring. It tumcd outthat thc dircct causa| i nucncc bctvc.
lvo variablcsvas abscnt aIcr takingintoaccountrespondents ' o
ec I~
. .
1 bl
(ha | ' uc
cational level (scc Tablc 3. 45). lurt+crmorc, tt I S qu| tc p ausi c .
cducational lcvc|oIthcIathcr(part|y)dctcrmincsthc cducationa|
| cvc| o|
u. s chi |drcn, and that thc cducationa| | cvc| oIthc chi l drcn cous
ucu' |

h l I 1

k h
Ial hc| s
( partly) ctcrmmcs t ctr mcomc. t +i s !S truc, vc ov ow
cducational | cvc| positivcl y i nucnccs hi s chi | d` s incomc. high'
r cJu
catcdIathcrs onavcragc havcthcirchi | drcn rcach hi ghcrcducatio''
a| '

c' s, and hi ghcr cducationa| | cvcl srcsu|t in highcr incomcs. 1his

mp| i c
lhat rcspondcnts` incomc is irrc|cvant to hov cducatcd hi s or h'
lal hc|
( or parcnts) i s. What docs mattcr is thc cducational | cvcl oIthc
Jcu|. OIcoursc, dcspitc various govcnmcntal i ntcrvcntions, obt.'
' '
ng a
1 1

| | 1 1 d l d
s Ccu-
u . ghcr| cvc| oIc uca|. ou | s sl cas c|w| l 1 1 1 g 1 y c ucaC parcn

. .

_n | o| a
c|a ' ' y, a mcdiation moJc| sc|v.s . . s 1 u| c|p|c' a' | ouor cxp anati

| l

1 1
b d |rccl l y
|c| a|ionship hc|wccu ' wo v. | |. o|s v. c\ | u . | . a y sccm l o c
causal |yrclatcd.
x y (.e|a|. oc 1 1 1 p xy / ( l ) , x I y ( r t | u| .ou |i. pxy.z= o
(xy. z= |e|at.oo 111p xy W| i | I ' | . |l i 1 | l l t l t l . l . . | . . ' Z n non .qu.|.caot)
cxacp| e. P n 1 1 1 ' 1 1 1 1 1 1 1 1 1 1 1 1 1 l | ' | | . | . t | . | . - ' i . ' :|

s. ucoc
Fi gurc J. 4(, l l lr "f l l '" t l! , , , , fl , , d \ t , t, / "' " ' ' '"l ' " " 'l , ; t.; , t ll l lf /e
1 1 11 1 ( l i i i pl i l l ' l
Spu ri m1 sncss
^ s| a| | s| . c. . | o | a| . | s| i . ' | v . . . . . . . . ^ . . . o y va|. ao| c. n. . y . o| | c o. |cc| | y
o| . uo| |cc| | y causa | | y . . ' . | .o a| . d ' 'l l cucc| wucuc|| u | s . s | uc casc, a
| | musl hc p| aus | o|| u. . | | | . .7. v. . . . a|| | s i . s( a |c l hccausa | ' . c| o|| u|both
x auo y ( scc |. gu|c J .7 ) ; . . uo | i . ' . |c| a| . oush| pshould |ccomc i nsi g-
u i l caul orshou| dcuau co. +| | o . wu . cou| |o| | . ugIor(z) variablcs. |or
cxamp| c, a h. va| | a| c pos . | | vc |c| . . | | o . su . p cx . sl s bctvccn thc variablcs
church attendance aud /odl l l 'ci,l)JI : ou avc|agc, church attcndccs vcigh
morcthan non-attcndccs | | owcvc|, | | | s ua|d to i maginc that church at-
l cudaucc rcal l ymakcs pcoplc ga| u wc. ghl . | l . s morc pl ausibl c that this
rcl ati onship i s spuri ous anda t h| |d ovc|| oo|cd oromi ttcdvariablc dctcr-
mincs both church attcndancc and vc. ght. This ` l urki ng` or ' conIound-
ing` vari abl c is age: ol dcr pcopl c al l cud church morc Ircqucntly than
youngcrpcoplc and ol dcrpcopl chavc typi cal ly puton somc vcightdur-
| ugthcir l i Ic coursc. Indccd, iIthi s is corrcct, thc ori ginalrcl ati onshipbc-
tvccn church attcndancc and vcight vi l l bccomc non-signi Icant, aIIcr
control l i ng Ior agc. 1his is i ndccd thc casc and age is sai d to ' cxpl ain
avay`thcpuzzl i ngrcl ationshipbctvccnchurchattcndanccandvcight.
An cxamplc vhcrc a third variabl c reverses thc ori gi nal rcl ati onship
( alsoknovnasSimpson 's paradox) i sIound inmcdical scicnccs. |nhos-
p| | a| s thcrc i s a positivc rclationship bctvccn the level ofexpertise and
ortalit rates. |ortunatcl y, thi sal armingrclationship is spuriousasscri-
ous| y i l l paticnts typi cal ly rcccivc hi ghl yproIcssional hclp but al so havc
| owc| chanccs oIsurvi val comparcdto thosc not scriously i l l . Thc seri
OII.\1ess ofa patient 's illness must bc takcn into account (or control l cd
| | whcuIairlycomparinghospital s. AItcrcontrol l i ngIorthcscriousncss
( ) r | uc i l l ncssthc origi nal rclationship isrcvcrscd. hi ghcrlcvcl soIcxpcr-
| sc a|c assoc. a| cd vith lovcrmortal i ty ratcs. 1his makcs scnsc - vhcn
p. . | . cu| s rcccivc hi ghcr standards oIproIcssi onal carc, thcir chanccs oI
suv| va | arc hi ghcr comparcd to paticnts rccciving l ovcrl cvcl s oIcarc.
Tu| s |s cspcci al l y so vhcn paticnts arc scriously i l l , an additi on that rc-
| a| cs|omodcrati onori ntcractionandis di scusscdIrthcronpagc I Oo.
x -y (relati onshi p xy7 0) , x /
y (relati onshi p xy. z = ns)
(xy. z = relati onshi p xy whi l e taking i nto account z, ns = nonsi gnificant)
Exampl e:

Church attendance Weight
Fi gure 3.47 SJJI IrioJJsness: Theordicol Modi '/ our l l:nJJiiricol Example
Pa rt i al Mlc l i al i ou I l ' nr l i al Spuri ousmss
|a|c| y oocs ' u | | nco. . . | . o . l ' u | | spu|. ousucss occu| . u | hc soc. a| sc| -
uccs. /| | c|cou| |o| | . up ' : ouco|mo|c7 va|. ah| cs, l hcoriginal rclation-
su | p |s o| cu |coucco ou| |cma| us s| gu | |cau| . Thcrc|orc, mcdiation or
spu|| ousucss | s ou| y pa|l . a| . Dcpcud| ug ou thc assumcd causal dircction
oc| wccu x aud 7 l hccausa| modc| i scithcr parti al mcdiati on(vhcn x is
| uccausal lactor | o|Z (x - z) ) orpartial spuriousncss(vhcnz - x). |or
cxamp| c, thc rclationshi p bctvccn a father 's education and his child's
ed1 1cation i sa parti al mcdiation (or chain) modcl , cvcn vhcncontrol l i ng
|o|father 's income, part oIthcoriginal posi tivc (+) rclati onshi prcmains
( scc |igurc 3. 4o, uppcr pancl ). Thc rclationshi p bctvccn educational
level and traditional attitudes (conscrvatism) is onc oIpartial spurious-
ucss. Ancr control l i ng Ior birth cohort, a partial rclationship bctvccn
cduca|iona| lcvclandconscrvatismrcmains(sccligurc3 4o, lovcrpancl ) .
Partial Mediation
x - y (relati onshi p xy 7 0),
- z - _(relati onshi p xy.z < xy)
(xy.z = relationshi p xy while taking into account z)
cxacp' e.

Father's education - Father's income -Chi ld' s education

Partial Spuriousness
x - y (relati onshi p xy 7 0) , A /
y (relati onshi p xy.z < xy)
Exampl e: co

Educational level conservatism
Figure 3.48 Partial Mediation I Partial Spuriousness: Theoretical Mo
dels and Examples
'uppression occu|s w| . . .| . . s| |. uj| | . o|| | . c|c| a| | oush . pbctvccnx andy
increases a|c|ouco. . . . o. . 7 v. . . . ||s. . |. . . . . | uoco ( scc |i gurc 3 4) An
. usl|uc|| vc cxaup|c . . +. . . | ' . . . ' . . | . o . s| . p |c| wccu | hcva|. ah| cs age
( x aud bodl' l l 'l 'i.t :l11 (v i t . . . . v . . . . , . . . s o| ow o| dc| | hcy put on
wc. ghl . Yc| , | | . | o . | . v ' . v . . . . . . | . | . | . o. . . | . . |. . wc . . o.!:e aud weiht is
su|p|| s. up| w. . . | . . . . . ' 'l I . . . 1 1 1 . . . . | . . o.. .c vo. . y oosc|va| | ous | hat
: : -- ( : :
pcop|ovc l | y. | . . o| . . ' . . . . . . I I IL y wc|c . . . . ' . . . I Wl ' l i l l l ' r I I H' .p' a
ual . ou |o| | u. s wca| c' . . | . o. . s' op ' . cs . . t he < I S S UI I l pl i ol l | | . . . | . . ' ' . . ' po. . -
dcnts arc cqua | . u a ' | o| u` . . . . o | . . . . | aspcc| s. | | owcvc|, o. . uvnl ooked
aspcct | s thc var| ah| e hod1 ' /ic,! )lt ! | . . Wcs| c|u couul |. cs, you1 1 er pcop' c
arcgcncrally ta| |crt hauo' oc|p op|. 1u . smay bc duc l ocuaugcs . uJ. cl,
| i vi ng standards, and mcJ. ca ' ca|c ou|| ug chi | dhood. Bccausc thcsc
changcs takc p| acc ovcr t . mc. J. || cu| |ir|h cohorts vitncsscd di IIcrcnt
circumstanccs (i . c. , youngcr coho|| s g|cw up dur|ng conditions that Ia-
vorcdgrovth). This intcrcst|ng |ac| . s| hc suhj cctoImuchsocial scicncc
rcscarch and i s callcd a cohort-effect. So, in a bi variatc analysis oIthc
rclationshipbctvccnage andbody weight, pcop| cvhoarcyoungandtall
arccrroncouslycomparcdtothoscvhoarco| dcrandshortcr.Ductothcir
hcight, ta||cr pcop|c arc hcavicr than shortcr pcoplc. As a conscqucncc,
thc Iactthatyoungpcop|ctypicallyvcigh |css than oldcrpcop|c (anag-
ing cIIcct) is obscurcdorsuppressed to a ccrta| n dcgrcc by thcsc hcight
di IIcrcnccs(cohortcIIcct). Stati sti ca| | y, itisthcrcIorcmorcappropriatcto
comparc youngcrando| dcrpcoplc vhoarccqual i nhcight. Statistically,
thi scomparison is possib|c by taking into accountthc (supprcssor)vari-
ablc body height (scc ligurc 3 . 4). AItcrcontrollingIor bodyhcight,thc
positivcrc| ationshipbctvccnagcandbodyvcighti sstrongcr.
x --y (relationshi p xy or ns ), x -- z --y (relati onshi p xy. z - or +)
+/ - +
(xy.z = relationshi p xy whi l e taki ng i nto account z, ns = nonsi gnificant)
Exampl e:
Age --Body Hei ght __Body Wei ght

Figure 3.49 Suppression: Theoretical Model and Empirical Example

Moderation or Interaction
Inthc Iourcausalmodcls discusscd, a causalrclationship orcausalcIIcct
is assumcdto bc cqually strong Ior all units. lor cxamp| c, i nthc discus-
sionoIthcmcdiationmodcl , vcassumcdthatthcpositivccIIcctoIonc` s
cducationonincomci scquallystrongIoral l rcspondcnts. Hovcvcr, stud-
ics suggcst thatthi scIIcct is strongcrIor mcnthan Ior vomcn. Also,thc
ncgativc cIIcct oIcxpcrtisc vithin a hospita| on its mortal i ty ratcs (scc
spuriousncss)probab|yon|yho|ds Iorthcse|. ous|y . ' ' auJ mattcrsl cssIor
thosc not scrious|y | | | . | Ia rc| al | onsu . p J . |

| c|s ac|oss spec. f i c groups or

catcgor| cs |t is ca| l cd moderation o| inlem( ' /ion. | ut :. . c| . ous a|c |c' a-
| . vc' y wca| w' . . . I I I L' ;l l l ' I I J ', I I I l l f | uc c. | . ousu . p s I WI cq. . . ' acoss
groups, hut |cu . . . . s . . . t i lL' s. . . . . c o |cc| . ou. |u a s| |ougcva . au| o|. u| c|. c-
| . ou, | uc |ca | . o. . s| | . p . s aosu| o| uou-s. gu. | cau| | | cc|| a . u groups o|
ca| cgo|. cs. 1hcs| |ougcs| . u| c|acl . ouoccu|s whcu| hc |c' a| . oush. p. spos. -
| . vc | u|somcg|oups hul ucgal | vc Ior othcrs. | Ithc strcngth o| a |c| a| . ou-
s u. p gocs to zcro o| cvcn changcs dircction Ior somc groups/ca|cgor. cs
lhcuscr|ousqucstionscanbcraiscdaboutthccausalordcrbc|Wecu x auo
Thcrc may bca goodthcorctica| cxplanation astovhya causa ' |c' a-
| | onship cxists bctvccn x andy Ior somc groups/catcgorics hu| uo| |o|
o|hcrs. It i s, hovcvcr, much morc di IIcul tto cxplain vhy thc causa | c|-
|cct is positivcIorsomcgroupsandncgativcIorothcrs.
l tisi mportantnottoconIusc ` intcractioncIIccts` w. lu ' coul |o' ' . ug | a
variab|c` . l nthc casc oIintcractions, thc assumcd causa | c | Jcc| o| x ou
varics across valucs oIZ (vhcrc z indicatcs di IIcrcn| g|oups, ca| c nr i l s.
or conditions). Whcn a variablc is contro| l cd |o| | | i s ass un1 d t u. . l | ' .
( rcmai n| ng) causal clIcctoIx ony i s app|ox . ma| c|y | h c s. . . . . | J | | .
cntva| ucsoIthczvari abl c(s). l nothc|wo|os. . | . s. . ss. . . . . c' | ' . +| l l l l l l t H I
cration or intcraction i s prcscnt

A| so, a u . u| ca | . o. . s J l i i' S\' I l l t d di l l l ' l

cntly i n a graph, bccausc thc 7 va|. ao| c uow s \'l l l l l l i l l i l l l l i l i '
H I I I 1 1 1
causalrclationshipbctvccnx and y( scc | . . . c \ . 0 )
x --y (relati onshi p xy = +, - or 0), A ' y ( . nl l l i l l l l | . | | ry v . I
Figure 3.50 Moderation/Interaction: T /l( ' nl' < t 11 `Id A /1 1, 11 ' /
|n thcory, thc modcls discusscd |n lhc p|cv o. . s Sl ' l ' i l t l l l I 1 1 1 d . l I 1 L t , t . I
usingcontingcncytablcs (sccTab|c 3.45 ), hut . . . p.c| . . \ ` . . . | v | | . . w + v
tablcs ( i . c. , a tab|c vi th an x and y va|. ao| p' . . s t l l l l ' v . . . d dt I \ i i i i l 1
analyzcd this vay. Hovcvcr, cvcn thrcc-v. y | . hl s | . . . v. p. . . | . . d | . . . . . |
that arc quickly rcachcd vith intcrva|/rat|o va . . o' s hn : I I I SL' t i i i' V . . . .
mal|y contain a |argc numbcr oIcatcgo|. cs. Thcrcl ( l rc. l l l l l lllf 'l, /l ' Il
regression analysis i s oIcn uscd i n soci a| sc. cucc csca|cu. T' . . s t l' l ' i t
niquc takcs into account multip|c indcpcndcn| ( x va|. ao' cs at . . 1 t' rv. i |
andratio|cvcls. !omi nalandordinalvar| ah| cscana' so oc uscoa || .I l l y
havc bccn transIormcd | nto dichotomous variab|cs ( . ul cva| va|. ao|s oy
dcIni tion)

Addit| ona| | y, |ntcraction modc| s auJ uou ' . uca c' al . ousu | ps
can a| so hc aua' y..J

1o . ' ' usl |a|cl uc ve|sa| . | . ly o| mu| | . p| c |cg|css. ou
aua| ysi s, a . . | | | | ovs v' . c ' . | l | i r . |rv. . | s. . . | . i r r l r r H r r . . . j c' . -
gi ous oc| i d s cxp| . i . .` d v | ' . :r . . . . . . . ' . |or x- v: r ri abl s.
Modeling I nterval and Ral i o Pndi dor Va.iables
To cxp|ain traditional reltt; io11s /)( '/icfi ( vhi ch is a sum oIvc variab|cs
cach mcasuring an aspcc| or t rad i l o. a ' |c| . g. ous bcl icIs), tvo ratio vari-
ablcsarc rc|cvant. education and age ( bot h mcasurcd in ycars). 1hc tvo
a|tcnativc hypothcscs statc | ua| | hc morc ycars oI cducation and thc
youngcr thc rcspondcnt, thc wca|c| |c|i gious bc|icIs vi l | bc. 1hcsc hy-
pothcscsarcconrmcdaItcrca| cul al i ug |carson` scorrc|ationcocIcicnts
bctvccn education and religious bLliLf\ . and bctvccn age and religious
belief (r cquals-. l 5 and . 22rcspccti vc| y, scc Tab|c 3. 5 l ). Hovcvcr,this
conIrmation is at thc bivariatc |cvc|, so thc qucstion rcmains as to
vhcthcrboth i ndi cators pcrsist in a mul tivariatc modc| . Morc prcciscly,
vcvi|!tcstamodcli nvhichpartia|mcdiationi ssuspcctcd(z= education):
Age -- Education --- Rel i gi ous Bel iefs
l n addition, thc cIIcct oI education on religious belief coul d bc (par-
ti a| l y)spurious, bccausc age dctcrmincs both cducationandrcl igious bc-
l i cIs, So, vcsimultancous|ytcsta sccond modclvhichisparti a| spurious
(z= age):


Rel i gi ous Bel iefs
Thcoutcomcsarcshovn in 1ab| c3. 5 l .
Table 3. 51 Results from Multiple Regression Analysis, y Religious Be
liefs, xl= Education (in years) and x2= Age (in years)
Rel igious Pearson's b coefficient
bel iefs (y) coefi cient
-. 02
Education (x1 ) -. 1 5 -. 06
Age (x2) .22 . 02
' 1 6
. 01
. 002
beta p
-. 1 1
(two-tai led)
. 896
<. 001
<. 001
Thc intcrccpt i n th| s modc| cqua | s -. 02. Thi s however has uo mcaui ng,
bccausc it rcprcscuts thc avc|agc |c| i gi ous hcl i l' rs I ( J r peopl e aged 0 and
vi th 0 yca|s o r cdueat i on va ' ues l h; r l l . . o| .s. . . . ' . . u. . | . . set ' Thc
l l i l "l "l l l l . r l : : r r dl : 1 l lt .: |
b coc m: c cu| | . utic . t / i t t . s , 0i . . . . | i r r creast. or I yc; r r sassoc i : r l .:d wi l h
a . 0I decrease or t /ti t i t\ 111 '/iej\ . | u add i t i on, religio11s helieji i ncrease
by . 02 |or every ; r dd i l i or r : r l yt: : l l or age. To dcl crmi nc |hc cxl cnt t hat t hi s
agc-cl Jcc| is cxp' a . i cJ by cduca|i on. |hc hcl a cocffici cu| ( p) is rc| cvanl
1h| scoc|| ci cnl cau hc i n|c|p|ctcd thc samc vay as Pcarson` s corrc| a| . on
( /). Hcrc, p indi catcs thc changc in rcl igiousbc| | cIs in standard dcvi atious
whcn thc scorc on agc/cducation shiIts l standard dcviati on. Thc di llcr-
cncc bctvccn r and p i sthat thc | attcr cxprcsscs thc rc| ationship (or cI-
Icct) aItcr considcring onc or morc contro| variablcs. A comparison oI/
vith p shovs that thc cIIcct oIcducation dccrcascd most (Irom -. l 5 to
-. l l ) . So, onc third oI thc origina| rc|ationship bctvccn cducation and
rcl igious bc|icIs (= -. l 5, scc 1ab|c 3. 5 l ) is cxp|aincd by agc. ln othcr
vords, a partial spurious rc|ationship cxists bctvccn cducation and rc| i -
gious bclicIs. Thi s is bccausc, on avcragc, o| dcr rcspondcnts havc |css
ycars oIschool i ngcomparcdto youngcr rcspondcnts (Pcarson` scorrc| a-
tionbctvccnage and education = -. | ) . 1his corrclation is not thc rcsu| |
oIanagingproccss, butrcIcctscohortdiIIcrcnccs(o|dcrrcspondcntsarc
Iromo|dcrbirthcohortsvhovitncsscd|csscducati onalopportunitics).
Thc bcta cocHcicnt is mcasurcd in standard dcviations bccausc i l i s
thcrcsu| toIaz-transIormation(scc scction2. 3. 3) , mcaningthatbcta cau
bc uscd:o mcasurc thcrcl ativcstrcngth oIthccIIccts i nthcmodc| . |c-
causcoIthis,it canbc statcd thatthccIIcctoIagc is abouttviccass|roug
as thccIIcctoIcducation(. 2O/ -. | | ). !otc that thc strcngth oIthc cduca-
tioncIIcct, cxprcsscdas thc b cocIcicnt, is about3 timcs largcr thau |uc
cIIcctoIagc (-. Oo/ . O2) . Hovcvcr,thi si sdcpcndcntonthcmcasurcmcn|
units oIthc variabl cseducation and age. Whcncducation is mcasurcd | n
months(i nstcadoIycars),thcb cocIh cicntbccomcs-. OO5 (-. Oo/ l 2) .
lurthcrmorc, Tab| c 3. 5 l shovs that thc rcsu|ting b and bcta coc|| -
cicnts di Ilcr si gni cant|y Irom zcro, at thc . 05 si gni Icancc l cvcl A| -
though thc hypothcscs associatcd vi th both vari ab| cs arc dircctiona|, vc
can do vithoutdividingthc reo|tcd p-va| ucs |y |vo, bccausc thcsc va| -
ucsarc a|rcadyvcry sma| | | u ot her vords, vccau |cccl Ho v| th |cga|ds
to cducation and agc as it is hi gh ' y uu| . |c| y | ua | | uc o cocIcicnts (and
conscqucntlythcbct a coc | | i er r l s ) qu. . | 0 . u| ' . popul at i on.
Modeling Ordi nal and `uuI uJ l ' nt l i dt H Vll l i ahks
A| | prcdi ct ls ( x v. . . ' 1 . i . + ' ' ' l' ' ' "'. r i H r . . . | vs s . . . . s. oc mcasu.cd a|
| cast a| t hc u| . v. . ' I . , | l l 1 1 \\ ' ' , . , , r l .
. ' ' . | i '
. . . . ' uJc nom ua| and
ordi nal vari : r hks " 1\ l ' l l l r . ' ' . . . | . . . | . l l r r 1 , \\ t l r r , l r r s\ hi v: r r i : r l e regres
si ou aua|ysi s wr l l r l l r 1 Y | , . ' 1 t l r ' l ' ' "' /t , /1 ' /1 . . | ' . o.
a|. | . vari ab| c
and sex as l l r 1 | 0 | J l l |
" ' J . \ ' . r o. . ' . | . . . o. . s v. . |. . . o|,
wu. cuuas i nll: rv; t l cl t ; t i ; J t' l ' l t Si t t ( SL'l' 'l' l i on ' . 2 i . M: 1 k:- . J I J ' J l tdnl ( ) : 1 1 HI
l craks uavccooc I . ' I ' l l . | v!. | j ' s.a l ( H rel . g. ot t s l wl t l' l . . . OX l 'or
mcn and . I 0 l or women. So. 01 1 av r: t e, women uav l l gi l l l y | |ougc.
rc| igious bc| i cls. l l can be demonl ral ed l i l at l uc .cg.css. ou | uc |uus oc-
tvccnthcsc tvo averagcs. As .| was ment i oned ca|| i cr, luc b coefl i ci cnt
(b)cqua|sthcchangciny assoc. alcJ wi l h a | -u u . l changci u x. Fo| l oving
thi s|ogic,b indicatcs thcchangc . u .c|i gi ous oc| icIsvhcna ma|crcspon-
dcnt (O) is comparcd to a lcma| c rcspoudcut (l ). ThcrcIorc, b cqua|s
( . lO - -. O)/ l - . l . Thus, thc b cocu ci cnl cqualsthcmean dif ference
bctvccn mcn and vomcn. Rcca| l tha| an intcrccpt (a) is thc mcan prc-
dictcd scorc vhcn al! xvariablcs - O. |n | hi scasc,thcintcrccptcqualsthc
mcan Ior mcn (-. O) as thcy scorc O onL. Thc mcaning oIa and b arc
summarizcd in|igurc3 . 52.
. 1 0
-. 08 (=a)
- - - - - -

- - - - -
a ~
- -
0 1
& &
- -
& &
& & &
& & & &
( . 1 0 - - . 08) l 1 = . 1 8 (= b)
codi ng: 0 = male, 1 = female
Figure 3. 52 Meaning ofb coefficient in dichotomous (dummy) variables
Tabl e 3. 53 shovs rcsul ts Irom a rcgrcssion ana| ysi s in vhich religious
he/iefs is thc dcpcndcntvariab|candsex is thc prcdictor(Oma| cs, mcan
sco|c on y = -. O and l-Icma|cs, mcan scorc on y - . | O). Thc intcrccpt
( a auJl hcb cocIcicnt(b) cqua|thcval ucsinIi gurc3. 52.
Tabl e 3.53 Results from Simple Regression Analysis, y = Religious Be
liefs, x=Sex (O=Male, /=Female
Rel i gi ous bel i efs (y) b coeffi ci ent standard error beta p (two-tai l ed)
Constant (a) -. 08 . 03 . 022
Sex (b) . 1 8 . 05 . 09 < . 001
Whcn additional prcdictors arc inc| udcd in thc rcgrcssion modc| thc i n-
tcrprctati on oIthc intcrccpt vi | | changc. | t thcn .c|c|s | o |hc mcan prc-
dictcdscorcvhcnall predictors cqual 0. 1hc . ul c.p|c|at . ouor l hco cocI-
ci cnt Ior sex sti | l | uJ| calcs | uc mcau d i l 'll:n.: nc hcl wcen meu and
WOi l l el l , hut l l i i l t nw . t i i J ' t l . t k t l l ) t l l l c or 1 1 1 on .: cont rol var i ; t hk i nt o : t c
count . | o. . us| . . | . . _ l i l c v: l l t : t hl l 'ducutiou cou l d be added l o t i le n 1 odel
becauc i t may cxpl : t i | | wi l y men and women d i fl cr i n |c| | g. ous bcl i d- ;.
Ou avc|agc, woui c. . obt a i n sl . gul | y | owc. cJucal . oua| | cvc| s l uau mcu
( scc 1ao| c 3. 43 and a | owcr cduca|ion is associatcd w. | h sl |ougc. |c| . -
gious bc| icIs (scc 1ah| c 3. 5 l). This rcsu|ts in thc lo| l oving mcdiation
modc| . Sex - Education - Religious Beliefs.
Tabl e 3.54 Results from Multiple Regression, y=Religious Beliefs ,
xl=Sex (0= Male, 1= Female) and x2= Education (in years)
Rel i gi ous bel i efs (y) b coeffi ci ent standard error beta
Constant (a)
Education ( i n years)
. 66
. 1 5
-. 07
. 1 3
. 05
. 01
. 08
-. 1 4
p (two-tai l ed)
<. 001
<. 001
Tab| c3. 54 shovsthatthcbcocIcicnt(andbcta)Iorsex hardl ydccrcascs
comparcd to thosc in Tab|c 3. 53. So, cvcn aItcr contro| |i ng Ior educa
| iona|di IIcrcnccsbctvccn mcn and vomcn,vomcnon avcragcsti | | havc
signicant|y strongcr rc| igious bcl i cIs comparcd to mcn. ThcrcIorc, i | . s
un| ikc| y that vomcn havc strongcr rc| igious bclicIs, because thcy havc
|ovcr l cvc| soIcducationthcnmcnon avcragc. So, vc sti l | do not|uow
why vomcnhavc strongcrrc| i gious bcl icIs comparcd to mcn. Onccou| U
addothcrxvariab|cstotrytocxpl ai nthi srcmarkab| cdi IIcrcncc.
A val i dob cctionagainstthc usc oIthc variablc education in years is
|hat it mcasurcs somcthi ng di IIcrcnt than thc highcst comp| ctcd level o|
cducation. |orcxamp| c, in thc!cthcrlands,pcopl cvithSccondary Voca-
tiona| Schoo| and pcopl c vith A | cvc|s typica|| yvcnt to schoo| Ior | hc
samc numbcr oIycars. Sti l | , it is thcorctica| l y pl ausi b| c that thc |a|tcr
havc vcakcrrc| igious bc| i cls compa.cd to thc |ormcr. ThcrcIorc, cduca-
|i onmcasurcdi nycars may oca poo. i nsl .umcn| . On |hcothcrhand, |hc
variablc educational level . s o.o. ua| Whcu wc l|cal |his variab|c as au
intcrval variabl c, wcmus| assu 1 1 1 l h: t t | uc exacl di l 'l crcnces (o| intcrva| s)
bctvccn subscqucn| cal cgor i s ( i . . , kvel s ) < t re | uc samc auJ |uowu. l u
| hc data sct each educ< l l i ol l : t l |vc i s nH kd I po. u | u . guc| l hau | hc p|c-
ccding ( |ovc| l evel . Thi s l l t t p l ws l l t : l l l ' ! l l t t ' al i ol l : t l l evel . uc.cascs al a
constan| |acl o. | I ) | | | I l l , Ol l kl , l l ' i l l l ' l t l : t l y '\l ool ( 1 ) , |owc. Voca-
| . oua| Schoo| ( } _ l . t l \\1\' t . l' ! l l l t dl t t y '. l 1 1 11 1 l | \ ) '. . o a| a.y Voca| . ona |
Scuoo| ( 4, I l . evl ' l :: ( i , . . . . ' t I I Y I , . . l | . j ) l wt | | i ) , Ti l e order . s . ndi s-
pul ao|c( cv |y Sl l hl'I J I I I ' I t l lt v . l 1 l t q l n 1 ) I t i l l | ' . . .l l l t s l : t l l l 1 ; 1 ct or ( I ) . s.
| | 4
Jtudd | hC v i i | . i |h : lui / iiucl l: i l l t | ht | | i t|l V | | l i u| d|
n | l . i l h
uSSum| l tu>, u | l > l x dl l c: i l l < l l l n l hv l > h. i vt l t|C l |cul td n> -t i | i l t v. il |
abl cs. J|CSC vu|l ul| c> ui dl ti tl tl | | t i i - v| l h >ct|C> 0 . i i i d I . i i i t l . i | . t n l hd
dummy 1artable\ ( duui l ny l | i l l i t | | i c. | | | l l | tl ` SulSl l l u | c ) . l l i dul i i | i
variablc elementary \tliuul | i i t l | i dt > u | | l ud| vl duul S vl | h cl cucul u|
school as thcir h| |CSl l cvc| tl ti i i | t| cd cducull Un (thcy u|c ctdCd l ).
Rcspondcnts vho Ctml C| Cd h l hC| cducu| | t|u| | cvc|s scorc O. Dummy
variablcsarccrcatcd loru l l t| hc|( 5 ) cducul l tuu| l cvcl si nthcsamcvay.
I ntui ti vcl y, onc mi ght CxCc| | |u| a | | > l x dumm variab| cs shoul dbc
addcdtothcrcgrcssi onmodcl . | | tvcvci, lt| mul |CmuIiCurcasonson|y5
dummy vari ab|cs can bc uscd. Jt uudC|S| u|d v|, vc vi l | rcturn to thc
variab|c sex oncc morc. Instcad o|u>| uy | |l S di chotomous vari abl c, vc
cou| dusc tvo dummyvari abl cs. male ( 0 - |Cmul 0, I - Malc)and female
(O - Mal c, l - lcmalc). Hovcvcr, UCtuuSC ||C arc cxact|y oppositc to
cach othcr, Pcarson` s corrclation cocIci cnt is cxact|y - l . Wi thout addi-
tiona|mcasurcs,iti snotpossib|ctoaddvariablcsthatcorrc|atc 1 (or +l )
to arcgrcssi onmodc| . lortunatcl y, thisi s notncccssary hcrc bccauscthc
b cocIli ci cnt Ior sex i ndi catcs thc mcan di lIcrcncc bctvccn mcn and
vomcn (. l , scc 1ab|c 3. 53). 1hi s oIcoursc is al so thc di IIcrcncc bc-
tvccnvomcnandmcn- vc usthavctoaddaminussigntothcbcocI-
ci cnt.Hcncc, cithcrthcdummyvari abl cMale orthcvariab|cFemale can
^ov backto our six cducational | cvcl s. lach dummy variablc is pcr-
Icctly corrcl atcd to thc combination oIthc vc othcrdummy vari ablcs.
1hi s poscs no prob|cm vc dummy variab|cs arc addcd and consc-
qucntly thc vc rcsu| ti ng U cocIcicnts rcprcscnt mcan di IIcrcnccs Irom
thcsi xth (cxc|udcd)dummyvariab|c. Thisal so ho| ds vhcnothcrprcdic-
torvariablcsarcaddcdto thcmodc|- onlythcmcan di IIcrcnccsarcnov
contro| | cd Ior othcr variabl cs. 1hc omi ttcd dummy variablc i s ca|l cd thc
reference category. Ccncra| | y, a rcIcrcncccatcgory is choscnthatcorrcs-
ponds to thc di rcction i nthc altcmativc hypothcsi s. lnthis casc, elemen

tary school i sa good rcIcrcncc bccausc vcthcorcti cal |y cxpcct rcl igious
bcl i cIstogctvcakcrascducationa| l cvc| sri sc.
1ab| c3. 55shovsthatvhcncducationa|| cvc| sarci nc|udcdasdummy
variab| cs(i nstcad oIycars oIcducation), vomcn arc sti| l morc rc| i gious
than mcn. 1hc di IIcrcncc ( . l ) i s comparablc to thc di IIcrcncc in 1abl c
3. 54(. l 5). So, analyzi ngcducationa| | cvcl s instcad oIycars oIcducation
docs not changc our conc| usi on that mcn arc l css rcl i gi ous than vomcn
cvcnvhcn accountingIor cducati onal di fIcrCncCS. Ju|| C 3. 55 a|so shovs
that a|| cducational lcvc|s d| IIC| S| ul | 1cuul l ( a . 05 ) |Jtm |CStudCnls
vith clcmcntary schtt| uS | |Cl | | | yhC>| | cvcl tl ttu| tl cd Cducul | Uu.
1hi sal so |0| dS It| Ct| C v| | | l tvc| vt u| | t| i ul t di i t. i | | ti i , l't . i l | St | |C
. u| ciu| l vc hyt| h >i > i > di ittl l ti | u| ( l hc h l y|ci | hc cdut. | | | tu. d | cvc| , l hc
vcukci iCl l yl tu>|t| | l 's ) . l hC uS>tcl u| Cd ( . 057 ) ucCd> |t lc dl v | dcd |y 2
JhC l ulC|cC| (. 24) | > | hC |Cdl c|Cd mCu| tu |Cll y| tu> lC| | cl S | u uul c>
( >ct|C 0 tu sex) v|t |uvC c| cmcntary schoo| i ng( Sct|C O tu a| | 5 dumuy
vari ab| cs)asthci rh i ghcst l cvc| oIcducati on. Thi s typc ol |CStudcul i s
rcprcscntcd i nthc samp|c, thcrcIorc, thcintcrccptandthcuSStc| ulCd | cS|
astovhcthcri t di IIcrsIromOcanbci ntcrprctcdmcani ngIul | y.
Table 3. 55 Results from Multiple Regression Anal|sis, |-tlttuit\ l|t-
liefi, x variables: Sex (0= Male, 1 = cmulc/ uuu 5 .u/t:u-
tional Levels (Elementar School is cJcrcurt Iultiu/'I
Rel i gi ous bel iefs (y) b coeficient standard error beta j ( l wo I ; i | | i < I )
Constant (a) . 24 . 08 ( )( ) ! )
Sex . 1 6 . 0" UH ll |
Lower vocati onal school -. 1 7 . 09 | I| I U' l
Lower secondary school -. 26 . l
I I 1 1 1 1 I
Secondary school -44 ) ! , | 1 l | 1 11 1 1
0 levels -. 75 l | ' l | | | | |
A leves and more -. 48 || l I l l 1 11 1 1
Additi onal | y, Tabl c 3. 55 >uyyc>| > l l . i l | t i| i t l i i | \ i | | i l \ i i l l i dM
schoo| i ng,havcVCukC| |C| l y| tu> | l i l s | l i . i i i | l i i .t tv i | l i l 1 1 i i o i l | nH
schoo| i ng(mcan di IIcrcncc. -. 2(> . l I 01 ) ) l 1 1 |i | V 1 , , l | i i 1 1 1 1 | | |
Icrcncc i s signicant, thc eleme11/t n: 1 ' \/iil d| i i i | | i i \ \ . i i i . i l l i t dd d ' "
thcrcgrcssi on modcl , vhi l c | |C lui |t i i c/i / / . / / Y t /i l di i i i i i i i \ i i l d + l
i srcmovcdandnovscrvcsuS | |c |c l c i | it l ' : l t q, l l ! y |' t I ` l o ld1 I ! |
Table 3.56 Results from Multiple t(tt\\ti| .| / / i l i '\\

| ' lit i | i \ /11

liefs, x variables: Sex (0 ^ul: , I l /i|li l . tt | / / /| | i
tional Levels (Lower |uru/tuttuli l:/:i u | ':l '. / | ' l
Rel igi ous beliefs (y) b coeficient standard error bol ; l ( l wo | . i i | i d )
Constant (a) . 07 . 05 t BD
Sex . 1 6 . 05 . 08 . 00 1
El ementary school . 1 7 . 09 . 05 . Or/
Lower secondary school -. . 06 -. 04 . T 4
Secondary school -. 27 . 1 3 -. 05 . 044
0 levels -. 58 . 09 -. 1 8 <. 001
A levels and | . 31 . 07 -. 1 2 . 001
Tu| s aJJ | | ou. d . . . . a ' ys s s| . ov. | ' . . | ' . . assoc | . . | cJ | \t i 1 1 1 ! 1 1 o . d. . . s
.1 34 auJ l uc || p' . | . . | . . | . J . s
0( 1 / ' : | | uc . . | c. . . . J | |. . I ` |c. wcc. .
Lower Secondwy Scl/ 1 1 1 11 . . . J I I I I l 'I ` l ' rwulirmol ,)'c!Jori l. | ' . . . s, . . ' . . . . u o|
.O5 thcrc | s uo| cuoupu s| . . | . s| .. d . v. J . i cc | oacccp| | uc | i ypo| ' . cs| s | ual
rcspondcnts vith a | owc| voc. . ' | o . . ' cJu a| ou uavcwca|c||c' | g| ous hc-
| icIs than rcspondcul s w | ' . |ow | s u| . |y cJucal| ou l cvc| s . Notc that
vitha |css strict hut acccp| ao| | ..s| . | l . I 0, thc a|tcrnativc hypothcsi s
i sconIrmcdhccausc. 067 | soc' ow . I 0.
Thc hcta cocIIcicnts assoc. a| cJ w | u cacu cducationa| |cvc| Irom Ta-
h|c 3. 55 and Tah|c 3. 5 a|c uo| vc|y . u | ni a | | vc hccausc thcsc arc a| |
rc|ativctothcrcIcrcncc catcgo|y. Wucu a J| ||c|cn| rcIcrcncc catcgory i s
sc|cctcd, thchcta cocIIcicnts w. ' | cuaugc ( compa|c Tah|c 3. 5 to Tah|c
3. 55) . |t i spossih|c,hovcvcr,tocompulca comh. ncdhctacocIcicntIor
a|| dummy variah|cs to mcasurc |uc |ola' slanJa|d| zcd cIIcct oI cduca-
tiona||cvc|. This hctais knovnasthcsheaf coefficient.
|urthcrmorc. it is assumcdthatthchcocInc. cut Io|sex is cqua| across
cvcry cducationa| |cvc|. Hovcvcr, an i ntcraction hctvccnsex and educa
tional level maycxist.
|i na||y,vcvou|d |ikc tocmphasizcthatit i sgcncra| |y prcIcrah| cnot
to add| argc numhcrs oIprcdictors to arcgrcssionmodc| ( ' |css i shcttcr`)
as it kccps thc modc| parsi monious. It i s possih|c to tcst vhcthcr thc
modc| vith thc Ivc cducation dummy variah|cs ' Its` thc data morc
c| osc|y than thc parsimonious or rcstraincd modc| vith cducation in
ycars. Wc c| ahoratc upon shcaIcochcicnt, i ntcraction modc|s and tcsts
Iorrcstraincdmodc|sonourvchsitc(http.//vvv. ru. nl/mt/statistics/homc).
Lnear Regression Analysis: Assumptions
1uc|ca|c |ouri mportantassumptionsassociatcdvithrcgrcssionana|ysis,
wui cuvi | | hc discusscd i nthc ordcroIimportancc. Thc rstassumption
isthatthc mcanoIa||crrors (i . c. , thcmcanoIa|| thcdi IIcrcnccshctvccn
ohscrvcdandprcdictcdy-va|ucs)i s0 Ior a||(comhinationsoI)x-scorcs i n
thc popu|ation. This assumption i svi o|atcd i nthc casc oInon|i ncarrcl a-
tionships. This assumption can hc i ndircct|y chcckcd hy inspccting | . nc
graphsi nvhichthccrrori sp| ottcdagainstthcxvari ah|c(s). |orcxamp|c,
thc non|incar rc|ationship in |igurc 3. 34has a mcan crror that i s most|y
positivc Ior youngpcop|chuttuns ncgativcIorthoschctvccn55 and ?O
ycarso|d. Thisnon|i ncaritycan hctumcdintoa| i ncarrc|ationshipvi th a
transIormati onoIthc variah|cage (sccourvchsitcIordcta . ' s) .
Sccond|y, i t i s assumcd |ha| a | | crrors a|c | udcpcudcu| . Tu| s mcans
thatthc va|uc oIauy c|o| docs uo| dcpcnJ upou luc va ' uc o|auy o| ucr
crror. Whcnthc crrors arc dcpcudcu | ou cacu o. | i c| | | u. y | uJ| . . . | c uou-
1 1 1 1 r on1 1 r 1 1 . .a . . .
. .
. . . . . . . |,
| | . . c. .| yo|| | i aosc. . .. o| . . .cO uo|c . . . oi| . u| pi cJ. c' ;
1 ` ` " "
' j ` . . . .c
. | I uc vanao|hodt !Jet !11 s. o| aJJcJ | o a uoJc| wuc||s h .
. .

c . gu|c
( cuJcu| vanao' c auJ | uc Jc(cuJcul vanao| c | s /J()( /l ' O; ':
1 - - -l . . . | . . | c
3. 49) l uc wc| gul o| youugc| pcop| c . s sys| cma| . ca' | .
. 1 . 1 .

, gc . s
( ovc|a ' | pos. | | vc c||o|)hccauscyoungcrpcop| ca|c la| | c| uuJ

| pcopc
mcausl uat thc va|ucsoIthc crrors arc positivc|yco||c| a( ou av
( \ , | ic cou-
a|c systcmat|ca||yovcrcstrmatcd(ovcra|| ncgati vcc||o| 'el. Oi l
l |o ' variah|cbody weight hccauscthcyarc shortcron avcw. l uo

. . . \ ic popu-
T|ud|y, a| | crrors +c assumcdtohc norma||y d s| nh agc.
c||o|s | s
|ation Ior a|| (comhmat|onoI)x-scorcs. Whcn thcJ| st

cJ | . .
1 1
strong|y skcvcd in ci thcr dircction, it may | ndi cal c u l | ou

1 .
d H.

' ( I c| . s
ahscnccoIonc ormorc i mportantprc ictors. | slog|ams i . uca '

. . I d
. -.
i nvhichthc distrihutionoIcrrors I S d| spayc , a|c com i uJs,
indircctchcckIorthisassumption. ouI y
1 1 1
. Y
1 1 ,
Thc Iourth and I na| assumpl | ou |c| al cs lo uoua
1 .


I l 1 1 1 1 1
mcansthatthcvarianccoI thc crror sassuucJ | oI K c . . ( . . Jas ,
| \ [
r I
hi nationoIx-scorcs. V| oat|on o l i | s ass . | . o . ( s.. l | .
' ( 1 | - , .
1 1 l .
|cadto scriousproh|cms whcul uc van. | . .c. J. | | cs . A 1 ' 1 1 1 1 '
. . l 1 1
numhcroIcascsuscdt oca| cu| a| c| uc vam. | . | .

. J | | . | . 1, .i _
| |
gory oIvariah|c x (scc a|so | uc . . ss. . . . . ' . . . s 1 1 1 1 l l I I , 1 1 1 |
' .
somc ru|cs oIthumh) . | n | uc casc o| sc. o. . 1 1 1 1

1 1 Q 1 I |i q
( WcightcdLcastSquarcs) |cg|css ou | s . . . . . . . . . . . t 1 1 1 1 1 1
j |
|i na| |y it shou|d hc uo| cJ | ua| | | . . 1 1 "> 1 1 I ' u l 1 1 1 1 1 1 1
( 1
. . . .
ana|ysis can hc scvcrc|y . u | ! ucuccJ o v 1 1 1 1 ' 1\ i u i ' 1 1 \ . . . , l q d1
' I
1 I j l l l 1
sma|| samp|cs (n 2O0). 1ucsc oos. v. . | . . . v d | . 1 1 1 1 1
tivc|y |ovor high scorcou l ucx-v | ao|| . i \ o. . . | . . . o Y
\ 1

; ;
t | ' . i ]
I / 1
I l I l l l
Influential cases canhcdclcclcJ oy. . u. . ' y : . . j 1 1 1 1 . . .
1 1
t l 1 1
vchsitcIorsomcncvdcvc| opucu| s| . . 1 1 1 1 .' | | o i
1 1 1 1 1 1 1
3.7 Summar
| . . q | .
Toconc|udc vcprcscnt|hcmos| | upo|| a. . ' . . . ' o . . . . ' . . . 1
1 1

| _ '
in thc tah|cs hc|ov. Bascd ou | uc: ucasu|c. . . c . . . |v| 1 1 1 ' "' 1 1 1 1
oncormorcappropriatcs|a| i s|| ca' | oo' s a|cs c s| . J

| c
Table 3.57 Univariate tests

, . . | .
di chotomous nomi nal ordi nal
Test for
Recede as Recede as di chotomous . . . .
|os|1 01
di chotomous vari ables test for proporlio,
1 1 1 10 1 1 1
variabl es assume interval level i nsle<

r |
test for proportion of ordi nal l-test for mea

| 1 I\
Table 3.5X l I OIiOlt
| nUCCnOCnlvtt DI C ( x)
variable (y)
|Ou| | i | ordi nal i |' l.Va/. | | | O
nomi nal
Cramer' s V
(ml t i | mi AI ) I gi tic regression anal ysi s *
Kendal l ' s tau-b and c
ordi nal Cramer' s V
Spearman' s rank correlation (rs)
ordi nal regression anal ysi s *
Pai red samples t-test
Odds ratio (for di choto-
Two sampl es t-test
interval and Analysis of vari ance l
mous vari abl es)
Spearman' s rank corr. (rs)
Li near regression analysis
Pearson' s correl ation (r),
( predi ctor as dummy vari ables)
Li near regression anal ysi s
Table 3.59 Multivariate tests
variable (y)
nomi nal
ordi nal
i nterval and
nomi nal
I ndependent variable (x)
ordi nal i nterval/ratio
Cramer's V
(multi nomial ) logistic regression anal ysis *
Kendal l ' s tau b and c
Cramer's V Spearman' s rank correlation (rs)
Ordi nal regression anal ysi s *
Multi pl e l i near regressi on anal ysi s
(nomi nal and ordi nal predictors as dummy variabl es)
* see http://www. ru. nl/mt/statistics/home
Concluding remarks on Statistical Tools
Wccndcdthsbookvith1ablcs3. 5 - 3 . 5. I nthcscsummarytablcsvc
prcscntcommonstatistica| tcsts uscdi nthc soc|a| sc|cnccs. Hovcvcr thc
di si p| | ncoIstat|
st|csi svcrymuchal | vc andmorcadvanccdanalyscarc

b| cdcpcndmg on thcrcscarch qucstion. lor cxamp| c, mixcdmodcl

tchmqucs that arc uscd Ior units at dIIcrcnt | cvc| s oI anal ys| s, havc
gancd much attcnt|0n rcccntly. Lkcvisc, nnovations arc constantly
bcmgmadc n statistca| packagcs such as SPSSaud IrccsoIvarc such as
R (ttp. //vv
v. r-procct.org) || ua| | y, du| ug | hc | as| 1 5 years pract| cal
app|icat|ons i ncrcasmg| y havc become t he | uso'

s' . | . s' . cscourses. We

hopcthat'/u///t:u/ !Jo/ cont ri but es mc. . . i j| . ' | v | o| | s J vcl opment .
associ at i on 45-50, 76- 1 0 I , I 07
anal ysi s of vari ance 73-77, I 1 8, 1 25
al pha(o) 5o- (
al phauumcr|c 1 1
altcrnat| vchypothcs| s 5o-ol
anova 4, l 25
ari thmct|cmcan 3 l
assocatoni ncont tablc o3, o, I O4
assumptonsrcgrcsson l l o- l l
bcocIccnt 4-o, l l 2
barchart 23-25, 5O
bctacocII ccnt l l O- l l o
bctvccnvarancc 3-o
b| nomaldistrbut|on o, l 24
bvar| atcanalyss 4o, 5O, l , l l o
B Ml bodymassndcx 2, O, l 24
BonIcrrontcst o, l l o
boxplot 3o, 3,4o, 5O
ccntral| m| tthcorcm
l 2, l 3
o-, l O l - l O
l o, 5 l -52
2, 2, 32, 5O
l O5
oO-o3, | l o, l 25
4o, l O7- 1 08, 1 1 1
86-87, 90, 1 2( 1
. ) c x
conIoundngvariabl c
contingcncytab| c
1 02, 1 01 1
77 71 > . XO,
< ) 8, 1 0 1 , 1 ` '
control l | ng . I , I ( )
corrccti ou | | or l l i l l l l i i Y f l , l r K, I ' I
co|re| a| | on . , X\KK, ' i I / 1 , 1 1 I , 1 1 H
Cramcr` sV HO H I , I I H, I ' '
cum. percentages
datav| cv
dclicatc qucstions
dcpcndcutvariabl c
n, 30, 37,42
I 1 -22
1 7
1 1
1 22
4, 5O, 78,
3, 1 1 8
dcpcndcntcvcnts |, l | 1 1 8
dcpcndcntsamplcs 68-70
dcscrptivcana|yss 23,45, 50
dIdcgrccsoIIrccdom l 22, 1 26, I 7
dichotomousvarablc 1 4, 65, 98,
di rcctcIIcct
1 1 2- 1 1 6, I I X
I 05, 1 l0
59, 6 1 , 8 1 ,
l. .
drcctionoIassociation 84, I 0(1
discordantpairs oo-87, 90, I _( \
dstrbuton,uni varatc 23-44, 50,
1 1 7
dummyvariablc l l 4, 1 1 8, 1 28
cmpr|calrulc 44,45, 0,63
cxacttcst o2-o3, o,O, I 0 I
cxpcctcdIrcqucncy 80, 2
cxpcr| mcnt |,I 8
LAp| a i uLd variancc 97
| .a | ouI prportion
' .' Q| l y t abl e
I ; u | . o . | ou
( .' l l ' SI
I \ : d I l l '
l ' l l l l ' l l i l r ;: l l l l l l l
l ' l l l j i l l l r . d o.

. .i pl i ( ) l l
62, 65
n, 50
75, 1 25
73-77, 1 1 8
1 6, 20,4(1 , 5 I
2:,4(1 , 50

Вам также может понравиться