You are on page 1of 32

# '.

'&
ti

jp
tsBnJc
r{"
',,,

C h a p e Treso

I1

\
i
\
\
Model
The SimpleRegression
\
\
\
modclcantrcuscdto stLrcly
hc sinrplcregrcssion therclationshipbctwecntwo
variables.For reasonswe will see,the sirnpleregressionurodel haslirlita- \
it
Ncvcrthclcss,
anall'sis.
tionsas a gcncraltool lor cnrpirical is sorttetintcs
appropriareas an empirical tool. Learning how to interpret the sinrple regression
m oc lc l i s g o o c lp r.a c ti c cfo r s tu rl y i ngrnul ti pl c rcgrcssi on.w hi ch w c' i l do i n subsc-
q u c n tc h r t p t c r s .

## 2.{ REGRESSIOIU MODEL

DEFITUITIOIII OF THE SNTUIPT.E
Much of applieoeconometricanalysisbeginswith the follolving prentise:.l and r are
somepopulation,attd we are interestedin "explainin$i'in t',vovariables,representating termsof r," or in "studyinghow .]'varieswith changesin r." We discussedsolTle*x\tt- ples in Chapter i, including:.r,is soybeancrop yield and "r is amountof fertilizer;,1is Sclurlywagc ancJ,ris yearsof cducation;.\'is a comniunityCrimerate and .r is nuntber of police officers. \ In writing clowna modelthat will "explainl,irt tertnsof .t." we musi confrontthre\ issues.First, sincetirereis ncver an exactrelationshipbetrvccntwo variables.how do\ \\ we alloiv {or ofter factors to aifect -r,'?Seconcl,r.vhatis the functional relationship \ \\ bctwccn -),anclr? AncJtirird, how can wc be sllre wc are capLuringa ceterisparihusrela- \ \ tionshipbetween,vand ,t (if that is a desiredgoal)'l \ \ Wc can resoiyethesearnbiguities['ry rvriting down an equationrelating1'to r. A \ I s im p l ee q u a ti o ni s ## . I l:Fo*Br-r*a. , , , , , , ( 2 ' 1 1 ,\, , , , i Equation (2.1), urhichis assrlmedto hold in the populationof interest,definesthe sim- nnde[ lrlc linear regressionmodel. It is also calleclthe In'o-trnriablelinear regre:;.sian or ltit,oriatelinerir regres,siort mot{elbecause it relates the two variables.r and t'. We now c ' lis c u stfis c n rc a ri i n go l -c a c h o l ' tl rc rl ui trtti l i csi rt (2.1).(l rrci dcntal l y, thc tcrm " rcgres- sion" has originsthatare not especiallyimportantfor rnostlnodernccollometlicappli- c t r [ ir l u ss, () w c ri ri l l l ro t c x l tl a i rri t l r crc.S cc S ti -ql cr' l l 9tl (rll i rr l rt cngagi rl ghi sttl ry< tl ' r c gr c s s i o na n a l y s i s .) 22 Model The SimpleRegression .,c:ptcr I havsseveril!rJifterent nantesused when relatedby Q.l), the vafiables1land.l ,n *'ari*ble' explainrdvari- interchangeirbly, asfollows. is calleclthe.clg*adent 'i[1e the variable, or regressapd.-r :s called able, the responsevariable, the predg*ect variable' the control vsriable':he pre- the independentvariable, tfueex-gdatcry (T.hetern covariateis alsousedfor x.) Tnetertns dictor variabre,or the re:=;f.on ..dependent uoriuil{, -a,e-{nclepe*dent variable"a'e frequentiyusedi* *ionomet- "'independenr" herecloesnot reler tu drestatistical rics. But na o*=i.rlrat'the label rlnciom variables (seeAppendixBi' '"-in" of ii:,:€Fcfil*n.. betwgg.n notion ancl"explanatory" ,r..:rii;,explainsd" variabies arepro"\ablythemostdescrip- in the experim*ntal science.s'wherethe 1,, .i.esponse', and"co:itrci"areuseclmostly ..;frable-r rs undertl.ieexperimenter's controi.we u,ill not usilthe "pr:dicted. te-rms vari- ..predictor,"althoughyou somctimcsscc thcsc.Our tclniiriolcrgy for sinrple *;--o regres;ionis summarized in Table2' 1"' Tbbla 2.{ for SimpleRegression Termirrology v .r .l: ,. ,., : . ::i,..;.- _,::.:i." DependentVariable IndependentVariable ExplainedYariable ExplanaLoryVariable ## ResponseVariable Control Variable PredictedVariable PrcdictorVariablc Regressand Regressor , called nqltcrtthe rhe error relationsnip,represents rnce in the relatlol effor term or disturbance The variablez, 1n treats all fac- eft-ectivei,i' factors other than r that affecl,rl,A simple regressionanalysis think of a as stand- tors atlectin-q.1,other than-x a3'beingunobser'red.You can usefully ing for "unobscrvcd." Ecuetion (2.1) also adclresses the issueof the functional relationshi;:betweenrr and zero' .\fi : 0, then x -1.Xf ise clrer faciors in u are,heldtixed, so that the changein lr is has a line.urcffect otr y: l A] : F,Ar if Au : 0. (r:2I This'i'reansthat BI is ThuS, the changein .y is sirnply B, multipliecl by the changein .r. between y and.r holding the ciher factorsin u t5e slope parameter in the relationship economics. The intercept paxameter Bc,also fixecl; i1 is of primary interest in applied has its uSeS,although it is rareiy central to an analysis. I l'(r,9ie::t ot r Ar r.rly',r', wt tl r C rcl:'-- :cclt c-'rt al I'r'rl''r Part -**"*:**---- w, y" s.u gul p ,* f;; ? ? (Soybean Yield and Fertilizer) by the model th a t s o y b e a ny i e l di s d e te rm i ned S uppo s e * u, yield: Fo+ B,Jertilizer (2't) ## researcher is interested in the effectof so tlraty: yieldand x : fertilizer. The agricultural The errorterm u f er t iliz eor n y i e l d ,h o l d i n go th e rfa c to r sfi xed.Thi seffecti s gi venby F' . Ti tc l , so < .rrr. cocffi ci cnt mcasurcs thc c ont ai n sfa c to rss u c l ra s l a ri ccl l u a l i tyr,, ri rrfal.tttd Pr : effectof fertilizeron yield,holdingother factorsfixed: Lyield B,Lfertilizer. Y" SL IY} F , 1 ^ # ;T ? 1A S imple W a g e E q u a t i o n) s a g eto o b served A m od e lre l a ti n ga p e rs o n 'w and other unobservedfactorsis educatton ## ,vagg.: 9a* Bteduc* u. (2.4) ## i n d o l l a rsp e r h o ur and educi s yearsof educati on' then p, measures lf wag e i s m e a s u re d t he c h a n g ei n h o u rl yw a g e g i v e na n o th eryearof educatton, hol di ngal l otherfactorsfixed. S om eo f th o s efa c to rsi n c l u d el a b o rf orceexperi ence, rnnateabi l i ty,tenurew i th current em plo y e r, w o rk e th i c sa, n d i n n u rn e ra blother e thrngs' on y' The linearity of (2.1) implies that a one-unitchangein "t has the sunrceffect regarclless of t5c initial valueof .r. Tl-iisis unrcalistic1or ntan)/ecollonlicapplications' for increu'sing Foi exarnple,in the rvage-educationexampie. i.l'e might want to allorv has a lurge r el'l'cct on wages than did tltc prcvious rcturns:the ncxt ycar ol- educaticln year.we rvill sec*howto aliow for sucit possibilities in section 2.4. T h c .to s t ri i l ti c u l t i s s u cto a c l d r css i s w i rcthcrnl odel(2.1)real l yal l ow susl tocl raw cetcrisparibuscoiiclusi0ns abouL horv .r al,f-ecrs J,.we.iust saw in equation(2.2) thar Bl l h c c t' t' c c t 4, r : . r,rrrc .s u rc g l ' .r' tl ry r'h, okl i l n-1] l l otl tcr Ii tctcl l '(isrt rr) fl xcrl .[s thi s thc cncl of the causalityistue? Unfor[unately, no. Horv can we hope lo learn in -{etteralat'loitl" ilre ce[erisparibusef'lbctof ,r,on-t,, holding olhcr factors fixed" when we are ignoring all those othcr factors'l As we will seein Section2.5, we are only able to get reliableestimatorsof B,,and wc make an i-rssumption restrictinghow the B, lrom a ranclonrsamplc of data when Without such a rest:ictiol']. we unobseryableu is relateclto the explanatoryvariable.r. Because u and x are random rvill not be able to estimatethe ceterisparibuseffect,B,. variables,we needa conceptgroundedin probability' Ilclbrc wc statcrirckcy assLlllption aboutliorv.randu arc rclatcd.[ltcrcis cltteassulnp- tion aboutrr thatwe can ahvaysnrake.As long as the intercept. B,,is includedin lhe equa- zero' tion, rrothingis lost by assumingthat dre averagevalue ol u in the populationis 24 Model TheSimPleRegression ChaPter 2 MathematicailY' (2.51 E(il) : 0. , ' i , . ## betweenn andx but sim- assume (2.5) says nothingaboutthe.relationship lmportantly, unobservabtt:, the populat'ion' ply makesa srarernent, aboutttre ctisiiuutionof the 1l (2'5)is nol very assunr-pilon tor'iustration,we canseethat Usingilrepreviourr^u*pies by normalizing theu'observed tactorsaffect- restrictive.In Example2.1,we losenorr,ing zero in the populationof quality,to haveun uu*tnie of ing soybeanyieicl,lucn as lancl fictors in Examp'e 2'2' wit'out alr culdvatecr prots.The sameis rrueof tt'r.unobserved ability arc zero in the pop- avertrge lossof generaliry, we canassume,ir.,irrringr:11,1: uladonofallrvorkingpeople.lfyouar:n:t:o]1:::.0'you'onworkthroughProblenr (2.1)to make(2'5)true' redefrne the irrerceptin equation 2.2 toseeilnatwe canalways regardinghow a anclr arerelated. A natural We now tLlrn|'othecruciilro*,u*p,ion is thecorreltttion cocJficient' measufeof theassociation betweentwtl ra'dom variabres cl'then'asrar- p'operties') If u and x afe uncorrekfie (seeAppendixB for cletjnitionontt thatu an{ x are uncorrelated goes dorrrvariables, trreyarenor tr,tearll,i-i"..r Assurning in equatlon in which u ttnJr shoulclbe unrelatecl a long way towarddrtiningthe "n" only lineardepen- becausecorrelationmeasufes (2.1).But it doesnot go far enough, counterintuitive feature:it is posst- tra, a sornewhat dencebetweenu ano.rlcorrelatioi with tunctionsof x, suchas for u to be uncorrelated with .r while bein,ecorreratecl tor t'ost ble This possibilityis not acceptable (See Secdon B.4 tor funher discussion') x2. for interpretati'g ^the the,mocieland deriving for purposes, as it causes fiootems x' regression involves expected vtiue'of u giv"en properties. A better assirrnption statisticar *. .on'j.tine theconclitional distributionof , and x are rando'i variables, (or average) Bccausc ,ny uot* of ,r' ln particular, lor any-tr,we canobtairrthc expecrecl rr given or ilrc ctescrined by the valueof x' The clucial for tl-ratslicc fopulation valuc of r.r uui,r. of , doesnor ciepend on trrevalueof r. we can assumption is t'at the averag* write thisas t2r5l , ## lt saysthat' for any wherethesecondequalityfollowsfrom(2'5):*i']'l'^::yj]I^Tequation(2.6)isthe aottro the zero conditional mean assumption' new assunption, unobservabres is the sameandthereforemustequal givenvaiueof x, dreaverageof :h,: population. i1.,,ourrogevalueof u in the entire fo slmnlify the cliscussion' in the wage exan-rple' Let us seewhat (2'6)lentails average levelof ooiriryrricn rz.oi iequiresthaithe assumethatu is the sameasin'are F'r exampie' \f E@bill9) denotes of yearsoi eclucation. ability is the sane regardless years of eclucation'ancl grorp of arl peoprewiilr eight *re averageability for thg thJpoiulation with 16 years of E(nbil116) denoteJrh(rourrog.onifityo*ong p'opft in the average ability Ger. *uit-be the same.rn fact, education,then (2.6)imptiestnar we think thiit average educationrevers.If, for example, Ievermustue trnesam*io, crl (2'6) is false'(This wouldhappenif' on ability increases*i,n y.u,, or..rutotion,then As we cannot chooseto becomcmole eclucatecl') average,peopte*i,t, *ore abiliry average abilityis the no *uy of knorvi'gwrretheror not i:':.,:. ## :...., observeinnateab*ity,we have ::: .r: 2A Analysis Regression Data with Cross-Sectional Part t l*ili1.' :i$sj'snmc bel'orcapplying
lbr all cducltioulcvcls.But thisis ln issuothatwc tnlrstilddrcss
, simple regression analysis.
. In thefertilizerexample,if fertilizeramountsarechosenindepertdently oi otheri'ea-
turcs of thc plots, then (2.6) will hold: the
averagslanclquality will not dependon the
uEsTloru amourlt ol' I'crtiliz-er.Flowr:vcr,if more l'er-
Suopbse that a scoreon a final exam,score,dependson classes tilizer is put on the higher quality plots of
ittenaea (atend) and unobserved factorsthat affect exam perfor- lancl,then the expectedvalue of a changes
manieGuchasstudentability): with the level of fertiiizer, and (2.6) faits'
Assumption (2.6) gives B, anotlrer
interpretation that is often useful. Thking
the expectedvalue of (2.1) conditional on
you expectthis modelto satisfy(2.6)? x anclusing E(r.rl-r): 0 gives

E(ylx): Fo* F$## 5:]iii Equation(2.8) showsthat the populationregressionfunction (PRF),e(yir)' is a lin- ;ill.lll:1; I earfunctlonof x. The linearitymeilnsthat a one-unitincreasein r changesthee"rpecl' a:;.,'-r-'-- i,,,'iE(ylx)as a linearfunction of x. ;i1i;i"1. * l" 26 I : Model TheSimpieReqression Ghaptcr 2 \!.r,',,,'l ffi of )'is cen- valueof 'r' thedisuibution Fo'ony edvalueof v by the amountB,' Y?n . in Figure2' l' i"r.J ououtbOltl, asillustrated The picce Fo* Ftx y into two components' When (2.6),, ,'ut,^iii' osefulto tireat \ti' issornerimesca.lledthe silsteilxaticpartofl'-thatis,thepartofyexplainedby;r-attd \ 'r' We will use is cailed ne unrysle'ioticpart' or the pa'rtof v not exptainedby r assumpdon(2'6)intr'"""-tsectionformodvatingestinlatesofpoandB,.Thisassump. 1orthestatisticalanalysisin Section2'5' tion is alsocrr'rcial LEAST SQUARE5 2,2 DERIVIHG THE OTTDIHARY ESTIIVIATEE ttl?i Nowthatwehavediscussecithebasicingredientsolthesinrpleregressionnrodei,we will address of hol '1esti1a1e issue trreimportant le..l,1;1T"1':: =l fi=1,...,rt! asample from Letf,l^li rhepopuiarion. {(xi,:-i): ill ?iil:?tri'iff';. need denotearandotnsampteofsizerrfrorrrthepopulation'sincethesedat.ercornefrom (2.1),we canwrite (t.bt li: Fo + Br.[r+ t{i ali factors aff'ect- tbr observationisince it contains foreacl.ri. Here, r{,is the error term t-r forlamilv savings andy,theannuai il:f:lxil#, income .x,mightberheannuar :L n : l5' A scat- t.hen ciataon l5 families, l. i. y;:ii;, j duringa panicutar havecollecred 'l:i, ficiitious) aiongwiththe(necessarilv i.'..\ rerptorof such^ o"^ J;, t, *:**'i";;;2.2, r*s:t.:iJ:il:::fl,fll,'"-3J; *.r" dara andsrope ofrheintercepr esrimutes toobtrin in thepopulationregressionof savingson income' we will use tr-,efollowingesdmationprocedure' Thereareseveral*ri i" *.ttvaie impticatlonof assuniption (2'6):in tirepopulation' hasa zero u (2.5)anclan importarrr meananriisuncorrelaterlwit]r'r,Therefore,weseetlratulrasZeroexpectedv:ilueand' u is zero: that the covariancebetweenx and E(a) : 0 i;;F,1!01 Cov(r,u):E(ru):0' t2.rt) iiit' wherethetirstequalityin(2.1t)1ollows196(2.10).(SeeSectionB.4forthedefini. 1i r,r,, tionanclpropertiesot.ouu,iun.,".)lntermsoftheobservablevariablesxandyandthe writtenas unknownparametersp.-tnJ B,' equations(2'10)and(2'11)canbe i; 1'. ! E(v-Fn-Fr'r):o 'ti'lil '(;ffi $$i ;].i.:r.., ', and Elx(y*Fo-F,x)l:0, on thejoin' restrictions .fi111,i,t Equations(2'12) and(2'13)impty two respectively. f:b*lt]:'l to estl- i;1i;i,,: of (.r'-v) m tne Sincethereale two unknownparameters population' i'rl;i;- distribution obnin good esti- (2' i2) and (2'13)canbe usedto mate,we might hope,nui!qu*'o''s i';; -, t, 't = tii *- .tl 1 Regression Data with Cross-Sectional Analysis * 9o +. plincorne E(savingslincome) ## mato of BoandB,. In fact,ttheycanbe.Givena sampleof data,we chooseestimates liri Foaodp, to solvethesanrylicounteJpartsot (2.12)and(2.13): x;rrl!:; i l. " , : , ;:.i:i: . -s n ## , n - ' Z ( f , - & - F ' x r=) 0 . i'€'',;::ii . ial , .t : : . ## n-' Z,\;(-\';- Fo- F,.t,)= 0. -r\a ' A A (SeeSectitlnC.4 This is an exampleof the.mttltodof ntomentrapproachto estimation. Theseequations be solvedfor approaches.) of differentrestimation for a cliscussion can Po-d F,. Using the basicpropertiesof the summationoperatorfrom AppendixA, equation *.1a:,: (2.14)canbe rewrittenas !': ## iir I !: Fo*F' r , ;riiqr,, 2a ....1.------.- The Simple RegressionModel '\., Chapter 2 \.", - F,t' ,,tz,r1711 Fo: I ## ++* Therefore,oncewe havethe slopeestimate ceptestimateBo,given! and.X' pr, it is sUaighttbrwarcl ## Droppingtlren-, in(2.15) (sinceit doesnot iffect to obtainthe inter- tl]esolutiorr)an<ipluggirrg(2.17) *i;r irji,-,:1 i,: into (2.15)Yields - (t - F,x)- F'x,) : 0 i{'' I ",,r, ' .,f .' ,uai', r". 1,,. ...' ,,i,: i1 which, upon rcarrangclllcul',givcs t' t i'lrr !:l S -/-t ^" ri\.t ., i - ii\ = .\ l F,)x,("r,-r). 'l:,i Frornbasicpropertiesofthesumnrationoperatorlsee(A.7)and(A.8)]' rt rl sl nrn r t - Z.r,(.r, x) i=r == i ( * , - ; ) z a n d ) ; , t . r , , - . r ) : i:r ) (",-.r)(t,r-l'). ,:r.. ffi* i= | rirra Therefore, Providedthit t. ' ' .l - ) ,", ;)'> 0, i'.1"} ,ff'ti ,* . ir': i "r: dreestimatedsioPeis i-il j Q,- .-r1(]i- -v) il ## Equation(Z.fq) ii simplythe samplecovariance between;r andy dividedby tirp.sam- andthe denominator pliuuriun." of x. (SeeAppen4ixC. Diviclingboth the nun'lerator equals th(]populationcovari- [v ,, _ I changesnotr,ing,trhis makessersebecause B, : of r when E(u) :0 and cov(x,u) 0. An irnrnediate ancedividedby the vari"ance p, is positive; in correlatecl the sample' then implicationis thatif .r andy arepositively^ arenegativelycorrelated, if i' and"v.. thenB'is negative' thernethod for obtaining (2.17)and(2.19)is motivated by (2'6)'theonly Although to cornpute theestintates tbr a particular sample is (2'l8)' This is arsu*ptio,ineeded thex, at a1t (2.18) is trueproviclecl in the sample arenot all equal hudly an assumption unlucky in obtainingour to the samevatue.lr (?.18)fails, thenwe haveeitherbeen (-rdoesnot an interesting.problem samptefrom the populationor we havenot specified : (2'18) = For example,tf y wageand-t' etluc't\en fails only vary in tirepopulation'). (For example' everyone if if everyOnein the samplehasthe san]eamountof eclucation' SeeFigure2.3.)If just onepeIsonirasa cliff'erent amountof is a high schoolgraduate. thena2.18)holds,andtireOLS estitnates canbe computed' educat]on, 29 Data RegressionAnalysiswith Cross-Sectional Par{ { Figuro 2,3 A*tt"tpl"i;f wage againsteducationwhen educ,= 12 {or all i' if squares The estimatesgiven in Q.l7) and (2.19)are calledthe ordina^ryleast (OLS) estimatesof prand F,.To justify this na're, for any [i,and B,' definea fitted Yaluefor Y whenx : .ti SUCh 3s f (t,hi.i , ': !,= Fo*8,x,, : There for thegiveninterceptandslope.This is the valuewe predictfor y when;r Jt' is a trttedvaluefor eachobgervationin thesample'Theresidualfor observationis the i differencebctweerrthe actqal)'i andits fittedvalue: -t ' ## fii= !,- !; : )',- Fo- F,r,. T - a 1 ', ;.(2.:t)- l' Again, thereare /l suchresiduals.(TheseNe not the sameasthe elrorsin (2'9)' a point *i ,.rurn to in SectionZ.-l.i)fne fittedvaluesandresiduaisareindicatedin Figure2'4' Now, suppose we choosepr andp, to makethesum of squaredresiduals' inn i>t"=)0, I i:1 i:l ' &'- F,x,)', ,'*l The SimPleRegressionModel Chapter 2 i , t l ; '. , ' l ,, .. Figure'2.4 g : P o+ P ' x 0r: residuq! " t' assmaltaspossible.Theappenclixtotlrischaptersirowsthatthecondidonsnecessary (2' 14)and(2'15)'withoul give-nexactiyby equarions ror ip,),p,;to minimize1i'Zz) ;" OLS caffeOtitefirst order conditionsfor the n-1. Equations(2.14)u* 1Z'iSlaie often A)' From usingcalculus(see,Appendix esdmates, a termtnatcomesfrom optimization {irst orderconditions we know-ttratthe solutionito the oLS our previouscatcutauons, from the fact name"ordinaryleastsquares"comes aregivenby (2.J7)unJ ii.rql. The :, .t'.a. t:.;: i UtuJtf,"r. minimize the sum of squaredresiduals' .t,, "rtl*it., interceptand slope estimates,we form theoLS once we havecietermined theoLS regression line: !=F,,+Pt'r' tz.zsi :. ] p,have beenobtainedusingequations(2'17) and whereit is understooorirui lrr'and thatthepredictedvaiuesfrom equa- (2.t9). The notariont, r; ur'..yttui;; emphasizes : 0' is the predictedvalue .of y when 'r tion (2.23) are estimates'The intercept'Bo' is not' senseto set:l-: 0. In thosesituations'8,, althoughin somecasesit *itt not make initself,veryinterestlng.Whenusing(2.23)tocomputepredictedvaluesofl'forvari- Equation(2'23) ttreinterceptin the calculations' ous valuesof x, we *uii o.rount foi version is alsocalledthe ,u*pr" *gr*ssion functiol tsirnl becauseit is theestimated g(ll.r)': is important to remember of the populationregressioifunction Fo t F,x.It pRF is sometSing fixed. but unknown,'ln trrepopuiation'since the sRF is thar rhe lyl Analysis Regression Data with Cross-Sectional Pa|{ I a differentslopeand obtflined1bra givensantpleof dau, a n()wsanple will generate intercept in equation(2.23). lrr rlost casesths slopeestinate,whichwe cill wl-ltcaij B,: A/Ax, by increases is of primaryinterest.It tells us the amountby which i changeswhenr one unit. EquivalentlY' ## '" A! = B,Ar, (2.251 ' '-'"' -: ## so that givenany changein .l (whetherpositiveor negadve), we cancomputetirePre- dictedchangein Y. We now preseirtseveralexamplesof simpleregressiotl obtainedby usingrealdata' (2'17) and In otherwords,we ftnd tlle interceptanclslopcestimatcswith cquations (2.19).Sincetheseexamplesinvolvemany observations, the calculationswere done uSingan econometric sot'twarepackagc.At tlrispoint, you should bc carcfulnot to read too rluch into thsseregressions;thcyiue not uccessarily uncovering a causalrelation- ship.We havesaidnothingsofar aboutthestatisticalproperties of OLS' In Section2'5' rveconsiderstatisticalpropertiesafler we explicitlyimpose assumptions on the popu- lationmodelequation (2'1). 6XlhMpl"8 3 3 (CEO SalarY and Return on EquitY) ## Forthe populationof chiefexecutive officers,let y be annualsalary(salary) in thousandsof d o l l a r sT. h u s y, : 8 5 6 . 3i n d i c a t easn a n n u a sl a l a r yo f 8 5 6 , 3 0 0a, n d y : 1 4 5 2 ' 6i n d i c a t e s a s a l a r yo f 1 , 4 5 2 , 6 0 0L. e tx b e t h e a v e r a g er e t u r ne q u i t y ( r o e ) f o rt h e C E o ' sf i r m f o r t h e previousthreeyears.(Returnon equityis definedin termsof net incomeas a percentage o f c o m m o ne q u i t y .F) o re x a m p l ei ,f r o e: 1 0 , t h e na v e r a g e r e t u r no n e q u i t yi s 1 0 p e r c e n t ' To study the relationship betweenthis measureof firm performance and CEOcom- thesimplemodel we postulate pensation, : Brr* Bttoe-f u. salary- thechange in annual salary inthousands when of dollars, Theslopeparameter B1measures returnon equityincreasesby onepercentage point.Because a higherroeis good for the companY, we thinkFt > 0. ThedatasetCEOSAL1.RAW containsinformation on209CEOs fortheyear1990;these datawereobtained fromSusinessWeek(5/6/91). ln thissample, theaverage annualsalary ;L'r:i is1,281,120,with thesmallestand being largest 223,000 and 514,822,000,respective- 4t I'ii: years'1988,1989,and1990is 17.18percent, with ly.Theaverage returnon for equity the thesmallestandIargest valuesbeing0.5and56.3percent, respectively. Usingthe datain CEOSALl.RAW, the OL5 regressionline relatingsalaryloroeis ## sctla,t= 963.191+ 18.50ircc, (2,26), 32 ,1., Model TheSimpleRegression Chapter 2 ,:: tt andslopeestimates havebeenrounded to threedecimal places; we l r, wherethe intercept the thatthisis an estimated equation. Howdo we interpret ],, use,,salaryhat,, to indicate i equation?First,ifthereturnonequityiszerc,roe:0,thenthepredictedsalaryistheinter- ffii.,., c e p t , 9 6 3 ' 1 9 1 . w h i c h e q u a | s 9 6 3 , 1 9 1 ' i n . . s a l a r y i sroe:m e Asaiary: a s u r e d i 18 n t h501 ousands.Next,wecan writethe predicted changein salary asa functionof the changein ,:;:l:",' iaro"l.Thismeansthat iithe return on equityincreases by onepercentage point,Aroe: Because (2.26)is a linear 1uiri, 1,then salaryispredided io change by about18.5,or 18,500. tfrl ls tfreestimated changeregardless of the initialsalary' equation, W e c a n e a s i | y u s e ( 2 . 2 6 ) t o c o m p a r e p r e d i :c t e d s a l a r i e s a tis differentva|uesofroe. 1518'221' which justover Suppose roe : 3b' Thensaiary: 963'191+ 18 501(30) meanthat a particular CEOwhosefirm had an 1.5 million.However,,fi'iot' not that affectsalary' Thisisjust other factors roe - 30 earns1,518,22LTherearemany in Fig- tineiz.za). Theestimated lineis graphed our prediction fromtn.bLs regression u r e 2 . 5 , a | o n g w i t h t h e p o p u | a t i o n r e g r e s s i o n f u n c t i o n E sample ( s a t a rof y | data r o e )will 'WewiIlneverknow PRFAnother the pRF, so we cannortell'howclosethe sRFis to the regres- Iine,whichmayor maynot be closer to the population givea differentregression sionline. Fiqure 2.5 populatlon + 18'50roeandthe (unknown) {unction. regression sarary Saafy = YbJ' lY 1 + ? ^-r.^ roe 18,501 \-,z I i,t, iti*:; iilr.i ,l::r:. 963.'191 ;. i::]]l i..'.i ;;, -r il t . , '. , , ',' 1 . . : ; 1 , , , ' ; - r -, i ilr.,i,;:,,: ,:, !"t"'t' lr, .r:.. : . 33 !r,iii"',',' i:;a:,1, ,. i;'- lir-,i' 1;1,'-' i.::rj.,r.,.' . : :,, ,' :;, i;t;:,,., Data RegressionAnalysiswith Cross-Sectlonal Part { EXAb'!*3*K X 4 (Wage and Education) - wage,wherewage is mea- Forthe populationof peoplein'the work f orcein 1976,lety f s u r e di n d o l l a r sp e r h o u r .T h u s , o r a p a r t i c u l apr e r s o ni,I w a g e : 6 . 7 5 , t h e h o u r l yw a g e i s 6 . 7 5 .L e tx : e d u cd e n o t e y e a r so f s c h o o l i n gf o ; r e x a m p l ee, d u c : 1 2 c o r r e s p o n dt os a completehigh schooleducation. Since the average wage in the sampleis 5 90, the con- sumerpricelndexindicates that this amount is equivalent to 16'64 in 1997dollars' Usingthe datain WAGEl.RAWwheren : 526 individuals, we obtain the followingOLS regression line(or sampleregression function): ## wdge: -0.90 + 0.54educ. ti*i we mustinterpret thisequation Theintercept with caution. of -0.90 literallymeans thata hasa predictedhourlywage of -90 cents an hour'This, of personwith no education issilly.lt turnsoutthatno onein thesample course, hasless than eightyears of education' whichhelpsto explain for.a zeroeducation the crazyprediction value.Fora personwith erghtyearsof education, thepredicted wage is w6ge = -0.90 + 0.54(8) : 3.42' or 3.42 perhour (in 1976 dollars). Theslopeestimate in (2,27)implies that onemore year of education increaseshourly wageby 54 centsan hour.Therefore, four morevearsof education increase the pre- of dictedwage by a(0.5a)= 2.16 or 2.16 per hour.Thesearefairlylargeeffects.Because the linear natureof (2.27),anotheryear of educationincreases the wage by the same 2 . 4 , w e d i s c u ssso m em e t h - l f e d u c a t i o nI .n S e c t i o n a m o u n t ,r e g a r d l e sosf t h e i n i t i a l e v e o ods that allowfor nonconstant marginaleffectsof our explanatory variables' EXJIMPLF N (Voting Outcomes and Campaign Expenditures). ## Thefile voTEl.RAWcontains dataon election outcomes andcampaign fot' expend'itures 173two-partyracesfor the U.S.Houseof Representatives in 1988.Therearetwo candi- datesin each race,A andB.Lgt voteAbe the percentageof the byCandidate votereceived A and shareA be the the pdrcentageof total campaign expendituresaccountedfor by Candidate A. Many factorsqtherthan shareA the affect electionoutcome (including the qualityof the candidates the and]possibly dollar amountsspent by A andB).Nevertheless, we canestimate a simplereg[essionmodelto findout whetherspending to morerelative oneschallenger impliesa hig[erpercentage of thevote. Theestimated equation the 173observations rJsing is I j ## ip?el = 40.90+ 0.306shnreA. ,.:er:i As expenditures Thismeansthat,if the shar!of Candidate by onepercent- increases agepoint, A receives,almost Candidate pointmore0f the of a percentage one'third !r4 '. Model TheSimPleRegression - :. GhaPter 2 .' ili#,i!',,.,:.' :I : : .. :. m i ^gL +h t ^ewx^ 6' aet c t ' *l 1,' li',,'t' ' but to simply .. -^^-,,.,i.innrnqlvsisis not usedto cleterminecausality i',"i ,"";r.ii^fir;l'J;;'"il,r;ilil:ffiJi#J##fflj*i;l x*x;J':i1t.l, 0ccursin Problem2'12'where.Iou-T: -3 e u E s r r o ru 2 \ ;;; to us3^11ta ol ii::"::1ii:"::: spent sreeping -*orr.ingtrooo) ij-am"'mesf' r.r,T,nL,Lnlo**1r:j:lT":,:iil1*"Li::fj rnExampre I I reasonablc? lme invcstigatc thc tradeofl b;;s thisanswerseem = 60 (whichmeans60 pJ.I;itr on,r to between thesetwo factors' A ilote on TerminolgY sakeof brevity'it is Inmostcases,wewillindicatetheestimationofarelationshipthrouglroLSbywriting as fi'i'i,'A.iil, or (2.28).Stot,i*"t, for the an equatiorr such regressi"" withoutactuallywritingout the usefulto indicate that an OLS 1i:-;;;un eouation.Wewilloftenirrdicatethatequauon-(2'.23)hasbeenobtainedbyoLSinsay- o7 ing that we run the reg'ression : . : . . ,, (2.2e) )onr, o r s i r r r p l y t l t t t t ' w c r c s r c s i y o l l , r . T l r c prc't t l svariable i t i o r t s:owe f t , always a n c l x rcgrcss 9 ) i tlcpcn- i n ( 2 ' 2 thc ndicatewlrichisthe clcpcnclcnt variable *ntlir i* ,r,. in,r"p"n we rcplacc I andx "no var.iablc. r,u,^rj,"Jii.applications, dentvariabie on urerndepencJcnt on I?c or to obtain (2'28)' (2'26)' *t with their names.Tlrus,io obtain "g"" "'l'l' -'l;T::'#f*:l#i;'iinotogv pran toesti- 'topt'%'Tl,:'Ii:l',i:::,1'l'we for thevast marctheintercept, *iir' Bo,"oiong "'(';??): 'nt B'' ri.ti' to*Lit Ti1:ltiott lelationship the &tu'ion']l1l L", ilt" *i"!' t: :iTi: majoriryof appiicatlon'' = 0); intercept .r assuningthat the is zero(sothat'r : 0 impliesthatI )' arTd beiween wealways otherwise' .*pii.it'tystated i.o. Untess this wc cover case ort.tjrtt"'i."i"" with a slope' es[itnatean intercept along ll', 2.3 nlEcHAlulcq oF oLs i*'it' ln rhiscction, *" sontealgebraic "oul'thinkabout nrory111::*'.i],tl:,t" properties rrrese trrarthevarercatures ?::,i:it:::'iJ"J,1li; istorsalize iltii i:,:ffiffi:Iirl,i":r"i; of ols for a piulicutarlampleof with thestatisticulprop' Theycanbe contrasted clata, of tlreesti- [i.;i' (ai i,i':r:::. ertiesof ol-s, which requires deriving teaturesoi tt e sampiingdistributions matofs.WewilldiscussstatisticaipropertiesirrSection2.5. will appearmundane' iirri.j. Scru:d of thc u\gttniic pro6erties\\e rue going to deri-ve ## Fiiit' the OLS esrimares o"i i"foi.O in cefiainways' Nevertlreless,havingagraspoft}reseproperuesheip.o.toigure'txrst'iih:ppe*sro when the dataaremanipulated statistici viuiableschange' mclin<lcpcnclent unitsof theclepcnclcnt [it- i ,tl,' suchaswhenthe*"uJura*an, 35 l i, i'' i .',.,, 'if'" Data RegressionAnalysiswith Cross-Sectional Part I ## Fitiled Values and Residuals we assumethattheinterceptandslopeestimates, p, anopl' havebeenobtainedfor the obser- given sampieof daU. Given p6and Fr, we can .btain the fitted value!, for each iation. iThis is givenlry equationQ.20)]By definition,each frttedvalue of !, on the is with observation i, it, is *re differ- OLS relressionline.ffre blS residualassociateci v, andits fittedvalue,asgivenin equation Q'21).lf is r?, positive' theline encebJtween theline overpredicts il; if fi,is negative, );.The ideal case for observation underpredicts i is wiren tti = 0, but in mostcases €veu is residual not equalto zero'In otherwords' noneof thedatapointsmustactuallylie on theOLS line' E}C,l.M$$!.s x s (CEO SalarY and Return on EquitY) in theCEOdataset,alongwith the listingof thefirst15 observations Table2.2 contains-a calledsalaryhat, fittedvalues, and the uhat. called residuals, Thble 2.2 for the First15 CEOs andResiduals FittedValues ## . obstto, roa salary salaryhat uhat ## r 09_5 r224.058 - r29.058 i I l4.l ## r00l 1r 64.854 -163.8542 2 10.9 ## 1122 t397.969 -2'.7s.9692 4 23.5 t012.348 -494.3484 A T 5.9 578 ## 5 I -1.6 1368 1218.508 t49.4923 ## - 114-5 l 333.2I .5 - t88.21s{ 6 20.0 ti,:. l 078 r 266.6I I - r8 8 . 6 1 0 8 7 16.4 t264.761 - 170.7606 8 I O.-1 1094 '79.s4626 9 l t l \ 1157.454 ## 833 t449.713 -616:1726 i0 /.o.3 ## 567 1M2.312 -875.3721 1l 25.9 t459.023 -526.0231 tz 26.8 933 continued 36 Clraptcr 2 The Simple RegressionModel ## Tbble 2.2 (concludedl ## obsno roe salary salaryhat ahat :: \. r4 .8 l 339 t237.00e l 0 r . 9 9l l \ l-) \ 937 1375.768 -438.16-78 t4 ## i5 56.3 2OIT 2004.808 6.r91895 ## Thefirstfour CEOshavelowersalaries than what we predictedf rom the OLSregression line ( 2 . 2 6 ) ;i n o t h e rw o r d s ,g i v e no n l yt h e f i r m ' sr o e ,t h e s eC E O sm a k el e s st h a nw h a t w e p r e - dicted.As can be seenfrom the positiveuhaf, the fifth CEOmakesmorethan predicted line. from the OL5 regression ## /llgebraic Properties of OLS Statistics Thereareseveralusefulalgebraicproperties of OLS estimatesandfteir associatedsta- tistics.We now coverthe threemostimportantof these' ( I ) The sum,andthereforethe sampleaverageof theoLS residuals,is zero. Mathematically, \ , r -uin - w. (2.3O) ,LJ r ## This properryneeclsno proof; it follows immediateiyfrom the OLS first ordercondi- tion (2.14),whenwe remember thatthe residuals by f' : .l',- F,,- P,.r,. areciehne<J In otherwords,theOLS estimates BuandB, arechosanto make theresiclualsaddup to This saysnotiringabouttheresidualfor any zero(for any data-set). particular observa- tion i. (2) The samplecovariance betweenthe regressors and the OLS residualsis zero. This fbllowsliom thefirst ordercondition(2.15),whichcanbe writtenin termsof the residuals as s\ Z xiui: U, {,.;t The sampleaverageof the OLS residuztls is zero,so the left handsideof (2.31) is pro- portionalto the samplecovariance between-lrandi,. (3) The point (-r-',.v.) is alwayson the OLS regression line.In otherwords,if we tike equation(2.23) andplug in 7,for x, thenthe predictedvalueis !. This is exactlywhat l.i equation(2.16)sitowsus. 77 i; i Part I ReoressionAnalvsiswith Cross-Sectional Data ffiK&rvgtrilffi s"F (Wage and Education) F o rt h e d a t ai n W A G E l . R A Wt h e a v e r a g eh o u r l yw a g e i n t h e s a m p l ei s 5 . 9 0 ,r o u n d e dt o t w o d e c i m apl l a c e sa, n d t h e a v e r a g e d u c a t i o n i s 1 2 . 5 6 .l f w e p l u ge d u c : 1 2 . 5 6i n t ot h e O L Sr e g r e s s i o n l t n e ( 2 , 2w 7e ),get wAge: - 0 . 9 0 + 0 , 5 4 ( 1 2 . 5 6 ) :5 . 8 8 2 4w , h i c he q u a l s 5 . 9 w h e n r o u n d e dt o t h e f i r s td e c i m a l p l a c e . d T h e r e a s o nt h e s ef i g u r e s o n o t e x a c t la ygree isthat we haveroundedthe averagewage and education, aswell asthe intercept slope and e s t i m a t e sl f,w e d i d n o t l n i t i a l lryo u n da n yo f t h e v a l u e sw, e w o u l dg e t t h ea n s w e rtso a g r e e more closely,but thls practicehas littleuseful effect. ## its fitted value,pllrs its rcsidual,providcsanotllcrway to intcpret Writing each-I,-as an OLS regression.For each j, write ## !i: !i* iti. t2.33) From property(1) above,theaverageof theresidualsis zero;equivalently, the sample as averageof the fittecJvalues,.v;,is the sarl'le the average sarrrple of the y,, or i : T. (l) Further,properties and (2) can be usedto show that the sample covariance betweeni, and 0ris zero.Thus,we can view OLS as decomposin-e eachvr into two parts,:a fitted valueand a residual.The fitted valuesandresidualsareurrcorr:elated in the sample. Detrnethetotal sum of squarcs(SST),thecxplainedsum of stluares(SSE),and the residualsum of squares(SSR)(alsoknownas the sumof squaredresiduals), as follows: S S T= \1 ffii s\ ,^ l S S E= {2.34} I ssn=>r;'. n ;il$l
SST is a measureof tne rotatsamplevarialionin the ,r,; that is. it measureshow spreitd
out the y, are in the sample.If we divide SST by n - 1. we obtainthe samplevariance
of 'y, as discussedin Appendix C. Similarly,SSE measurestlte sarnplevuiation in tJte
: rr),lncl SSR measuresthe samplevariationin tire fi,.
i,(where we usethe facr thati
The total variation in i, can always be expressedas the sum of the explainedvaliation
and the unexpiainecl variationSSR.Thus,
a . : ' t ; : ' : 1 ' 1 ' \t t ) t

S S T : S S E+ S S R . {ft16|q

3A
T h e S i m P I eR e g r e s s i oM
n odel
ChaPter 2

of the sum-
diftjcuil' but it requirelus to useall ol the ProPertics
Proving(2.36)is not
ApperrdixA' Wrile
n
-.r S
S r .\Ji' - lLY,-YJ+Sr-l)l'
Z,/

:)ti,+()r-)-,)lr
tl
,, -'l , S r.r - '-,\:
: ) n i + 2 > r i ' ( i , -) ) - r ZJ\)i .')

il
- !) + ssE'
: sSR+ z> r?r()'i

that
Now (2.36)holdsif we show

S .rra -
uiUi
r:\ = 0 iz.iit r
4 'r/

B u t w e h a v e a l r e a c l y c l a i m e d t h a t t h e s air.iuriti:il
n p i e c o v a l i i<livicleci twn
r n c e b eby e e- n1' d u ahave
e r e s iwe
t l rTirr'rs' lsandthe
is zero,#'il;
firtedvatues .-ouuri^,.,.*

## SSr'stlllj3,.Y': quantities isnounirorm

"."3:#t-ffff't 'ou'ion
the three ]l'':,',',:l
fi're in equa[ons
defined
on dre
agreement "un.r"or-uUJ*iations-for SSTorTSS'sothere
it torlttreiilrer
(t.33),(2.34),ono
fz':!l'ih;'"t'l :"i:t:.1Y'** sumof squaresis somedmes called
*:,T?]ttneci
is little confusionhere'Unfortunatel't it caneas-
abbreviation'
sum ot squares'" liven its natural ref'er
the "regression lf.ti"::"tl^t: packages
with gretermresidualsumof iquart:: S"1t't^::'gression
ily be confusecl sumof squa'res"'
tt,tht
to the explainedsumoi 'quuttt :.1:.1:lslu]nof 'quttt is oftencalledthe "enor
residual
ttre
To makenattersevenworse' as we wil'l seein Section2'5'
i, .rp..i^fly unfortunat;;ilJ.
sum of squares.,,{hi, call (2'35)the
resrduals'are ditferent q"rtttit.r. inrr, *t will aiways
rheerrorsandrhe Wc preferto usetheabbrevia-
sumof 'qu""ti'"''Ouals'
residualsumof 'quo"'o' the because it is morecommonin econo-
residuals,
tion SSRm denote,;r;;; ";;quarecl
metricPackages' l

Goodness-of'Fit I variable'
n:Y theexpianatory or 1ndePenclent
no way of measuring sum-
So tar,we have lttt to compute a number that
variaute,u. it i, olten usefirl
x, explainsme oepe#eni li-. ;;;;;;oin. rn trre toitowingdiscussion' be
rhe ol-s regression
marizeshow weil alongwith the slope'
that iin *io..pt is esdmated
sufeto rerlemberu]atwe u*.ur* i, no. equalto zero-which
iS true
ot,qu*."-isi
Assumingtnat tne total Sum can divide
all-tle 1' equaitire sane value-we
exceptin the very ""tt*ttt """"ttft'i
( 2 . 3 6 ) b v S S T t o r ; ; ' i l ' i s g s s r + s s r v d s r ' T h eisRdefined
- s q u aas
redoftheregresslon'
of determination'
sometimescaUeOtf'!""oefficient
g9
-'T-

Para I Regression
Analysis pata
with Cross-Sectional

= I - SSruSST.
R2:.SSE/SST

## R2is the ratio of the explainedvariationcomparedto the totalvariation,andthusit is

interpretedasLheJraction of th.esantplevariution irty thcttis e.xplainedDy.l, The sec-
ond equalityin (2,38)providesanotherway fbr computingR2.
From (2.36),the valueof .R2is alwaysbetweenzeroandone.sinceSSEcanbe no
greaterthanSST.when interpreting R?,we usuallymultiplyit by 100to changeit into
a percenf 100'R2is thepercentageoJthe santplevaricftionin tt that is explainedbx-x.
If the datapointsall lie on the santeline, OLS providesa perf'ect. fi1.tr: thedata.In
this case,R2= l.,A valueof R2that is nearlyequalto zeroinclicates a poor fit of the
oLS line: verylittle of thevariationin they, is capturetlby thevariationin thei, (which
all lie on theOLS regression line).In fact,it canbe shownthatRr is equalto thesquure
of the samplecon'elationcoefficientbetween.r, and i,. Tlris is where the term
"R-squared" camefiom. (TheletterR wastraditionallyuseclto clenote an estimateof a
populationconelationcoefficient,arrdits usagehassr"uvived in regression analysis.)

HXAM$}T-H } 8 (CEO Salary and Return on Equity) In the CEOsalaryregression, we obtainthe following: : 963.191+18.501 saiary roe n : 2 0 9R , 2: 0.0132 We havereproduced the OLSregression lineand the numberof observatrons for clarity. Usingthe R-squared (rounded to four decimalplaces) reported for thisequation,we can seehow muchof the variation in salaryis actually explained by the returnon equity.The answer is:notmuch.Thefirmsreturnon equityexplains onlyabout1.3o/o of thevariation in salaries for thissampleof 209CEOs. Thatmeansthat98Jo/o of thesalaryvariationsfor theseCEOsis left unexplained!Thislackof explanatory powermaynot be too surprising sincethereare manyothercharacteristics of both the firm and the indjvidualCEOthat shouldinfluence'salary; thesefactorsare necessarily included in the errorsin,a simple rdgressionanalysis. I In the sociaisciences, low R-squareds in regression equationsarenot uncomrnon, especiallyfor cross-sectionql analysis. We wiii discussthis issuemoregenerallyuncler rnultipleregressionanalysis,but it is worthemphasizing now rhata seeminglylow R- squareddoesnot necessarily meanthatan OLS regression equationis useless. It is still possiblethat(2.39)is a goodestimateof theceterisparibusrelationship betweenscla4y$i" androe; whetheror not this is uue cloesnot dependdirectlyon thesizeof R-squared.
Studentswho arefirst learnipgeconometrics tendto put too muchweighton the sizeof
the R-squaredin evaluati4gregressionequations.For now, be awarethat using
R-squared asthemaingaugeof success for an econometric analysiscanleadto trouble.

## il Sometimesthe explanatoryvariableexplainsa substantialpart of the szunplevaria-

tion in thedependent

4()
variable.

i;
ffi'"'
t.
'
::,t ''
Model
The SimPleRegression
GhaPter 2

? s
EKAmp*'a
Campaign Expenditures)
(Vottng Outcomes and
expen-
rt
equation (.2 o sos Thus,theshareof camPaign
outcome
lnthevotins ]u]J I::variation outcomes
intheelection for thissam-
just 50
over percentof the
explains
ditures
portion'
ple.Thisis a fairlysizable

AND FUTUCTTOIUAI-
2.4 UNTTS OF MEASURE]NEilT
FORM the
how clranging
t:on:*-i:'"-1re( L) unclerstanding
issuesin appliecl OLSesti-
Two important unOlorlni.p*Otntlalablesaffects
;i ;h; dependent
unirsof nleasur.cmen,
(2)knowins
lurd
matos i"':'l-"-i1'^.lT:)fjT;'fili"""'fr:[:::jil':l"tT[:
rrow.to oI runc-
lffiiffi J[.CIoio'atu' u'ircrstancrirtg
H::.,:#lts,lff;il:J1# in AppendixA'
revieweci
,i*tt r?,* issuesis
of Ghanging Units of Measurement on OLs
The Effects
Statistics
I n E x a m p l e 2 . 3 , w e c h o s e t o n r ep.rr"nt.Gu'ttuinon
a s u f e a n n u a l s a lu'a raydecimal)' " s . ^ o fto
a nisdcrucial
i n t h o u s lt dollars,andthe
equiry was #il;J;r u sense of the
reruril on in order to make
in trri* l*o*pr.
know how salary and;;;;;ruied
;*ttl"?ff;;:'inf wavs
inenrirerv,.:1.:,,:d the
when
e$imares ,11l"LS lfange ln cirange' Examplc unoiffi"'de,'t variables. lt ln uni$ of measuremenr;;';;;;o;nJrntsalaryin tnousands of dollars'wemeasure
suppose raUrJr
tfrat, ntfo'u'ing
tfro't be interpreted as
2.3, :845'?61 wouid
Letsalardolbesalary,in measured in
doilars' .Oot'*l,911'idot
a^simpie to the salary
ieiationsnip
ot course,sataic,sltas run the
$g45,?61.). wt ;" not need'to actually thousands -i"ia"i = r'ooo'snlary' or oouars'ls is: thattheestimatedequation reEession of salardol;;;;";"know i ',,..,,",',,,',,*',. interceptandtlre in (24g|:i*ply bv multip|l:'i^'nt Weobtaintheinterceptandslope salarvis slopein(2.39)byr,o6o.rpi,givesequaticl"'iilql""d(240)the.tcnteinterpretation. 6, ,n"".'rr1a'rr)":',16:,i9r' t-q.t.,predicted roor<ingat (2.40),ii"rnii,) (21?Jl Furthermore' if roe [the same ;;J *; obbinedfrom equation thisis whatwe$963,191 i;;;.;;;t by $18,501; again' by onc,*""'ir. brJtcicclsalary i'creascs concludedt'o* o* *'tiei analysis of equation 'J#il:: Q'39)' * ffii ;;'-;d111n5;m:ilT:ti:iffi i,r,'ilv' rv, erar :,,l : Gen bv each multiplied varueintr'''umft"is Xfl,ffi,i:fiff:Tf,[1T.--'fnl,:F ,"i.r.if ^"0 ,top. arealso by c' (Thisassumes nultiplied : c-rhsn rheOLS "rti*uils c in *recEO salarvexanrple' r,* .r,ong.i-;i;;i;; inoepenoent*"uianr".l nothing to solatdol' ;;66 ;" movingirom snlar-v 41 Pal't I RegressionAnalysiswith Cross-Sectional Data We can also use the CEO salary cxaruplc t.oscc what happcnswhen.we cirange the unitsof measurement of the indepen- .+'r dent variable.Define roedec - roel100 ,a- QuEsrlolu 2 - 4 to be the decimalequivalentof nre; thus, ilt,' iuooor" that salaryis measured in hundredsof dollars,rdher than roedec= 0.23meansa retun on equityof in'ihousandsof dollars,saysalarhun.Whatwill be the OLSintercept 23 percem.To focuson changin_q theunits ind slopeestimatesin the ol regression salarhun on roe? I i of measurement of the independent vari- ablc,wc rcturnlo oul originaldepcnclcnt wltich is measupdin thtlusandsof dollars.When we regresssalary-on variable,salary,, roedec,we obtain salary)1963.191 + I850.1roedcc. (2.4'U ot roedecis 100!irncs The coelllcietrL tirecoethcier)Lon /irc in (2.39).'l'hisis as ir shouldbe.Changingroeby one pfintage pointis to \roedec- 0.01.From equivalent (2.41),ifA,roedec:0.0i, thenfri/an : : 18-50.1(0.01)18.501, whichis whatis obrainedby usin-eQ.39).Note tir4 in noving froni (2.39)to (2.41),the independent ',ri .:i* variablewasclividedby 100,ands(tlreOLS slopeeslimatewasmultipliedby 100,pre- servingthe interpretationof tire Quation.Cenerally,if the independerrt variable.is dividedor rnultipliedby somenorferoconstant,c, thenthe OLS slopecoefficientis by c cspptively. alsomultipliedor cJivicled The intercepthasnot changdi{(2.4i) because roedec: 0 still corresponds to a zero return on equity.In generil,chqging the units 01'nreasurement of only the inde- pendentvariabledoes not affertthe i\ercept. In the previoussection.ve defir\dR-squareci as a goodness-of-fit nreasure lbr OLS rcgrcssion.Wc catr alscask wlti\ happcnsto .It2whcn thc uniI ol' nictrsurcnicnt of either the independentorthe depen$ntvariablechanges.Without doing any alge-
bra, we shoul6know the rerrlt:the goolpess-ot'-tit
of the model shoultJnor depenclon
the units of measuremenlof our variabps. For e.rample,the amountof variationin
salary,explainedby therrturnon equity,\houldnotdependon whethersalaryis mea-
of dollar$r on wiretirer suredin dollarsor rn thcusands returnon equity'is a pcrccnt or a decimai.Thisjntuitoncan be veritledrnathematically: usingthedetrnitionof R2, it canbe shownthatRzis, in fact,invariantto changes in the unitsof y or.r. ## tncorporating Nonlinearities \r Simple Regression n linearrelationsliplbetween So far we havefocused thedependent andindependenr variablcs. in Chapteri,lin\r rclationships As wc nrenliurcd arc not ncarlygcncral ## nonlinearidesinto sim[e regressionanallris propriatelydefiningthe dependent . and independent variates.Hererve rvill tovert\ possibilitiesthat often appearin :". .t. fi:',: appliedwork. In readingappliedvorkin thesocialscences. rvill oftenencounter ;fii! ll,',, |ii"f1;,r,;; regression ii;'l'':, equationswherethedoendent variableappears in tithmicfonn.Why is thisdone? il,'.,,,, '.: Recallthe wage-educdon example,rvherewe ly u'ageon yearsof edu- i).i...: . cation.We obtained islopeestimate of 0.:4 lsee equat (2.27)),which ureansthat iii,' 5;:;:li:'.' each year:f additional educationis predicted to rly wage by 54 cents. i,' 1 , :" ) , i:..'1. ,1 a ,'. ll. l :11: Model TheSimpleRegression Chapter 2 is theincrcasclbrcitherthe tlrstyearof Becauseof thelinearnatureof (2.21),54cents education -- or the twentiethyear;this may not be reasonable' i4prease in wageis il"esamegivenonemore Suppora,instead,thatihepercentoge increase:thepeF a constantpercmtage y.u, ot iJo.ation. ldodel(2.2i) ooesrol.iinpiy (approxintately)a oependion the initkfi wage. A model thatgives centageincreases constantpercentage effectis ## log(w@e)= Fo * Preduc+ u, 1**1, wherelog(.)denotes|henaturrtllogarithm.(SeeApperrdftAforareviewofioga. :1C, then r,ithms.)In particular,if Au 1;,. (100'B,)Aeduc.f %Lwage,e' percentagecpngein wage gtvenorreaddi- Nodcehow we multiply B1by i00 to get the tionnty.o,ofeducati.on:$^ncettepefcentage.changeinr&seistheSameforeaciraddi.
iionofVt* of educa.tion. ^.y?!:.fbr an extrafearo1cducationincrcaseszrs
ihp'fl*ng" returnto education'
educationincreases: in ';thrj *otdt' (2'42) inipliesan trfrcaslrlS
;;;;;;;*riating.(?.n2), !1/ecanwrirc wcge : exP(Fo+B'educ* a)' This equation
ls'grapneO in Figule';'6" wirh tt : 0'

F i g- u r e J r . rr-. f -
'iig"
.
with B, > o'
= ex4(loi Breduc),
'waQeI
: , t:
Data
RegressionAnaiysiswltlr Cross-sectional
Part t

whenusingsimplel'egressron'
Estimatinga modelsuchas(2.42)is suaightlbrward variableis
variable,y, to be I : log(wage). The independent
Justdefinethedepenclent as before:tlre intercept
the same
;;;A by r : edic.Tnemechanicsof oLS :ue words' we
(2'l?) andQ'19)' In other
and slopeestimates.r.giutn by the fbrmulas
p, tio* th;ols regression of log(rvagc)ot edttc'
ilJ; A-;

ffix&fiftff}Ftuffi & 
(A Lo9 Wagc tquation)
l o g ( w a q ea)s t h ed e p e n d e n t v a r i a bwi ee,
U s i n gt h e s a m ed a t aa s i n E x a m p l2e. 4 , b u t u s i n g
obtainthe followingrelationshrp:
ir;1, - o'584+ o'083edrc
log(wage) li.++l
iir,i:]
n:526,R2:0.186'
interpretation whenit is multiplied by 100:wage
Thecoefficient on educhasa percentage
yearof education. This is what economists
increases by g.3 percent for everyadJitional
yearof education"'
m.un*f,.n theyreferto the "returnto another
ltisimportanttorememberthatthemainreasonforusingthelogotwagein(2.42)is
t o i m p o s e a c o n s t a n t p e r c e n t a g e e f f e c t o f e dparticu|ar,
u c a t i o nitois
n npt
W acorrect
g e . o ntoc say
eequation(2.42)is
the naturallo9of wage|s rare|ymentioned, ln
obtained,
yearof education increases log(wage) by 8 3%'
ihut unothur. log(wage)'
as it givesthe predicted
The lnterceptin (2.42)is not verymeaniniful, percent of thevari-
wheneduc: 0. Thenirquui.oshowsthat educ all of the non-
(not wage). Frnally, equation (2.44) mlght not capture
ationin log(wage) "diploma effects"'
u.*..n wageandschooling' lf there are
linearity in the relatronti-]ip
t h e n t h e t w e l f t h y e a r o f e d u c a t i o n _ g r a d u a t i o n f r o m h ikind
g h sof
c hnonlinearity
o o | _ c o u | in
dbeworthmucn
allowfor thls
morethanthe eleventh v.r|. w. wiii learnhow to
Chaoter 7.

a constantelasticitymodel'
A'other i*poitrnt useof thenatu'allog is in obtaining

w,K&{qt$sl-€ u s$
(CEO SalarY and Firm Sales)

modelrelating
ellsticity CEOsalary Thedataset
to firmsales'
We canestimate a constant
2.3,except*. no* relatesa/ary to sa/esLetsa/esbe
is the sameone usedtn Example is
A constant
of dollars.
inlmillions model
elasticity
annual measured
firmsales,
f tt, (i.45);
iog(salnr])= Fo* Brlog(snles)

w h e r e B 1 i s t h e e l a s t i c i t y o f s a t l a r y w i t h r e s p e c t t o s a:/ e s . T h i s mand
odelfal|sunderthesimp|e
to be y log(sa/ary) the inde-
regression model by o.tlning the dependentvariable
this equationby OL5gives
pJnO"n,variableto be x : llg(sa/es)'Estimating

l
i.,,ji
;,
(hapter 2 T h e S i m p l eR e g r e s s i oM
n odel \*1,''
' 1os$nkn') : 4.822 + 0.25'llog(saies) It*.,..=1.9;. f"' i:;:lrllir:,,;r'': n : 209,R2: 0.211. Thecoefficient is the estimated of log(sales) to sa/es.lt of salarywith respect elasticity CEO in firm salesincreases salaryby 0.257per- about iii;-' impliesthat a 1 percentincrease cent-the usualinterpretationof an elasticity 1ffi,, : i::r:::la')i:: ## il*[' The two funclionalformscoverecl in thissectionwill oftenarisehr theremainder this text. We have coverednodels containingnaturallogarithms appearso frequentlyin appliedwork. The interpretalion here because of suchmodelswill not be they of ## ffifi*i muchdifferentin the multipieregression It is alsouseful theunitsof measurement Becauscthe changeto to note what of the happens dependent logarithmic form to case. theinterceptandslopeestimates variable approximates when a it appea$
propoltionate
in
if we change
logarithtnic
change, it
form.
makes
1o;,'t''
sensethat nothinghappensto the slope.We can seethisby writing tlie rescaledvati-
ableas c,1,,foreachobservation I' The originalecluation is log(l') = Fo* Brx,* u,'If
:
rveaddlog(cr)to bothsides,rvegetlog(c1)* log('li) [log(r:,)+ Fn] + Br;trt 4,,or
log(cr),i): Uog(cr)+ Frl + B,xi* tti.(Rernember thatthesumof thelogsis equalto
thelog of t.heir product as shown in Appcndix A.) Therefore, theslopcis still B', butthe
interceptis now log(c,) t Bo. Similarly, if the independent vzriableis log(x),and we
ctrangethe unitsof measurement of x before taking the log, the sloperemainsthe same
but ttreinterceptdoes not change, You will be asked to verify tlnese claimsin Problern2.9.
We end this subsection by summarizing four combinations of functionalforms
';l'' availablefrom usingeither the original variable or its natural log. In Table2.3,'r and.v
: i,ri standfor the variablesrn their original form. The model rvith y as the dependent vari-
,:r:ix{
i!.':
ableand.rastheindependent variableis calledthe level-level model, because each vari-
ableappears in its levelform,The modelwith log()) as the ciependent variable and .r as
lr, l'

## ttreindependent variableis calledthelog-levelmoclel.We will not explicitly discuss the

Ievet-Iogmodelhere,becauseit ariseslessoften in practice. In any case, we will see
i l ,r :
examples of thismodelin laterchapters.

Tbble 2.3
Forms
of Functional
Summary Logarithms
lnvolvlng

## Dependent Independent lnterpretation

Model Variable Variable of F'

)

VoLr-

## log(y) Iog(.r) ckLv{:

log-log BrIoLx

45
RegressionAnalysiswith Cross-Sectional
Data

## The last column in Thble2.3 gives the interpretationof

B,. In the log-levelilloclel,
100'Br is sometimescatled the semi-elasticity of with respect
.1, to "r.As we mentionecl
in Example 2.1 | , in the log-log moder, p, is the erasticity
of 1,with respectto x. .Iiible
2'3 warrants careful study,as we will refer to it often in the rernaindef
of the text.
The Meaning of ,,Linear,' Regression
The sinrpleregression modelthatwe havestudiedin thischapteris alsocalledtlie sim-
ple lineur rcgressionmocJel.Yet,aswe havejust seen,thegeneralmo<Jel ajsoallowslbr
certainnonlineurrelationships. So whatdoes"linear"meanhere?you canseeby look-
ing at equation(2.1)thaty : 0,,* Ffl * a. Thekeyis thatthisequation
is linearin the
loromercrs, FrMd B'. Therea.rerto restrictionson how v andr relateto the original
explainedandexpranatory variablesof interest.As we sawin Examples2.7 and,i.g,y
andx canbe naturallogsof variables, anclthisis quitecornmon in ajplications. Butwe
ncednot stopt.hele'For cxantple,nolhingpr'events us lronr usingsirnplerogressior.r to
]i it estimatea modeisuchascors : Fr,+ Fr{frt * r.r.whereconsis annuar
consumption
andinc is annualincome.
whiie the mechanicsof sinrpreresressioncronot ciepe'cron rrow y
ancr"r are
defined,the interpretationof thecoefflcients doesdependon theirdefinitions.For suc-
cessfulempiricalwork,it is muchntoreimportantto becorneproficient
at interyreting
coefficientsthanto becorneefficientat compuringformulast;uri,u, (2.1g).we
will gi
much morepracticewith interpretingthe estimatesin OLS regression
lineswhenwe
studymultipleregression.
Thereareplentyof modelsthatclnnot be castasa linearrcgression moclelbecause
they are not linea-rin their parameters;an exiuripleis cons : tt(Fu* Brirtc)i- u.
Estirnationof suchmoclelsukes us into the rcalnr01'thertortlitrcurregre.r.rion
rnodel,
whiciris beyondthescopeof thistext.For mosrapplications, choosing a modelthatcan
be put into theiinearregression li.arncwurkis sul.ficicnt.

## 2.5 EXPECTED VALI'Es AIUT' I'ARIAIUGES OF THE OLs

ESTIMATORS
1".',: 'In Section2.l,we
definedthepopulation moder.r,: Fo+ Bfi + u, andwe craimed that
thekey assumption for simpleregression analysisto beusefulis thattheexpected'alue
of r givenanyvalueof .r is zero.In Sections2.2,2.3, ancJ 2.4,we cliscussedthe alge_
braic properties of oLS esrt11a1io,n we now returnto the populationmodelandstudy
[te statisticulproperties of ot-s. In 0therwords,we now ui.* li. and as estimators
B,
for theparameters Buandp, thatappealin thc popLrlation nodel.This meanstiratwc
will studypropertiesof the distributionsof
F,,and p, oue, differentrandomsamples
from the population'(ApptndixC conhinsdcfinitionsof estimiitors
ancireviewssome
of their importantproperties.)

lfnbiasedness of OLs
we be-qinby establisiring the unbiaseciness
of oLS unclera sinple setof assunptions.
For luturc ref'crence,
it is usefulto numberr]reseassumptrons usin_{theprefix .,sLR,,
1brsimplelinearregression. Thefirstassunrptiondefinesthepopulation moder.
#
T h e S i m p l eR e g r e s s i oM
n odel
Chapter 2

- (LINEAR IN PARAMETERs)
ASSUMPTION SLR.T
is related the independentvariable
t o
I n t h e p o p u l a t i o nm o d e l , t [ . o u p " n o * n t v a r i a b l e y
u )a s
a n d t h e e r r o r( o r d i s t u r b a n c e

tr,
,(2:471
respectlvely'
and slopeparameters'
where Boand p1 aretne populationintercept

## asrandomvariablesin statingthe population

To be realistic,\,, ,r, arrdu areaII viewecj
moclel.WediscussecltheinterpretationofthismorlelatSomel."g:1]lSection2.lmd
gaveseveralexamptes.InthepreviousSection,welearnedthatequation(2.47)isnotas
y and'r appropriately' we canobtaininter-
restrictiveasit initially ,,"*i by choosing models)'
(suchasconstantelasticity
lsting nonfinearrelationships
i;;t;g Bt'and'cspe-
tiatao' v ancl-r to cstimatcthe pafameters
We ar-einterestcd (See Appendix
o,i, clatuwercObtainccl
tr.,ut as a sarnple,
ranclonl
ciully,B,. Wc assurnc
C fbr a reviewof randornsuttpling')
-l
sLR .2 (RANDOM sAMPLING)
f;tsuMPrroN r t , i : 1 , 2 , . . . , fnr\o, mt h eP o P u l a t i o n
w. samPIe
.un usea random o f s t z e l ( x , , Y , ) ' .
II
I
mooet. t
I
:lr6ttri j:ri
:f :e-siq\:i:' 't:::":il\<---':i:\--cin'3iT
U<-r-j'bi=-:t Er:'*\i-:'r=-

## s;arrpbs car:tre rie.r,$lS JJllJi)r\ Wc can w:tite (2.47) in terms of l: tlie :lt \''':':\l': 1l:::::':i j.l:i random :^:l'r-'1::' \:' \a)lia\. sample as \ui .llrarl l" \'\'S\'\\i\\'' i)i N ' lir,,, ti;cbt; li: Fo* Ffiit [t,,i: 1,2,..,,t7, fir*, j (for exzunple, wheren, is the.erroror diSturbance for observation personl,'firm i, city ilit ,: i; i, etc.).ThuS,t, containsthe unobservables for observation I which affect1". The It shouldnot be confusedrvith the residuals. r'i,,thatrvedefinedin Section2.3.Lateron. rve will explorethe relationship between the erors and the residuats. For interpret- ing Fomd F, in a particular (2.47) application, \s most (2'48)is alsct informative,,but r n."d"a fclrsonreof the statisticalderivations. The relationship (2.48)canbe plottecl for a particular outcomeof rlataasshownin Figure2.7. In orclerto obtainunbiased of B,,andFr, we needto intpOse estimators thezerocott- rJitionalmeanassumption that we cliscussed in solne detail in Section 2.1. We now explicitlyaddit to our list of assumplions' ## sLR.3 (zERo coNDtrtoNAL MEA N f;tsuMprtoN I elrp): o. I i'. Data ReoressionAnalvsiswith Cross-Sectional Fiqure 2,7 = Graphof y, P'o+Bli+ ui. ' E l y r :xF) o * F , x ij'i' ## For a randomsample, thisassumption impliesthatE(url.rr) : 0, for all i: 1,2,...,n. In additionto restrictingtherelationship betweenl andr in tirepopulation, thezero conditionalmean assumption-coupledwith the random santpiingassuntptiott- allowsfor a convenient technicalsimplification.In particulal,we canderivethe statis- tical propertiesof the OLS estimators asconditionalon thevalues0f the.rrin our satn- ple.Technically, in statistical derivations, conditionin-e on thesamplevaluesof theinde- pendentvariablE theis same as ueating the x, as.fixeclin repeated santples. This process involvesseveralsteps. We first choose tl sample values for 11, "11, ...',t,,(These canbe repeated.). Giventhese values, we then obtain a sample on .v(effectively by obtaining t randomsample6f thetr,).Next anothcrsarnple ol'r-is obtainecl,usingthesanrcval- uesfOr.{1,...,.t,,. Then another sample of l is obtained, again uSing thesamex,. And so on. l The fixedin repeated samples scenariois not veryrealisticin nonexperimental con- in texts.For instance, sampling individuals for the wage-education example, it makes little senseto think of choosingthe valuesof educaheadof time and thensampling individualswith thoseparticularlevelsof education. Randomsampling,whereindivid- ualsarechosenrandomlyandtheirwageandeducatiotiarebothrecorded, is represen- tativeof how mostdatasetsareobtainedfor empiricalanalysis in the social sciences' = Once we assunethat E(ul.r) 0, and we have randomsampling,nothing lost in is 'I'ire derivationsby treatingthex, as nonra:rdom. clangeris that the hxed in rcpeatecl samplesassumptionalwaysimpliesthat tt, and .r,are independent' In decidingwhen Chaptcr 2 l h e 5 i m p l eR e g r e s s i oM n odel regression sirrrple is goingto produccunbiased analysis cstimators,it is criticalto think in termsof AssumptionSLR.3. Oncewe haveagreedto conditionon thex,, we needonefinal assumption for unbi- asedness. ## t- A S S U M P T T O N SLR.4 (SAMPLE VARIATI oN,N I l t t E T N D E P E N D E N T v A R T A B L E ) t h, ei n d e p e n d e n t v a r ixa,b,i l:e1s, 2 , . . . , nor,,r^g ^I t^v+L o- ll lt ^g ^q ,u, o- I totherr..o* 1 I t nt h es a m p l e somevariation in x in thepopulat,orr I stant.Thisrequires 1*- J r rl . , l t . 4w l t c r rw e r l r : r i v c ttll r t ll i l n r r t r l r rl si u ' t l t cO l , Sc s t i - W c c r r c o r " r n t c rAcscsl u n r p t i o S r ) ) u t ( ) r 'lst ;l s c c l u l v u l c , tr,r ) ( . r '-; - . t ) r - - 0 . O l t l r u l i r u l i r s s r r r r r p t i orrrrrsr u l cl ,l r i s i s t l r c . l:I leastimportantbecause it essentially neverfailsin interestingapplications. If Assump- tion SLR.4doesfail, we cannotcomputetheOLS estimators, whichmeansstatistical analysisis irrelevant. li, n Using the fact that ) (r, - ;)0,, - .'r-,) ,, : ) (r, - x)v;(seeAppendixA), we can l t=l i:l ,:l.i:,'' 'i,".rl;.:l;r""t Ii in equation(2.19)as writetheOLS slopeestimator ., flilii',..., .. i"ir':'i..;t I ' s, -. iiit:rli.rl,,.: I I I ^ ,L. (xi - x.)], fu1li1i,,,.,,, ln.'...' Ft:#, r;.,;. 1:ir ';r l' . ) (*,- r)' ## llii, , Becausewe etrenow interestecl in the behaviorof p, acrossall possiblesamples,B, is properlyviewedas a randomvariable. l :.,i' ' . We canwrite B, in termsof thepopulationcoefficients anderrorsby substitutingthe rti. l right hzu:dsideof (2.48)into (2.49).We have ,n ' ' u ' ' 't S Z-/ /* \*i - ^r\., t-t i ) " (", - xXBo+ Br:r,-lu,) .' l gr : ----------;- - s.i ?, ,'(r.5ol ## rvhcrcwe havedcfinedtlrctotalvariationin.r,asr'j : i (r, - x): in orclcrto simpliiy t: I tlte notation.(This is not quite the sarnplevuiance of the "r,becausewe do not divide by rt - l.) Using the algebra of the sumrnationoperator,write the numeratorof p, as ## , :rrl 'r: :r. +, " , - r ) F , . r ,) * ( . r , - . r ) u , ii -,l " , - x ) F o> ii | i=l i=l rrririlr 'hnn r.fi"!t,fi i:i ::: ::':: ## = Fo) (xi- i) + F,) (x - .r).r,+) (n - ;)a,. ,jti li:llrriiiiliiirilr,i 49 Part I Data ReoressionAnalvsiswith Cross-Sectional t l l l " ## As shownin AppendixA, ) (.r,- ;) = 0 arrd) (.r,- .x).r,: ) (x; - .r=')r : 5,r. ,., t /'" I il ## -.[)4,. Writing this over Therefore,we canwrite the numeratorof p, asF,s.l + 2 {r, the denominator gives L",:"'i rrr : i:''r" 1:'ii, l,li 'Fr , zr \xi* x)ut F,: F,+ i:l s^? : B, * (tlsi) ) d,H,, ,l i: I ;;;L ' ,'t .. i: ## whered,- .f,i .t. we nowseethattheestirnator p, equalsthepopulationslopeB,, plus a term that is a linearcombination in the errors ConditionalOnthe val- lL!,,ttr.,...,u,). uesof Ji, the randomness in p, is due entirelyto the errors in the sample.The factthat theseenorsare generally diff'erentfiom zero is what causes Bl differ fiom p'. to Using therepresentation in (2.52), we can prove the first prop- importantstatistisal erty of OLS. -l 2.1(ttNBtAsrDtl Is:, or: oLS) f;-EoREM Assumptions Using SLR.4, SLR.Ithrough I I II andE(8,): B, t(BJ: Bo, (2'sil" I for Bo,andp' isunbiased of BoandB,. Inotherwords,Boisunbiased ,or.un,values for B'. | I tl p on thesample of I values I n o o t : In thisproof,theexpectedvalues areconditional variable. I tn. lnC.p.ndent Since onlyof the&, theyarenonrandomI sl andd,aref unctions Therefore, in theconditicning. from(2.53), I I I E (p ,): F r* E [(l /s) .2 ' 2,t,u,)= 13,*ir ls]1i E( d,ui) '=,,' | || : Fr* ,'= ( t / s ' f ; ) r t , E ( u , )p=r * t l / s , 1 1d) , . 0 : 8 1 , I i:r i:r I I I | *f,"r" we haveusedthef actthattheexpected valueoI eachu,(conditionalon {x',xr,...,x,,})| I iszero underAssumptions and 51R.2 SLR.3. I Avcragc fne prooffor 1loisnowstraightforward, (2.48)acrossi to getI': [3o+Bi 1' I I u, andplugthisintotheformula for 6: I I I Bo:.!- F,x:Fu-rB,x,+n-B,t=Fu*18,-B,p+a- | I rh.n, conditional of thex,, on thevalues I r F,)]r, - B,)rt+ E(;)iBuo EftF, ti&,1: pn+E[(F' I : Bt,which I I sinceE(0):0 by Assumptiorrs SLR.2andSLR.3.But,we showed thatE(81) I that E[(F,- F,)] : 0. Thus,EFJ : B6.Bothof thesearguments I implies arevalidfor any I valuesof BoandB1,andsowe haveestablished unbiasedness I J 50 l l:ll'lr;-,..1ll ..., I r,tii .r.,,,,. i ,1.,, , ' 1 t, Model Regression The'SitnPle Chapter 2 ## Rememberthatunbiasedness is a featureof thesarrpiingdisuibutionsof p, andPo, which saysnothingabouttheestimatethatwe obtainfor a givensample,we hopethat' if the samplewe obtainis sOmehow "typical,"thenour estimateshouldbe "near"the populationvalue.Unfortunately, it is alwayspossiblethat we couldobtainan unlucky iamptethatwould give us a point estimatefar from 0r, andwe canneverknow for sure whetherthis is the case.You may want to reviewthe materialon unbiasedesdmatorsin AppendixC, especiallythe simulation exercise in TableC.l thatillustratestheconcept of unbiasedness. Unbiascc.lncssgcnerallylails if anyclfour lour assuuptions iaii' This llleansthatit is importantto thint<abouithe veracity of each assumption for a particularapplication' As we havealreadydiscussed, if Assumption SLR'4 fails, then we will not be ableto obtainthe OLS estimates, Assumption SLR.I requires that,'f'and 'l be linearlyrelated' with an additivedisturbance. This can certainly fail. But we also know$rat) and'r can
nonlinear relationships. Dealing with the failureof (2.47)
be chosento yield interesting
requiresmoreadvancedmethodsthat are beyond the scope of this text.
. Later,we will haveto relaxAssumptionSLR.2,therandomsamplingassumpdon,
for time seriesanalysis.But what aboutusingit for cross-sectional anaiysis'JRandom
sectionwhensamples are not representative 0f the under-
, samplingcanfail in a cross]
fying polutation;in fact,somedatasetsareconstructed by intentionally oversampling
diiiJr"niprts of the population. We wiil discussproblemsof nonrandomsamplingin
Chapters 9 and l7'
Th. assoroprion we shoulcJ concentrate on for now is SLR.3.If SLR'3 holds'the
OLS estimatorsare unbiased.Likewise, if SLR.3 fails, the OLS estimatorsgenerally
Therearewaysto detgrminc
rvill be biasecl. the likcly clirection andsizeo[ thebias'
whichwe will studyin ChaPter 3'
The possibilirythat r is conelatedwith a is almostalwaysa concernin simple
regSession analysiswith nonexperimental data,as we indicatedwith severalexamples
in-section2.1.Usingsimpieregression whenr containsfactorsaffecting.'vthatarealso
correlatedwith r can result \n spuriouscorrelation'.that is, we find a relationship
tretweeny and.rthatis reallydueto otherunobserved factorsthataffectl' andalsohap-
Dento be correlatedwith.r.

NXltff!&1\$E"K x s:t
( S t u d e n l M a t h t . . , 1 o . r T u . n " . : , . : l d t h g 5 q h . o , o lL. u n 5 , h r r o o . r e m )
Letm,othioidenole th.epelientage of tenthgraders bt a high.school'.receiving a passing
sco1€.;ot-l a stardardized mdthematics exam. Suppose we wish to estimate the effectof
the federally fundedschoOl lunch program on student performance' lf anything, we
expectthe lun::hprogramio have a positive ceteris paribus effect on performance: all
otherfactors beingequal,if a student who is too poor to eat regular meals becomes eli-
giblefor.the schoollunchqrogram,hisor herperformance should'irnbrove Lel'tnChprg
ilirl;
denotethe percentage,of.sludents who areeligible,for;trhsllunshLprrbgrarni;Thbnib,sirtr'ipile ':r ',
I

* u,
lnathl} = F,s* BJnchprg (2.541:,
, l': ' .'l

5t
Data
RegressionAnalysiswith Cross-Sectional

school
[hataffectovcrall
characl"crislics pcrformance'
whereu contatnS andsludent
school
highschools
on 408 Michigan for the 1992-93school
Usingrhe datain MEAP93.RAW
year,we obtain
nw?ht0:32.14 - 0.319lu:hPrg
n : 408,R: : 0.171
that if studenteligibilityin the lunchprogram increasesby 10 per-
Thisequationpredicts
points.Dowe really Lelieve in the lunchprogram
that higherparticipation actually
centage term
Almostcertainly not.A better explanation isthatthe error
causes worseperformancei
ln fact,u factors
contains suchastne pover
u in equation(2.5a)iscorrelated with/nchprg.
attending school, which affectsstudentperformance andis highlycorre-
ty rateof children are
in thellnchprogram. Varlablessuchasschool quality andresources
latedwitheligibility remem-
with /nchprg lt is importantto
alsocontained in u, arrdthesearelikelycorrelated
the estimate -0.319 isonly for thisparticularsample,butitssignandmagnitude
ber that
makeussuspectthatuandxarecorrelated,sothatsimpieregressionisbiased.

with
thereareotherreasonsfor -r to be correlated
model.Sincethe sameissuesarisein muitiple regressiott
u in the simpleregression
analysis,wewiilpostponeaSystematictreatmentoftheproblemuntilthen.

## Variances of the OLS Estimators

clistributionof B, is centerecl aboutB' (p' is
In aclcJitiorrto knowingthattltesantpling
fi'orr Br on aver-
unbiased), it is irnportantto kriowhow tar we callexpectB, to be away
estimator .mong all' or at
age.Among othei things,this allowsu.sto choosethe best
estimators. The measure of spread in the distribu-
to work with is the varianceor its square root'thestan-
il;; n,ir"o prythaiis easiest
darclcleviation.(SeeAppentlixCloramorecletaileddiscussion')
be corlputeclunder
It turns out that the varianceof the oIJs estimatorscan
SLR.1throughSLR.4.However,theseexpressions would be sontewirat
.gssumptions analy-
Insteacl,*, u6Oan assumption for cross-sectibnal
complicateci.
smtesthatthe varianceof the unobservable, a, conditional x' is
on
sis.This assumption
Constant. TiriSis knOwnasthehomoskedasticity Of"constantvafiance"assumptiOn'

## t.''.,] t-- sLR's (HoMosKEDAsrlclrY)

I otsuMprloN
:
I vartulxl "'.
l
.:yrr:r
lr'i;: Wcnrustcnrphasizcthatthehor-troskcdasticityassumptioltis.quitedistinctfrom
t5e zero conditionalmeanassumption, E(ulx) : 0. Ass*mptiorrSLR'3 involvesthe
of u (bothcondi-
expecteclvhlueof u, whileAssumptionSLR.5concernsthe v-ariance
{.t,,, the unbiasecirress
lional on ;r). Recallthat we establishecl of OLS withoutAssumption
assunption plays rto role in showingthat ft.,and B, iue
i'::r,'l SLR.5:the homoskedasticity
it simplifies thevtlriancecalculationsfor

l'i", 52
i"l
I' r.

Model
TheSimpleRegression
Chapter 2

it iffrpliesthatorclinaryleastsquares hascertainefficiencyprop-
B,,andF, andbecause thatu andx areindepen-
assun'le
erlies,which we will seein Chapter3' If we wereto
on .r' andso E(rrfr:)= E(1) = 0
dent, thenthedistributionof ,t giuen,, doesnot clepend
: s2. But independence is sometimes too sffongof an assutnption'
.";-v*irtrl
"^-B;;il;
: - : 0'.crr
E(rrl't) whichnreans
Var(rrl.r) ni":ltl tE(ul'r)]'?a^nd l.Ejrurlr)'
:
:
of 12'Theretbre,or E(u2) Var(r'r)' because
o2 is also tte unconclitionat expectation
E(u) : 0. In .ther words,o2 iS'thetmconclitiottttl varianceof tt, andsorr! is oftencalled
thc crror varianccor ciisturbance variance, The squarerootol'cr2,o. is the standard
ol'thetrnobservablcs af'l'ecl-
deviationof thcerror.A largercr mcallsthltlthc clistribution
'ItisoftenusefultowriteAssumptionsSLR'3arrdSLR.5intermsofthecondi-
of y:
tionalmeanandconditional1'ariance
(2.55)
E(Ilr) : Fo* Frx.
Var(yl,l)= o2. tz.s,eil