Вы находитесь на странице: 1из 35

Assessing Studies Based on Multiple Regression (SW Ch.

7)
Multiple regression has some key virtues: It provides an estimate of the effect on Y of arbitrary changes X. It resolves the problem of omitted variable bias, if an omitted variable can be measured and included. It can handle nonlinear relations (effects that vary with the Xs)

!"

#till, $%# might yield a biased estimator of the true causal effect. A Framework for Assessing Statisti al Studies !nternal and "#ternal $alidit% Internal validity: the statistical inferences about causal effects are valid for the population being studied. External validity: the statistical inferences can be generali&ed from the population and setting studied to other populations and settings, where the 'setting( refers to the legal, policy, and physical environment and related salient features.
!)

&hreats to "#ternal $alidit% *ow far can we generali&e class si&e results from +alifornia school districts, -ifferences in populations o +alifornia in )../, o Massachusetts in )../, o Me0ico in )../, -ifferences in settings o different legal re1uirements concerning special education o different treatment of bilingual education o differences in teacher characteristics
!2

&hreats to !nternal $alidit% of Multiple Regression Anal%sis (SW Se tion 7.') Internal validity: the statistical inferences about causal effects are valid for the population being studied. Five threats to the internal validity of regression studies: ". $mitted variable bias ). 3rong functional form 2. 4rrors!in!variables bias 5. #ample selection bias /. #imultaneous causality bias 6ll of these imply that E(ui7X"i,8,Xki) ..
!5

(. )mitted *aria+le +ias 6rises if an omitted variable both (i) is a determinant of Y and (ii) is correlated with at least one included regressor. ,otential solutions to omitted *aria+le +ias If the variable can be measured, include it as a regressor in multiple regression9 :ossibly, use panel data in which each entity (individual) is observed more than once9 If the variable cannot be measured, use instrumental variables regression9 ;un a randomi&ed controlled e0periment.
!/

'. Wrong fun tional form 6rises if the functional form is incorrect < for e0ample, an interaction term is incorrectly omitted9 then inferences on causal effects will be biased. ,otential solutions to fun tional form misspe ifi ation +ontinuous dependent variable: use the 'appropriate( nonlinear specifications in X (logarithms, interactions, etc.) -iscrete (example: binary) dependent variable: need an e0tension of multiple regression methods ('probit( or 'logit( analysis for binary dependent variables).
!=

-. "rrors.in.*aria+les +ias #o far we have assumed that X is measured without error. In reality, economic data often have measurement error -ata entry errors in administrative data ;ecollection errors in surveys (when did you start your current >ob,) 6mbiguous 1uestions problems (what was your income last year,) Intentionally false response problems with surveys (3hat is the current value of your financial assets, *ow often do you drink and drive,)
!

In general, measurement error in a regressor results in 'errors!in!variables( bias. Illustration: suppose Yi ? . @ "Xi @ ui is 'correct( in the sense that the three least s1uares assumptions hold (in particular E(ui7Xi) ? .). %et Xi ? unmeasured true value of X % ? imprecisely measured version of X X i
!A

Bhen Yi ? . @ "Xi @ ui
% @ C"(Xi < X % ? . @ " X i i ) @ uiD

or
%@ u % % % Yi ? . @ " X i , where ui ? "(Xi < X i ) @ ui i % is correlated with u E % If X i then " will be biased: i % % % % u cov( X i , i ) ? cov( X i ,"(Xi < X i ) @ ui) % % % ? "cov( X i ,Xi < X i ) @ cov( X i ,ui) % % ? "Ccov( X i ,Xi) < var( X i )D @ . . % % because in general cov( X i ,Xi) var( X i ).
!F

%@ u % % % Yi ? . @ " X i , where ui ? "(Xi < X i ) @ ui i

% is in general correlated If Xi is measured with error, X i E % with u i , so " is biased and inconsistent.

It is possible to derive formulas for this bias, but they re1uire making specific mathematical assumptions about the measurement error process (for e0ample, % that u i and Xi are uncorrelated). Bhose formulas are special and particular, but the observation that measurement error in X results in bias is general.

!".

,otential solutions to errors.in.*aria+les +ias $btain better data. -evelop a specific model of the measurement error process. Bhis is only possible if a lot is known about the nature of the measurement error < for e0ample a subsample of the data are cross!checked using administrative records and the discrepancies are analy&ed and modeled. (Gery speciali&ed9 we wont pursue this here.) Instrumental variables regression.
!""

/. Sample sele tion +ias #o far we have assumed simple random sampling of the population. In some cases, simple random sampling is thwarted because the sample, in effect, 'selects itself.( Sample selection bias arises when a selection process (i) influences the availability of data and (ii) that process is related to the dependent variable.

!")

Example H": Mutual funds -o actively managed mutual funds outperform 'hold! the!market( funds, 4mpirical strategy: o #ampling scheme: simple random sampling of mutual funds available to the public on a given date. o -ata: returns for the preceding ". years. o 4stimator: average ten!year return of the sample mutual funds, minus ten!year return on #I:/.. o Is there sample selection bias,

!"2

#ample selection bias induces correlation between a regressor and the error term. Mutual fund example: returni ? . @ "managed_fundi @ ui Jeing a managed fund in the sample (managed_fundi ? ") means that your return was better than failed managed funds, which are not in the sample < so corr(managed_fundi,ui) ..

!"5

Example ): returns to education 3hat is the return to an additional year of education, 4mpirical strategy: o #ampling scheme: simple random sampling of workers o -ata: earnings and years of education o 4stimator: regress ln(earnings) on years_education o Ignore issues of omitted variable bias and measurement error < is there sample selection bias,

!"/

,otential solutions to sample sele tion +ias +ollect the sample in a way that avoids sample selection. o Mutual funds example: change the sample population from those available at the end of the ten!year period, to those available at the beginning of the period (include failed funds) o !eturns to education example: sample college graduates, not workers (include the unemployed) ;andomi&ed controlled e0periment. +onstruct a model of the sample selection problem and estimate that model (we wont do this).
!"=

0. Simultaneous ausalit% +ias #o far we have assumed that X causes Y. 3hat if Y causes X, too, Example: +lass si&e effect %ow S"! results in better test scores Jut suppose districts with low test scores are given e0tra resources: as a result of a political process they also have low S"! 3hat does this mean for a regression of "estScore on S"!,
!"

Simultaneous ausalit% +ias in e1uations (a) +ausal effect of X on Y: (b) +ausal effect of Y on X: Yi ? . @ "Xi @ ui Xi ? . @ "Yi @ vi

%arge ui means large Yi, #hich implies large Xi (if "K.) Bhus corr(Xi,ui) .

E is biased and inconsistent. Bhus "

Ex: 6 district with particularly bad test scores given the S"! (negative ui) receives e0tra resources, thereby lowering its S"!9 so S"!i and ui are correlated
!"A

,otential solutions to simultaneous ausalit% +ias ;andomi&ed controlled e0periment. Jecause Xi is chosen at random by the e0perimenter, there is no feedback from the outcome variable to Yi (assuming perfect compliance). -evelop and estimate a complete model of both directions of causality. Bhis is the idea behind many large macro models (e.g. Lederal ;eserve Jank!M#). "his is extremely difficult in practice. Mse instrumental variables regression to estimate the causal effect of interest (effect of X on Y, ignoring effect of Y on X).
!"F

Appl%ing this Framework2 &est S ores and Class Si3e (SW Chapter 7.-) $b>ective: 6ssess the threats to the internal and e0ternal validity of the empirical analysis of the +alifornia test score data. 40ternal validity o +ompare results for +alifornia and Massachusetts o Bhink hard8 Internal validity o No through the list of five potential threats to internal validity and think hard8
!).

Che k of e#ternal *alidit% compare the +alifornia study to one using Massachusetts data &he Massa husetts data set )). elementary school districts Best: "FFA M+6# test < fourth grade total (Math @ 4nglish @ #cience) Gariables: S"!, "estScore, $ctE%, %unch$ct, &ncome

!)"

Bhe Massachusetts data: summary statistics

!))

!)2

!)5

%ogarithmic v. cubic function for S"!, 4vidence of nonlinearity in "estScore!S"! relation, Is there a significant 'iE%S"! interaction,
!)/

,redi ted effe ts for a lass si3e redu tion of ' %inear specification for Mass:
? 55.. < ..=5S"! < ..52 $ctE% < ../A)%unch$ct "estScore

()".2) (..) )

(..2.2)

(...F )

< 2.. &ncome @ .."=5&ncome) < ....))&ncome2 ().2/) (...A/) (....".) 4stimated effect ? !..=5(!)) ? ".)A #tandard error ? )..) ? ../5
E ) ? 7a7SE( E) NOTE2 var(aY) ? a)var(Y)9 SE(a " "

F/O +I ? ".)A ".F=../5 ? (..)), ).25) +omputing predicted effects in nonlinear models
!)=

(se the )before* and )after* method:


? =//./ @ ").5S"! < ..=A.S"!) @ ...""/S"!2 "estScore

< ..525$ctE% < ../A %unch$ct < 2.5A&ncome @ .." 5&ncome) < ....)2&ncome2 4stimated reduction from ). students to "A:
"estScore ? C").5). < ..=A.).) @ ...""/).2D

< C").5"A < ..=A."A) @ ...""/"A2D ? ".FA compare with estimate from linear model of ".)A SE of this estimated effect: use the 'rearrange the regression( ('transform the regressors() method

!)

Summar% of Findings for Massa husetts ". +oefficient on S"! falls from <". ) to <..=F when control variables for student and district characteristics are included < an indication that the original estimate contained omitted variable bias. ). Bhe class si&e effect is statistically significant at the "O significance level, after controlling for student and district characteristics 2. Po statistical evidence on nonlinearities in the "estScore < S"! relation 5. Po statistical evidence of S"! < $ctE% interaction
!)A

Comparison of estimated lass si3e effe ts2 CA *s. MA

!)F

Summar%2 Comparison of California and Massa husetts Regression Anal%ses +lass si&e effect falls in both +6, M6 data when student and district control variables are added. +lass si&e effect is statistically significant in both +6, M6 data. 4stimated effect of a )!student reduction in S"! is 1uantitatively similar for +6, M6. Peither data set shows evidence of S"! < $ctE% interaction. #ome evidence of S"! nonlinearities in +6 data, but not in M6 data.
!2.

Remaining threats to internal *alidit% 3hat the +6 v. M6 comparison does and doesnt show (. )mitted *aria+le +ias Bhis analysis controls for: district demographics (income) some student characteristics (4nglish speaking) 3hat is missing, 6dditional student characteristics, for e0ample native ability (but is this correlated with S"!,) 6ccess to outside learning opportunities Beacher 1uality (perhaps better teachers are attracted to schools with lower S"!)
!2"

+mitted variable bias, ctd 3e have controlled for many relevant omitted factors9 Bhe nature of this omitted variable bias would need to be similar in +alifornia and Massachusetts to be consistent with these results9 In this application we will be able to compare these estimates based on observational data with estimates based on e0perimental data < a check of this multiple regression methodology.

!2)

'. Wrong fun tional form 3e have tried 1uite a few different functional forms, in both the +alifornia and Mass. data Ponlinear effects are modest :lausibly, this is not a ma>or threat at this point. -. "rrors.in.*aria+les +ias S"! is a district!wide measure :resumably there is some measurement error < students who take the test might not have e0perienced the measured S"! for the district Ideally we would like data on individual students, by grade level.
!22

/. Sele tion #ample is all elementary public school districts (in +alifornia9 in Mass.) no reason that selection should be a problem. 0. Simultaneous Causalit% #chool funding e1uali&ation based on test scores could cause simultaneous causality. Bhis was not in place in +alifornia or Mass. during these samples, so simultaneous causality bias is arguably not important.

!25

Summar% Lramework for evaluating regression studies: o Internal validity o 40ternal validity Live threats to internal validity: ". ). 2. 5. /. $mitted variable bias 3rong functional form 4rrors!in!variables bias #ample selection bias #imultaneous causality bias

;est of course focuses on econometric methods for addressing these threats.


!2/

Вам также может понравиться