Академический Документы
Профессиональный Документы
Культура Документы
No measurement error (biased estimates: inflated) α = “true” intercept a = population intercept How: subtract each X from the mean of X (same for Y)
Run OLS with the new values
No specification error β = “true” slope b = population slope
ι ι
Non-linear relationship modeled (biased estimates) ε = “true” error e = population error Why: might make results easier to interpret
Relevant X’s excluded (biased estimates) “true” regression equation: for the total population Consequences: slope (b) no change; intercept (a) changes
Irrelevant X’s included (inflated standard errors) estimated regression equation: for sample
Error term assumptions Variance (standard error) of Slope Estimate
Homoskedasticity (variance of error term is constant) Criteria used by OLS for fitting a line Tells you how stable the line is
(if not, inflated/deflated standard errors)
Basic idea is to choose estimates b1 to bk to minimize the High variance (points compacted around the line):
sum of the squared residuals (errors) Bad à line is less stable
No autocorrelation (residuals are not correlated)
(if not, inflated/deflated standard errors)
Low variance (points more spread around the line):
Good à line is more stable
Residuals average out to 0 (this is built into OLS) Problems with the Slope (parameter) Estimate
Covariance between residuals and ind var = 0 1.Biased slope estimate: over an infinite number of samples, As sample size increases, variance usually decreases
(if leave relevant var out, biased estimates) the estimate will not equal the true population value
Error terms are normally distributed 2.Want efficient estimators: unbiased estimator with the least Forcing the Intercept Through the Origin
variance
If theory predicts that if X = 0, Y should = 0
R2 Not a good idea:
% of the variance in Y that is “explained” by X Standardized Estimates (Beta Weights) 1) changes the slope (strength of the relationship)
Measures goodness of fit Change in Y in standard deviation unites brought about by 2) can’t test H0: α = 0
R2 = Regression Sum of Squares (RSS) Σ (y hat - mean y)2 one standard deviation unit change in X
Total Sum of Squares (TSS) Σ (yi - mean of y)2
3) won’t work if you really have curvilinear rel
Units are lost in the transformation (now in std dev units) 4) maybe it doesn’t make sense to talk about the line
at all if X = 0
Problems with R2: Hard to convey meaning to the reader ( b/c std dev units)
1) not a measure of magnitude of rel btwn X & Y 5) may have a bad sample; makes it appear sig.
2) dependent/vulnerable on the std dev of X & Y Functional Transformations of Independent Variable 6) if you force the line you deny the chance to see if
can’t compare across samples there is something wrong with the model and if
biased in small samples the model actually predicts an intercept of 0
Used if there is a non-linear relationship btwn X & Y
Log (X)
√X
3) addition of any variable will increase R2 7) costs of leaving a in are minor compared to taking
include variables only because of theory it out
Dummy Variable Measures the goodness of fit To determine if β ≠ 0, we would find the actual
probability of obtaining a value of the test
statistic (p value for the t statistic) as much as
or greater than that obtained in the example; we
can accept or reject the hypothesis on the basis of
that number (.05 needed)
Changes the intercept Alternative to using the R2
The two groups start at different points, but have the Not dependent on the standard deviations of X or Y
same slope
Just one number for each equation (in Y units)
Can compare across samples
Interactions To determine if β > 0, a one-tailed test is needed;
to do this, only one half of the area under the
graph is analyzed (1/2 of the p value)
Changes the slopes Multicollinearity (MC)
The difference in slopes between the groups X1 can be predicted if the values of X2 and X3 are known
Should calculate predicted Y values to see impact Can’t tell which variable is actually having an impact
No
Explained? Stop
Yes
Use the means to calculate the predicted values for It exists in degrees and the magnitude of it determines Standard Error of Parameter Estimate
the variables used in the interaction (4 equations) whether or not it is a problem
but include
Yes all the variables
plain as anomalies? Delete, explain in footnote Σei2/n-(k+1)
√
No Inflates the standard errors: all look more significant Σ(xi – x hat)2
Possible interpretations (ex: interaction of gender/feeling) Diagnose it using VIF (variance inflation factor) Scores :
scores of 4 or 5 usually cut off point for problems; higher
re they severe? scores are problematic
Among Republicans, women like Clinton even less Variance Inflation Factor (VIF)
moderate
YesAmong women,
?e? > 3 std Republicans
dev from ? like Clinton even less 1 .
VIF=
1-auxR2
ze with andThere
Look is anfor
without
harder additional effect
explanation of gender
Explained away on feelings High Aux R2 (> .75) indicates high MC: you can explain a lot
towards Clinton of X1 with the other variables
Induces heteroskedasticity—residuals clustered in the Limitations: can’t check middle, only works for
middle, but no actual values there linear
WLS
Assumes that best info in data is in the
observations with least variance in error terms