Вы находитесь на странице: 1из 52

1

Introduction to Data Analisys with Stata


Sara Godoy.
Grupo Avanzado. Noviembre 2011

Nonparametric Analysis

Non-Parametric tests: Summary


NATURE OF DEPENDENT VBL. ONE-SAMPLE TWO-SAMPLE K-SAMPLE

RELATED/
MATCHED

INDEPENDENT

INDEPENDENT

CATEGORICAL/
NOMINAL

Binomial test

McNemar test

Fisher s exact test WilconxonMann Whitney test

Chi-square test

ORDINAL/
INTERVAL

KolmogorovSmirnov onesample test

Wilcoxon signed ranks test

Kruskal Wallis test

Non-parametric correlation
A Spearman correlation is used when one or both of the variables are not assumed to be normally distributed and interval (but are assumed to be ordinal). The values of the variables are converted in ranks and then correlated. ! Syntax: spearman [varlist] [if] ,[options]
!

spearman read write Number of obs = 200 Spearman's rho = 0.6167 Test of Ho: read and write are independent Prob > |t| = 0.0000 The results suggest that the relationship between read and write (rho = 0.6167, p = 0.000) is statistically significant.

P-values meaning
A p-value is a measure of how much evidence we have against the null hypothesis (H0) ! The p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.
! !

One often "rejects the null hypothesis" when the p-value is less than the significance level:
! ! !

p <0.1 (10%) p<0.05 (5%) P <0.01 (1%)

When the null hypothesis is rejected, the result is said to be statistically significant.

Binomial probability test


Test whether the proportion of successes on a two-level categorical dependent variable significantly differs from a hypothesized value (small samples) ! Syntax: bitest varname == #p
!

bitest female=.5

The results indicate that there is no statistically significant difference (p = .2292).! In other words, the proportion of females does not significantly differ from the hypothesized value of 50%.

+ One- and two-sample tests of proportions


performs tests on the equality of proportions using large-sample statistics. ! Syntax
! prtest

One-sample test of proportion: tests that varname has a proportion of #p. prtest varname == #p
!

Two-sample test of proportion: tests that varname1 and varname2 have the same proportion prtest varname1 == varname2
!

+ One- and two-sample tests of proportions


!

Example 1: One-sample test of proportion

Assume that we have a sample of 74 automobiles. We wish to test whether the proportion of automobiles that are foreign is different from 40%. prtest foreign == .4

The test indicates that we cannot reject the hypothesis that the proportion of foreign automobiles is 0.40 at the 5% significance level.

+ One- and two-sample tests of proportions


!

Example 2: Two-sample test of proportion

We have two headache remedies that we give to patients. Each remedy s effect is recorded as 0 for failing to relieve the headache and 1 for relieving the headache. We wish to test the equality of the proportion of people relieved by the two treatments. prtest cure1 == cure2o-sample test of proportion

We find that the proportions are statistically different from each other at any level greater than 3.9%.

+ Kolmogorov-Smirnov one and two-samples


test
!

ksmirnov performs one-sample Kolmogorov Smirnov tests of the equality of distributions. In the first syntax, varname is the variable whose distribution is being tested, and exp must evaluate to the corresponding (theoretical) cumulative. Syntax: ksmirnov varname = exp

Example : One-sample test Let s now test whether x in the example above is distributed normally. KolmogorovSmirnov is not a particularly powerful test in testing for normality, and we do not endorse such use of it; In any case, we will test against a normal distribution with the same mean and standard deviation ksmirnov x = normal((x-r(mean)/r(sd))
!

+ Kolmogorov-Smirnov one and two-samples


test
!
1.

Example : One-sample test


summarize x

2.

ksmirnov x = normal((x-4.571429)/3.457222)

The results indicate that the data cannot be distinguished from normally distributed data.

+ Kolmogorov-Smirnov one and two-samples


test
!

Example : two-sample test

The first line tests the hypothesis that x for group 1 contains smaller values than for group 2. The largest difference between the distribution functions is 0.5. The approximate p-value for this is 0.424, which is not significant. The second line tests the hypothesis that x for group 1 contains larger values than for group 2. The largest difference between the distribution functions in this direction is 0.1667. The approximate p-value for this small difference is 0.909. Finally, the approximate p-value for the combined test is 0.785, corrected to 0.735.

McNemar test
You would perform McNemar's test if you were interested in the marginal frequencies of two binary outcomes. ! These binary outcomes may be the same outcome variable on matched pairs (like a case-control study) or two outcome variables from a single group. ! Example:
!
!

! ! !

Consider two questions, Q1 and Q2, from a test taken by 200 students. Suppose 172 students answered both questions correctly, 15 students answered both questions incorrectly, 7 answered Q1 correctly and Q2 incorrectly, and 6 answered Q2 correctly and Q1 incorrectly. These counts can be considered in a two-way contingency table. The null hypothesis is that the two questions are answered correctly or incorrectly at the same rate (or that the contingency table is symmetric). We can enter these counts into Stata using mcci, a command from Stata's epidemiology tables. The outcome is labeled according to casecontrol study conventions.

McNemar test

McNemar's chi-square statistic suggests that there is not a statistically significant difference in the proportions of correct/incorrect answers to these two questions.

Wilcoxon signed ranks test


!

The Wilcoxon signed rank sum test is the non-parametric version of a paired samples ttest. You use the Wilcoxon signed rank sum test when you do not wish to assume that the difference between the two variables is interval and normally distributed (but you do assume the difference is ordinal). We will use the same example as above, but we will not assume that the difference between read and write is interval and normally distributed.

The results suggest that there is not a statistically significant difference between read and write.

Wilcoxon signed ranks test


!

If you believe the differences between read and write were not ordinal but could merely be classified as positive and negative, then you may want to consider a sign test in lieu of sign rank test. Again, we will use the same variables in this example and assume that this difference is not ordinal.

This output gives both of the onesided tests as well as the twosided test. Assuming that we were looking for any difference, we would use the two-sided test and conclude that no statistically significant difference was found (p=.5565).

Fisher exact test


!

The Fisher's exact test is used when you want to conduct a chi-square test, but one or more of your cells has an expected frequency of five or less. Remember that the chisquare test assumes that each cell has an expected frequency of five or more, but the Fisher's exact test has no such assumption and can be used regardless of how small the expected frequency is. In the example below, we have cells with observed frequencies of two and one, which may indicate expected frequencies that could be below five, so we will use Fisher's exact test with the exact option on the tabulate command.

These results suggest that there is not a statistically significant relationship between race and type of school (p = 0.597). Note that the Fisher's exact test does not have a "test statistic", but computes the p-value directly.

Two independent samples t-test


!

An independent samples t-test is used when you want to compare the means of a normally distributed interval dependent variable for two independent groups. For example, we wish to test whether the mean for write is the same for males and females.
The results indicate that there is a statistically significant difference between the mean writing score for males and females (t = -3.7341, p = .0002). In other words, females have a statistically significantly higher mean score on writing (54.99) than males (50.12).

Wilcoxon-Mann Whitney test


!

The Wilcoxon-Mann-Whitney test is a non-parametric analog to the independent samples t-test and can be used when you do not assume that the dependent variable is a normally distributed interval variable (you only assume that the variable is at least ordinal). You will notice that the Stata syntax for the Wilcoxon-Mann-Whitney test is almost identical to that of the independent samples t-test the same variables in this example as we did in the independent t-test example above and will not assume that write, our dependent variable, is normally distributed. The results suggest that there is a statistically significant difference between the underlying distributions of the write scores of males and the write scores of females (z = -3.329, p = 0.0009). You can determine which group has the higher rank by looking at the how the actual rank sums compare to the expected rank sums under the null hypothesis. The sum of the female ranks was higher while the sum of the male ranks was lower. Thus the female group had higher rank.

Chi-square test
!

A chi-square test is used when you want to see if there is a relationship between two categorical variables. In Stata, the chi2 option is used with the tabulate command to obtain the test statistic and its associated p-value Example: let's see if there is a relationship between the type of school attended (schtyp) and students' gender (female). Remember that the chi-square test assumes the expected value of each cell is five or higher. These results indicate that there is no statistically significant relationship between the type of school attended and gender (chi-square with one degree of freedom = 0.0470, p = 0.828).

Chi-square test
!

Let's look at another example, this time looking at the relationship between gender (female) and socio-economic status (ses). The point of this example is that one (or both) variables may have more than two levels, and that the variables do not have to have the same number of levels. In this example, female has two levels (male and female) and ses has three levels (low, medium and high).

Again we find that there is no statistically significant relationship between the variables (chi-square with two degrees of freedom = 4.5765, p = 0.101).

Kruskal-Wallis
!

The Kruskal Wallis test is used when you have one independent variable with two or more levels and an ordinal dependent variable.
If some of the scores receive tied ranks, then a correction factor is used, yielding a slightly different value of chi-squared. With or without ties, the results indicate that there is a statistically significant difference among the three type of programs.

Linear Regression

Regression: A practical approach


!

We use regression to estimate the unknown effect of changing one variable over another. Technically, linear regression estimates how much Y changes when X changes one unit. Before running a regression it is recommended to have a clear idea of what you are trying to estimate (i.e. which are your outcome and predictor variables). Previous Steps
1. 2. 3. 4.

Examine descriptive statistics Look at relationship graphically and test correlation(s) Run and interpret regression Test regression assumptions

+ An example: SAT
Expenditures
!

scores and Education

Are SAT scores higher in states that spend more money on education controlling by other factors?* ! Outcome (Y) variable: SAT scores, variable csat in dataset ! Predictor (X) variables
! ! ! ! ! !

Per pupil expenditures primary & secondary (expense) % HS graduates taking SAT (percent) Median household income (income) % adults with H Sdiploma (high) % adults with college degree (college) Region(region)

* Source: Data and examples come from the book Statistics with Stata (updated for version 9) by Lawrence C. Hamilton (chapter 6). Click here to download the data or type: . Use the file states.dta (educational data for the U.S.)

Regression: Check the variables


describe csat expense percent income high college region

summarize csat expense percent income high college region

+ Regression: View relationship graphically


twoway scatter !expense !scat
1100

Relationship Between Education Expenditure and SAT Scores

800 2000

Mean composite SAT score 900 1000

4000

6000 8000 Per pupil expenditures prim&sec

10000

+ Regression: View relationship graphically


twoway (scatter ! !scat ! !expense) !(lfit scat ! expense)

+ Regression: View relationship graphically


twoway lfitci expense !csat

+ Regression: Correlation test


pwcorr csat expense, !star(.05)

Regression: what to look for


!

Lest run the regression: SAT scores and education expenditures


Outcome Variable (Y)

regress csat expense, robust


Predictor Variable (X) Robust standard errors (to control for heteroscedasticity

How state s mean SAT changes if its expenditure increases one unit. For each onepoint increase in expense, SAT scores decreases by 0.022

Constant (Intercept): state s mean SAT score if its expenditure is 0$

Regression: what to look for


Significance of individual predictors: Is there a statistically significant relationship between SAT scores and per pupil expenditures? regress csat expense, robust
!

The t-values test the hypothesis that the coefficient is different from 0. To reject this, you need a t-value greater than 1.96 (for 95% confidence). You can get the t-values by dividing the coefficient by its standard error. The t- values also show the importance of a variable in the model.

Two-tail p-values test the hypothesis that each coefficient is different from 0. To reject this, the p-value has to be lower than 0.05 (you could choose also an alpha of 0.10). In this case, expense is statistically significant in explaining SAT

Regression: what to look for


!

Significance of overall equation


This is the p-value of the model. It tests whether R2 is different from 0. Usually we need a p-value lower than 0.05 to show a statistically significant relationship between X and Y.

regress csat expense, robust

R-square shows the amount of variance of Y explained by X. In this case expense explains 22% of the variance in SAT scores.

Regression: what to look for


!

Adding the rest of predictor variables

regress csat expense percent income high college , robust

Regression: adding dummies (I)


! !

Region is entered here as dummy variable. First, generate dummies: tab region, g(reg)
. tab region, g(reg) Geographica l region West N. East South Midwest Total

Freq. 13 9 16 12 50

Percent 26.00 18.00 32.00 24.00 100.00

Cum. 26.00 44.00 76.00 100.00

Regression: adding dummies (I)


regress csat expense percent income high college reg2 reg3 reg4 , robust
Linear regression Number of obs = F( 8, 41) = Prob > F = R-squared = Root MSE = 50 69.82 0.0000 0.9111 21.492

csat expense percent income high college reg2 reg3 reg4 _cons

Coef. -.002021 -3.007647 -.1674421 1.814731 4.670564 69.45333 25.39701 34.57704 808.0206

Robust Std. Err. .0035883 .2358047 1.196409 1.02694 1.599798 17.99933 12.52558 9.44989 67.86418

t -0.56 -12.75 -0.14 1.77 2.92 3.86 2.03 3.66 11.91

P>|t| 0.576 0.000 0.889 0.085 0.006 0.000 0.049 0.001 0.000

[95% Conf. Interval] -.0092676 -3.483864 -2.583638 -.2592168 1.439705 33.10295 .101086 15.4926 670.9661 .0052256 -2.53143 2.248754 3.888679 7.901422 105.8037 50.69293 53.66149 945.0751

Regression: adding dummies (II)


!

Let Stata do your dirty work with xi command


. xi: regress csat expense percent income high college i.region, robust i.region _Iregion_1-4 (naturally coded; _Iregion_1 omitted) Linear regression Number of obs = F( 8, 41) = Prob > F = R-squared = Root MSE =

xi: regress csat expense percent income high college i.region, robust

50 69.82 0.0000 0.9111 21.492

csat expense percent income high college _Iregion_2 _Iregion_3 _Iregion_4 _cons

Coef. -.002021 -3.007647 -.1674421 1.814731 4.670564 69.45333 25.39701 34.57704 808.0206

Robust Std. Err. .0035883 .2358047 1.196409 1.02694 1.599798 17.99933 12.52558 9.44989 67.86418

t -0.56 -12.75 -0.14 1.77 2.92 3.86 2.03 3.66 11.91

P>|t| 0.576 0.000 0.889 0.085 0.006 0.000 0.049 0.001 0.000

[95% Conf. Interval] -.0092676 -3.483864 -2.583638 -.2592168 1.439705 33.10295 .101086 15.4926 670.9661 .0052256 -2.53143 2.248754 3.888679 7.901422 105.8037 50.69293 53.66149 945.0751

NOTE: By default xi excludes the first value, to select a different value, before running the regression type: . char region[omit] 4 xi: regress csat expense percent income high college i.region, robust This will select Midwest (4) as the reference category for the dummy variables.

Regression: correlation matrix


!

Below is a correlation matrix for all variables in the model. Numbers are Pearson correlation coefficients, go from -1 to 1. Closer to 1 means strong correlation. A negative value indicates an inverse relationship (roughly, when one goes up the other goes down).
pwcorr csat expense percent income high college, star(0.05) sig
csat csat 1.0000 expense percent income high college

expense

-0.4663* 0.0006 -0.8758* 0.0000 -0.4713* 0.0005 0.0858 0.5495 -0.3729* 0.0070

1.0000

percent

0.6509* 0.0000 0.6784* 0.0000 0.3133* 0.0252 0.6400* 0.0000

1.0000

income

0.6733* 0.0000 0.1413 0.3226 0.6091* 0.0000

1.0000

high

0.5099* 0.0001 0.7234* 0.0000

1.0000

college

0.5319* 0.0001

1.0000

Regression: graph matrix


!

Command graph matrix produces a graphical representation of the correlation matrix by presenting a series of scatterplots for all variables

graph matrix csat expense percent income high college, half maxis (ylabel(none) xlabel(none))

Regression: Managing all this outputs


! Usually
!

when we re running regression, we ll be testing multiple models at a time


Can be difficult to compare results

! Stata

offers several user- friendly options for storing and viewing regression output from multiple models:
! !

Store Output: eststo / esttab Outputting into Excel: outreg2

Regression: eststo/esttab
!We

can store this info in Stata, just type:

regress csat expense, robust eststo model1 regress csat expense college, robust eststo model2 percent income high

xi: regress csat expense college i.region, robust eststo model3

percent

income

high

Regression: eststo/esttab
!
esttab model1 model2 model3 Now Stata will hold your output in . memory until you ask to recall it: (1) (2) csat csat expense -0.0223*** (-6.07) 0.00335 (0.70) -2.618*** (-11.44) 0.106 (0.09) 1.631 (1.73) 2.031 (0.96) (3) csat -0.00202 (-0.56) -3.008*** (-12.75) -0.167 (-0.14) 1.815 (1.77) 4.671** (2.92) 69.45*** (3.86) 25.40* (2.03) 34.58*** (3.66) 1060.7*** (43.55) 51 851.6*** (14.86) 51 808.0*** (11.91) 50

esttab model1 !model2 model3

percent

income

high

college

_Iregion_2

_Iregion_3

_Iregion_4

_cons

t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001 .

Regression: eststo/esttab
Some options (type help eststo and help esttab for mor options) esttab model1 model2 model3, r2 ar2 se label
!
(1) Mean compo~e Per pupil expendit~c -0.0223*** (0.00367) % HS graduates tak~T Median household~000 (2) Mean compo~e 0.00335 (0.00478) -2.618*** (0.229) 0.106 (1.207) 1.631 (0.943) 2.031 (2.114) (3) Mean compo~e -0.00202 (0.00359) -3.008*** (0.236) -0.167 (1.196) 1.815 (1.027) 4.671** (1.600) 69.45*** (18.00) 25.40* (12.53) 34.58*** (9.450) 1060.7*** (24.35) 51 0.217 0.201 851.6*** (57.29) 51 0.824 0.805 808.0*** (67.86) 50 0.911 0.894

% adults HS diploma

% adults college d~e

region==2

region==3

region==4

Constant

Observations R-squared Adjusted R-squared

Standard errors in parentheses * p<0.05, ** p<0.01, *** p<0.001

Regression: outreg2
!

Avoid human error when transferring coefficients into tables

regress !csat expense, robust outreg2 !using !outreg2 using prediction.doc regress csat expense percent income high college, robust outreg2 using prediction.doc, append xi: regress csat expense percent income high college i.region, robust outreg2 using prediction.doc, append

Regression: outreg2

Getting predicted values


!

How good the model is will depend on how well it predicts Y, the linearity of the model and the behavior of the residuals. Using predict immediately after running the regression:
xi: regress csat expense college i.region, robust predict csat_predict label variable csat_predict "csat predicted

percent

percent2

income

high

Getting predicted values


scatter csat csat_predict
1100 800 850 Mean composite SAT score 900 1000

900

950 csat predicted

1000

1050

We should expect a 45 degree pattern in the data. Y-axis is the observed data and x-axis the predicted data (Yhat). In this case the model seems to be doing a good job in predicting csat

Linear Regression Assumptions


!

Assumption 1: Normal Distribution


! !

The dependent variable is normally distributed The errors of regression equation are normally distributed The variance around the regression line is the values of the predictor variable (X) same for all

Assumption 2: Homoscedasticity
!

Assumption 3: Errors are independent


!

The size of one error is not a function of the size previous error

of any

Assumption 4: Relationships are linear


! !

AKA the relationship can be summarized with a straight line Keep in mind that you can use alternative forms of regression to test non- linear relationships

Testing for Normality


predict !resid, !residual label !var resid "Residuals !of !pp !expend !and !SAT" histogram !resid, !normal

Testing for Normality


!

Shapiro- Wilk test of normality tests null hypothesis that data is normally distributed

+ Regression:

testing for homoscedasticity

Note: rvfplot command needs to be entered after regression equation is run Stata uses estimates from the regression to create this plot

+ Regression:
!

testing for homoscedasticity

A non-graphical way to detect heteroskedasticiy is the BreuschPagan test. The null hypothesis is that residuals are homoskedastic. In the example below we fail to reject the null at 95% and concluded that residuals are homogeneous. However at 90% we reject the null and conclude that residuals are not homogeneous.

Logit/Probit Regression

Logit model
Use logit models when every our dependent variable is binary (also called dummy) which takes values 0 or 1. ! Logit regression is a nonlinear regression model that forces the output (predicted values) to be either 0 or 1. ! Logit models estimate the probability of your dependent variable to be 1 (Y=1). This is the probability that some event happens. ! Logit and probit models are basically the same, the difference is in the distribution:
!
! ! !

Logit Cumulative standard logistic distribution (F) Probit Cumulative standard normal distribution ( ) Both models provide similar results.

Logit model

Logit: predicted probabilities

Logit: Odds ratio

Logit: adjust

Ordinal logit
!

When a dependent variable has more than two categories and the values of each category have a meaningful sequential order where a value is indeed higher than the previous one, then you can use ordinal logit. Here is an example of the type of variable:

Ordinal logit: the setup

Ordinal logit: predicted probabilities

Ordinal logit: predicted probabilities

Predicted probabilities: using prvalue

Predicted probabilities: using prvalue

Panel Data (fixed and random effects)

Panel Data Analysis

Panel Data Analysis


!

Panel data allows you to control for variables you cannot observe or measure like cultural factors or difference in business practices across companies; or variables that change over time but not across entities (i.e. national policies, federal regulations, international agreements, etc.). This is, it accounts for individual heterogeneity. With panel data you can include variables at different levels of analysis (i.e. students, schools, districts, states) suitable for multilevel or hierarchical modeling. Some drawbacks are data collection issues (i.e. sampling design, coverage), non-response in the case of micro panels or cross-country dependency in the case of macro panels (i.e. correlation between countries)

Panel Data Analysis

!In

this document we focus on two techniques use to analyze panel data:


! Fixed

effects ! Random effects

Setting panel data: xtset

Exploring panel data

Exploring panel data

FIXED-EFFECTS MODEL (Covariance Model, Within Estimator, Individual Dummy Variable Model, Least Squares Dummy Variable Model)

Fixed effects
!

Use fixed-effects (FE) whenever you are only interested in analyzing the impact of variables that vary over time. FE explore the relationship between predictor and outcome variables within an entity (country, person, company, etc.). Each entity has its own individual characteristics that may or may not influence the predictor variables (for example being a male or female could influence the opinion toward certain issue or the political system of a particular country could have some effect on trade or GDP or the business practices of a company may influence its stock price).

Fixed effects
!

When using FE we assume that something within the individual may impact or bias the predictor or outcome variables and we need to control for this. This is the rationale behind the assumption of the correlation between entity s error term and predictor variables. FE remove the effect of those time-invariant characteristics from the predictor variables so we can assess the predictors net effect. Another important assumption of the FE model is that those time-invariant characteristics are unique to the individual and should not be correlated with other individual characteristics. Each entity is different therefore the entity s error term and the constant (which captures individual characteristics) should not be correlated with the others. If the error terms are correlated then FE is no suitable since inferences may not be correct and you need to model that relationship (probably using random-effects), this is the main rationale for the Hausman test (presented later on in this document).

Fixed effects

Fixed effects

Fixed effects

+ Fixed effects: Heterogeneity across countries


(or entities)

+ Fixed effects: Heterogeneity across years

OLS regression

Fixed Effects using least squares dummy variable model (LSDV)

Fixed effects

+ Fixed effects: n entity-specific intercepts using


xtreg

+ Fixed effects: n entity-specific intercepts (using


xtreg)

+ Another way to estimate fixed effects: n entityspecific intercepts (using areg)

+ Another way to estimate fixed effects: common intercept and n-1


binary regressors (using dummies and regress)

+ Fixed

effects: comparing xtreg (with fe), regress (OLS with dummies) and areg

A note on fixed-effects
!

The fixed-effects model controls for all time-invariant differences between the individuals, so the estimated coefficients of the fixed-effects models cannot be biased because of omitted time-invariant characteristics [like culture, religion, gender, race, etc] One side effect of the features of fixed-effects models is that they cannot be used to investigate time-invariant causes of the dependent variables. Technically, time-invariant characteristics of the individuals are perfectly collinear with the person [or entity] dummies. Substantively, fixed-effects models are designed to study the causes of changes within a person [or entity]. A timeinvariant characteristic cannot cause such a change, because it is constant for each person. (Underline is mine) Kohler, Ulrich, Frauke Kreuter, Data Analysis Using Stata, 2nd ed., p.245

RANDOM-EFFECTS MODEL (Random Intercept, Partial Pooling Model)

Random effects

Random effects
!

Random effects assume that the entity s error term is not correlated with the predictors which allows for time-invariant variables to play a role as explanatory variables. In random-effects you need to specify those individual characteristics that may or may not influence the predictor variables. The problem with this is that some variables may not be available therefore leading to omitted variable bias in the model. RE allows to generalize the inferences beyond the sample used in the model.

Random effects

FIXED OR RANDOM?

Fixed or Random: Hausman test

OTHER TESTS/ DIAGNOSTICS

Testing for time-fixed effects

+ Testing for random effects: Breusch-Pagan


Lagrange multiplier (LM)

+ Testing

for cross-sectional dependence/contemporaneous correlation: using Breusch-Pagan LM test of independence

+ Testing

for cross-sectional dependence/ contemporaneous correlation: Using Pasaran CD test

Source: Hoechle, Daniel, Robust Standard Errors for Panel Regressions with Cross-Sectional Dependence http:// fmwww.bc.edu/repec/bocode/x/xtscc_paper.pdf

Testing for heterocedasticity

NOTE: Use the option robust to control for heteroscedasticiy (in both fixed and Presence of heteroscedasticity random effects).

Testing for serial correlation

Testing for unit roots/stationarity

Robust standard errors

Summary of basic models (FE/RE)

Вам также может понравиться