Академический Документы
Профессиональный Документы
Культура Документы
About STATA
Basic Operations
Regression Analysis
Panel Data Analysis
About
STATA provides commands to analyze panel data (crosssectional time-series, longitudinal, repeated-measures, and
correlated data), cross-sectional data, time-series data,
survival-time data, cohort study,
Basic Operations
Entering Data
Exploring Data
Modifying Data
Managing Data
Analyzing Data
Entering Data
Example
cd u:\stata
dir
insheet using hs0.csv (If file has variable name on the first line)
Save hs
insheet gender id race ses schtyp prgtype read write math science
socst using hs0_noname.csv, clear(If file doesnt have variable name on the
first line)
Count
Describe
Compress
Clear
use hs, clear (only for files in Stata files, can be use over internet)
Memory
Exploring data
Correlate: Correlations
Example
Modifying Data
Example
Use hs0
Order id gender
label variable schtyp "The type of school the student
attended."
label define scl 1 public 2 private
label values schtyp scl
codebook schtyp
list schtyp in 1/10
list schtyp in 1/10, nolabel
encode prgtype, gen(prog) (create a new numeric version of the
string variable prgtype)
label variable prog "The type of program in which the student
was enrolled."
codebook prog
list prog in 1/10
list prog in 1/10, nolabel
Example (cont)
rename gender female (easier to work with since we dont have to deal with
0s and 1s)
codebook female
label variable total "The total of the read, write and socst."
recode race 5 = .
sum total
Codebook total
save hs1
Managing Data
cd Change directory
Example
We take the hs1 data file and make a separate folder called honors and
store a copy of our data which just has the students with reading
scores of 60 or higher
Pwd
Dir
Ls
cd honors
Describe
summarize read
drop ses
describe
list in 1/20
Analyzing Data
Ttest: t-test
Regress: Regression
Predict: Predicts after model estimation
Kdensity: Kernel density estimates and graphs
Pnorm: Graphs a standardized normal plot
Qnorm: Graphs a quantile plot
Rvfplot: Graphs a residual versus fitted plot
Rvpplot: Graphs a residual versus individual predictor plot
Xi: Creates dummy variables during model estimation
Test: Test linear hypotheses after model estimation
Oneway: One-way analysis of variance
Anova: Analysis of variance
Logistic: Logistic regression
Logit: Logistic regression
Example
ttest write = 50 (This is the one-sample t-test, testing whether the sample of
writing scores was drawn from a population with a mean of 50 )
ttest write = read (This is the paired t-test, testing whether or not the mean of
write equals the mean of read)
ttest write, by(female) (This is the two-sample independent t-test with pooled
(equal) variances)
Example (cont)
Example (cont)
xi: regress write read i.prog (The xi prefix is used to dummy code
categorical variables such as prog. The predictor prog has three levels and
requires two dummy-coded variables)
test _Iprog_2 _Iprog_3 (The test command is used to test the collective effect
of the two dummy-coded variables; in other words, it tests the main effect of
prog)
xi: regress write i.prog*read (create dummy variables for prog and for the
interaction of prog and read)
test _IproXread_2 _IproXread_3 (tests the overall interaction)
test _Iprog_2 _Iprog_3 (tests the main effect of prog)
gen honcomp = write >= 60 (create a dichotomous variable called honcomp
(honors composition) to use as our dependent variable)
tab honcomp
The logistic command defaults to producing the output in odds ratios but can
display the coefficients if the coef option is used. The exact same results can be
obtained by using the logit command, which produces coefficients as the
default but will display the odds ratio if the or option is used:
logit honcomp read female
logit honcomp read female, or
Logistic Regression
Classical Regression vs Logistic Regression
The variance of the errors are not constant, i.e., no homogeneity of variance.
Logistic Regression - 2
Logit:
Use admission into a graduate program in which 70% of the males and 30% of
the females are admitted
Let P equal the probability of being admitted.
Let the odds of a male admitted be odds(M) = P/Q = P/1-P = .7/.3 = 2.3333
Let the odds of a female admitted be odds(F) = P/Q = P/1-P = .3/.7 = .42857
The odds if being admitted to the program are about 5.44 times greater for
males then for females.
Odds ratios in logistic regression can be interpreted as the effect of a one unit of
change in X in the predicted odds ratio with the other variables in the model
held constant
Logistic Regression 3
Logistic Regression 4
Example 1: Categorical Independent Variable
lstat
Do file
Do-files are created with the do-file editor or any other text editor. Any
command which can be executed from the command line can be placed in a dofile
To open a do file editor: Window Do-file Editor or Ctrl + 8
set more off
use hsb2, clear
generate lang = read + write
label variable lang "language score"
tabulate lang
tabulate lang female
tabulate lang prog
tabulate lang schtyp
summarize lang, detail
table female, contents(n lang mean lang sd lang)
table prog, contents(n lang mean lang sd lang)
table ses, contents(n lang mean lang sd lang)
correlate lang math science socst
regress lang math science female
set more on
Do file cont.
Look at the commands in a do-file that contains:
. type hsbbatch.do
To run the do-file.
do hsbbatch
Panel Data
Creat the do file as followed
sort group
ttest pre, by(group) /* check to see if the groups differ on the pretest depression score
*/
hotel dep1 dep2 dep3 dep4 dep5 dep6, by(group)/*There isn't much of a difference
between groups on the pretest so let's try a Hotelling's T2
Using Hotelling's T2 we find a significant difference between the two groups. The T2 did not
make use of any of the information concerning the pretest but that's okay for the moment
especially since we know that the pretest differences were not significant.*/
xtgee dep pre group visit, fam(gaus) link(iden) i(subj) t(visit) corr(ind) /*The three
previous analyses provide identical incorrect results.
The common thread among them is that they all assume that the observations within the
subjects are independent. This seems, on the face of it, to be highly unlikely. Scores on the
depression scale are not likely to be independent from one visit to the next.
Of the three, only xtgee makes the assumption concerning the correlations explicit.*/
xtsum dep
Panel data 2
/*We can analyze these data using compound symmetry for the correlational
structure.
This approach can be tried using exchangable for the correlation matrix in
xtgee */
xtgee dep pre group visit, fam(gaus) link(iden) i(subj) t(visit) corr(exc)
xtcorr
/*Note in particular the change in the standard errors between this analysis and
the previous one.
Now let's try a different correlation structure, auto regressive with lag one.*/
xtgee dep pre group visit, fam(gaus) link(iden) i(subj) t(visit) corr(ar1)
/*back up and reconsider the group by visit interaction.
We will try a model with the interaction using the ar1 correlations. */
generate gxv = group*visit
xtgee dep pre group visit gxv, fam(gaus) link(iden) i(subj) t(visit)
corr(ar1)
/* The group by visit interaction still is not significant even though this may be a
better approach for testing it.
So far we have been treating visit as a continuous variable.
Is it possible that our analysis might change if we were to treat visit as a
categorical variable, the way that the anova did?
Let's try one last analysis using xi to create dummy variables on-the-fly. */
xi: xtgee dep pre group i.visit, fam(gaus) link(iden) i(subj) corr(ar1)
The help command can be used from the command line or from the
Help window. To use help the command must be spelled correctly and
the full name of the command must be used. help contents will list
all commands that can be accessed using help
help if
help anova
help regress
The search command searches for information in Stata manuals,
FAQs, and Stata Technical Bulletins (STBs). The search options
include: manual which restricts searches to the Stata Manual; author
when searching for an author by name; stb which restricts searhes to
STBs; faq which restricts searches to FAQs.The search command can
be used from either the command line or the Help window.
search if
search regression
search ttest, manual
Each copy of Stata comes with a built-in tutorital. Typing tutorial
brings up information about the tutorials. tutorial regress will bring
up the tutorial on regression.
tutorial
tutorial regress
End of Session