Вы находитесь на странице: 1из 10

DEPARTMENT OF BIOSTATISTICS

UNIVERSITY OF COPENHAGEN

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Graphs in stata
The main command is graph followed by the type of graph

Graphics in Stata

graph
graph
graph
graph
graph
graph
graph
graph
graph

Klaus K. Holst

29 Sep 2014
U.S. Life Expectancy
90

19001999

80

Life expectancy, males


Life expectancy, females

twoway
matrix
bar
dot
box
pie
save
use
combine

scatter plots, line plots


scatterplot matrices
bar charts
dot charts
box-and-whisker plots
pie charts

50

60

70

plus more specialized graphs: histogram, kdensity, avplot, . . .

30

40

1
1900

1920

1940

1960

1980

2000

Year
Data 19001999

Life expectancy by gender

3
4
5

graph twoway scatter ...


// or
twoway scatter ...
// or
scatter ...

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

U.S. life expectancy data, 1900-1999


2

clear
sysuse uslifeexp

describe

Scatter plots
1

twoway scatter le year

life expectancy
60
50
40

Contains data from /Applications/Stata/ado/base/u/uslifeexp.dta


obs:
100
U.S. life expectancy, 1900-1999
vars:
10
30 Mar 2011 04:31
size:
3,800
(_dta has notes)
------------------------------------------------------------------------------storage
display
value
variable name
type
format
label
variable label
------------------------------------------------------------------------------year
int
%9.0g
Year
le
float
%9.0g
life expectancy
le_male
float
%9.0g
Life expectancy, males
le_female
float
%9.0g
Life expectancy, females
le_w
float
%9.0g
Life expectancy, whites
le_wmale
float
%9.0g
Life expectancy, white males
le_wfemale
float
%9.0g
Life expectancy, white females
le_b
float
%9.0g
Life expectancy, blacks
le_bmale
float
%9.0g
Life expectancy, black males
le_bfemale
float
%9.0g
Life expectancy, black females
------------------------------------------------------------------------------Sorted by: year

70

80

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

1900

1920

1940

1960
Year

1980

2000

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Scatter plots

Line plots

twoway spike le year

twoway line le year

70
life expectancy
60
50
40

40

50

life expectancy
60

70

80

80

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

1900

1920

1940

1960

1980

2000

1900

1920

1940

Year

1960

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Line plots

A scheme specifies the overall look of the graph


To change for the session

80

twoway line le year, connect(stairstep)

set scheme s2mono

graph query, schemes

50

life expectancy
60

70

Available schemes are

1900

1920

1940

1960
Year

2000

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Schemes

40

1980

Year

1980

2000

s2color
s2mono
s2manual
s2gmanual
s2gcolor
s1color
s1mono
s1rcolor
s1manual
sj
economist
s2color8
lean1
lean2
rbn1mono

see
see
see
see
see
see
see
see
see
see
see
see
see
see
see

help
help
help
help
help
help
help
help
help
help
help
help
help
help
help

scheme_s2color
scheme_s2mono
scheme_s2manual
scheme_s2gmanual
scheme_s2gcolor
scheme_s1color
scheme_s1mono
scheme_s1rcolor
scheme_s1manual
scheme_sj
scheme_economist
scheme_s2color8
scheme_lean1
scheme_lean2
scheme_rbn1mono

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Schemes
1

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Multiple graphs in one


Overlay multiple twoway graphs (parentheses or seperate by ||)

twoway line le year, scheme(s1color)

twoway (line le_male year) (line le_female year)

40

40

50

50

60

life expectancy
60

70

70

80

80

1900

1920

1940

1960

1980

1900

2000

1920

1940

Life expectancy, males


DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Graphics options

1980

2000

Life expectancy, females


DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Graphics options

Stata can produce nice publishable graphics results in few steps.


Axis limits chosen automatically
Legends automatically added
Axis labels automatically obtained from variable labels.
To alter we need to add options (everything after ,)
General syntax of the scatter graphics command:
twoway scatter varlist [if] [in] [, options]
Options divided into subgraph (here scatter), e.g.
marker_options
connect_options
axis_choice_options

1960
Year

Year

change look of markers (colour, size, etc.)


change look of lines or connecting method
associate plot with alternate axis

and options for the global graph command


twoway_options by, name, titles, legends, axes, etc.

General syntax:
twoway (line ..., line_options) ///
(scatter ..., scatter_options) ///
(lfit ..., lfit_options), twoway_options
In the previous plot the y-axis label disappeared:
1

twoway (line le_male year) (line le_female year),


ytitle(Life expectancy in years)

Some twoway options:


by(varlist, ...)
nodraw
name(name, ...)
scheme(schemename)
xtitle,ytitle
xlabel,ylabel
legend
title,subtitle

repeat for subgroups


suppress display of graph
overall look
Axis titles
Axis labels positions
Legend options
Graph title

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Graphic options

Graphics options

scatter marker options:

80

shape of marker
colour of marker, inside and out
size of marker
inside or "fill" colour
colour of outline
thickness of outline

. . . , jitter options, connect options, label options, . . .


line options
how to connect points
how to sort before connecting
missing values are ignored
line pattern (solid, dashed, etc.)
thickness of line
colour of line
overall style of line

40

connect(connectstyle)
sort[(varlist)]
cmissing(y/n)
lpattern(linepatternstyle)
lwidth(linewidthstyle)
lcolor(colorstyle)
lstyle(linestyle)
...

twoway (line le_male year) (line le_female year),


ytitle(Life expectancy in years)

Life expectancy in years


50
60
70

msymbol(symbolstylelist)
mcolor(colorstylelist)
msize(markersizestylelist)
mfcolor(colorstylelist)
mlcolor(colorstylelist)
mlwidth(linewidthstylelist)

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

1900

1920

1940

1960

Life expectancy, males

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Graphics options
If the axis label is going to be reused many times we may store it in
a macro
1

local gopt ytitle(Life expectancy in years) xlabel


(1875(25)2025) title(U.S. Life Expectancy)

where we also add a little more space to the x-axis + title


1

twoway (line le_male year) (line le_female year),gopt

. . . and some different line types and colours


1

local maleline "line le_male year, color(dknavy)


lpattern(solid) connect(stairstep)"
local femaleline "line le_female year, color(dkorange)
lpattern(dash_dot) connect(stairstep)"
twoway (maleline) (femaleline), gopt

2000

Life expectancy, females


DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Digression: Macros
An alias that can be dereferenced in the program everywhere(!)
1
2
3

local a 1
local b a b c
global b "Hello"

local macros lives within this scope where they were defined (i.e.
the do-file or program/function).
1
2

di a
di "$b b"

1
Hello a b c

To evaluate a macro expression use =


1

1980

Year

local a = a+1
di a

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Digression: Macros

Graphics options

Meta-programming with macros

3
4
5

1
2
3
4

capture drop x1 x2
input x1 x2
1 3
2 4
end

U.S. Life Expectancy

local idx 1 2
foreach i in idx {
list xi in 1/2
}

40

+----+
| x1 |
|----|
1. | 1 |
2. | 2 |
+----+

1875

+----+
| x2 |
|----|
1. | 3 |
2. | 4 |
+----+

1900

1925

1950
Year

1975

Life expectancy, males

2000

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Symbols

palette linepalette

palette symbolpalette

Line pattern palette

Symbol palette
solid

Oh

oh

Dh

dh

Th

th

Sh

sh

dash
longdash_dot
dot
longdash
dash_dot

smplus

shortdash
shortdash_dot
blank

2025

Life expectancy, females

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Line types
1

twoway (maleline) (femaleline), gopt

80

Life expectancy in years


50
60
70

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Colours

title options Options for specifying titles

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Titles, plot region

The definition of ringposstyle and the default positioning of titles is

note
caption

r1 r2
1 2

ringposstyle
7
6
2
1

plot region

0
0
0
0
0
0
0

r1title

title
subtitle
t2title
t1title

r2title

l2title

0 0 0 0 0 0 0 0 0 0 0

l1title

l2 l1
1 2

b1title
b2title
legend

1
2
3
4
5

title
subtitle
t2
t1

b1
b2
legend
note
caption

where titles are located is controlled by the scheme

Customize
single-line
1
set rgbwith
"255
100 50"color-colname.style (in ado-path):

help
title_options
Description
Titles are the adornment around a graph that explains DEPARTMENT
the graphs purpose.
OF BIOSTATISTICS

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Titles

UNIVERSITY OF COPENHAGEN

Titles

Options
title(tinfo) specifies the overall title of the graph. The title usually appears centered at the top of
the graph. It is sometimes desirable to specify the span suboption when specifying the title, as in

U.S. Life Expectancy

6
7
8
9
10
11

Position given as clock-position

90
80
70

60

Life expectancy, males


subtitle(tinfo)
specifies the
subtitle of the graph. The subtitle appears near the title (usually directly
Life expectancy,
females
under it) and is presented in a slightly smaller font. subtitle() is used in conjunction with
title(), and subtitle() is used by itself when the title() seems too big. For instance, you
might type
. graph

. . . , . . . title("Life expectancy") subtitle("1900-1999")

. graph

. . . , . . . subtitle("Life expectancy" "1900-1999")

or

50

#delimit ;
twoway (maleline) (femaleline),
title("U.S. Life Expectancy")
subtitle("1900-1999")
caption("Life expectancy by gender")
note("Data 1900-1999")
legend(col(1) ring(0) position(11))
yscale(range(30 90))
ylabel(30(10)90)
name(lifeexptitles, replace);
#delimit cr

40

If subtitle() is used in conjunction with title() and you specify suboption span with
title(), remember also to specify span with subtitle().

30

. . . , . . . title("Life expectancy", span)


19001999
See Spanning under Remarks and examples below.
. graph

1900

1920

1940

1960
Year

Data 19001999

Life expectancy by gender

1980

2000

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Example 2, World Life Expectancy 1998

sysuse lifeexp, clear

graph matrix gnppc popgrowth lexp safewater,


half note("")

describe

Contains data from /Applications/Stata/ado/base/l/lifeexp.dta


obs:
68
Life expectancy, 1998
vars:
6
26 Mar 2011 09:40
size:
2,652
(_dta has notes)
------------------------------------------------------------------------------storage
display
value
variable name
type
format
label
variable label
------------------------------------------------------------------------------region
byte
%12.0g
region
Region
country
str28
%28s
Country
popgrowth
float
%9.0g
* Avg. annual % growth
lexp
byte
%9.0g
* Life expectancy at birth
gnppc
float
%9.0g
* GNP per capita
safewater
byte
%9.0g
*
* indicated variables have notes
------------------------------------------------------------------------------Sorted by:

GNP
per
capita
4

Avg.
annual
%
growth

2
0
80

Life
expectancy
at birth

70
60
50
100

safewater

50

0
0

20000

40000 0

450

60

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Stratification
We can also use the by option to make different plots for different
levels of a third variable.

4000

Notice we have here used the if argument together with logical


expressions to subset the scatter plots. This applies to every graph
command!

2000

Subsetting

1000

twoway (scatter lexp gnppc if region==2, mcolor(


dkorange) msize(0.8)) (scatter lexp gnppc if region
==3, mcolor(dknavy) msize(0.8)),
xscale(log) xlabel(1000 2000 4000 8000 16000,angle
(90)) legend(order(1 "North America" 2 "South
America"))

From the overlay plot using if statements:


80

Scatter plots

Life expectancy at birth


60
65
70
75

Plotting the association between GNP (on log-scale) and life


expectancy in North America,South America with different colours

80

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

55

Scatter plots

70

GNP per capita


North America

South America

16000

Scatter plot matrices

8000

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Scatter plots, by

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Scatter plots
Labels can be added to the graph (here we also remove the points)
1

scatter lexp gnppc if region==2,


mlabsize(2) mlabel(country) mlabposition(0) msymbol(i))
80

twoway (scatter lexp gnppc),


by(region, row(1)) xsize(10)

Canada

N.A.

United States

S.A.

75

80

Eur & C.Asia

Jamaica

Life expectancy at birth


65
70

70
60
50

Life expectancy at birth

Panama

10000

20000

30000

40000 0

10000

20000

30000

40000 0

10000

20000

30000

40000

GNP per capita

Mexico
Dominican Republic
Honduras
El Salvador
Nicaragua

Guatemala

55

60

Graphs by Region

Trinidad and Tobago

Haiti

10000

Scatter plots, point sizes

30000

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Curve fits

And size of points depending on another variable

Linear regression (lfit, lfitci) or quadratic (qfit , qfitci)


1

twoway (lfitci lexp safewater) (scatter lexp safewater)

70

75

75

80

80

scatter lexp gnppc if region==2 [pweight=popgrowth],


msymbol(Oh)

55

60

65

Life expectancy at birth


60
65
70

20
55

20000
GNP per capita

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

40

60
safewater

95% CI
Life expectancy at birth
0

10000

20000

30000

80
Fitted values

100

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Bar plots
1

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Box plots

graph bar (mean) lexp (p50) lexp (mean) safewater (p50)


safewater, over(region)
80

Box-whisker plots

60

Gives a quick summary of the marginal distribution of continuous


variables. Useful for getting a quick overview of skewness, potential
outliers etc. for many variables.

40

graph box lexp, over(region) marker(1,mlabel(country))

20

Box limits are the 25% and 75% quantiles and with median
marked. as a line in the box. The whiskers shows the most extreme
observations (min/max) within 1.5IQR from the the box limits (or
else this limit).
Eur & C.Asia

N.A.

mean of lexp
mean of safewater

S.A.
p 50 of lexp
p 50 of safewater
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Box plots

Histograms, density estimation

graph box lexp, over(region) marker(1,mlabel(country))

80

Histograms can be generated with the syntax


1

histogram lexp, bins(#) width(#)

Life expectancy at birth


65
70

75

Selecting the width or number of bins potentially difficult, by


default selected ad hoc from the number of observations n:
k = min{sqrt(n), 10 log10 (n)}

60

Bolivia

We can overlay normal approximation (option normal) or


non-parametric kernel density estimates (option kdensity), there is
also a seperate graph kdensity command)
1

55

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Haiti

Eur & C.Asia

N.A.

S.A.

histogram lexp, normal normopt(lpattern(dot))

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

Histograms
1

DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

QQ-plots
Comparison with the theoretical quantiles of a normal distribution

histogram lexp, normal normopt(lpattern(dot))

qnorm lexp

55

60

65
70
Life expectancy at birth

75

80

60
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN

QQ-plots

1
2
3

capture drop z
gen z = rnormal()
qnorm z

50

.02

.04

Density
.06

Life expectancy at birth


60
70

.08

80

.1

65

70
75
Inverse Normal

80

85

Вам также может понравиться