Вы находитесь на странице: 1из 4

BMJ

Assessing Methods: Descriptive Statistics And Graphs


Author(s): Sheila M. Gore
Source: British Medical Journal (Clinical Research Edition), Vol. 283, No. 6289 (Aug. 15, 1981),
pp. 486-488
Published by: BMJ
Stable URL: http://www.jstor.org/stable/29503239 .
Accessed: 18/02/2014 10:08
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Digitization of the British Medical Journal and its forerunners (1840-1996) was completed by the U.S. National
Library of Medicine (NLM) in partnership with The Wellcome Trust and the Joint Information Systems
Committee (JISC) in the UK. This content is also freely available on PubMed Central.

BMJ is collaborating with JSTOR to digitize, preserve and extend access to British Medical Journal (Clinical
Research Edition).

http://www.jstor.org

This content downloaded from 168.176.5.118 on Tue, 18 Feb 2014 10:08:00 AM


All use subject to JSTOR Terms and Conditions

486

BRITISH MEDICAL

15 AUGUST

283

1981

Mgore

sheila

in Question

Statistics

VOLUME

JOURNAL

METHODS-DESCRIPTIVE
ASSESSING
AND
GRAPHS
STATISTICS
to qualify

are needed
and median

measures

central

as mean

such

COMMENT
are

Authors

data?location,
such
statistics
but
centiles,
of informative
the

FOR
THE

f?fe*^W4f&>^\

??sus?

summary

guidelines
Measures

tesfcrv?^^H

and

Mean
sum

^^^?;?%&

of

the

and

analysis."
This

is about

series

answers

papers
methods

and

Types

of problems

are

how

to

out

about

questions

in medical

when

statistical
particular
in using
them.

snags

mean

This

that

survived

for

at

survival

deal

about

statistics?mean,

descriptive

and

range?explaining
interquartile
measures
these
summary
using

the distribution

of observations

of 347 patients

Survival

time

(years)*

if a fully

It is usually
is not presented.
table or graph
helpful
some
about
how
and
reminders
to present
data,
graphs
are recom?
are given?in
this
scattergrams
particular,

to

do

9-10
10-11
11-12
12-13
13-14
14-15
15-16
16-17
17-18
18-19
19-20
At least 20

mended.

most likely {modal) survival time and


(1) From the tablefind the
estimate median survival for the 347 patients with breast cancer
who were referred to the department of radiotherapy> Edinburgh^
in 1956.
?the
?50%

most

likely

of patients

?the
difference
skewness
?measures

survival
survived

time

is less than one year

for at least four years

(mean-median)

of dispersion

is a crude measure

of

four

with

interquartile

range)

after
years.

breast

dissimilar

diagnosis
The

and
the

sample
reader

size
can
or

asymmetric
of breast

difference

cancer.
(mean

cancer

Cumulative

Frequency

62
45
38
28
25
10

frequency

62
107
145
173
198
208

14 222

7-8 11 233
8-9 9 242
8
250
258 8
266 8
275 9
280 5
282 2
285 3
289 4
296 7
297 1
300 3
47
347

1-2 years includes


interval
survival
times of one year and up to,
not including,
2 years. All patients
were
followed
up to the 20th
of first treatment;
300 patients
died before 31 December
anniversary
1976.
?The

but

(variance,

are

years
seven

6-7

informative
to use

and mode.
median,
The
distributions.

the

<1
1-2
2-3
3-4
4-5
5-6

a great

reveals

Some

of observations
by the number
value
above
which
50% of the
the median
of the distribution.
When

and

and median

exceeded

and
median,
a little
how

even

least

Survival

that ought to be referred to a statistician are

reviews

article

variance
mode,
work
detective

their

skewed. From the table notice that 174 (50%) of the 347 patients

discussed.

also

distribution.

be reported?is
moderate
always
the
distribution
is
underlying

should

Mean

subject
statistical

of

conclusions

hidden

the

appropriate?and

part

important
set

with

integrated

of

aspects

important

divided

true mean,
lie estimates

estimated

deduce

In his 1981 presidential address to the Royal Statistical Society,


Professor D R Cox said: "The setting out of conclusions in a
is vivid,
that
accurate,
way
simple,
is a very
matter
considerations

observations
the

observations
?which

summarise

dispersion,
skewness?by
reporting
descriptive
as mean,
median,
mode,
variance,
range,
per
too often
the summary
is presented
at the expense
or graphs.
tables
The
is left to infer from
reader

the shape
of the underlying
are given
for doing
this.
are mean,
of location
(or centre)
for
median
coincide
symmetric

the

estimates

to

advised

This content downloaded from 168.176.5.118 on Tue, 18 Feb 2014 10:08:00 AM


All use subject to JSTOR Terms and Conditions

BRITISH MEDICAL
is a crude

median)
positive?three
distribution

VOLUME

JOURNAL
measure

of

of

is exaggerated
survivors.

a small

by

skewness.

allows

years?and
survival
time

15 AUGUST

283
For

the

is positively

but

cancer

breast
to

reader

it is

infer

that

of

long-term

skewed?mean

important

487

1981

the

survival

proportion

survival was approximately five years for the 300


(2) Mean
patients {see table above) who died before 31 December 1976. Why
is it wrong to conclude that patients in Edinburgh with breast
cancer

a mean

have

?because
Survival

for 347patients

with

cancer-,

breast

summary

the

?five

0)

= Lower

11

quartile or

=
Upper quartile or
75th percentile

q,

most

survival
likely
there may
More
generally
is multimodal.
distribution
two

modes

Five

one

is less than
time, however,
be several maxima,
in which
A

typical

is when

(bimodality)
are
and women

of

example

anthropom?trie
Shoe
size?a

combined.

year.
case the

there

being
observations

20

better

estimator

a more

the

Scattergram

of 49

class

undergraduates

than 20
longer
these
circumstances

be

tion
o Male
?
?
?
?

?
?
?

?
?
?
?

?
?
? ?
? ?

o
o
o
o
o
o o o o
ooooo

Female

plotting
Another

10 11 12 13 14

or

range,
measures

variance,

and

is

which
thus

measure

the

which
as

such

percentiles,

as the

by

25%
10th
and

the

of

the

variance

is the

diagnosis
time

of

breast

was

the figure.
How
much

less
the

had

patients

diagnosed,
three years
of

survival

survivors.
the

interval
survival
years

and

information

within
half

90th.

is

skewed

From

interval

years?is
between

being

is a great
was

for

75%

of
it

is

table

we

18months
of

are

patients
shown
in

deal
less

table

is conveyed
by the
:
if you follow
the clues
than one
of
year,
25%

of breast

four

to positive
the

considerably
the median
because
treated

care, or severity
could
have
changed.

reliable
distribu?
It

is also

data

entails

article.
if there

1956

need
more
or

of disease,

is a time
not
than

have
two

treatment,

skewness

is the

fact

and the median?


quartile
than
the
longer
corresponding
and the lower quartile.
Long-term
were
of patients
still alive
25%

that

upper

for breast

cancer.

is a misleading
in figure A

represents

?neither

of

representation

the

given

in a histogram

frequency

tumour

that

mentions

figure

size

was

not

for 50 patients

recorded

quartiles
lie?or

the

quartiles

?area

cancer having
been
mean
Because
survival
is
years.
the median
survival
the distribution
time,
so there
are
skewed
and
long-term
18 months

pointer
between

is confirmed
after

that
The

in the

time

within

than
longer
is positively
Another

time

seven

died

a later

referred

patients

patients

lifetimes.

in

referred

because

47

survival

survival

is that

Patient

?no,
figure
information

the best

also the
reporting
of
the observations

and

11 years.

than

answer
alone
? The
figure
survival
the most
likely

patients
as
pattern

survival

of

cautious

being

the

incomplete
of

then

at referral

COMMENT
size was

Tumour
nearest

size was

and rounded
up to the
clinically
and
had no measurable
mass,
patients
for 50 out of 347 patients.
recorded
Neither

measured
Three

centimetre.

tumour
of

for

survival
later.

is

measure

best

estimate that 25% of the 347 patients had died within


cancer

and

observations?is

distribution

underlying
the median

to qualify
and
below

important
?above

root

the mean

when

dispersion
the
When

location.

square
units

same

in the

of

are

because

still
these

is a more

survival

the mean

there

for

diagnosis

Complete
reporting
and is the subject
life-tables

same

age

and

from

find.

years

an underestimate

(3) Do figures A and B give the same information about tumour


size ?

interquartile
location.1
An

to supplement
of
range. They
measure
of dispersion
should
be reported
appropriate
alongside
or mode,
size is needed
for inter?
mean,
median,
just as sample
or percentages.
Variance
is the
proportions
average
preting
the mean
from
deviation
and as such
it?or
standard
squared
deviation,
measured

by

years.
median

than

reason

in

the

location

skewed
to

easier

trend
o

is right-censored?
patients
survival
times
exceed
20 years.
these patients
20
with
crediting

size

are

of dispersion
are needed

Measures

survival

is

decades
Shoe

after

of

from

calculation

these
their

but

measure,
times

survival

measure

time

the

so obtaining

and

realistic

actual
In

for a

size

shoe

survival
from

(300 x 5 years +47 x 20 years)/347 =7


as
must

of

that
only
is derived

know

of foot

reliable

for at least
is, they survived
at
discontinued
the
20th

was

follow-up
on
information

survival

of mean
excluded

survivors?that

long-term
Since

anniversary
that is, we
years'

shown
below
for a class of 49 undergraduates.
length?is
are two modes
: shoe sizes 6 and 9 correspond
to the mode
for women
and for men
respectively.

is an underestimate
years
the 47 patients
because

years.

measure

crude

There

a more

is

COMMENT

diagnosis
were
the

for men

survival

20 years

25thpercentile

The

of

underestimate

q^

serious

diagnosis

median

case,

any

therefore

survivors

long-term

ignored

time from

years

of five

measure

I
1

is

survival

?in

time

calculation

years

mean

survival

not

nor

these
50 missing
observations.
B mentions
figure
tumour
A plots
size rounded
up to the nearest
centimetre,
Figure
so that the 29 patients
tumour
2 cm
for whom
size was
recorded
A

figure

had

tumours

Recorded

that measured
tumour

size

more

is therefore

on

1 cm

and

average
\
show
the same

up to 2 cm.
cm more
than

as figure
to
data
size. Figure
B purports
actual
tumour
area of the bar over
size 10-15
The
A, but is misleading.
cm represents
a frequency
as many
times
of 40 patients?five
as really
a tumour
bar over
15-20
that large. The
had
patients
cm

not height,
of 5. Area,
represents
by a factor
a
it is not
In
B, moreover,
figure
histogram.
frequency
to separate
the three patients
who
had no measurable
possible
mass
tumour
from
the 12 patients
measured
whose
up to 1 cm.
is also

in error
in

11

than

This content downloaded from 168.176.5.118 on Tue, 18 Feb 2014 10:08:00 AM


All use subject to JSTOR Terms and Conditions

488

BRITISH
disease

artery

men

Fifty

MEDICAL

for

as controls.
for elective
minor
and served
surgery
In a single
et al showed
vivid
Lowe
scattergram
comprehensively
the association
between
cell volume
and blood
packed
viscosity

>?50
c
<D
D

the asymptomatic
stenosis
of one

for
or
two

?30?
104
0

or

three

controls,
vessel
only,

for

the

and

for

10

15

Patients
x Patients

20

(rounded up to the nearest cm)

Unit

with

no

with

stenosis

'

frequency

0
5
Tumour size (cm)

of

two or

of one

three

vessel

only

vessels

10

15

> 6-5
-a
o
_o
c? 60

20

is the minimum information that should be given with a

axes,

scale,

are

?scattergrams

key to symbols
for

recommended

8 8X
x ?
?

8 ?

55

40

38
labelled

8?
o ?

? ?
o

Packed
Blood
viscosity
Conversion:

SI

45
cell

50

55

volume

and packed
cell volume.
to traditional
units?Blood

: 1mPas=^l

viscosity

cP.

and

exploring

data

reporting

excellent

as

an

75

o
u

measure?
should
distinct
represent
points
in particular,
bilateral
observations
(ocular
in right and left eye) should not be represented

ments;
tension

of

are

Scattergrams

or stenosis

stenosis

x7-0

?distinct

stenosis

stenosis

80

(Dn

(4) What
graph ?

with

patients

vessels.

coronary

major

no

with

patients

o Controls

lili

Tumour size

?title,

1981

aged 30 to 55 years.
selective
coronary
the other
25 had

blood

studied

arteriography
been
admitted

70

15 AUGUST

283

in 75 men
viscosity
before
they underwent
assessment
of chest
pain;

and

were

VOLUME

JOURNAL

unrelated

device

for

showing
et alx
Pocock
mortality
also used

ratio

for
the

and

exploring
association
the

showed
and water

of

negative
hardness

to good
effect
Heart
Study.
advise
authors

maps

British

Regional
Editors
often

to

data,
reporting
particularly
one
variable
with
another.
association
in 234

of

towns

illustrate

the

standardised

in this way
and
of the
findings

that graphs
should
be interpret?
to the text. This
is a good maxim.
Another
referring
on the same patient
observations
is that repeated
important
point
on the graph
as being
and
be identified
should
should
related,
as though
not be represented
they were measurements
certainly
are reported
on several
If several
similar
different
plots
patients.
able without

can
comparisons
together
are consistent
the symbols
of

choice

in the next

scale

is important.

be made
from

most
one

Transformation

if the

easily
graph

to

the

of data

scale
next.

and
The

is discussed

article.

I thank G D O Lowe
4.
under
question

et al for permission

to reproduce

the

scatter?

gram

References

COMMENT
should
have
Graphs
scales
Breaks
given.
attention
written
ing
graph
Lowe

drawn

a title,
scale

in

axes

should

should

be

be

and
labelled,
marked
clearly

the
and

or rogue
to outlying
observations.
has
Healy2
on informal
for detect?
methods
graphical
Different
that appear
in the
symbols
defined
in a key, or as a footnote
to the graph.

a good
article
observations.
rogue
should

be

et al3

studied

the

relation

between

the

extent

of

coronary

1Gore
of statistical methods:
critical
SM, Jones IG, Rytter EC. Misuse
1976. Br MedJ
assessment
of articles in BMJ from January to March
1977;i:85-7.
2
of medical
data. Br Med Bull 1968;24:210-4.
Healy MJR. The disciplining
3 Lowe
et al. Relation
Lorimer
between
Drummond
AR,
MM,
GDO,
Br Med J 1980;
extent of coronary artery disease and blood viscosity.
280:673-4.
4 Pocock
et al. British Regional
Heart
SJ, Shaper AG, Cook DG,
Study:
and the role of water
in cardiovascular
variations
mortality,
geographic
1980;280:1243-9.
quality. Br MedJ
Sheila M Gore,
Research
Medical
No

reprints will

at the MRC
Biostatistics
is a statistician
ma,
Hills Road, Cambridge
CB2
Council
Centre,
be available

from

This content downloaded from 168.176.5.118 on Tue, 18 Feb 2014 10:08:00 AM


All use subject to JSTOR Terms and Conditions

the author.

Unit,
2QH.

Вам также может понравиться