# Descriptive Statistics

## Introduction and descriptive statistics

Why studying statistics? to be able understand and to judge the scientific literature to be able to analyze own data Role of medical statistics in Medical research Planning Design Data collection Statistics plays a role Data processing at all stages Data analysis Data presentation Interpretation publication
Data analyses
Data analysis

We want to make general statements about a wider set of subjects than our study group How to draw conclusion about a population on the basis of sample? Statistical theory: random sampling are assumed Practice: almost never truly random

population

## sample One or more variables are observed in a sample

STATISTICS
DESCRIPTIVE STATISTICS (to describe)
Summarizing, presenting and exploring data in a sample (tables, graphs characteristics numbers as mean, median, SD, Range etc)
STATISTICS
INFERENTIAL STATISTICS (to infer)
Statement about unknown population parameters (Statistical modeling: parameter estimates, Confidence Interval and hypothesis testing)

Topics

The role of statistics in medical research Statistics descriptive Data scale Central tendency Dispersion Normal (Gaussian) curve Data presentation
## The role of statistics in medical research

To calculate sample size To test validity and reliability of questionnaire Data presentations technique To analyze data and test the hypotheses inferential statistics

Types of variables

## Choice of methods is determined by type of variables

Dependent Independent

Categorical Numeric

Data

Category / qualitative

Numeric / quantitative

Nominal
(Unordered categories) e.g. Gender smoker/ non smoker Blood type (O, A, B, AB) Marital status
Ordinal
(Ordered categories) e.g. Education Disease stages I, II, III, IV Level of knowledge

## Discrete e.g. Number of

person in household Number of white blood cells

## Continuous e.g. height Age Blood pressure

## Summarizing and presenting data of a single variables

Population / sample distribution A distribution; how often the possible value of the variable occur in the population or in the sample if the sample is random, then the sample distribution gives an impression of the unknown population e.g. sex, relative frequency in population (in practice unknown) 0.51 males and 0.49 females If in 100 sample size we find 55 males and 45 females relative frequency of sample distribution; 0.55 males and 0.45 females
Categorical

## Table : frequency table, by categories Graph : bar, pie etc

Numerical

Table : frequency table Graph : histogram, stem and leaf, box plot etc
## Summary measures To characterize distribution of numerical data

measures of location/central tendency;
mean, median, modus Spread / variability

SD Inter-quartile range

Percentiles Etc
## The age (in year) of 30 nurses in a hospital in Jakarta.

data

29, 30, 26, 24, 35, 26, 43, 38, 38, 37, 35, 33, 41, 31, 32, 38, 27, 31, 27, 26, 28, 26, 32, 26, 52, 37, 27, 32, 37, 30,

Central Tendency
Mean = arithmetic mean = mean of sample m = mean of population (the true mean) Median = midpoint Mode : the most prevalent value
x n

Dispersion
SD = standard deviation of sample = standard deviation of population SD2 = variance of sample 2 = variance of population SE = SD/Vn
2 2 x x (x x)2 n SD2 n 1 n 1
AGE_CAT

Descriptive Statistics

Results of SPSS
N age Valid N (listwise) 30 Minimum 24,00 Maximum 52,00 Mean 32,4667 Std. Deviation 6,30125

30

## Valid < 30 30-39 >= 40 Total

Frequen cy 13 14 3 30

## Percent 43,3 46,7 10,0 100,0

Valid Cumulativ Percent e Percent 43,3 43,3 46,7 90,0 10,0 100,0 100,0
Statistic

Std. Error

age

## Mean 95% Confidence Interval for Mean

32,4667 Lower Bound Upper Bound 30,1137 34,8196 31,9815 31,5000 39,706 6,30125 24,00 52,00 28,00 10,0000 1,100 1,628
1,15045

5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis
,427 ,833
## Value : 192, 197, 200, 202, 209

Mean = {(192+197+200+202+209)/5} = 1000/5=200 Median = 200 Range = 209 -192 = 17

## Variance = [{(192-200)2 + (197-200) 2 + (200-200)2

+(202-200) 2+(209-200) 2}/4] = {(82 + 32 + 02 + 22 + 92) / 4 } = 64+9+0+4+81 = 158/4 =39.5

## Standard Deviation = V39.5 = 6.3 SE = 6.3/V5 = 2.81

## How precise are these estimate?

The precision of a sample estimate of a population parameter is characterized by its Standard error If the study were repeated infinitely many times, and each time the estimate is compute, we would get the distribution of the parameter estimate CI
## How to know? What kind of location and spread will be used?

Normally distributed mean and SD Not Normally distributed median and percentile or min, max

## Is there any relation with statistical test?

Yes parametric test No non parametric test

SAMPLING DISTRIBUTION
Almost infinite samples Variation of mean and sd Follow sampling distribution

## Mean of the population Sd of the population

NORMAL CURVE

## Symmetry Mesocurtic Asymptote at +3 standard dev.

## Normal distribution = Z distribution (1)

(AUC)

- 1.96 Z value
- 1 -1

m 0 x-m

+ 1 +1

+ 1.96 + 1.96
- 1.96

Z =

Normal distribution
COV = (SD/mean) x 100% < 20% 6.3/200 x 100%= 3.15% Normal?

## Ratio Skewness Skewness/ SE of skewness

Ratio Kurtosis Kurtosi/SE of kurtosis

-2 to +2
-2 to +2

normal

Histogram
Box plot

Simetris
Simetris

Kolmogorov-Smirnov Shapiro-wilk
> 0.05

age

Kolmogorov-Smirnov(a) Shapiro-Wilk Statistic df Sig. Statistic df Sig. ,130 30 ,200(*) ,910 30 ,015
Statistic Std. Error 32,4667 1,15045 Lower Bound Upper Bound 30,1137 34,8196 31,9815 31,5000 39,706 6,30125 24,00 52,00 28,00 10,0000 1,100 1,628

age

## Mean 95% Confidence Interval for Mean

5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis
,427 ,833
26

age Stem-and-Leaf Plot Frequency Stem & Leaf 1,00 2 . 4 10,00 2 . 6666677789 8,00 3 . 00112223 8,00 3 . 55777888 2,00 4 . 13 1,00 Extremes (>=52) Stem width: 10,00 Each leaf: 1 case(s)

60

25

50

40

30

20
N = 30

age

Data menopause
Umur

women;

> 40 yrs
Cu mu l ati ve Pe r ce nt 1 5.5 2 0.0 2 8.0 3 2.0 3 4.5 4 0.5 4 3.5 4 7.0 5 0.5 5 3.0 6 0.0 6 2.5 6 5.0 6 7.0 6 8.0 7 0.5 7 3.0 7 5.5 7 8.5 8 3.5 8 5.0 8 5.5 8 7.5 8 8.5 9 3.0 9 4.0 9 5.0 9 6.0 9 8.5 9 9.0 9 9.5 1 00 .0

Va li d

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 60 61 62 63 64 65 66 67 68 70 73 79 85 Zarni amri T o ta l

Freq u en cy 31 9 16 8 5 12 6 7 7 5 14 5 5 4 2 5 5 5 6 10 3 1 4 2 9 2 2 2 5 1 1 1 2 00

## Pe r ce nt Va li d Pe r cen t 1 5.5 1 5.5 4 .5 4 .5 8 .0 8 .0 4 .0 4 .0 2 .5 2 .5 6 .0 6 .0 3 .0 3 .0 3 .5 3 .5 3 .5 3 .5 2 .5 2 .5 7 .0 7 .0 2 .5 2 .5 2 .5 2 .5 2 .0 2 .0 1 .0 1 .0 2 .5 2 .5 2 .5 2 .5 2 .5 2 .5 3 .0 3 .0 5 .0 5 .0 1 .5 1 .5 .5 .5 2 .0 2 .0 1 .0 1 .0 4 .5 4 .5 1 .0 1 .0 1 .0 1 .0 1 .0 1 .0 2 .5 2 .5 .5 .5 .5 .5 .5 .5 descriptive statistics 1 00 .0 1 00 .0

Descriptiv es Statistic 50.47 49.14 51.79 49.83 48.00 89.617 9.47 40 85 45 15.00 .836 .076 Std. Error .67

Umur

Mean 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis

.172 .342

## Normal; skewness and descriptive statistics stat/SE, +2 s/d -2 kurtosis =

histogram
60 50

40

30

20

10

Std. Dev = 9.47 Mean = 50.5 N = 200.00 40.0 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0 85.0

Umur
umur

40.00 4 . 0000000000000000000000000000000111111111 24.00 4 . 222222222222222233333333 17.00 4 . 44444555555555555 13.00 4 . 6666667777777 12.00 4 . 888888899999 19.00 5 . 0000000000000011111 9.00 5 . 222223333 7.00 5 . 4455555 10.00 5 . 6666677777 6.00 5 . 888888 13.00 6 . 0000000000111 5.00 6 . 23333 11.00 6 . 44555555555 Stem and leaf 4.00 6 . 6677 plot 2.00 6 . 88 5.00 7 . 00000 1.00 7. 3 .00 7. .00 7. 1.00 7. 9 1.00 Extremes (>=85)
## Stem width: 10 Each leaf: 1 case(s)

## Box Plot percentile

90
81

80

100

70

60

75
50

50 25

40

30
N= 200

Umur
a

Umur

## Not normally distributed Median and Min-Max

## 95 % Confidence Interval (1)

Example n=18 patients with stable angina pectoris X = total cholesterol Sample mean = 5.81 SD = 1.20 SE (X) = SD/vn = 1.20/v18 = 0.28 95 % CI = X + 1.96 x SE = 5.81 + 1.96 x 0.28

= 5.81 + 0.55

= 5.26 , 6.36

## 95 % Confidence Interval (2)

a) A sample mean = 6 does this sample belongs to the population with m = 5.81? Yes, because lies within 95% CI of m = 6 (5.26,6.36) or because 95% CI of = 6 + 0.55 = 5.45, 6.55 includes m = 6 b) A sample mean = 7 does this sample belongs to the population with m = 6 ? No, because lies outside 95% CI of m = 7 (5.26 6.36) or because 95% CI of = 7 + 0.55 = 6.45,7.55 does not include m = 6
Data Presentation

## Should be adjusted for: targeted audience Messages No duplication

TEXTULAR

Limited substance Suitable for presenting qualitative description Use to complement other presentation methods Presenting data basis for the study Supporting statistical calculation Proper method for academic purposes
GRAPHICAL

Visualization of tabular data Good to present progress development Proper method for public audiences

## CRITERIA FOR A GOOD TABLE

Simple Self-explanatory * Clear title * Note * Clear classification * Row & column total Citation source
TYPES OF TABLES

## Master table (reference) Derived table (analysis) * Frequency/distribution * Cross table

Frequency tables
grup tensi N n Ht Cumulative Percent 84.1 100.0

Valid

Valid

## Valid Percent 47.6 36.5 12.4 3.4 100.0

Zarni amri

Tabel frekuensi
Kategori umur Cumul ati ve Percent 70.5 98.5 100.0 Frequency 141 56 3 200 Percent 70.5 28.0 1.5 100.0 Vali d Percent 70.5 28.0 1.5 100.0

Vali d

## Frequency Percet 141 56 3 200

## 70.5 28.0 1.5 100

Cross-Table
pelak s anaan shift bergilir * klas if te ns i JNC 7 Cros s tabulation klasif tensi JNC 7 prahipertensi hipertensi st 1 28 13 32.9% 57 67.1% 85 100.0% 44.8% 16 55.2% 29 100.0% pelaksanaan shift bergilir tidak Count % w ithin klas if tens i JNC 7 Count % w ithin klas if tens i JNC 7 Count % w ithin klas if tens i JNC 7 normal 34 30.6% 77 69.4% 111 100.0% hipertensi st 2 4 50.0% 4 50.0% 8 100.0% Total 79 33.9% 154 66.1% 233 100.0%

ya

Total

k od ing in de x m as a tub uh * g r u p te ns i N n Ht Cr o ss tab ulatio n Count grup tens i N n Ht normotens i hipertensi 13 91 10 42 7 41 17 187 34 Total 13 101 49 58 221

## < 18.5 18.5 - 24.99 25-27 >27

Catatan: !!!

Pada studi epidemiologi, tabel silang ini dapat dibuat dan dapat diuji kemaknaan , apakah ada perbedaan antara hipertensi dan kerja gilir atau hipertensi dan status gizi, dengan hanya menggunakan sampel untuk survei/studi potong lintang
Cross-table
Kategori umur * Apakah anda sudah menopause ? Crosstabulation Count Apakah anda sudah menopause ? Sudah Belum 44 97 55 1 3 102 98

## Total 141 56 3 200

GRAPHICAL
Line graph * Arithmetic scale * Semi logarithmic scale Histogram / polygon frequency Scatter diagram Bar diagram Pictogram Pie diagram Spot map
Pie diagram
klasif tensi JNC 7
hipertensi st 2 hipertensi st 1

normal

prahipertensi

Frequency

## Stem & Leaf

.00 5. 3.00 5 . 5& 20.00 6 . 0000000022 13.00 6 . 555557 64.00 7 . 0000000000000000000000000022222 32.00 7 . 555555555557777 51.00 8 . 0000000000000000000022222 15.00 8 . 5555577 22.00 9 . 0000000022& 5.00 9 . 57 8.00 Extremes (>=100)
Stem width: Each leaf: 10.0 2 case(s)
## Box plot & histogram

140 60 50 120
2 49 105

40 100
35 30 206 197 229

30

80

20

10 60 0 55.0 40
N= 233

Std. Dev = 10.81 Mean = 76.7 N = 233.00 65.0 60.0 70.0 75.0 80.0 85.0 90.0 95.0 105.0 115.0 120.0 100.0 110.0

td diastolik rata2

td diastolik rata2

LINE GRAPH
100 90 80 70 60 50 40 30 20 10 0 Q-1
Rural Urban

Q-2

Q-3
Q-4
BAR DIAGRAM
90 80 70 60 50 40 30 20 10 0 Q-1
Rural Urban

Q-2

Q-3
Q-4
Q-4

## Q-3 Urban Rural

Q-2

Q-1 0
20

40

60
80

100
80

60

40

20 Std. Dev = 392987.4 Mean = 793250.0 0 250000.0 750000.0 1250000.0 1750000.0 2250000.0 2500000.0 500000.0 1000000.0 1500000.0 2000000.0 N = 200.00

Pendapatan
## Pendapatan Stem-and-Leaf Plot Frequency Stem & Leaf

7.00 3 . 0000055 28.00 4 . 0000000000000000000000000055 37.00 5 . 0000000000000000000000000000000000555 6.00 6 . 000055 35.00 7 . 00000000000000000000000555555555555 30.00 8 . 000000000000000000000000000000 4.00 9 . 0000 16.00 10 . 0000000000000000 2.00 11 . 00 10.00 12 . 0000000000 9.00 13 . 000000000 .00 14 . 8.00 15 . 00000000 .00 16 . 1.00 17 . 0 7.00 Extremes (>=2000000) Stem width: 100000 Zarni amri Each leaf: 1 case(s)
Box plot
3000000

2000000

2 135 38 39 161 21

1000000

0
N= 200

Pendapatan

