Вы находитесь на странице: 1из 57

Descriptive Statistics

Zarni Amri

Zarni amri

descriptive statistics

Introduction and descriptive statistics

Why studying statistics? to be able understand and to judge the scientific literature to be able to analyze own data Role of medical statistics in Medical research Planning Design Data collection Statistics plays a role Data processing at all stages Data analysis Data presentation Interpretation publication
Data analyses
Zarni amri descriptive statistics 2

Data analysis

We want to make general statements about a wider set of subjects than our study group How to draw conclusion about a population on the basis of sample? Statistical theory: random sampling are assumed Practice: almost never truly random

population

sample One or more variables are observed in a sample

Zarni amri

descriptive statistics

STATISTICS
DESCRIPTIVE STATISTICS (to describe)
Summarizing, presenting and exploring data in a sample (tables, graphs characteristics numbers as mean, median, SD, Range etc)
Zarni amri descriptive statistics 4

STATISTICS
INFERENTIAL STATISTICS (to infer)
Statement about unknown population parameters (Statistical modeling: parameter estimates, Confidence Interval and hypothesis testing)

Zarni amri

descriptive statistics

Topics

The role of statistics in medical research Statistics descriptive Data scale Central tendency Dispersion Normal (Gaussian) curve Data presentation
descriptive statistics 6

Zarni amri

The role of statistics in medical research

To calculate sample size To test validity and reliability of questionnaire Data presentations technique To analyze data and test the hypotheses inferential statistics

Zarni amri

descriptive statistics

Types of variables

Choice of methods is determined by type of variables


Dependent Independent

Categorical Numeric

Zarni amri

descriptive statistics

Data

Category / qualitative

Numeric / quantitative

Nominal
(Unordered categories) e.g. Gender smoker/ non smoker Blood type (O, A, B, AB) Marital status
Zarni amri

Ordinal
(Ordered categories) e.g. Education Disease stages I, II, III, IV Level of knowledge

Discrete e.g. Number of


person in household Number of white blood cells

Continuous e.g. height Age Blood pressure

descriptive statistics

Summarizing and presenting data of a single variables

Population / sample distribution A distribution; how often the possible value of the variable occur in the population or in the sample if the sample is random, then the sample distribution gives an impression of the unknown population e.g. sex, relative frequency in population (in practice unknown) 0.51 males and 0.49 females If in 100 sample size we find 55 males and 45 females relative frequency of sample distribution; 0.55 males and 0.45 females
Zarni amri descriptive statistics 10

How to describe the sample distribution of a variable?


Categorical

Table : frequency table, by categories Graph : bar, pie etc


Numerical

Table : frequency table Graph : histogram, stem and leaf, box plot etc
Zarni amri descriptive statistics 11

Summary measures To characterize distribution of numerical data


measures of location/central tendency;
mean, median, modus Spread / variability

SD Inter-quartile range

Percentiles Etc
Zarni amri descriptive statistics 12

The age (in year) of 30 nurses in a hospital in Jakarta.

data

29, 30, 26, 24, 35, 26, 43, 38, 38, 37, 35, 33, 41, 31, 32, 38, 27, 31, 27, 26, 28, 26, 32, 26, 52, 37, 27, 32, 37, 30,

Zarni amri

descriptive statistics

13

Central Tendency
Mean = arithmetic mean = mean of sample m = mean of population (the true mean) Median = midpoint Mode : the most prevalent value
Zarni amri descriptive statistics 14

x n

Dispersion
SD = standard deviation of sample = standard deviation of population SD2 = variance of sample 2 = variance of population SE = SD/Vn
2 2 x x (x x)2 n SD2 n 1 n 1
Zarni amri descriptive statistics 15

AGE_CAT

Descriptive Statistics

Results of SPSS
N age Valid N (listwise) 30 Minimum 24,00 Maximum 52,00 Mean 32,4667 Std. Deviation 6,30125

30

Valid < 30 30-39 >= 40 Total


Zarni amri

Frequen cy 13 14 3 30

Percent 43,3 46,7 10,0 100,0

Valid Cumulativ Percent e Percent 43,3 43,3 46,7 90,0 10,0 100,0 100,0
16

descriptive statistics

Statistic

Std. Error

age

Mean 95% Confidence Interval for Mean

32,4667 Lower Bound Upper Bound 30,1137 34,8196 31,9815 31,5000 39,706 6,30125 24,00 52,00 28,00 10,0000 1,100 1,628
descriptive statistics

1,15045

5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis
Zarni amri

,427 ,833
17

e.g. Blood cholesterol level

Value : 192, 197, 200, 202, 209


Mean = {(192+197+200+202+209)/5} = 1000/5=200 Median = 200 Range = 209 -192 = 17

Variance = [{(192-200)2 + (197-200) 2 + (200-200)2


+(202-200) 2+(209-200) 2}/4] = {(82 + 32 + 02 + 22 + 92) / 4 } = 64+9+0+4+81 = 158/4 =39.5

Standard Deviation = V39.5 = 6.3 SE = 6.3/V5 = 2.81


Zarni amri descriptive statistics 18

How precise are these estimate?

The precision of a sample estimate of a population parameter is characterized by its Standard error If the study were repeated infinitely many times, and each time the estimate is compute, we would get the distribution of the parameter estimate CI
descriptive statistics 19

Zarni amri

Are your data normally distributed?


How to know? What kind of location and spread will be used?


Normally distributed mean and SD Not Normally distributed median and percentile or min, max

Is there any relation with statistical test?


Yes parametric test No non parametric test

Zarni amri

descriptive statistics

20

SAMPLING DISTRIBUTION
Almost infinite samples Variation of mean and sd Follow sampling distribution

Mean of the population Sd of the population


Zarni amri descriptive statistics 21

NORMAL CURVE

Symmetry Mesocurtic Asymptote at +3 standard dev.


Zarni amri descriptive statistics 22

Normal distribution = Z distribution (1)


(AUC)

- 1.96 Z value
Zarni amri

- 1 -1

m 0 x-m

+ 1 +1

+ 1.96 + 1.96
23

- 1.96

Z =

Zarni amri

descriptive statistics

24

Normal distribution
COV = (SD/mean) x 100% < 20% 6.3/200 x 100%= 3.15% Normal?

Ratio Skewness Skewness/ SE of skewness


Ratio Kurtosis Kurtosi/SE of kurtosis

-2 to +2
-2 to +2

normal

Histogram
Box plot

Simetris
Simetris

Kolmogorov-Smirnov Shapiro-wilk
Zarni amri

> 0.05

descriptive statistics

25

age

Kolmogorov-Smirnov(a) Shapiro-Wilk Statistic df Sig. Statistic df Sig. ,130 30 ,200(*) ,910 30 ,015
Statistic Std. Error 32,4667 1,15045 Lower Bound Upper Bound 30,1137 34,8196 31,9815 31,5000 39,706 6,30125 24,00 52,00 28,00 10,0000 1,100 1,628

age

Mean 95% Confidence Interval for Mean

5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis
Zarni amri descriptive statistics

,427 ,833
26

age Stem-and-Leaf Plot Frequency Stem & Leaf 1,00 2 . 4 10,00 2 . 6666677789 8,00 3 . 00112223 8,00 3 . 55777888 2,00 4 . 13 1,00 Extremes (>=52) Stem width: 10,00 Each leaf: 1 case(s)

60

25

50

40

30

20
N = 30

age

Zarni amri

descriptive statistics

27

Data menopause
Umur

women;

> 40 yrs
Cu mu l ati ve Pe r ce nt 1 5.5 2 0.0 2 8.0 3 2.0 3 4.5 4 0.5 4 3.5 4 7.0 5 0.5 5 3.0 6 0.0 6 2.5 6 5.0 6 7.0 6 8.0 7 0.5 7 3.0 7 5.5 7 8.5 8 3.5 8 5.0 8 5.5 8 7.5 8 8.5 9 3.0 9 4.0 9 5.0 9 6.0 9 8.5 9 9.0 9 9.5 1 00 .0

Va li d

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 60 61 62 63 64 65 66 67 68 70 73 79 85 Zarni amri T o ta l

Freq u en cy 31 9 16 8 5 12 6 7 7 5 14 5 5 4 2 5 5 5 6 10 3 1 4 2 9 2 2 2 5 1 1 1 2 00

Pe r ce nt Va li d Pe r cen t 1 5.5 1 5.5 4 .5 4 .5 8 .0 8 .0 4 .0 4 .0 2 .5 2 .5 6 .0 6 .0 3 .0 3 .0 3 .5 3 .5 3 .5 3 .5 2 .5 2 .5 7 .0 7 .0 2 .5 2 .5 2 .5 2 .5 2 .0 2 .0 1 .0 1 .0 2 .5 2 .5 2 .5 2 .5 2 .5 2 .5 3 .0 3 .0 5 .0 5 .0 1 .5 1 .5 .5 .5 2 .0 2 .0 1 .0 1 .0 4 .5 4 .5 1 .0 1 .0 1 .0 1 .0 1 .0 1 .0 2 .5 2 .5 .5 .5 .5 .5 .5 .5 descriptive statistics 1 00 .0 1 00 .0

28

Descriptiv es Statistic 50.47 49.14 51.79 49.83 48.00 89.617 9.47 40 85 45 15.00 .836 .076 Std. Error .67

Umur

Mean 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis

Lower Bound Upper Bound

.172 .342

Normal; skewness and descriptive statistics stat/SE, +2 s/d -2 kurtosis =

29

histogram
60 50

40

30

20

10

Std. Dev = 9.47 Mean = 50.5 N = 200.00 40.0 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0 85.0

Umur
Zarni amri descriptive statistics 30

umur

40.00 4 . 0000000000000000000000000000000111111111 24.00 4 . 222222222222222233333333 17.00 4 . 44444555555555555 13.00 4 . 6666667777777 12.00 4 . 888888899999 19.00 5 . 0000000000000011111 9.00 5 . 222223333 7.00 5 . 4455555 10.00 5 . 6666677777 6.00 5 . 888888 13.00 6 . 0000000000111 5.00 6 . 23333 11.00 6 . 44555555555 Stem and leaf 4.00 6 . 6677 plot 2.00 6 . 88 5.00 7 . 00000 1.00 7. 3 .00 7. .00 7. 1.00 7. 9 1.00 Extremes (>=85)
Zarni amri

Stem width: 10 Each leaf: 1 case(s)

descriptive statistics

31

Box Plot percentile


90
81

80

100

70

60

75
50

50 25

40

30
N= 200

Umur
Zarni amri descriptive statistics 32

Tests of Normality Kolmogorov-Smirnov Statistic df Sig. .134 200 .000


a

Umur

a. Lilliefors Significance Correction

Not normally distributed Median and Min-Max


Zarni amri descriptive statistics 33

95 % Confidence Interval (1)


Example n=18 patients with stable angina pectoris X = total cholesterol Sample mean = 5.81 SD = 1.20 SE (X) = SD/vn = 1.20/v18 = 0.28 95 % CI = X + 1.96 x SE = 5.81 + 1.96 x 0.28

= 5.81 + 0.55

= 5.26 , 6.36

Zarni amri

descriptive statistics

34

95 % Confidence Interval (2)


a) A sample mean = 6 does this sample belongs to the population with m = 5.81? Yes, because lies within 95% CI of m = 6 (5.26,6.36) or because 95% CI of = 6 + 0.55 = 5.45, 6.55 includes m = 6 b) A sample mean = 7 does this sample belongs to the population with m = 6 ? No, because lies outside 95% CI of m = 7 (5.26 6.36) or because 95% CI of = 7 + 0.55 = 6.45,7.55 does not include m = 6
Zarni amri descriptive statistics 35

Data Presentation

Textular Tabular Graphical

Should be adjusted for: targeted audience Messages No duplication


descriptive statistics 36

Zarni amri

TEXTULAR

Limited substance Suitable for presenting qualitative description Use to complement other presentation methods Presenting data basis for the study Supporting statistical calculation Proper method for academic purposes
descriptive statistics 37

Zarni amri

GRAPHICAL

Visualization of tabular data Good to present progress development Proper method for public audiences

Zarni amri

descriptive statistics

38

CRITERIA FOR A GOOD TABLE


Simple Self-explanatory * Clear title * Note * Clear classification * Row & column total Citation source
Zarni amri descriptive statistics 39

TYPES OF TABLES

Master table (reference) Derived table (analysis) * Frequency/distribution * Cross table

Zarni amri

descriptive statistics

40

Frequency tables
grup tensi N n Ht Cumulative Percent 84.1 100.0

Valid

normotensi hipertensi Total

Frequency 196 37 233

Percent 84.1 15.9 100.0

Valid Percent 84.1 15.9 100.0

klasif tensi JNC 7 Cumulative Percent 47.6 84.1 96.6 100.0

Valid

normal prahipertensi hipertensi st 1 hipertensi st 2 Total

Frequency 111 85 29 8 233

Percent 47.6 36.5 12.4 3.4 100.0

Valid Percent 47.6 36.5 12.4 3.4 100.0

Zarni amri

descriptive statistics

41

Tabel frekuensi
Kategori umur Cumul ati ve Percent 70.5 98.5 100.0 Frequency 141 56 3 200 Percent 70.5 28.0 1.5 100.0 Vali d Percent 70.5 28.0 1.5 100.0

Vali d

40-55 tahun 56-70 tahun 71-85 tahun T otal

Age (yrs) 40-50 56-70 70-85 Total


Zarni amri

Frequency Percet 141 56 3 200


descriptive statistics

70.5 28.0 1.5 100


42

Cross-Table
pelak s anaan shift bergilir * klas if te ns i JNC 7 Cros s tabulation klasif tensi JNC 7 prahipertensi hipertensi st 1 28 13 32.9% 57 67.1% 85 100.0% 44.8% 16 55.2% 29 100.0% pelaksanaan shift bergilir tidak Count % w ithin klas if tens i JNC 7 Count % w ithin klas if tens i JNC 7 Count % w ithin klas if tens i JNC 7 normal 34 30.6% 77 69.4% 111 100.0% hipertensi st 2 4 50.0% 4 50.0% 8 100.0% Total 79 33.9% 154 66.1% 233 100.0%

ya

Total

k od ing in de x m as a tub uh * g r u p te ns i N n Ht Cr o ss tab ulatio n Count grup tens i N n Ht normotens i hipertensi 13 91 10 42 7 41 17 187 34 Total 13 101 49 58 221

koding index masa tubuh Total

< 18.5 18.5 - 24.99 25-27 >27

Zarni amri

descriptive statistics

43

Catatan: !!!

Pada studi epidemiologi, tabel silang ini dapat dibuat dan dapat diuji kemaknaan , apakah ada perbedaan antara hipertensi dan kerja gilir atau hipertensi dan status gizi, dengan hanya menggunakan sampel untuk survei/studi potong lintang
descriptive statistics 44

Zarni amri

Cross-table
Kategori umur * Apakah anda sudah menopause ? Crosstabulation Count Apakah anda sudah menopause ? Sudah Belum 44 97 55 1 3 102 98

Kategori umur Total

40-55 tahun 56-70 tahun 71-85 tahun

Total 141 56 3 200

Zarni amri

descriptive statistics

45

GRAPHICAL
Line graph * Arithmetic scale * Semi logarithmic scale Histogram / polygon frequency Scatter diagram Bar diagram Pictogram Pie diagram Spot map
Zarni amri descriptive statistics 46

Pie diagram
klasif tensi JNC 7
hipertensi st 2 hipertensi st 1

normal

prahipertensi

Zarni amri

descriptive statistics

47

td diastolik rata2 Stem-and-Leaf Plot

Frequency

Stem & Leaf

.00 5. 3.00 5 . 5& 20.00 6 . 0000000022 13.00 6 . 555557 64.00 7 . 0000000000000000000000000022222 32.00 7 . 555555555557777 51.00 8 . 0000000000000000000022222 15.00 8 . 5555577 22.00 9 . 0000000022& 5.00 9 . 57 8.00 Extremes (>=100)
Stem width: Each leaf: 10.0 2 case(s)
48

Box plot & histogram


140 60 50 120
2 49 105

40 100
35 30 206 197 229

30

80

20

10 60 0 55.0 40
N= 233

Std. Dev = 10.81 Mean = 76.7 N = 233.00 65.0 60.0 70.0 75.0 80.0 85.0 90.0 95.0 105.0 115.0 120.0 100.0 110.0

td diastolik rata2

td diastolik rata2

Zarni amri

descriptive statistics

49

LINE GRAPH
100 90 80 70 60 50 40 30 20 10 0 Q-1
Zarni amri

Rural Urban

Q-2

Q-3
descriptive statistics

Q-4
50

BAR DIAGRAM
90 80 70 60 50 40 30 20 10 0 Q-1
Zarni amri

Rural Urban

Q-2

Q-3
descriptive statistics

Q-4
51

INVERTED BAR DIAGRAM


Q-4

Q-3 Urban Rural

Q-2

Q-1 0
Zarni amri

20

40

60
descriptive statistics

80

100
52

80

60

40

20 Std. Dev = 392987.4 Mean = 793250.0 0 250000.0 750000.0 1250000.0 1750000.0 2250000.0 2500000.0 500000.0 1000000.0 1500000.0 2000000.0 N = 200.00

Pendapatan
Zarni amri descriptive statistics 53

Pendapatan Stem-and-Leaf Plot Frequency Stem & Leaf

7.00 3 . 0000055 28.00 4 . 0000000000000000000000000055 37.00 5 . 0000000000000000000000000000000000555 6.00 6 . 000055 35.00 7 . 00000000000000000000000555555555555 30.00 8 . 000000000000000000000000000000 4.00 9 . 0000 16.00 10 . 0000000000000000 2.00 11 . 00 10.00 12 . 0000000000 9.00 13 . 000000000 .00 14 . 8.00 15 . 00000000 .00 16 . 1.00 17 . 0 7.00 Extremes (>=2000000) Stem width: 100000 Zarni amri Each leaf: 1 case(s)
descriptive statistics 54

Box plot
3000000

2000000

2 135 38 39 161 21

1000000

0
N= 200

Pendapatan

Zarni amri

descriptive statistics

55

Zarni amri

descriptive statistics

56

Zarni amri

descriptive statistics

57