Академический Документы
Профессиональный Документы
Культура Документы
Biostatistics
LECTURE 1
DR. KENESH DZHUSUPOV
D E PA R T M E N T O F P U B L I C H E A LT H , I N T E R N AT I O N A L S C H O O L O F M E D I C I N E
Lecture outline
• Why we study Statistics?
• What is Statistics
• Some history of Statistics
• Types of Statistics. Statistics elements. Types of data.
• Probability
• Distribution of data
• Presentation of data
• Variability of data
Why we study Statistics?
Dose
Why we study Statistics?
B
Dose
The term «Statistics»
2
Why Medical Doctors need to know
Statistics?
The probabilistic nature of medicine
◦ No identical individuals
◦ No exact solutions
◦ There are no clear concepts of the norm
The doctor must have a scientific, logical and critical approach to solving
medical problems
Correctly evaluate available information
When deciding, be aware of a possible risk.
Identify decisions and conclusions that do not have sufficient scientific
and logical awareness
3
Why Medical Doctors need to know
Statistics?
Interpretation of variation - attempts to generalize certain
characteristics in a group of patients or populations, to determine
“normal”, “ideal” parameters
Diagnosis is a process of probabilistic assessment of a set of
various symptoms and biochemical parameters
Prediction of outcome for patient or population
Choosing the appropriate exposure for the patient or population
Planning and conducting medical research
4
Some History
• First works - 23rd century BC, China
• Ancient Rome – census, qualifications of citizens and their
property
• Petty W. (1623-1687) is the founder of the statistical science
"Political Arithmetic"
• Conring G. (1606-1681) developed a system for describing the
state structure
• Aachenval G. (1719-1772) - lecture course called "statistics"
• Schlitser A. (1736-1809) - subject of statistics is a society, but not
government
Types of Statistics
• Biostatistics
• Medical statistics
• Public health statistics
• Health Statistics
• Evidence Based Medicine Statistics
• And so on
Types of Statistics
• Descriptive Statistics
• Describes data
• Inferential statistics
• Based on the available sample data, and using inductive logic,
summarizes and gives an opinion on what is outside the sample
Terms used in Statistics
5
Types of Variables
ØQualitative ØScales of measurement
ØQuantitative üNominal scale data
üDiscrete üDichotomous
üContinious üPolichotomous
üOrdinal scale data
üFor different types of variables
and scales we use various üInterval scale data
techniques of statistical analysis üRatio scale data
17
Measurement of Data
18
Frequency Distribution
19
Absolute Frequency
Years Number of cigarettes Cholesterol (mg/100 Number of men
ml)
1900 54
1910 151 80-119 13
20
Relative frequency
Cholesterol (mg/100 Age of 25-34 Age of 55-64
ml) Number Frequency (%) Number Frequency (%)
80-119 12 1.2 5 0.4
120-159 150 14.1 48 3.9
160-199 442 41.4 265 21.6
200-239 229 28.0 458 37.3
240-279 115 10.8 281 22.9
280-319 34 3.2 128 10.4
320-359 9 0.8 35 2.9
360-399 5 0.5 7 0.6
Всего 1 067 100 1 227 100
21
Cumulative Frequency
Cholesterol Age of 25-34 Age of 55-64
(mg/100 ml)
Frequency (%) Cumulative Frequency Cumulative
frequency (%) frequency
80-119 1.2 1.2 0.4 0.4
120-159 14.1 15.3 3.9 4.3
160-199 41.4 56.7 21.6 25.9
200-239 28.0 84.7 37.3 63.2
240-279 10.8 95.5 22.9 86.1
280-319 3.2 98.7 10.4 96.5
320-359 0.8 99.5 2.9 99.4
360-399 0.5 100 0.6 100
22
Graphical presentation of data
Bar Chart
Polygon
Cumulative Frequency Polygon (Ogive)
Cumulative Frequency Polygon &
Centiles
Normal (Gaussian)Distribution
μ = Me = Mo
28
Distribution Types
29
Probability
32
Probability
In case of absence of one of conditions we use empiric definition:
Р(А)=А/N, where
А is a number of tries when this event is happening,
N is the number of all tries
Р(А) is the probability of the event А
Р(АС) is the probability of the opposite event
Р=0 (impossible event), Р=1 (certain event)
33
Probability
Mutually exclusive events
Р(А and В) = 0
Р(А or В) = Р(А) + Р(В)
А В Р(А or В) = 0.4 + 0.3 = 0.7
P(A)+P(AC) = 1
Random events
А В
34
Interdependence and incompatibility
A B
Compatible and independent
АС
C Р(А) = 10/20 = 0.5
Р(В) = 10/20 = 0.5
Р(С) = 10/20 = 0.5
Р(D) = 10/20 = 0.5
D Р(А and С) = Р(А) х Р(С)
Р(А and С) = 0.5х0.5 = 0.25
C D
35
Dependence
What is the probability of random selection of a white star on the
first attempt? Р(А1)=5/10 = 0.5
The probability of random selection of a white star in the second
attempt depends on what color star was selected in the first attempt
Р(А2) = 5/9 or 4/9
36
Probability Table
А АС
Р(ВС)
ВС Р(А and ВС) Р(АС and ВС)
Р(А) Р(АС) 1
37
Case
10% of the population is sick, in 5% of cases the doctor does not
determine the disease (false negative), and in 10% of cases the
healthy person is taken for the patient (false positive).
Questions:
◦ What is the likelihood that a randomly selected person will be recognized as
sick?
◦ What is the likelihood that a randomly selected person will be sick or
recognized as sick?
◦ What is the probability of making the correct diagnosis?
38
Solution
А АС
В 0.81 0.005 (5%) 0.815
ВС 0.09 (10%) 0.095 0.185
0.9 0.1 1
А – Healthy
В – Recognized as a healthy
39
Dependence
What is the probability of random selection of a white star on the
first attempt? Р(А1)=5/10 = 0.5
The probability of random selection of a white star in the second
attempt depends on what color star was selected in the first attempt
Р(А2) = 5/9 or 4/9
40
Conditional Probability
Conditional probability is the probability of event A provided that
event B has already taken place
The dice is thrown once. It is known that the number dropped out is
greater than 3 (event B). What is the probability that the drawn
number is even (event A)?
The probability of the event В = Р(В) 3/6
If the event В has occurred (4, 5, 6), then only two of them are
satisfy the condition (4, 6), i.e. P(A|B) = 2/3
The probability P(A∩B) = 2/6
P(A|B) = 2/3= 2/6:3/6 = P(A∩B) / P(B) Then P(A∩B) = P(A|B)хP(B)
41
Examples
Age
Sex Young (B1) Elderly (B2) Total
Men (A1) 30 20 50
Wom (A2) 40 10 50
Total 70 30 100
P(A1)=P(men)= 50/100=0.5 , P(A2)=P(wom)= 50/100=0.5
P(B1)= P(young)=70/100=0.7, P(B2)=P(old)=30/100=0.3
P(A2∩B2)=P(wom and old)=10/100=0.1
P(A1UB1)=P(men or young)=P(A1)+P(B1)-P(A1∩B1)=50/100+70/100-30/100=0.9
42
Examples
Age
Sex Young (B1) Old (B2) Total
Men (A1) 30 20 50
Women (A2) 40 10 50
Total 70 30 100
P(B2|A2)=P(old|women)=P(B2 ∩ A2)/P(A2)=10/100 / 50/100=0.2
P(B2|A1)= P(old|men)=P(B2 ∩ A1)/P(A1)=20/100 / 50/100=0.4
P(A1UA2)=P(A1)+P(A2)=50/100+50/100=1 P(B1UB2)=P(B1)+P(B2)=70/100+30/100=1
P(A2|B2)=P(A2 ∩ B2)/P(B2)=10/100 / 30/100=0.33
43
Case
The probability of survival for each of the two patients for 6 months
after surgery for cancer is 0.2
1. What is the likelihood that at least one of the operated patients
will survive in 6 months?
2. What is the likelihood that 2 patients will survive after 6 months?
3. What is the likelihood that after 6 months none of the operated
patients will survive
44
Full Probability
A group of events is called complete if at least one of them always happens and
they are incompatible in pairs (whom I meet first - a man or a woman)
45
Characteristics of Distributions
•Measures of central tendency
• Mean
• Median
• Mode
•Measures of variability
• Range
• Variance
• Standard deviation
• Coefficient of variation Cv
• Interquartile range
• Percentiles
46
Mean
The most commonly used generalizing characteristic of the
distribution
It is sensitive (responds to any changes in the range), the best with
normal (close to normal) distribution
Disadvantages:
- very sensitive to extreme options,
- poorly characterizes asymmetric distributions,
- often unsuitable for ranked indicators
47
Mean
Calculation:
(å X ) (å X )
X= µ=
n N
Example:
10 patients, who were tested for HIV+, reported the following
number of sexual contacts in 6 months:
2 4 4 6 7 8 10 12 15 93
X = å X / n = 16.1
48
Median and Mode
49
Measures of Variability
Measures of Variability
• Range
• Deviation from the mean, х
• Variance, σ2: σ2=Σ(Х-μ)2/N
• Standard deviation, σ (S): σ(S)=✓σ2
Standard Deviation
The most important part of parametric statistical tests
Patterns of sigmal distribution are used to
◦ setting boundaries
◦ representativeness error estimates
◦ defining pop-up values
53
Measures of Variability
Coefficient of variation is relative (dimensionless) value that is widely
used when comparing heterogeneous laboratory tests: the lower the
Cv, the more significant change is detected by the test
Cv = SD/M (%)
Interquartile range
◦ IQR = Q3 - Q1= 75th percentile – 25th percentile
Median is 50th percentile
54
Proportions of Data in Normal Distribution
Percentiles
Percentiles Deviation from the
mean
2,5 µ-2σ
16 µ-1σ
50 µ
84 µ+σ
97,5 µ+2σ
Z-values
Z=(X-µ)/σ
Z score Table
z Area beyond z z Area beyond z z Area beyond z z Area beyond z z Area beyond z
0.00 .5000 0.70 .2420 1.30 .0968 2.00 .0228 2.70 .0035
0.05 .4801 0.65 .2578 1.35 .0885 2.05 .0202 2.75 .0030
0.10 .4602 0.70 .2420 1.40 .0808 2.10 .0179 2.80 .0026
0.15 .4404 0.75 .2266 1.45 .0735 2.15 .0158 2.85 .0022
0.20 .4207 0.80 .2119 1.50 .0668 2.20 .0139 2.90 .0019
0.25 .4013 0.85 .1977 1.55 .0606 2.25 .0112 2.95 .0016
0.30 .3821 0.90 .1841 1.60 .0548 2.30 .0107 3.00 .0013
0.35 .3632 0.95 .1711 1.65 .0495 2.35 .0094 3.05 .0011
0.40 .3446 1.00 .1587 1.70 .0446 2.40 .0082 3.10 .0010
0.45 .3264 1.05 .1469 1.75 .0401 2.45 .0071 3.15 .0008
0.50 .3085 1.10 .1357 1.80 .0359 2.50 .0062 3.20 .0007
0.55 .2912 1.15 .1251 1.85 .0322 2.55 .0054 3.30 .0005
0.60 .2743 1.20 .1151 1,90 .0287 2.60 .0047 3.40 .0003
0.65 .2578 1.25 .1056 1.95 .0256 2.65 .0040 3.49 .0002
59
Z-transformations
Practical distribution can always be brought to a standard form
(z-transformation)
z = X-µ / s
If the average body weight of newborns is 3000 g, SD = 1000 g,
then what is the probability of giving birth to children weighing
from 2000 to 4000 g, more than 5000 g?
z = 2000-3000/1000 z ≥ -1 z = 4000-3000/1000 z ≤1
I.е. 68.3% of children will have the body mass between 2000
and 4000 g.
z= 5000-3000/1000 = 2, z≥2. P(X≥5000) = 0.022
60