Вы находитесь на странице: 1из 60

Introduction to

Biostatistics
LECTURE 1
DR. KENESH DZHUSUPOV
D E PA R T M E N T O F P U B L I C H E A LT H , I N T E R N AT I O N A L S C H O O L O F M E D I C I N E
Lecture outline
• Why we study Statistics?
• What is Statistics
• Some history of Statistics
• Types of Statistics. Statistics elements. Types of data.
• Probability
• Distribution of data
• Presentation of data
• Variability of data
Why we study Statistics?

• For qualitative assessment of real output of medical measures


(clinical and preventive), e.g. their effectiveness
• For assessment of efficiency of investments in medicine
Why we study Statistics?

Dose
Why we study Statistics?
B

Dose
The term «Statistics»

• Latin word «status» - the status, the state of a phenomenon (event)


in a society or a state

• An area of knowledge about mass phenomena in a society’s life


from their quantitative aspect with connection to their qualitative
certainty
The term of ”Statistics”

Activity on collection, accumulation, calculation and analysis of


digital data characterizing different aspects of societal life%
production, distribution, exchange of products, politics, culture,
health care, education and welfare of a population, etc.
Statistics is …
Statistics as a number
Statistics as method of data analysis
◦ Method of making assumption
◦ Methods of description of data (Descriptive statistics)
◦ Method of inference (Inferential statistics))
Hypothesis generator

2
Why Medical Doctors need to know
Statistics?
The probabilistic nature of medicine
◦ No identical individuals
◦ No exact solutions
◦ There are no clear concepts of the norm
The doctor must have a scientific, logical and critical approach to solving
medical problems
Correctly evaluate available information
When deciding, be aware of a possible risk.
Identify decisions and conclusions that do not have sufficient scientific
and logical awareness
3
Why Medical Doctors need to know
Statistics?
Interpretation of variation - attempts to generalize certain
characteristics in a group of patients or populations, to determine
“normal”, “ideal” parameters
Diagnosis is a process of probabilistic assessment of a set of
various symptoms and biochemical parameters
Prediction of outcome for patient or population
Choosing the appropriate exposure for the patient or population
Planning and conducting medical research

4
Some History
• First works - 23rd century BC, China
• Ancient Rome – census, qualifications of citizens and their
property
• Petty W. (1623-1687) is the founder of the statistical science
"Political Arithmetic"
• Conring G. (1606-1681) developed a system for describing the
state structure
• Aachenval G. (1719-1772) - lecture course called "statistics"
• Schlitser A. (1736-1809) - subject of statistics is a society, but not
government
Types of Statistics

• Biostatistics
• Medical statistics
• Public health statistics
• Health Statistics
• Evidence Based Medicine Statistics
• And so on
Types of Statistics

• Descriptive Statistics
• Describes data
• Inferential statistics
• Based on the available sample data, and using inductive logic,
summarizes and gives an opinion on what is outside the sample
Terms used in Statistics

• Population – the concept by which the researcher makes a


conclusion (Nx)
• Unit of a population or a sample – unit of the studied object,
phenomenon (Х)
• Sample – a part of the population to study
Types of Samples

• Simple random sample


• Stratified random sample
• Cluster sample
• Systematic random sample
Presentation of data – basic notations
Data characterizing a sample is a statistics
Data characterizing a population is a parameter
Measure Parameter Statistics
Mean µ Х
Ratio π p
Standard deviation s s
Variance s2 s2

5
Types of Variables
ØQualitative ØScales of measurement
ØQuantitative üNominal scale data
üDiscrete üDichotomous
üContinious üPolichotomous
üOrdinal scale data
üFor different types of variables
and scales we use various üInterval scale data
techniques of statistical analysis üRatio scale data

17
Measurement of Data

ØOften the same data can be measured on different scales


ØBody weight (age) as ratio scale data
ØBody weight (age) as ordinal scale
ØBody weight (age) as nominal scale
ØThis technique is acceptable, but should be used with caution.
ØIn connection with a certain voluntarism of deciding on the
boundaries of order, bias and distortion of results are possible

18
Frequency Distribution

For nominal and ordinal data, the frequency distribution is a set of


classes or categories and their corresponding numerical expression
For discrete or continuous data, this is a set of intervals and their
corresponding numerical expression

19
Absolute Frequency
Years Number of cigarettes Cholesterol (mg/100 Number of men
ml)
1900 54
1910 151 80-119 13

1920 665 120-159 150


1930 1 485 160-199 442
1940 1 976 200-239 299
1950 3 522 240-279 115
1960 4 171 280-319 34
1970 3 985 320-359 9
1980 3 851 360-399 5
1990 2 828 Total 1 067

20
Relative frequency
Cholesterol (mg/100 Age of 25-34 Age of 55-64
ml) Number Frequency (%) Number Frequency (%)
80-119 12 1.2 5 0.4
120-159 150 14.1 48 3.9
160-199 442 41.4 265 21.6
200-239 229 28.0 458 37.3
240-279 115 10.8 281 22.9
280-319 34 3.2 128 10.4
320-359 9 0.8 35 2.9
360-399 5 0.5 7 0.6
Всего 1 067 100 1 227 100

21
Cumulative Frequency
Cholesterol Age of 25-34 Age of 55-64
(mg/100 ml)
Frequency (%) Cumulative Frequency Cumulative
frequency (%) frequency
80-119 1.2 1.2 0.4 0.4
120-159 14.1 15.3 3.9 4.3
160-199 41.4 56.7 21.6 25.9
200-239 28.0 84.7 37.3 63.2
240-279 10.8 95.5 22.9 86.1
280-319 3.2 98.7 10.4 96.5
320-359 0.8 99.5 2.9 99.4
360-399 0.5 100 0.6 100

22
Graphical presentation of data
Bar Chart
Polygon
Cumulative Frequency Polygon (Ogive)
Cumulative Frequency Polygon &
Centiles
Normal (Gaussian)Distribution

μ = Me = Mo

Normal distribution - continuous distribution

28
Distribution Types

29
Probability

The probability of an event is a quantitative measure of the


proportion of all possible variants of the event р
Probability
Classical definition: with a finite number of equiprobable simple
events, the probability of any of them is equal to:
P (A) = A / N, where
A is the number of events of interest to us,
N is the number of all possible events
P (A) - probability of event A

Example: The probability of an eagle or tails falling when throwing a


symmetrical metal coin is 1/2.
The probability of a falling number when throwing a dice is 1/6

32
Probability
In case of absence of one of conditions we use empiric definition:
Р(А)=А/N, where
А is a number of tries when this event is happening,
N is the number of all tries
Р(А) is the probability of the event А
Р(АС) is the probability of the opposite event
Р=0 (impossible event), Р=1 (certain event)

33
Probability
Mutually exclusive events
Р(А and В) = 0
Р(А or В) = Р(А) + Р(В)
А В Р(А or В) = 0.4 + 0.3 = 0.7
P(A)+P(AC) = 1
Random events

Р(А or В) = Р(А) + Р(В) - Р(А и В)


Р(А or В) = 0.4 + 0.4 – 0.1 = 0.7

А В
34
Interdependence and incompatibility
A B
Compatible and independent
АС
C Р(А) = 10/20 = 0.5
Р(В) = 10/20 = 0.5
Р(С) = 10/20 = 0.5
Р(D) = 10/20 = 0.5
D Р(А and С) = Р(А) х Р(С)
Р(А and С) = 0.5х0.5 = 0.25

C D

Incompatible and independent


Р(С) = 5/10 = 0.5
Р(D) = 5/10 = 0.5
Р(D and С) = 0

35
Dependence
What is the probability of random selection of a white star on the
first attempt? Р(А1)=5/10 = 0.5
The probability of random selection of a white star in the second
attempt depends on what color star was selected in the first attempt
Р(А2) = 5/9 or 4/9

36
Probability Table

А АС

В Р(А and В) Р(В and АС) Р(В)

Р(ВС)
ВС Р(А and ВС) Р(АС and ВС)

Р(А) Р(АС) 1
37
Case
10% of the population is sick, in 5% of cases the doctor does not
determine the disease (false negative), and in 10% of cases the
healthy person is taken for the patient (false positive).
Questions:
◦ What is the likelihood that a randomly selected person will be recognized as
sick?
◦ What is the likelihood that a randomly selected person will be sick or
recognized as sick?
◦ What is the probability of making the correct diagnosis?

38
Solution
А АС
В 0.81 0.005 (5%) 0.815
ВС 0.09 (10%) 0.095 0.185
0.9 0.1 1
А – Healthy
В – Recognized as a healthy

39
Dependence
What is the probability of random selection of a white star on the
first attempt? Р(А1)=5/10 = 0.5
The probability of random selection of a white star in the second
attempt depends on what color star was selected in the first attempt
Р(А2) = 5/9 or 4/9

40
Conditional Probability
Conditional probability is the probability of event A provided that
event B has already taken place
The dice is thrown once. It is known that the number dropped out is
greater than 3 (event B). What is the probability that the drawn
number is even (event A)?
The probability of the event В = Р(В) 3/6
If the event В has occurred (4, 5, 6), then only two of them are
satisfy the condition (4, 6), i.e. P(A|B) = 2/3
The probability P(A∩B) = 2/6
P(A|B) = 2/3= 2/6:3/6 = P(A∩B) / P(B) Then P(A∩B) = P(A|B)хP(B)
41
Examples
Age
Sex Young (B1) Elderly (B2) Total
Men (A1) 30 20 50
Wom (A2) 40 10 50
Total 70 30 100
P(A1)=P(men)= 50/100=0.5 , P(A2)=P(wom)= 50/100=0.5
P(B1)= P(young)=70/100=0.7, P(B2)=P(old)=30/100=0.3
P(A2∩B2)=P(wom and old)=10/100=0.1
P(A1UB1)=P(men or young)=P(A1)+P(B1)-P(A1∩B1)=50/100+70/100-30/100=0.9
42
Examples
Age
Sex Young (B1) Old (B2) Total
Men (A1) 30 20 50
Women (A2) 40 10 50
Total 70 30 100
P(B2|A2)=P(old|women)=P(B2 ∩ A2)/P(A2)=10/100 / 50/100=0.2
P(B2|A1)= P(old|men)=P(B2 ∩ A1)/P(A1)=20/100 / 50/100=0.4
P(A1UA2)=P(A1)+P(A2)=50/100+50/100=1 P(B1UB2)=P(B1)+P(B2)=70/100+30/100=1
P(A2|B2)=P(A2 ∩ B2)/P(B2)=10/100 / 30/100=0.33
43
Case
The probability of survival for each of the two patients for 6 months
after surgery for cancer is 0.2
1. What is the likelihood that at least one of the operated patients
will survive in 6 months?
2. What is the likelihood that 2 patients will survive after 6 months?
3. What is the likelihood that after 6 months none of the operated
patients will survive

44
Full Probability
A group of events is called complete if at least one of them always happens and
they are incompatible in pairs (whom I meet first - a man or a woman)

Р(А) = Р(Н1)*Р(А|Н1)+Р(Н2)*Р(А|Н2)+…. Р(Нк)*Р(А|Нк)


51% men, 30% of them suffer from hypertension
49% women, 65% of them suffer from hypertension
What is the probability of randomly choosing hypertension Р(А)?

Р(А) = 0.51*0.3+0.49*0.65 = 0.47

45
Characteristics of Distributions
•Measures of central tendency
• Mean
• Median
• Mode
•Measures of variability
• Range
• Variance
• Standard deviation
• Coefficient of variation Cv
• Interquartile range
• Percentiles
46
Mean
The most commonly used generalizing characteristic of the
distribution
It is sensitive (responds to any changes in the range), the best with
normal (close to normal) distribution
Disadvantages:
- very sensitive to extreme options,
- poorly characterizes asymmetric distributions,
- often unsuitable for ranked indicators

47
Mean
Calculation:
(å X ) (å X )
X= µ=
n N
Example:
10 patients, who were tested for HIV+, reported the following
number of sexual contacts in 6 months:
2 4 4 6 7 8 10 12 15 93
X = å X / n = 16.1

48
Median and Mode

Median is the second most frequently used characteristic. It is


the middle element, half the option is equal to or less than it,
and the other is equal to or greater. With an even number of
elements Me is equal to the half-sum of the middle two
elements. Insensitive to small numbers of extreme scores in a
distribution - used for highly skewed distributions
Mode is the most observed value. Totally uninfluenced by small
numbers of extreme scores in a distribution - used with a small
and asymmetric distributed range. For the nominal scale data
only Mode is used while Mean and Median are never applied

49
Measures of Variability
Measures of Variability

• Range
• Deviation from the mean, х
• Variance, σ2: σ2=Σ(Х-μ)2/N
• Standard deviation, σ (S): σ(S)=✓σ2
Standard Deviation
The most important part of parametric statistical tests
Patterns of sigmal distribution are used to
◦ setting boundaries
◦ representativeness error estimates
◦ defining pop-up values

53
Measures of Variability
Coefficient of variation is relative (dimensionless) value that is widely
used when comparing heterogeneous laboratory tests: the lower the
Cv, the more significant change is detected by the test
Cv = SD/M (%)
Interquartile range
◦ IQR = Q3 - Q1= 75th percentile – 25th percentile
Median is 50th percentile

54
Proportions of Data in Normal Distribution
Percentiles
Percentiles Deviation from the
mean
2,5 µ-2σ
16 µ-1σ
50 µ
84 µ+σ
97,5 µ+2σ
Z-values

Z=(X-µ)/σ
Z score Table
z Area beyond z z Area beyond z z Area beyond z z Area beyond z z Area beyond z

0.00 .5000 0.70 .2420 1.30 .0968 2.00 .0228 2.70 .0035
0.05 .4801 0.65 .2578 1.35 .0885 2.05 .0202 2.75 .0030
0.10 .4602 0.70 .2420 1.40 .0808 2.10 .0179 2.80 .0026
0.15 .4404 0.75 .2266 1.45 .0735 2.15 .0158 2.85 .0022
0.20 .4207 0.80 .2119 1.50 .0668 2.20 .0139 2.90 .0019
0.25 .4013 0.85 .1977 1.55 .0606 2.25 .0112 2.95 .0016
0.30 .3821 0.90 .1841 1.60 .0548 2.30 .0107 3.00 .0013
0.35 .3632 0.95 .1711 1.65 .0495 2.35 .0094 3.05 .0011
0.40 .3446 1.00 .1587 1.70 .0446 2.40 .0082 3.10 .0010
0.45 .3264 1.05 .1469 1.75 .0401 2.45 .0071 3.15 .0008
0.50 .3085 1.10 .1357 1.80 .0359 2.50 .0062 3.20 .0007
0.55 .2912 1.15 .1251 1.85 .0322 2.55 .0054 3.30 .0005
0.60 .2743 1.20 .1151 1,90 .0287 2.60 .0047 3.40 .0003
0.65 .2578 1.25 .1056 1.95 .0256 2.65 .0040 3.49 .0002

59
Z-transformations
Practical distribution can always be brought to a standard form
(z-transformation)
z = X-µ / s
If the average body weight of newborns is 3000 g, SD = 1000 g,
then what is the probability of giving birth to children weighing
from 2000 to 4000 g, more than 5000 g?
z = 2000-3000/1000 z ≥ -1 z = 4000-3000/1000 z ≤1
I.е. 68.3% of children will have the body mass between 2000
and 4000 g.
z= 5000-3000/1000 = 2, z≥2. P(X≥5000) = 0.022
60

Вам также может понравиться