Вы находитесь на странице: 1из 27

Author: Dr.

Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Authors note : Succinct and lucid style of writing coupled with real-life examples, which health
professionals commonly encounter, have been illustrated to help learners understand and
appreciate the concepts behind risk, relative risk, attributable risk, attributable risk difference,
odds, odds ratios, sensitivity, specificity, true positivity, true negativity, likelihood ratios,
posterior and prior probability (and odds). The endeavor has been to negate the difficulties that
one commonly faces with measures of association and effect and to help ease the process of
recall ability. Epidemiology rests on understanding these foundational pillars of association
between an exposure and an outcome. The fundamental concepts such as, what epidemiology
aims to cover, differences between descriptive and analytical epidemiology, epidemiological
triad, causal factors, natural history, steps in epidemiology have been listed and the measures
have been explained in detail.
**********************************************************************

Concept one - Distribution (in terms of person, place and time) and determinants (in terms of
agent, host and environment) are the two fundamental aspects that an epidemiologist uses
frequently to arrive at a hypothesis in conducting an investigational study that deals with
diseases and epidemics.

Concept two - Examining, identifying, and reporting on the frequency and distribution of disease
in a population constitutes descriptive epidemiology. Analytic Epidemiology looks at testing a
hypothesis about the cause of disease by studying how exposures relate to the disease.

Concept three Agent, host and the environment together determine the susceptibility of a
person to develop a disease. The severity of an infection depends on the host (the sufferer).
Likewise, the probability of the disease depends on the immune constitution, personal traits,
behaviors and genetic predisposition of the human body (host). Agent (biological, physical and
chemical) has been defined as the necessary factor for disease to occur. Environment (external
conditions, physical or biologic or social) contributes to the disease process. Epidemics arise
when host, agent, and environmental factors are not in homeostasis (balance).
A new agent, a change in existing agent (infectivity, pathogenicity, virulence), change in number
of susceptible population, environmental changes that affect transmission of the agent or
growth of the agent lead to the occurrence of a disease (or an epidemic excess than the
normal expected)

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Concept four The causal factors can be necessary or sufficient. Necessary factors are those
that when removed, the disease does not occur. Sufficient factors are those that contribute to
some part of the disease process. Even in the absence of sufficient factors, a disease may
develop. A combination of sufficient and necessary factors causes disease.
Examples - Without HIV infection, AIDS does not develop Necessary factor

Development of tuberculosis requires M. tuberculosis and other factors, such as


immunosuppression, to cause disease. Bacteria still necessary, but not sufficient to
cause the disease

Concept five: Public Health is an integrated discipline. Health protection, disease control, risky
behavioral change, community development, primary health care and surveillance are the
notable fields in which the study of determinants and distribution (epidemiology) comes into
play.

Concept six: The natural history of disease is the history of a particular disease in the absence of
intervention, prevention or treatment. Epidemiology deals in primary, secondary and tertiary
prevention (both at an individual and population level) based on the natural history of the
disease.

Concept seven: The steps of an epidemic investigation or any causal study can be summarized
as below:
1. Begin with a general broad problem
2. Collect information about the problem,
3. Study the specific information collected,
4. Reassess the results and draw conclusions,
5. Re-evaluate the problem,
6. Re-formulate the question and

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

7. Collect additional information which will show the relationship between the exposure
and the outcome or the event.
Applications of epidemiology

Establish patterns of endemic and epidemic diseases

Determine origin of diseases with unknown etiology

Investigate/control diseases whose cause is unknown or poorly understood

Describe the ecology/natural history of disease

Plan and monitor control programs

Assess economic impact of disease

Development of prevention programs

Determine cost and benefits of alternate treatment, prevention, control programs


************************************************************************
Measures - The following examples have been devised to introduce the importance and
emphasize the measures of association and effect that we commonly encounter.
For 2 x 2 cells
Heart attack (Disease)

No heart attack (No Totals


disease)
Smoking (Exposed)
(a ) 80
(b) 20
a+b
(all persons
exposed to smoking)
100
No
smoking
(Not (c) 10
(d) 90
c+d (all persons who
exposed)
are not smokers)
10+90 = 100
Totals
a+c (all persons with b+d (all persons with no 200
heart attack)
heart attack)
3

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

90 (80+10)

20+90 = 110

Number of persons who are exposed with disease = a = 80


Number of persons who are exposed but NOT having disease = b = 20
Number of persons who are NOT exposed but having disease = c = 10
Number of persons who are NOT exposed and NOT having disease = d =90
1.) Risk of exposed to have disease = Risk of smokers to have heart attack
= (Disease and exposed)/ (all exposed persons)
= a/(a+b) = 80/(80+20) = 80/100 = 0.8 = 0.8 x 100% = 80%
This means that 80% of smokers will have the risk of having a heart attack.
This also means that out of 100 people exposed to smoking (smokers), 80 will have heart attack.
2) Risk of not exposed to have disease = Risk of people who are not smoking to have heart attack
= (no smoking with disease) / all non smokers
= c/c+d = 10/100 = 10/(10+90) = 10/100 = 0.1 = 10%
This means that 10% of non smokers will have heart attack. This also means that out of 100 non
smokers, 10 will have disease.
3) Relative risk (RR) for smokers to have heart attack
Relative means compared to ..here we compare smoking to non -smokers

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

So, the question is asking us how much the risk for smokers to develop heart attack is more as compared
to non smokers
= Risk of exposed to have disease / Risk of not exposed to have disease
= Risk of smokers to have heart attack /risk of non smokers to have heart attack
= 0.8 / 0.1 = 8
This means that the relative risk for smokers to develop heart attack is 8 times more (or higher) as
compared to non smokers.
This also means that the relative risk for smokers to develop heart attack is 700% more (or higher) as
compared to non smokers.
The null value for RR is 1. To reject null hypothesis, we should get values of RR to be higher or lower
than that of null value of RR
So, 8 minus 1 = 7 (7 X 100 = 700%)
4. Attributable risk (AR) = Excess risk
Is the risk difference (RD) between two groups, or the excess risk that smokers have as compared to non
smokers to develop a heart attack
So if the question is asking you - what is the AR for smokers to develop heart attack then you subtract
the risk of smokers to have heart attack from that of the risk of non smokers to have heart attack.
AR = (risk of smokers to have heart attack risk of non smokers to have heart attack)
= 0.8 0.1 = 0.7 or 70%
Therefore this 70% means that 70% is the risk difference between smokers and non smokers to have a
heart attack. This also means that smokers have an excess risk of 70% as compared to non smokers to
have heart attack

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

The difference between AR and RR should be very clear


RR is telling you how many times the risk is high in smokers as compared to non smokers
AR is telling you how much the risk is bigger in smokers as compared to non smokers to develop heart
attack
5) AR % - also called as attributable proportion
Ratio of (risk difference among both groups) / (risk in the exposed group) x 100
= (risk in exposed risk in unexposed) / (risk in exposed population ) x 100
=
(risk in smokers to have heart attack risk in non smokers to have heart attack )
x 100
Risk in smokers to have heart attack

= (0.8 0.1) / 0.7 = 1 x 100 = 100%


This means that 100% of risk among smokers to develop heart attack can be attributed to smoking
6. ODDS
Odds is the chance of an event to occur divided by the chance of the event not to occur.
So, if I say that odds of the horse winning the race is 4/7; it means that 4 times the horse will win and 7
times, it will lose.

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

In other words, if the horse runs in 11 ( 4+7) races, the chances of winning are in 4 races and chances of
losing are in 7 races
7. Odds Ratio (OR) of exposure
= odds of exposed to have a disease / odds of not exposed to have a disease
=
Odds of exposed to have disease

Odds of not exposed to have disease

Odds of exposed to have disease = Chance of exposed to have disease / chance of exposed to have not
have disease
Odds of not exposed to have disease = Chance of not exposed to have disease / chance of not exposed
to have no disease

Therefore,
Chance of exposed to have disease / chance of exposed to have not have disease

Chance of not exposed to have disease / chance of not exposed to have no disease

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

=
Chance of smokers to have heart attack / chance of smokers to have no heart attack

Chance of non smokers to heart attack / chance of non smokers to not have heart attack

So, in this 2 x 2 cells


Heart attack (Disease)

No heart attack (No Totals


disease)
Smoking (Exposed)
(a ) 80
(b) 20
a+b
(all persons
exposed to smoking)
100
No
smoking
(Not (c) 10
(d) 90
c+d (all persons who
exposed)
are not smokers)
10+90 = 100
Totals
a+c (all persons with b+d (all persons with no 200
heart attack)
heart attack)
90 (80+10)
20+90 = 110

Odds of exposed to have disease = odds of smokers to have heart attack = Chance of smokers to have
heart attack / chance of smokers to not have heart attack = 80/20 (because{ (80/100) / (20/100)} =80/20
Odds of not exposed to have disease = odds of non smokers to have heart attack = Chance of non
smokers to have heart attack / chance of non smokers to have no heart attack = { (10/100) / (90/100)} =
10/90

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Therefore, Odds ratio for exposure =

Chance of smokers to have heart attack / chance of smokers to have no heart attack

Chance of non smokers to heart attack / chance of non smokers to not have heart attack

(80/20)
(10/90)

Therefore the odds ratio for smokers to have heart attack as compared to non smokers is
OR of exposed people to have disease = OR for smokers to have heart attack = ( 80x 90) / (20 x 10) =
36. This means that odds for smokers to have heart attack is 36 times more as compared to non
smokers. Also , note that the OR = (axd) / (bxc) this is called as cross product ratio as shown below

Heart attack (Disease)

No heart attack (No Totals


disease)
Smoking (Exposed)
(a ) 80
(b) 20
a+b
(all persons
exposed to smoking)
100
No
smoking
(Not (c) 10
(d) 90
c+d (all persons who
exposed)
are not smokers)
10+90 = 100
Totals
a+c (all persons with b+d (all persons with no 200
heart attack)
heart attack)
90 (80+10)
20+90 = 110
9

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

8. Similarly, we can also calculate the odds ratio of disease


=
Odds of disease to have exposure

Odds of no exposed to have exposure

Therefore,
Chance of diseased to have exposure / chance of diseased to have no exposure

Chance of not diseased to have exposure / chance of not diseased to have no exposure

Therefore, Odds ratio for diseased =

Chance of heart attack to have exposure to smoking / chance of heart attack to have no exposure to
smoking
Chance of no heart attack among smokers / chance of no heart attack among non smokers

10

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

= {(80/10) / (20/90)} = (80/10) x (90/20) = 36


OD of disease = OR of heart attack = This means that people with heart attack are 36 times more
exposed to smoking as compared to people with no heart attack
Therefore, OR of disease = OR of exposure = cross product ratio
9 . Validity
Sensitivity
New Test
New test positive

New test negative

Totals

Old test (gold standard)


Gold standard positive)
(a ) 80

Gold standard negative


(b) 220

Totals
a+b
(all persons
positive on new test)
300
(c) 10
(d) 90
c+d
(all persons
negative on new test)
10+90 = 100
a+c (all persons positive b+d
(all persons 400
on gold standard test)
negative
on
gold
90 (80+10)
standard)
20+90 = 310

Sensitivity of new test = (new test positive) / (all persons positive on gold standard test)
= a/(a+c) = 80/90 = 0.88 = 88 %
This means that the new test has a sensitivity to detect (catch) 88% of the people who are actually
positive
Specificity of the new test = (new test negative) / (all persons negative on gold standard)
= d/(d+b) = 90/310 = 0.29 = 29%
This means that the new test has a specificity to detect ( catch) 29% of the people who are actually
negative.

11

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

In other words,
New Test
New test positive

New test negative

Totals

Old test (gold standard)


Gold standard positive)
(a ) 80
(True positives)

Gold standard negative


(b) 220
(False positives)

Totals
a+b
(all persons
positive on new test)
300
All positive on new
test
(c) 10
(d) 90
c+d
(all persons
(False negatives)
(True negatives)
negative on new test)
10+90 = 100
All negative on new
test
a+c (all persons positive b+d
(all persons 400
on gold standard test)
negative
on
gold
90 (80+10)
standard)
20+90 = 310

Therefore,
Sensitivity = True positives / (True positives + False negatives)
Specificity = True negatives / (True negatives + false negatives)

Screening
Test

Disease
Present

Absent

Positive

True
positives

False
positives

Negative

False
negatives

True
negatives

12

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Validity of Screening Tests


True Disease Status

Sensitivity: The probability of testing


positive if the disease is truly present
Sensitivity = a / (a + c)

13

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Validity of Screening Tests


True Disease Status

Specificity: The probability of screening


negative if the disease is truly absent
Specificity = d / (b + d)

14

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Screening Principles
Sensitivity
the ability of a test to correctly identify those
who have a disease
a test with high sensitivity will have few false
negatives

Specificity
the ability of a test to correctly identify those
who do not have the disease
a test that has high specificity will have few false
positives

15

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Validity of Screening Tests


Breast Cancer

Physical Exam
and Mammography

132

983

45

63650

Sensitivity:

a / (a + c)
Sensitivity = 132 / (132 + 45) = 74.6%

Specificity: d / (b + d)
Specificity = 63650 / (983 + 63650) = 98.5%
Sensitivity and specificity are not able to predict the performance of the screening test in the
population
Thus, the indices of positive and negative predictive value are needed

Predictive Value Positive (PV+) : People with positive screening test results will also test positive
on the diagnostic test:

Predictive Value Negative (PV-) : People with negative screening test results are actually free of
disease:

16

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Performance Yield
True Disease Status

Results of
Screening
Test

Predictive value positive (PV+): The probability that a person


actually has the disease given that he or she tests positive.
PV+ = a / (a + b)

Performance Yield
True Disease Status
+
Results of
Screening
Test

Predictive value negative (PV-): The probability


that a person is truly disease free given that he
or she tests negative.
PV- =
d / (c + d)

17

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Performance Yield
True Disease Status
+
Results of
Screening
Test

400

995

100

98905

Sensitivity:

a / (a + c) = 400 / (400 + 100) =

80%

Specificity:

d / (b + d) = 98905 / (995 + 98905) =

99%

PV+:

a / (a + b) = 400 / (400 + 995) =

29%

PV-:

d / (c + d) = 98905 / (100 + 98905) =

99%

18

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Performance Yield
True Disease Status
+
Results of
Screening
Test

PV+:

400

995

100

98905

a / (a + b) = 400 / (400 + 995) = 29%

Among persons who screen positive, 29% are found


to have the disease.

19

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Performance Yield
True Disease Status
+
Results of
Screening
Test

PV-:

400

995

100

98905

d / (c + d) = 98905 / (100 + 98905) = 99.9%

Among persons who screen negative, 99.9% are found


to be disease free.

Factors that influence PV+ and PV1.

The more specific the test, the higher the PV+

2.

The higher the prevalence of preclinical disease in the screened population, the higher the PV+

3.

The more sensitive the test, the higher the PV-

Therefore, in revision:

20

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Absent

Positive

Negative

c
a+c

Sensitivity
Specificity

Disease
Present

Absent

Positive

a+b

Negative

c+d

b+d

a+b

c+d

b+d

Screening
Test

Screening
Test

Disease
Present

a+c

Quantify a screening tests


accuracy given the known
disease status of subjects

Quantify a screening tests


accuracy given only the
test results of subjects

PPV
NPV

Concept of Likelihood ratios


Disease +

Disease -

Test +

Test -

Likelihood ratio( +) = LR + = (T+/all D+) / (T+/all D-) = (a/a+c) / (b / b+d) = sensitivity/ 1-specificity
Likelihood ratio( -) = LR- = (T-/ all D+) / ( T-/all D-) = c/(a+c) / (d/b+d) = 1-sensitivity/ specificity
Post odds + = ( LR+ ) * Pre odds
Pre also called as prior odds = Prob of disease/(1-prob of disease)
Prob of disease = {(a+c) / (a+b+c+d)}
Prior odds = {(a+c) / (a+b+c+d)} / 1-{(a+c) / (a+b+c+d)}

21

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Post odds (+) = (LR+) *prior odds


Post odds (+) = {(a/a+c) / (b / b+d)} * {(a+c) / (a+b+c+d)} / 1-{(a+c) / (a+b+c+d)}
Post odds ( -) = (LR - ) *prior odds
Post odds (-) ={ c/(a+c) / (d/b+d)} * {(a+c) / (a+b+c+d)} / 1-{(a+c) / (a+b+c+d)}

Receiver Operating Curve


ROC is a curve that plots false positive rate on X axis versus True positive rate on Y axis
ROC is a curve that plots (1- specificity) on X axis versus sensitivity on Y axis
Some examples of ROC curves are mentioned below

22

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

ROC analysis provides a useful means to assess the diagnostic accuracy of a test and to compare the
performance of more than one test for the same outcome. However, the usefulness of the test must be
considered in the light of the clinical circumstances.
Say for example,

In this curve, The ability of two continuous variables to diagnose an outcome can be compared using
ROC curves and their Area under ROC curve (AUROCs).
For example, Fig. 3 (above figure )and Table 6 (mentioned as below) show the ROC curve and AUROC for
urea in addition to those for lactate.

23

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Reference for figures and table is: Critical Care December 2004 Vol 8 No 6 Bewick et al.
The AUROC for urea is greater than that for lactate (0.730 for urea as compared to 0.64 for lactate),
suggesting that urea may provide a better predictive test for mortality.
Tests for Reliability
Standard approach / test to diagnose depression clinical exam
Self reporting depression test ( Depressed
new test)

Not depressed

Self reported depressed

25

Self reported not depressed

10

60

1. Percent Agreement: Divide the number of paired observation in the agreement cells by the total
number of paired observations

Using the data from our example:

(25+60)/100*100%=85%

Advantage

Simple to use

Can be extended to discrete score with more than two levels

Not depressed, mild depression, severe depression

Disadvantage

Values tend to be high whenever the absent-absent cell is high.

24

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

2. Percent Positive Agreement: Divide the number of positive paired observation by the average
number of positives by both ratings.

Using the data from our example:

25/[((25+10) + (25+5))/2]*100%=76.9%

This represents the number of times both ratings provide a positive results out of the
average number of positives by either rating.

3. Kappa Statistic: The fraction of the observed agreement not due to chance in relation to the
maximum non-chance agreement.

K=(P0-Pe)/(1-Pe)

P0=the proportion of observed agreement

Pe=the proportion of agreement expected to occur by chance alone.

From our example

P0=(25+60)/100=.85

Pe

The sum of chance agreement for each cell on the diagonal

The expected for each cell is calculated by the product of the


corresponding marginals divided by the total

(25+5)*(25*10)/100=10.5

(60+5)*(60+10)/100=45.5
25

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Therefore, Pe=(10.5+45.5)/100=.56
K=(P0-Pe)/(1-Pe) = (.85-.56)/(1-.56)=.66
Range of Kappa: -1 to 1

-1: Complete disagreement

0: Random agreement

1: Complete agreement

Suggested guidelines

< .4 Poor agreement beyond chance

.4-.75 Fair to good agreement beyond chance

> .75 Excellent agreement beyond chances

Reference : Landis and Koch (1977). The Measurement of observer agreement for categorical data.
Biometrics, 33:159-174.
4. Intraclass Correlation Coefficient: estimates the fraction of the total measurement variability caused
by variation among individuals.

This is an extension of the kappa; Same range of scores (-1 to 1)

ICC=Vb/(Vb+Ve)

Vb=Variance between individuals

Ve=Error variance

26

Author: Dr. Raghupathy Anchala, MD MPH PDCR, IIPH Hyderabad


Epidemiology made easy for beginners

Can calculate ICC from ANOVA Table

More complex approach to estimating the ICC also exist, which take into account
random effect of subjects and raters

5. Coefficient of Variability (CV): the standard deviation expressed as a percentage of the mean value
of two sets of paired observations
For each paired set of observation, calculate the variance
If have an pair of scores of 25 and 35, the mean of the two observation would be 30 and the variance
would be (25-30)2+(35-30)2=50
The CV for the pair would be the standard deviation of the paired observations divided by the mean of
the pair
SQRT(50)/30=.24
This is then repeated for each pair
The overall CV is the average of the pairwise CVs
The lower the CV, the less variation there is between the repeated measurements
If not differences between pairs, the CV would be zero
--------------------------------------------------------------------------------------------------------------------For validity, we could use sensitivity, specificity, PV + and PV------------------------------------------------------------------------------------------------------

27

Вам также может понравиться