ArboConFlu StudyDesign

INTRODUCTION TO THE EPIDEMIOLOGY TOOLBOX
TYPES OF EPIDEMIOLOGICAL STUDY DESIGN

Maximilian P.O. Baumann
Postgraduate Studies in International Animal Health
Epidemiologic studies are:

observational, structured comparisons of the proportions (eg. prevalence) or rates (eg. incidence) of disease (or of other conditions of veterinary, economic or public health interest) among populations or subsets of populations Epidemiologic terminology: disease and exposure refer to the dichotomous condition of interest and to the presence or absence of a risk factor under consideration
2
GENERAL REQUIREMENTS
In all study designs required that
1. each unit of observation (typically one animal, could also be an aggregate): disease and exposure status is recorded 2. disease and exposure status - if not directly observable - is established: reliable diagnostic tests (i.e. sensitivity and specificity known and considered) 3. diagnostic procedure: standardized and not subject to changes 4. additional information: collected on traits with known impact on disease and exposure status
3
2x2 contingency table for cross-tabulation of disease and exposure state Diseased (D+) Exposed (E+) a Unexposed (E-) c Total
Notation: D+ = animals is diseased D- = animal non-diseased E+ = animal is exposed E- = animal is unexposed
Non-diseased (D-)
Total n1 n2
d m2
m1
n1 = total number of animals exposed to the risk factor n2 = total number of animals unexposed to the risk factor m1 = total number of diseased animals m2 = total number of non-diseased animals n = total number of animals
a b c d
= number of diseased animals exposed to the risk factor = number of non-diseased animals exposed to the risk factor = number of diseased animals unexposed to the risk factor = number of non-diseased animals unexposed to the risk factor
2x2 TABLE: PROBABILITY NOTATION

Diseased (D+) Non-diseased (D-) Total Exposed (E+)
a c m1
b d m2
n1 n2 n
Unexposed (E-)
Total
Proportion or rate of interest Exposed Diseased Diseased and exposed Diseased in the exposed group Diseased in the unexposed group Exposed in the diseased group Exposed in the non-diseased group
Probability notation Pr(E+) Pr(D+) Pr(E+ and D+) Pr(D+ | E+) Pr(D+ | E-) Pr(E+ | D+) Pr(E+ | D-)
Sam ple estim ate (a + b)/n (a + c)/n a/n a/(a + b) c/(c + d) a/(a + c) b/(b + d)
5
SAMPLING BY ROWS:
a/(a+c) b/(b+d) is a valid estimator of Pr (E+ | D+) is a valid estimator of Pr (E+ | D-)
Pr (D+) and Pr (D-) cannot be estimated (directly) using this design
SAMPLING BY COLUMNS:
a/(a+b) c/(c+d) I is a valid estimator of Pr (D+ | E+) is a valid estimator of Pr (D+ | E-)
Pr (E+) and Pr (E-) cannot be estimated (directly) using this design
SAMPLING THE ENTIRE TABLE:

all sample estimates are valid estimators of the respective probabilities, but we may end up with n1 being a too small value if prevalence of exposure is low we may end up with m1 being a too small value if prevalence of disease is low
6
CLASSIFICATION OF STUDY TYPES:
sampling the entire table = cross-sectional study
sampling by rows
= cohort study
sampling by columns
= case - control study
CROSS-SECTION STUDY DESIGN
diseased, exposed
non-diseased, exposed
Population
diseased, not exposed
non-diseased, not exposed
Step-by-step guide
1. identify the target population, 2. identify the sampling frame (sampling population), 3. establish the sample size n and sampling method (cluster, stratified sampling, etc.), 4. obtain diagnostic data on the disease and exposure status for all n animals, 5. cross-tabulate the results using a 2x2 table.
Cross-sectional studies
measure prevalence difficult to detect disease or exposure of transient nature difficult inference about causality most suitable to study permanent risk factors
9
PROSPECTIVE (COHORT) STUDY DESIGN

TIME
PROSPECTIVE ASSESSMENT OF DISEASE
diseased exposed nondiseased
Population
nondiseased animals diseased not exposed nondiseased

10
Step by step guide

1. select a cohort of n1 non-diseased animals exposed to a risk factor and a matched group of
n2 animals not exposed to the risk factor 2. follow up the animals over a sufficient time period 3. report the dates at which new animals enter the study or at which animals drop out from any of the cohorts (including the reason for drop out) 4. establish the number of new observed cases in the cohorts 5. establish the total time at risk for the cohorts 6. enter the data into a 2x2 table
Cohort studies measure incidence rates (cumulative incidence, if the follow-up period of all animals is the same) are based on animal-time data appropriate for diseases with high incidence
11
RETROSPECTIVE (CASE-CONTROL) STUDY DESIGN

TIME
RETROSPECTIVE ASSESSMENT OF EXPOSURE
exposed
not exposed
Cases (diseased animals)
Population
exposed Controls (non-diseased animals)
12
not exposed
Step-by-step guide
1. Select a group of m1 diseased animals (cases) and group of m2 non-diseased animals (controls). Matched pairs are recommended. 2. investigate the past exposure status of all animals and establish the number of exposed cases and controls, 3. enter the data into a 2x2 table.
Other important aspects to be considered Case-control studies are "retrospective" because they start after the onset of disease and assess the history of postulated exposure. In a case-control study the inference is from effect to cause, not from cause to effect as it would be in a cohort study.
13
SUMMARY: MOST IMPORTANT FEATURES

Cross-sectional study Measure of disease frequency Direction of investigation Samples (selections) involved Primary measure of association Case-control study
Cohort study
Prevalence momentary/ Retrospective
Prevalence
Incidence
Retrospective
Prospective 1 cohort of exposed, 1 cohort of unexposed Relative risk; attributable risk

14
1 sample from the population
1 group of cases, 1 group of controls
Prevalence odds ratio
Odds ratio
Major advantages (printed bold) and disadvantages

Cross-sectional study Marginal conditions Applicability Data quality Sample sizes quick relatively cheap permanent risk factors quite common dis. as good as diagnosis large (low prevalences) Case-control study quick relatively cheap more general rare diseases errors in historic data relatively small limited causal evidence no incidence prev. of exposure no prev. of disease Cohort study time-consuming relatively costly more general as good as diagnosis large (dropout, low inc.) causal evidence incidence no prev. of exposure prev. of disease
15
Inferences/ estimatability
no causal evidence no incidence prev. of exposure prev. of disease
ANALYSIS AND INTERPRETATION OF RESULTS

The objective of quantitative epidemiologic research: quantify the association between disease and exposure. This entails different measures of association, expressing the absolute or relative change in risk due to exposure, which are to be selected according to the study type
Analysis of cross-sectional studies In a cross-sectional study the following measures can be established Prevalence of disease, Pr (D+) pD = m1 / n Prevalence of exposure, Pr (E+) pE = n1 / n Prevalence of disease given exposure, Pr(D+ | E+) p1 = a / n1 Prevalence of disease given no exposure, Pr(D+|E-) p2 = c / n2 Prevalence ratio Prevalence difference Prevalence odds ratio PR = p1 / p2 PD = p1 - p2 POR = p1 / (1-p1) p2 / (1-p2) = a d b c
16
= (a + c) / (a + b + c + d) = (a + b) / (a + b + c + d) = a / (a + b) = c / (c + d) = a / (a + b) c / (c + d)
The three latter measures are of direct interest for the assessment of a risk factor and have the following interpretation (if the 2x2 table is as shown in the first section).
Exposure is
expected PR
expected PD
expected POR
not associated with disease
a risk factor
>1
>0
>1
a protective factor
<1
<0
<1
17
Hypothetical example
In a cross-sectional study it is investigated whether Holstein-Friesian (HF) breed is a risk (or protective) factor for subclinical mastitis (SCM) compared with Brown Swiss (BS). It is estimated that HF and BS cows are equally abundant in the study area. From the study population a random sample of 300 milking cows was obtained. The results are as below.
SCM+ HF BS Total 27 32 59
SCM95 146 241
Total 122 178 300
18
PR = 27 / 122 32 / 178
= 1.231
>1
SCM prevalence in HF is 1.231 times the prevalence in BS. This indicates that HF could be a risk factor for SCM compared with BS. PD = 27/122 32/178 = 0.0415 >0
SCM prevalence in HF cows is 4.15 percent points higher than the prevalence in BS. This indicates that HF could be a risk factor for SCM compared with BS. POR = (27) (146) (95) (32) = 1.2967 >1
The odds of SCM in HF cows is 1.2967 times the odds in BS. This indicates that HF could be a risk factor for SCM compared with BS.
19
INFERENCES FROM THE RESULTS

Usually, instead of testing the null hypotheses PR=1, PD=0 or POR=1 one constructs a 100(1-)% confidence interval (CI) for each of the parameters. For example, =0.05, 95% CI (POR): Cornfields method = 0.7337-2.2922
The expected value under the null hypothesis is included in the interval. This indicates that there is no statistically significant evidence for a breed effect. For the other parameters, one could establish CIs as well.
20
CASE-CONTROL STUDY
Parameters to be estimated from case-control studies include Prevalence of exposure, Pr (E+) pE = n1 / n Prevalence of exposure given disease, Pr(E+ | D+) p1 = a / m1 Prevalence of exposure given no disease, Pr(E+|D-) p2 = b / m2 Odds ratio for exposure, ORE = p1 / (1 - p1) = a d . p2 / (1 p2) b c
21
= (a + b) / (a + b + c + d)
= a / (a + c)
= b / (b + d)
ORE = the ratio of the odds of exposure in cases and controls. ORE measures whether or not exposure is more common in the diseased group than in the non-diseased group. ORE is algebraically identical with the odds ratio for disease (ORD). For low prevalences (< 0,1) the ORE obtained form of a casecontrol study is a good estimator of the relative risk of disease. The interpretation of the value of ORE (and ORD) is the same as that of the prevalence odds ratio.
22
Hypothetical Example A veterinarian collected blood samples from 23 cows he visited 0-7 days after abortion (AB). Additionally, on each farm, a blood sample was taken from a cow that gave birth to a healthy calf. The 46 sera were submitted to a laboratory for testing Neospora caninum antibodies (NCA). Here are the results:
AB+
AB-
Total
NCA+ NCATotal
15 8 23
7 16 23
22 24 46
23
RESULTS
ORE = ORD = (15) (16) (7) (8) = 4.286 >1
The odds of abortion in Neospora caninum exposed (according to antibody level) cows is 4.3 times the odds in unexposed cows. This indicates that exposure to Neospora caninum could be a risk factor for abortion. Inferences from the results 95% CI (OR) Cornfields method = 1.27-14.45. The expected value under the null hypothesis (OR= 1) is not included in the interval. Thus, we conclude that there is a statistically significant evidence for an association between exposure to Neospora caninum and abortion in the cattle population covered by the vet.
24
COHORT STUDIES
We have two options: (A) If the follow-up time is equal for all animals under study, we use the count data. (B) If the follow-up time is variable we have to consider incidence rates or densities. A: Count data Parameters to be estimated include: Cumulative incidence in the exposed cohort, p1 = a / n1 = a / (a + b) Cumulative incidence in the unexposed cohort, p2 = c / n2 = c / (c + d) Relative risk (cumulative incidence ratio, risk ratio), RR = p1 / p2 = a / (a + b) c / (c + d) The numerical value of RR is interpreted in a similar way as OR.
25
Hypothetical example 1000 dogs selected for a prospective study on the impact of high doses of vitamin E and cancer. The owners of 500 dogs received a food additive based on high dosed vitamin E for daily administration (VE). The owners of the other 500 dogs, matched for breed, sex, and age received a food additive containing no active compound (placebo). In phase 1 of the experiment, the dogs were followed up over a period of 2 years. Each case of death in the cohorts was subjected to PM inspection where the presence or absence of any form of cancer was recorded (CAN). The number of diagnosed cancer is as follows: CAN+ VE+ VETotal 4 8 12 CAN496 492 982 Total 500 500 1000
26
Results p1 = 4/500 = 0.008 p2 = 8/500 = 0.016 RR = 0.5 <1 According to phase 1 of the study the observed risk of developing cancer under the VE treatment is 50% compared to a placebo treatment. This indicates that the VE treatment could be a protective factor. Inferences form the results 95% CI (RR) Cornfields method = 0.151-1.649 The expected value under the null hypothesis (RR= 1) is included in the interval. Thus, we conclude that there is no statistically significant evidence for an association between VE treatment and development of cancer from this study.
27
B: animal-time data Parameters to be estimated include: Incidence rate (density) for exposed, Incidence rate (density) for unexposed, Incidence rate ratio (density) ratio, IR1 = a / T1 IR0 = c / T0 IRR = IR1/IR0
where T1 and T0 denote the total animal-time at risk for the two cohorts IRR has the same interpretation as the RR. The analysis of incidence densities is only possible if entry and exit dates for the dogs of the cohorts are available.
28
Hypothetical example The above experiment is continued in a phase two for another 10 years. A lot of dogs will drop out due to various reasons (the most frequent probably lack of compliance). Replacement dogs are recruited according to strict study protocol. The resulting data after termination of the study is:
CAN+
Total animal-years at risk
VE+ VETotal
33 84 117
4865 3222 8087

29
Results :
IR1 = 33/4865 = 0.00678 IR0 = 84/3222 = 0.026 IRR = 0.26 <1
According to phase 2 of the study the observed risk of developing cancer under the VE treatment is 26% compared to a placebo treatment. This indicates that the VE treatment could be a protective factor. Inferences form the results Exact 95% CI (IRR) = 0.1684-0.3935. The expected value under the null hypothesis (IRR= 1) is not included in the interval. Thus, the extension of the study has we conclude that there is statistically significant evidence for an association between VE treatment and development of cancer from this study.
30
ATTRIBUTABLE RISKS
Some disease usually also occurs in unexposed animals. Only a fraction of disease cases in exposed animals can be attributed to the exposure. The observed incidence in exposed animals can be regarded as result of a base line risk plus some specific (but unobserved) risk due to exposure. In order to study the excess risk due to the risk factor, we consider the difference between incidence rates of exposed and unexposed animals. Concept of attributable risks refers to quantification of adverse effects in the population due to exposure to the risk factor. Excess risk of disease in the population due to a risk factor is required for prioritising and allocation of public and veterinary health resources. One would use available resources to eliminate or reduce a "mild" (say RR=1.5) factor which is abundant rather than tackling a "strong" (say RR=8) risk factor which is extremely rare in the population.
31
ATTRIBUTABLE RISK (AR)

Under the assumption that the risk in the exposed group (IR1) is due to the base line risk (IR0) plus specific ("attributable") risk of exposure (=AR) we can say that IR1 = IR0 + AR AR = IR1 IR0. Thus, the simple risk difference can be used to define the AR. and therefore
32
ATTRIBUTABLE FRACTION (AF)

attributable risk divided by risk in the exposed animals = attributable fraction; can be written in terms of the relative risk (RR).
AF = AR/[a/(a + b)]; (RR 1)/RR

Thus, removing AF(100)% of the risk form the exposed animals would result in a risk similar to the base line risk. Sometimes, the risk difference (AR) is divided by IR0. RRD is not really a fraction (because we assume IR1 > IR0). Again, it is just a transformation of RR. RRD = RR - 1 Both relative risk differences are of limited use, because they do not use the prevalence of exposure in the population.
33
Hypothetical example A cohort study provided statistically significant evidence (RR= 2.5: 95% CI = 1.134-5.509) for poor housing being a risk factor for bovine salmonellosis. The cohort sizes were selected to reflect the prevalence of exposure (ie bad housing) in the population (PE= 60/140=0.4286). The population incidence rate was IR=23/140= 0.1643. salmonellosis poor housing good housing Total 15 8 23 no salmonellosis 45 72 117 Total 60 80 140
34
Attributable risk (AR)

AR = IR1 - IR0 = 15/60 - 8/80 = 0.25 0.1 = 0.15. Thus, the incidence of disease in exposed animals which can be attributed to exposure is 0.15 (15 per 100 cases).
Attributable Fraction (AF)

AF = 1.5/2.5 = 0.6. Thus, taking away 60% of the risk in exposed animals results in the base line risk of 0.1. The other relative risk difference (relative to IR0) is RRD = RR 1 = 1.5.
35
PAR, PAF
POPULATION ATTRIBUTABLE RISK (PAR)
The PAR is the incidence in the population (IR) minus the incidence in unexposed animals (IR0). Thus, the population incidence is required to estimate PAR, which is (directly) only possible in cohort studies PAR = IR IR0. Note that in the literature the term PAR is often used to describe what we call PAF below.
POPULATION ATTRIBUTABLE FRACTION (PAF)

The PAF expresses the proportion of all cases that are caused by exposure. In other words, it indicates the proportion of all cases that could have been prevented, if exposure had not been present. PAF is called PAR by some authors. PAF is a function of the relative risk RR and the prevalence of exposure PE. PAF =
(IR - IR0 ) / IR; (RR-1) / RR

36
Population Attributable Risk (PAR) PAR = IR IR0 = 0.1643 0.1 = 0.0643. Note that we assume PE = 60/140. The incidence of disease in the population associated with exposure is 64 per 1000 cases. Population Attributable Fraction (PAF) For PE = 0.4286 we obtain PAF = (0.1643 0.1) / 0.1643 = 0.3913 Here we have a really important measure. PAF is telling us that that 39% of all cases in the population could be prevented, if exposure was removed. Note, how the prevalence of exposure works. For a lower value of PE (say 10%) only 13% of cases could be prevented (PAF= 1.5/(1.5+1/0.1)=1.5/11.5=0.1304).
37
CONFIDENCE INTERVALS (CI)

A confidence interval (CI) indicates the precision of an estimated parameter. The CI gives a range of parameter values (defined by the upper and lower limit of CI) so that we may say that the true population parameter lies within the interval with a predefined probability (commonly 95%). Various methods exist to construct confidence methods,either resulting in an approximation or an exact CI Exact CIs require some heavy computations and are implemented only in specialised statistical/epidemiologic software products (eg, EpiInfo, Stata, SAS, ).
38
CI: Odds Ratio (OR)

lower border: upper border: ln(OR) - ( z x (variance ln(OR) ) ) ln(OR) + ( z x (variance ln(OR) ) )
By transforming we get:
lower border upper border
and
The Z-values are set at 1.96 for a confidence level of 95% and at 2.58 for a confidence level of 99%. The variance of the ln (OR) can be calculated from a 2 x 2 table as 1/a +1/b + 1/c + 1/d.
39
CI: Odds Ratio (OR), Example

Hypothetical case-control study:
D+ 115 16
D67 64
E+ E-
OR = (115 x 64) / (16 x 67) = 6.866 The OR transformed = ln(6.87) = 1.927 The variance of ln(OR) = 1/115 + 1/67 + 1/16 + 1/64 = 0.102 The z-value is set at 95% = z = 1.96 The upper border: 1.927 + ( 1.96 x (0.102) ) = 2.553 The lower border: 1.927 + ( 1.96 x (0,102) ) = 1.301 The 95% confidence interval for the OR: from e1.30 to e2.55 (= 3.7 to 12.8)
40
This confidence interval does not include 1. Thus, it can be concluded that the association found in this study is statistically significant at a level of =0.05. At 99% confidence the interval ranges from 3.0 to 15.6. This larger confidence interval makes intuitively sense: if one wants to be more confident, one has to pay by loosing some precision. Suggested reading Nordhuizen, Frankena, van der Hoofd, Graat (1997). Application of quantitative methods in veterinary medicine. Wageningen Pers. 1997. Thrusfield M.V. (1986). Veterinary Epidemiology. London, Butterworth and Co.
41

ArboConFlu StudyDesign

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

ArboConFlu StudyDesign

Загружено:

Авторское право:

Доступные форматы

INTRODUCTION TO THE EPIDEMIOLOGY TOOLBOX

TYPES OF EPIDEMIOLOGICAL STUDY DESIGN

Epidemiologic studies are:

2x2 TABLE: PROBABILITY NOTATION

Pr (D+) and Pr (D-) cannot be estimated (directly) using this design

Pr (E+) and Pr (E-) cannot be estimated (directly) using this design

SAMPLING THE ENTIRE TABLE:

CLASSIFICATION OF STUDY TYPES:

sampling the entire table = cross-sectional study

= case - control study

CROSS-SECTION STUDY DESIGN

non-diseased, not exposed

PROSPECTIVE (COHORT) STUDY DESIGN

diseased exposed nondiseased

nondiseased animals diseased not exposed nondiseased

Step by step guide

RETROSPECTIVE (CASE-CONTROL) STUDY DESIGN

Cases (diseased animals)

SUMMARY: MOST IMPORTANT FEATURES

Prevalence momentary/ Retrospective

Prospective 1 cohort of exposed, 1 cohort of unexposed Relative risk; attributable risk

1 sample from the population

1 group of cases, 1 group of controls

Prevalence odds ratio

Major advantages (printed bold) and disadvantages

no causal evidence no incidence prev. of exposure prev. of disease

ANALYSIS AND INTERPRETATION OF RESULTS

not associated with disease

SCM95 146 241

Total 122 178 300

INFERENCES FROM THE RESULTS

Total animal-years at risk

4865 3222 8087

IR1 = 33/4865 = 0.00678 IR0 = 84/3222 = 0.026 IRR = 0.26 <1

ATTRIBUTABLE RISK (AR)

ATTRIBUTABLE FRACTION (AF)

AF = AR/[a/(a + b)]; (RR 1)/RR

Attributable risk (AR)

Attributable Fraction (AF)

POPULATION ATTRIBUTABLE FRACTION (PAF)

(IR - IR0 ) / IR; (RR-1) / RR

CONFIDENCE INTERVALS (CI)

CI: Odds Ratio (OR)

CI: Odds Ratio (OR), Example

Вам также может понравиться