Академический Документы
Профессиональный Документы
Культура Документы
3
Criteria for test selection
• Reliability
• Validity
• Feasibility
• Simplicity
• Cost
• Acceptability
4
Measures of inter- and intra-rater
reliability: categorical data
• Percent agreement
– limitation: value is affected by prevalence -
higher if very low or very high prevalence
• Kappa statistic
– takes chance agreement into account
– defined as fraction of observed agreement not
due to chance
5
Kappa statistic
Kappa = p(obs) - p(exp)
1 - p(exp)
6
Example of Computation of Kappa
Agreement between the First and the Second Readings to Identify Atherosclerosis Plaque
in the Left Carotid Bifurcation by B-Mode Ultrasound Examination in the
Atherosclerosis Risk in Communities (ARIC) Study
First Reading
Plaque Normal Total
Second reading Plaque 140 52 192
Normal 69 725 794
Total 209 777 986
7
Interpretation of kappa
• Various suggested interpretations
• Example: Lanis & Koch, Fleiss
excellent: over 0.75
fair to good: 0.40 - 0.75
poor: less than 0.40
8
Validity (accuracy) of
screening/diagnostic tests
• Face validity, content validity: judgement of the
appropriateness of content of measurement
• Criterion validity
– concurrent
– predictive
9
Normal vs abnormal
• Statistical definition
– “Gaussian” or “normal” distribution
• Clinical definition
– using criterion
10
11
12
13
14
Selection of criterion
(“gold” or criterion standard)
• Concurrent
– salivary screening test for HIV
– history of cough more than 2 weeks (for TB)
• Predictive
– APACHE (acute physiology and chronic
disease evaluation) instrument for ICU patients
– blood lipid level
– maternal height
15
Sensitivity and specificity
Assess correct classification of:
• People with the disease (sensitivity)
• People without the disease (specificity)
16
"True" Disease Status
Present Absent
Screening
test results
Positive "True positives" "False positives"
A B
Negative "False negatives" "True negatives"
C D
17
Predictive value
• More relevant to clinicians and patients
• Affected by prevalence
18
Choice of cut-point
If higher score increases probability of disease
• Lower cut-point:
– increases sensitivity, reduces specificity
• Higher cut-point:
– reduces sensitivity, increases specificity
19
Considerations in selection of
cut-point
Implications of false positive results
• burden on follow-up services
• labelling effect
Implications of false negative results
• Failure to intervene
20
Receiver operating characteristic
(ROC) curve
• Evaluates test over range of cut-points
• Plot of sensitivity against 1-specificity
• Area under curve (AUC) summarizes
performance:
– AUC of 0.5 = no better than chance
21
22
Likelihood ratio
• Likelihood ratio (LR) = sensitivity
1-specificity
• Used to compute post-test odds of disease
from pre-test odds:
post-test odds = pre-test odds x LR
• pre-test odds derived from prevalence
• post-test odds can be converted to
predictive value of positive test 23
Example of LR
• prevalence of disease in a population is 25%
• sensitivity is 80%
• specificity is 90%,
• pre-test odds = 0.25 = 1/3
1 - 0.25
• likelihood ratio = 0.80 = 8
1-0.90
24
Example of LR (cont)
• If prevalence of disease in a population is
25%
• pre-test odds = 0.25 = 1/3
1 - 0.25
• post-test odds = 1/3 x 8 = 8/3
• predictive value of positive result = 8/3+8
= 8/11 = 73%
25
Diagnostic odds ratio
• Ratio of odds of positive test in diseased vs
odds of negative test in non-diseased:
a.d
b.c
• From previous example:
OR = 8 x 27 = 36
2x3
26
Summary: LR and DPR
• Values:
– 1 indicates that test performs no better than
chance
– >1 indicates better than chance
– <1 indicates worse than chance
• Relationship to prevalence?
27
Applications of LR and DOR
• Likelihood ratio: Primarily in clinical
context, when interest is in how much the
likelihood of disease is increased by use of
a particular test
• Diagnostic odds ratio Primarily in research,
when interest is in factors that are
associated with test performance (e.g., using
logistic regression)
28