Вы находитесь на странице: 1из 5

CLIN. CHEM.

35/2, 284-288 (1989)

ChoosingQuality-ControlSystems to Detect Maximum ClinicallyAllowableAnalyticalErrors


KrlstlanUnn.t
Critical systematic and random analytical errors for 17 common clinical chemical components were estimated from published values for analytical imprecision, biological variation, and medically important changes. Appropriate qualitycontrol systems for these analytes are discussed on the basis of power considerations. The simple rule 1, with one control per run,is minimally sufficient for the analytes (about one quarter of those considered here) for which the magnitude of critical error is at least 3 analytical standard deviations. The more powerful rule 1, with one control per run, is the minimal requirement for analytes for which critical errors are about 2 analytical standard deviations; these are about half the remaining analytes. Greater power values are achieved by using multiple rules based on several controls per run. In general, this study does not support the view put forward by some authors that the quality-control rules in use today are too restrictive. Commonly, the design of quality-control systems in clinical chemistry is based on convention rather than on consideration of what size of analytical errors should be detected for clinical utility. The ability of various quality-control rules to reveal errors of a given magnitude with a stated probability has been considered by Westgard et al. (1,2). It is difficult to define medically important errors objectively, and several investigators have chosen the pragmatic approach of interrogating clinicians as to what changes in laboratory results are judged as being important (3-8). In this paper, I use the medically importantchanges recorded by Skendzel et al. (8) as a starting point and, by taking both pro-analytical and biological variation into account, derive the maximum allowable analytical errors for commonly measured analytes. From these values, I suggest appropriate quality-control systems. When a patients value was compared with a fixed limit, the relation was as follows:
CVm
=

med1.65

has been interpreted as a goal for analytical for the following reason. If the analytical imprecision (Se) is the only source of random variation, and the goal is fulfilled, the difference of medical importance is statistically significant at the 0.05 level (one-sided test). The test results given clinicians, however, are subject to additional sources of variation: pre-analytical and biological factors. The pro-analytical imprecision (sr) is that induced by the venipuncture and the pre-processing of the sample. In the present context, the biological variation (Sb) is the intraindividual variation about a homeostatic set point (9). To test whether an observed change in a patient, or a difference with respect to a fixedpoint, indicates a real difference in the biological state of the individual, one has to calculate the total standard deviation (St), which should be expressed as a coefficient of variation: The CVm
imprecision
St=y++s

An observed change (ii) in a monitored patient is statistically significant (one-sided test) if: .I(sV) >1.65 Table 1 presents data for the commonly used analytes by Skendzel et al. (8). The intra-individual biological variation (coefficients of variation, column 1) mostly are those agreed upon at conferences in 1976 and 1978 (10). The value for bilirubin originates from a report by Winkel et al. (11), and that for aspartate arninotransferase is the average of the results obtained in two studies (11, 12). For blood hemoglobin, blood leukocyte count, and plasma prothrombin time, the intra-individual variation was estimated as described in the Appendix. The analytical imprecisions listed in Table 1 represent median values of the intra-laboratory coefficients of variation recorded in surveys by The College of American Pathologists (8, 13), and so may be regarded as representative state of the art values. The total standard deviations are based on the biological and the analytical standard deviations; for all analytes, I have assumed the pre-analytical standard deviation to be equal to one-half the analytical standard deviation, an intermediate value of the pre-analytical standard deviations recorded for various analytes (14, 15). Thus, the formula for the totalstandard deviation becomes:
examined
=

Background and Assumptions


Biological

Medically Important Differences Related to Analytical and Variation

To physicians in various medical fields Skendzel et al. (8) common clinical problems, asking the physicians to select from several possibilities the change in laboratory result for a patient that would elicit an action, i.e., further testing or therapy. Most of the questions concerned a change in a patient being monitored for some disease; in some cases, a difference between a patients value and a fixed limit, e.g., a reference interval limit, was considered. From 750 responses, Skendzel et al. recorded the median value of the answers to a given question and converted this to a relative value with respect to the analyte concentration. From these median differences of medical importance (Imeci), so-called medically useful coefficients of variation (CV,,,) were derived:
posed 25 questions concerning CVmeci
=

V1.25 s + s

med(1.65V)

The medically important differences in Table 1 reprosent the difference between a fixedlimit and a patients result thatelicits an action. These values are equal to the CV,, values of the study of Skendzel et al. (8), multiplied by 1.65. In those situations where Skendzel et al. The clinical questionswere directional, so the statistical should be one-sided.
tests

Department of Clinical Chemistry KK 4051, Rigshospitalet, Blegdamsvej9, DK-2 100Copenhagen0, Denmark. Received September 29, 1988; accepted November 18, 1988.
284 CLINICAL CHEMISTRY, Vol. 35, No. 2, 1989

Table 1. Intra-Indlvidual Biological Variation (se,), Analytical Imprecision (Sa), Total Standard Deviation (sJ, Medically Important Difference and for 17 Analytes
lntra-Indlv.
Analytea var.
(Sb)

Aspartate aminotransferase Biiirubin

0.140
0.230 0.048

Cholesterol Triglyceride
Creatinine Urea Glucose Iron Phosphate Total protein Thyroxin Hemogiobin

Anal. var. (s) 0.061 0.109 0.038

Total at.

dev. (St)
0.156 0.260 0.064

A,,,.,

0.335 0.431 0.203

2.2 1.7 3.2

0.260 0.044
0.124

0.054
0.061 0.046 0.035 0.032 0.035 0.022 0.089 0.011 0.025 0.038 0.024 0.021 0.012

0.267
0.081

0.266
0.259

1.0
3.2 2.3 3.8 1.1 3.7 3.5 3.0 1.6 1.7 4.0 3.0 2.4 2.3

0.134
0.059 0.262 0,070 0.039 0.125 0.047 0.150 0.062 0.032 0.050 0.016

0.044
0.260 0.058

0.308 0.224
0.284 0.257 0.137 0.380 0.076 0.251 0.251 0.097 0.119 0.036

0.030
0.076 0.045 0.147 0.045 0.018

Leukocytecount
Prothrombin time Calcium Potassium

0.044
0.008

SodIum
a Measured in

serum,plasma,or blood. concentrations. All results are expressed as relative to analyte

gave two or three values for the same analyte, corresponding to different clinical problems, I have selected the mean
value. Medically important changes in a patient beiflg monitored are obtained by multiplying these values by V2. Finally, for most (15/17) of these values are equal to or exceed 1.65. The Meaning of the Requirement for Statistical Significance How important is the requirement for statistical significance?Although clinicians usually do not carry out formally statistical tests of significance in everyday work, this concept is nevertheless of real importance. ffm&j/St is >1.65 for an analyte, observed changes () that are larger than the action limit will occur in <5% of the situations in which the true state of the patient is stationary (Atrue = 0) (FigUre la). Here the true state refers to the homeostatic set point of the patient. In other words, the frequency of false-positive alarms is <5%. Using the terminology of statistical hypothesis testing, we may say that the clinical type I error (a) is <5%. Alternatives to the stationary state should also be studied. Ifwe presume that Atrue of the patient is equal to (Figure lb), the requirement that Amj/St be 1.65 Will ensure that 5% of the patients have observed A values in the opposite direction of the true difference (an obviously misleading trend). On the other hand, only 50% of the patients observed A values will exceed Am. Only when A exceeds (A + 1.65s) do 95% of the patients exhibit values greater than the action limit (Figurelc). The properProbab Sty density Probability density

tion of patients (here 5%) with values under the action limit may be regardedas false-negatives; using statistical terminology, we may say that the clinical type II error () is 5%. From the character of the questions posed by Skendzel et al. (8), it is not clear whether clinicians pay attention primarily to the type I or the type II error when they decide upon an appropriate action limit. Probably, they intuitively seek to reduce the type I error to a low level, say about 5%, when the consequencesof false-negative outcomes are moderate to small. For example, if a false-negative outcome means only some transient discomfort to a patient, a high type H error is acceptable, and a reasonable strategy is to keep the type I error low to reduce costs(supposing that the overlap prevents low values for both). On the other hand, if a false-negative event indicates a serious outcome, for example, a critical delay of the detection of a cancer, clinicians probably primarily seek to reduce the type II error and accept a somewhat high type I error. In Table 1, Amj/St is <1.65 for serum iron and serum triglyceride. The actual values of about 1 indicate that the frequency of false alarms for patients whose status is not changing will be about 15% (only the one-sided direction is considered). Whether this is a choice intended by the clinicians is hard to say. Probably, the clinicians have merely underestimated the biological
variation, which is rather large for these analytes. Notice

also that decreasing the analytical standard deviations for these analytes willnot have much effect, because these are already very low in comparison with the biological variations.

MED

0
A

The medicallyimportant change (

Fig. 1. The distribution of observed patients A values for (a) A, a equal to 1.65 s,

0; (b) A,

A,;

(c) A

A,,.,

1.65s

CLINICALCHEMISTRY, Vol.35, No. 2, 1989 285

Because alternative states for heterogeneous everyday clinical problems are difficult to specify, I have chosen to follow the pragmatic approach of Skendzel et al., focusing on the type I error problem and retaining the requirement A,/s 1.65 as a reasonableapproach in considering the maximum allowable analytical errors (see below). This condition seeks to keep the false-positive alarm frequency reasonably low and ensures that only up to 5% of the observed A values are in the opposite direction of the true A when the true A equals A.

which

rearranges s

to
=

Vs

0.5 s

The critical

random error is finally standardized: AREC(8t)= S/sa

About half of the 12 analytes considered have ARE values between 2 and 3 (Table 2). Sodium has the minimum value (1.6) and phosphate the maximum (4.1). Quality-Control Designs Appropriate for Detection of Critical Errors Having estimated the critical errors ASE) and ARE, one can select a quality-control system that is able to detect these errors with a reasonable probability (power). The notation AL for quality-control rules is used, where A is the number of control observations that must exceed the limit L to indicate an out-of-control signal. Table 3 presents powers for some common control rules: two simple rules, 1 for n = 1 control per run and 1 (n = 1), and a multi-rule, 1/2/ R(n = 2, 4, or 6) (here, s refers to the analytical standard deviation). The multi-rule gives a reject signal if one control exceedsa 3s limit, two consecutive controls exceed the same 2s limit, or the maximum and the minimum values of the controls deviate by more than 4s. The 2 component is sensitive towards systematic errors, whereas R in particular detects random errors. The powers have been read from the power graphs presented by Westgard et al. (1, 2). Combining Tables 2 and 3 guides us to select an appropriate quality-control system for a given analyte. Several of the analytes (potassium, calcium, urea, thyroxin, and creatinine) have ASE) values of about 2 and ARE values from 2.3 to 3. For these analytes the 1 (n = 1) rule detects both error types with the power from 0.40 to 0.50. Higher power values (0.65 - 0.85) are achieved by using the 1,/2/ R(n = 4) rule. The 1(n = 1) rule has low power here and is insufficient. One shouldgenerally strive for high levels of power when errors are frequent; on the other hand, one can accept lower levels when errors are rare (16). The error frequency is seldom known, but past experience may give some indication. For example, electrochemical determination of electrolytes is an analysis subject to frequent errors. For potassium assayed by this principle, the multi-rule with n = 4 thus may be preferable to the simple 1(n = 1) rule. For the few analytes with ASE) about 1 (sodium, aspartate aminotransferase), a low level of power has to be accepted, even with n = 6 controls and the multi-rule. Using a multi-rule over several runs may increase somewhat the probability for detecting persisting errors (17). A ASE) value of about 3 is detected with powers 0.50 and 0.84 with use of the simple rules 1 and 1(n = 1), Probabhilty
density

Results
Critical Analytical Errors Derived from Medically Important Differences
An analytical method ordinarily produces results that are scattered about a target value, , with a standard deviation, S (Figure 2a). This in-control state may be disturbed by errors. A systematic error (inaccuracy) is a constant bias (ASE) thatshifts the target value from to1& + ASE (Figure 2b). A random error represents an increase of the baseline scatter about the targetvalue (Figure 2c). Analytical methods for which A.Jst is >1.65 for the in-control state may still, in an out-of-control state, fulfill the basic requirement of <5% false-positive alarms in the clinically stationary state. Figure 3 illustrates the maximum systematic error (ASE) at which this condition is just fulfilled. ASEC is defined by the following relation:

ASE

AmM

1.65

St

It is convenient to express the critical systematic error in analytical standard deviation units. The standardized critical systematic error is: ASEct)
=

ASEdSa

[A

1.65StJfS

The above-mentioned approach is valid for all analytes in Table 1 for which Am,/St substantially exceeds 1.65, i.e., for 12 of 17 analytes (Table 2). Two of the 12 analytes have ASE values of about 1, half exhibit values of about 2, and a third have values equal to or greater than 3 (maximum 4.0). The standardized critical random error (ARE) is determined as follows.An increase of the analytical standard deviation from s to s changes the total standard deviation sto s. To avoid having >5% of observed A values exceed A,, (in the same direction), when the true A is zero, the following condition should be fulifiled (Figure 4):
AmJS
=

1.65

Having determined s, s is obtained as follows (supposing the pre-analytical error (s = 0.5 Sa) to be constant):
s
=

Vs

+ 0.5 s + s

Probability density

density

Probability

ii

Test result

Test result

ji

Test result

Fig.2. DistributIonof observed analytical resultsabout the target value (a) forthein-control state;(b) inthe presence of a systematic error, ASE; and (C) when the standard deviation is increasedfrom s to s, 286 CLINICALCHEMISTRY, Vol.35, No. 2, 1989

Probability density

Probability
density

5%

Test result
*

Test result
#{149}

MED

MED

FIg.3. Distribution of observed patients resultsabout + ASEC with a standard deviation , given a homeostatic set point and a systematic analytical errorASEC 5% of the values exceed + respectively. The ability to reveal a critical random error of about 3 is not quite as high for these rules, 0.30 and 0.50, respectively. Very high power levels for both error types are achieved by using the multi-rule (n = 4). The analytes cholesterol, total protein, and glucose have ASE) and ARE) values of about 3. For analytes with A,/s <1.65, we cannot calculate ASE) and AREC(,t) along the principles considered here. Instead, we may arbitrarilyconsider some error level, say ASE) and AREc(8t) equal to 2, and select an appropriate quality-control system.

Fig. 4. Distnbution ofobservedpatients results aboutthe homeostatic set point with an increased standarddeviation(s ) inthe presence of a randomanalytical error

Table 3. Frequency of False Alarms (Type I Error) and ProbabilIty of Error DetectIon (Power) for Some QualIty-Control Rul86a
12a,

13,

1J2R4,
n=2 n=4 n=6

n=1

Type I error Power


ASEC(,)
=

0.05 1
2 3

n=1 0.003

0.01 0.05 0.42


0.85

0.03 0.15 0.68 0.98


1.0
0.28 0.57 0.87 0.95

0.06 0.22
0.85

4
= 1.5 2 3

0.16 0.50 0.84 0.98


0.18

0.02 0.16 0.50 0.84


0.05

1.0
1.0
0.42 0.75 0.97 0.99

0.99
0.12

Discussion
The aim of the present study was to propose suitable quality-control designs for commonly measured components in serum or blood. The several investigators (3-8) who have attempted to define analytical goals from clinicians opinions have used various study designs and have arrived at somewhat different results. The study by Skendzel et al. (8), which is the most recent and the largest investigation, gave on average larger values for medically important differences than did the others and, accordingly, has been criticized for setting too-liberal limits for analytical imprecision (18). As shown here, one should also take into account the preanalytical and biological components of variation when making judgments on the analytical quality required. Allowing for these additional sources of random variation, the medically important differences recorded by Skendzel et al. do not seem unreasonable. On the contrary, the smaller values attained in previous studies result in rather large clinical type I errors in many cases. Clinicians are probably inclined to underestimate the biological component of variation (19). An alternative way to define goals for analytical impreci-

0.32 0.50
0.62

0.13
0.32 0.45

0.30
0.55 0.74

ThetypeI errorand powers forRE.) = 4 have been determined here theoretically orby simulations.The other data are from Westgardat al. (1, 2).

Table 2. CrItIcal SystematIc (ASE)) and Random (ARE)) Errors for 12 Analytes
Analyte Sodium
Aspartate aminotransferase 0.8 1.3
REC(S()

Potassium
Calcium

1.8 1.8
1.9 2.0 2.1 2.6 3.3 3.6 3.9 4.0

1.6 2.4 2.8 2.3

Urea
Thyroxin

sion consists of focusing on the biological variation. The limiting requirement for the analytical coefficient of variation is half the intra-individual biological variation, expressed as a coefficient of variation (10). It seems reasonable to apply this goal for the in-control analytical imprecision and establish maximum allowable analytical errors on the basis of the clinicians opinions on what changes are important. Strictly, we should consider the consequences of analytical errors on both clinical type I and type H errors. Here, maximum allowable errors have been defined in such a way thatan upper bound has been set on the type I error (5%). Type II errors and their consequences are difficult to evaluate for the everyday clinical situation, so establishing error limits on this ground is not easy. One should recognize, however, that errors below the critical limits also increase the type land H errors and, preferably, should be avoided. In simple screening situations, where plus or minus detection of disease depends solely upon a laboratory result, a formal decision theoretical analysis may be applied to study the effect of analytical errors on both type I and H errors and their costs (20, 21). With the reservations given above concerning the significanoe of the computed ASE) and AREC(,) values in mind, we are able to judge whether commonly used quality-control systems are sufficient or not. Given the many analytes for which ASE) and ARE exceed 2, the simple system 1(n
= 1) provides a moderate power, which for a persisting error leads to a reject signal in the course of, on average, two or three analytical runs.2 This willoften be satisfactory. The rule 13,(n = 1), however, is clearly insufficient and should be

Creatinine
Cholesterol

24
2:9

Total protein
Glucose

3.5
3.6 3.8 4.1

Prothrombin time
Phosphate

__________________________ 2Average run length = llpower.

CLINICALCHEMISTRY,Vol.35, No. 2, 1989 287

reserved for analytes in which ASE) and ARE exceed3. The advantage of the 1(n = 1) rule is the very low frequency of false rejections (0.0027). The multi-rule offers a greater efficiency for a given number of controls, but it is also more complicated to apply. Notice that the powers of Table 3 have been obtained under the assumption of no between-run analytical component of variation. If this component is of about the same sizeas the within-run variation, the actual powers are somewhat smaller (22). Some authors (13) have argued that quality-control limits being used today are generally too narrow, being based on the analytical standard deviation, which has declined in recent years. Claiming that a quality-control system is too rigidis the same as saying that the power is unnecessarily high for detection of medicallyimportanterrors, i.e., 0.99or greater. If this is the case, widening of the control limits would reduce the frequency of false rejections (and so the costs), while maintaining a sufficient degree of power (a reduction from 0.99 to 0.90 would be fully acceptable). According to the present analysis, suitable quality-control systems can be selected from the set of traditional rules being based on the analytical standard deviation. For none of the analytes is the power of the most liberal control rule [138(n = 1)] unnecessarily high, justifying a widening of the control limits to four or more analytical standard deviations. Thus, the criticism (13) of the traditional control-rule principle, founded on the analytical standard deviation, is not tenable.Nor is the approach adopted by Schoen et al. (23), which consists of using clinically useful control limits. A quality-control rule is a statistical test of the null hypothesis-the analysis is in control-against the alternative-the analysis is out of control. The design should be based on a consideration of the frequency of false rejections (type I error) and the power. Only when the power for a medically important error turns out to be far too large should the controllimitsbe widened.

the qualityand productivity of analytical processes. Washington, DC: Am Assoc for Clin Chem, 1986. 3. Barnett RN. Medical significance of laboratory results. Am J
Clin Pathol 1968;50:671-6.

4. Campbell DG, Owen JA. The physiciansview of laboratory performance. Aust Ann Med 1969;18:4-6. 5. Skendzel LP. How physiciansuse laboratory tests. J Am Med
Assoc 1978;239:1077-80. 6. Barrett AE, Cameron SJ, Fraser CG, Panberthy LA, ShandKL. A clinical view of analytical goals in clinical biochemistry. J Clin

Pathol 1979;32:893-6. 7. Elion-Gerritzen WE. Analytic precision in clinical chemistry and medical decisions. Am J Clin Pathol 1980;73:183-95. 8. Skendzel LP, Barnett RN, Platt R Medically useful criteria for analyticperformance of laboratory tests. Am J Clin Pathol 1985;83:200-5. 9. Harris EK. Statistical principlesunderlying analytic goal-setting in clinical chemistry. Am J Clin Pathol 1979;72:374-82. 10. Subcommittee on Analytical Goals in Clinical Chemistry, WASP, CIBA Foundation, London, England, 1978, Proceedings. Analytical goalsin clinical chemistry:their relationshipto medical care. Am J Clin Pathol 1979;71:624-30. 11. Winkel P, Statland BE, Bokelund H. Factorscontributing to intra-individual variation of serum constituents: 5. Short-term dayto-day and within-hour variation of serumconstituentsin healthy subjects. Clin Chem 1974;20:1520-7. 12. Young DS, Harris EK, Cotlove E. Biological and analytic components of variation in long-termstudiesof serumconstituents in normalsubjects. 4. Resultsof a studydesignedto eliminate longterm analytic deviations.Clin Chem 1971;17:403-10. 13. RossJW, Fraser MD. Clinical laboratoryprecision. The state of the art and medical usefulness based internal quality control. Am J
Cliii Pathol 1982;78(Suppl):578-86.

AppendIx
Concerning biological variation,the focus has been on serum constituents.To derive estimates for hemoglobin, leukocyte count, and plasma prothrombin time, the group 0.95-reference intervals presented in ref. 24 were used as a starting point. Assuming gaussian distributions, the range of the 0.95-reference interval corresponds to four total standard deviations, where the total standard deviation is:
St
=

+ Sjny)

+ S&inter)

Notice that here we have both an intra- and an interindividual biological component of variation (9). By arbitrarily setting = S) and 5p = O.SSa, Sl,(j,,t.a) 1 be computed and converted to a relative value with respect to the mean of the reference interval. We did so, obtaining the following values:
Sb(intra)

14. Bokelund H, Winkel P, Statland BE. Factorscontributing to untra-individual variation of serumconstituents: 3. Use of randomized duplicate serum specimens to evaluate sources of analytical error. Clin Chem 1974;20:1507-12. 15. Winkel P, Statland BE, Nielsen MK. Biological and analytic components of variation of concentrationvalues of selectedserum proteins.Scand J Clin Lab Invest 1976;36:531-7. 16. Westgard JO, Groth T. The predictive value theory of quality control. Am J Clin Pathol 1983;80:49-56. 17. Westgard JO, Barry FL, Hunt MR, Groth T. A multi-rule Shewhart chart for quality control in clinical chemistry [Proposed Selected Method]. Cliii Chem 1981;27:493-501. 18. Fraser CG. Better criteria for desirable laboratory performance exist [Letter]. Am J ClinPathol 1986;85:251. 19. Shephard MDS, Penberthy LA, Fraser CG. Analytical goals for quantitative urine analysis: a clinical view [Letter]. Clin Chem 1981;27:1939-..40. 20. Groth T, Ljunghall S, De Verdier C-H. Optimal screening for patients with hyperparathyroidism with use of serum calcium observations. A decision-theoretical analysis. Scand J Clin Lab Invest 1983;43:699-707. 21. Petersen PH, RosleffF, RasmussenJ, Hobolth N. Studies on the required analytical quality of TSH measurements in screening for congenital hypothyroidism. Scand J Clin Lab Invest 1980;40(Suppl
155):85-93. 22. Westgard JO, Falk H, Groth T. Influence of a between-run component of variation, choice of controllimits, and shapeof error distribution on the performance characteristics of rules for internal quality control. Chin Chem 1979;25:394-400. 23. Schoen I, Custer E, Graham G,BandiZ, Surovik MH. Quality control log with CUSUM and clinically useful limits criteria, Arch Pathol Lab Med 1985;109:333-9. 24. Tietz NW, ed. Textbook of clinical chemistry. Philadelphia,PA: WB Saunders & Co., 1986:1829-44.

Hemoglobin Leukocyte count Prothrombin time References

0.045 0.147
0.045 control

1. Westgard JO, Groth T. Power functions for statistical

rules. Clin Chem 1979;25:863-9. 2. Westgard JO, Barry FL Cost-effective quality control: managing

288 CLINICALCHEMISTRY, Vol. 35, No. 2,

1989

Вам также может понравиться