Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/6402702
ARTICLE in JOURNALOFCHROMATOGRAPHYA·AUGUST2007
ImpactFactor:4.17·DOI:10.1016/j.chroma.2007.03.111·Source:PubMed
CITATIONS
148
8AUTHORS,INCLUDING:
52 PUBLICATIONS 525 CITATIONS
107 PUBLICATIONS 2,681 CITATIONS
READS
280
266 PUBLICATIONS 6,626 CITATIONS
253 PUBLICATIONS 5,245 CITATIONS
Availablefrom:AttilioCeccato
Retrievedon:10February2016
Journal of Chromatography A, 1158 (2007) 111–125
Analysis of recent pharmaceutical regulatory documents on analytical method validation
Eric Rozet ^{a} , Attilio Ceccato ^{b} , Cedric´
Hubert ^{a} , Eric Ziemons ^{a} , Radu Oprean ^{c} ,
Serge Rudaz ^{d} , Bruno Boulanger ^{b} , Philippe Hubert ^{a}^{,}^{∗}
^{a} Laboratory of Analytical Chemistry, Bioanalytical Chemistry Research Unit, Institute of Pharmacy, University of Li`ege, CHU, B36, B4000 Li`ege, Belgium ^{b} Lilly Development Centre, rue Granbompr´e 11, B1348 MontSaintGuibert, Belgium ^{c} Analytical Chemistry Department, Faculty of Pharmacy, University of Medicine and Pharmacy “Iuliu Hatieganu”, 13 Emil Isac Street, RO3400 ClujNapoca, Romania ^{d} Laboratory of Pharmaceutical Analytical Chemistry, School of Pharmacy, University of Geneva, 20 Bd. d’Yvoy, 1211 Geneva 4, Switzerland
Available online 1 April 2007
Abstract
All analysts face the same situations as method validation is the process of proving that an analytical method is acceptable for its intended purpose. In order to resolve this problem, the analyst refers to regulatory or guidance documents, and therefore the validity of the analytical methods is dependent on the guidance, terminology and methodology, proposed in these documents. It is therefore of prime importance to have clear deﬁnitions of the different validation criteria used to assess this validity. It is also necessary to have methodologies in accordance with these deﬁnitions and consequently to use statistical methods which are relevant with these deﬁnitions, the objective of the validation and the objective of the analytical method. The main purpose of this paper is to outline the inconsistencies between some deﬁnitions of the criteria and the experimental procedures proposed to evaluate those criteria in recent documents dedicated to the validation of analytical methods in the pharmaceutical ﬁeld, together with the risks and problems when trying to cope with contradictory, and sometimes scientiﬁcally irrelevant, requirements and deﬁnitions. © 2007 Elsevier B.V. All rights reserved.
Keywords: Validation; Guidelines; Terminology; Methodology; Accuracy proﬁle
1. Introduction
The demonstration of the ability of an analytical method to quantify is of great importance to ensure quality, safety and efﬁcacy of pharmaceuticals. Consequently, before an analyti cal method can be implemented for routine use, it must ﬁrst be validated to demonstrate that it is suitable for its intended purpose. While the need to validate methods is obvious, the procedures for performing a rigorous validation program are generally not deﬁned. If regulatory documents allow selecting the validation parameters that should be established, there are still three main questions remaining: (a) How to interpret the regulatory deﬁnitions of the parameters? (b) What should be the speciﬁc procedure to follow to evaluate a particular parame
^{∗} Corresponding author. Tel.: +32 4 366 43 16; fax: +32 4 366 43 17. Email address: Ph.hubert@ulg.ac.be (P. Hubert).
00219673/$ – see front matter © 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.chroma.2007.03.111
ter? (c) What is the appropriate acceptance criterion for a given parameter? Furthermore, method validation is not speciﬁc to pharmaceutical industry, but to most industrial ﬁelds involving either biology or chemistry. Even though each ﬁeld of work has
its own characteristics and issues, the main criteria to fulﬁl are similar or should be similar since the validation of an analyti cal method is independent of the industrial sector, matrix of the samples or analytical technology employed. A harmonized val idation terminology should be adopted to allow discussions and comparisons of validation issues between scientists of different ﬁelds. This consensus on terminology is not yet available even
if an attempt was made [1,2]. However, if it is desirable to have
a harmonization between the different ﬁelds interested in ana
lytical validation, it is interesting to note that, even within the
pharmaceutical ﬁeld, all the laboratories are not using the same terminology while they should use similar deﬁnitions to describe validation criteria. The terminology used between different ofﬁ cial documents such as the Food and Drug Administration (FDA)
112
E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125
guide on validation of bioanalytical methods [3], ICHQ2R1 [4], ISO [5,6], IUPAC [7], AOAC [8] is different. Further more, in some cases inhomogeneous terminology can be found throughout the same document depending on the section where it is mentioned. Therefore, the knowledge and understanding of these signiﬁcant differences in terminology and deﬁnitions are essential since the methodologies proposed to fulﬁl the deﬁnition criteria can lead to confusion when preparing the validation pro tocol and the experimental design. Furthermore the subsequent statistical interpretation of the results obtained and the ﬁnal deci sion about the validity of the analytical procedure depends on the consistent and adequate deﬁnition of the criteria assessed. This leads to highly critical consequences since the validated analytical method will be daily used in routine analysis (batch release, stability assessment, establishment of shelf life, phar macokinetic or bioequivalence studies, etc.) to make decision of the utmost business and public health consequences. Therefore, the main objective of this review is to reveal the inconsisten cies between the deﬁnitions of the validation criteria and the proposed experimental procedures to perform those criteria as well as the statistical tools mandatory to help the decision about the validity of the analytical procedure. The main points dis cussed in this review are: (a) the distinction that can be made concerning speciﬁcity and selectivity; (b) the clariﬁcation of the linearity concept and the difference with the response function; (c) the deﬁnition of precision, trueness and accuracy; (d) the discussion about the decision rules to adopt from a statistical point of view; (e) the deﬁnition of the dosing range in which the analytical method may be used and, last but not least, (f) the determination of the limit of quantiﬁcation (LOQ). Finally,
the risks and problems when trying to cope with inconsistent, sometimes scientiﬁcally irrelevant, requirements and deﬁnitions are highlighted.
2. Speciﬁcity or selectivity
The ﬁrst criterion for an analyst when evaluating an ana lytical method consists in its capability of delivering signals or responses that are free from interferences and give true results. This ability to discriminate the analyte from interfer ing components has been confusedly expressed for many years as “selectivity” or “speciﬁcity” of a method, depending on area of expertise of the authors. The terms “selectivity” and “speciﬁcity” are often used interchangeably while their signiﬁcances are different. This concept was extensively discussed by Vessman in different papers [9–13]. He particularly pointed out that organizations such as IUPAC, WELAC or ICH are deﬁning speciﬁcity and/or selectivity in different manners (Table 1). However, a clear distinction should be made as proposed by Christian [14], “A speciﬁc reaction or test is one that occurs only with the substance of interest, while a selective reaction or test is one that can occur with other substances but exhibits a degree of preference for the substance of interest. Few reaction are speciﬁc, but many exhibits selectivity”. This is consistent with the concept that selectivity is something that can be graded while speciﬁcity is an absolute characteristic. Some tentative to quantify selectivity can be found in the literature [15–19]. For many analytical chemists, it is commonly accepted that speciﬁcity is something exceptional since there are, in fact, few methods that respond
Table 1 Deﬁnitions of selectivity and speciﬁcity in different international organizations
Organization 
Deﬁnition 
Reference 

IUPAC 
Selectivity (in analysis) 
[14] 

1. 
(qualitative): The extent to which other substances interfere with the determination of a 

substance according to a given procedure. 

2. (quantitative): A term used in conjunction with another substantive (e.g. constant, coefﬁcient, index, factor, number) for the quantitative characterization of interferences. Speciﬁc (in analysis) A term, which expresses qualitatively the extent to which other substances interfere with the determination of a substance according to a given procedure. Speciﬁc is considered to be the ultimate of selective, meaning that no interferences are supposed to occur. The term speciﬁcity is not mentioned. Selectivity of a method refers to the extent to which it can determine particular analyte(s) in a complex mixture without interference from other components in the mixture. A method, which is perfectly selective for an analyte or group of analytes is said to be speciﬁc. Not deﬁned. 

WELAC 
[20] 

ISO 

ICH 
Speciﬁcity is the ability to assess unequivocally the analyte in the presence of components, which may be expected to be present. Typically these might include impurities, degradants, matrix, etc. Lack of speciﬁcity of an individual analytical procedure may be compensated for by other supporting analytical procedure(s). Test for interferences (speciﬁcity): (a) Test effect of impurities, ubiquitous contaminants, ﬂavours, additives, and other components expected to be present and at unusual concentrations. (b) test nonspeciﬁc effects of matrices. (c) Test effects of transformation products, if method is to indicate stability, and metabolic products, if tissue residues are involved. 
[4] 

AOAC 
[8] 
IUPAC: International Union of Pure and Applied Chemistry; WELAC: Western European Laboratory Accreditation Cooperation; ICH: International Conference on Harmonization; ISO: International Organization for Standardization; AOAC: Association of Ofﬁcial Analytical Chemists.
E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125
113
to only one analyte. Considering these elements, the IUPAC deﬁnition stating that the speciﬁcity can be considered as the ultimate selectivity seems to be rational regarding the situation in the pharmaceutical industry [9]. It must be noted that WELAC provides probably the most clear deﬁnition of selectivity by saying that a method which is perfectly selective for an analyte is said to be speciﬁc [20]. As recommended by the IUPAC and WELAC, the term selectivity should be promoted in analytical chemistry and particularly in separation techniques and the term speciﬁcity should be discouraged.
3. Response function and linearity
The response function for an analytical procedure is the exist ing relationship, within a speciﬁed range, between the response (signal, e.g. area under the curve, peak height, absorption) and the concentration (quantity) of the analyte in the sample. The calibration curve should be described preferably by a simple monotonic (i.e. strictly increasing or decreasing) response func tion that gives reliable measurements, i.e. accurate results as discussed thereafter. The response function – or standard curve – is widely and frequently confounded with the linearity crite rion. The linearity criterion refers to the relationship between the quantity introduced and the quantity backcalculated from the calibration curve while the response function refers to the relationship between the instrumental response and the con centration. Because of this confusion, it is very common to
see laboratory analysts trying to demonstrate that the response function is linear in the classical sense, i.e. a conventional least squared linear model is adapted. As demonstrated by several authors, to systematically force a linear function is not required, often irrelevant and may lead to large errors in measured results (e.g. for bioanalytical methods using LC–MS/MS or ligand bind ing assays) where the linear range can be different from the working or dosing range [21,22]. A signiﬁcant source of bias and imprecision in analytical measurements can be caused by the inadequate choice of the statistical model for the calibra tion curve. The confusion is even contained and maintained in the ICH document. In the terminology part of Q2R1 (formerly
ability (within
a given range) to obtain test results which are directly propor
Q2A), the linearity is correctly deﬁned as the “
tional to the concentration (amount) of analyte in the sample.”
But later in the methodology section (formerly Q2B) it is men
tioned that “Linearity should be evaluated by visual inspection
of a plot of signals as a function of analyte concentration or con
tent.” The text indicates clearly that it is the signal and no more the result that matters in the linearity. The document clearly confounds, on one hand, linearity and calibration curve and, on the other hand, test results and signal. The continuation of the text is selfexplicit: “If there is a linear relationship, test results should be evaluated by appropriate statistical methods, for example, by calculation of a regression line by the method of least squares.” For an analyst, the “test results” are, without ambiguity, the backcalculated measurements evaluated by the “regression line” that is in fact the calibration curve, established using appropriate statistics methodologies. Last but not least, the fact that no linearity is needed between the quantity and the
signal is – paradoxically – contained in the last sentence of that section devoted to linearity: “In some cases, to obtain linearity between assays and sample concentrations, the test data may have to be subjected to a mathematical transformation prior to the regression analysis.” Indeed, if any kind of mathematical transformation can be applied to both concentration and/or sig nal to make their relationship looking like “straight lines” what is the very purpose of requiring linearity? Clearly, the intend of that section was, confusedly, to suggest that to use the clas sical leastsquare linear function it is sometimes convenient to apply transformations to the data when the “visual plot” signal versus concentration does not look straight. It is indeed a good trick, largely diffused to establish the standard curve, but that trick should not be interpreted as a scientiﬁc necessity to have a “linear” relationship between the concentration and the sig nal. Hopefully, since 1995 understanding has evolved so that the FDA guidance on Bioanalytical Method Validation issued in May 2001 [3] does not contain any more the word “linearity” but only “calibration/standard curve” without particular restriction except that “The simplest model that adequately describes the concentration–response relationship should be used.” Nevertheless, the same confusion in concept and wording between response function and linearity of results still can be found in the recent book by Ermer and Miller [23]. While those authors indicate that “some analytical procedures have intrin ”
sic nonlinear response function, such as quantitative TLC
they continue to use the linearity terminology to express the calibration curve. In the same context, HPLC methods coupled to spec trophotometric detection (UV) are usually linear according to Lambert–Beer’s law while immunoassays are typically non linear. However, even for HPLC–UV methods covering a large dynamical range, advanced models, such as quadratic models or log–log models, could be necessary. Indeed, it is important to model properly the whole procedure, including all the handling and the preparation of samples that do not obviously remain linear over a large range of concentration even if the detector response is, according to Lambert–Beer’s law. It has to be noted that the complete analytical procedure should be modeled by an overall appropriate response function. As long as the model remains monotonic and allows an accurate measurement, that is all that is required. Another aspect that is very important and that has been largely neglected and ignored in the analytical literature is the ﬁtfor purpose principle [21]. The central idea is very logical: the purpose of an analytical procedure is to give accurate measure ments in the future; so a standard curve must be evaluated on its ability to provide accurate measurements. A signiﬁcant source of bias and imprecision in analytical measurements can be caused by the inadequate choice of the statistical model for the calibra tion curve. The statistical criteria such as R ^{2} , lackofﬁt or any
other statistical test to demonstrated quality of ﬁt of a model are only informative and barely relevant for the objective of the assay [21,24–27]. For that intend, several authors [1,2,28] have introduced the use of the accuracy proﬁle based on the toler ance intervals (or prediction intervals) to decide if a calibration model will give quality results. The models should be retained
114
E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125
Fig. 1. Accuracy proﬁles of the LC–MS/MS assay for the determination of loperamide in plasma (concentration pg/ml) using (A) linear regression model, (B) weighted linear regression model with a weight equal to 1/X ^{2} , (C) linear regression model after logarithm transformation, (D) quadratic regression. The dotted lines represent the acceptance limits (−15%, 15%); the dashed lines the 95% tolerance interval connected. When the tolerance intervals are included in the acceptance limits, then the assay is able to quantify accurately, other wise not. The continuous line represents the estimated relative bias line.
or rejected based on the accuracy of the backcalculated results regardless of the statistical properties. This approach has already been used by several authors such as Streel et al. [29] for the val idation of a LC–MS/MS assay for the quantitative determination of loperamide in plasma [29]. As can be seen from Fig. 1 and indicated by the authors, the weighted linear regression provides for the procedure the best accuracy proﬁle as obtained by joining the extremes of the 95% tolerance intervals, i.e. the interval that will contain 95% of the future individual results. Inversely, the simple linear model, the quadratic or even a model with a log–log transfor mation are not adapted because they do not better contribute to the ultimate goal of the assay, i.e. providing accurate results in the future. Indeed, the tolerance intervals for those three models are not as included in the acceptance limits deﬁned as with the selected model. Nevertheless, as can be seen on Fig. 2, when looking at the quality of ﬁt as usually practiced, the four models exhibited a R ^{2} > 0.999 for all series. This ﬁgure, representing in a way the quality of the “linear” ﬁt [4], does not show any difference from one model to the other. This contrast with the accuracy proﬁle ﬁgure, where a major difference exists in the
quality of the results depending on the model selected as standard curve. Another example to illustrate the difference between response function, linearity and ﬁtforpurpose accuracy proﬁle can be obtained with a highperformance thinlayer chromatographic assay (HPTLC; Fig. 3) and an enzymelinked immunosorbent assay (ELISA) published in [30] (Fig. 4). Indeed as can be seen, using a quadratic response function for the HPTLC assay or
a nonlinear standard curve such as the weighted four param
eter logistic model for the ELISA, the graphic of the signal
(Figs. 3.a.1 and 4.a.1) does not look linear, while the results as a function of the concentration are linear (Figs. 3.a.2. and 4.a.2.). The same apply with the accuracy proﬁle (Fig. 3.a.3. and 4.a.3) that clearly shows that when using this standard curve, the assay
is able to quantify over a large range. This property is not fulﬁled
with the other linear models for both type of assay. In both cases, the ﬁt of the model is acceptable, but none of these two “linear” models show acceptable linearity or accuracy according to ICH deﬁnition. The standard curve model selection based on the obtained accuracy of the results was difﬁcult to envisage few years ago
E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125
115
Fig. 2. Response functions for the LC–MS/MS assay for the determination of loperamide in plasma (concentration pg/ml) for series 2 only using (A) linear regression model (R ^{2} = 0.9991), (B) weighted linear regression model with a weight equal to 1/X ^{2} (R ^{2} = 0.9991), (C) linear regression model after logarithm transformation (R ^{2} = 0.9997), (D) quadratic regression (R ^{2} = 0.9991).
because it requires a lot of computing and is a postdata acqui sition scenario, e.g. evaluation of all the putative calibration models before making a choice. Currently, the computational power is no more a limitation and the selection of a model is perfectly aligned with the objective of the method. Having stressed the difference between response function and linearity, it allows to apply the concept of linearity not only to relative but also to absolute analytical methods such as titra tion for which the results are not obtained by backcalculation from a calibration curve. Attempts to provide a response func tion are therefore of no use and impracticable as there is no signal or response whereas the linearity of the results can be assessed. Statistical models for calibration curves can be either linear or nonlinear in their parameter(s)—as opposed to linear in shape, indeed a quadratic model Y = α + βX + γX ^{2} is linear because it is a sum of “X” even if its graphical representation may look “curved” on a X–Y plot. The choice between these two families of models will depend on the type of method and/or the range of concentrations of interest. When a narrow range is consid ered, an unweighted linear model is usually adapted, while a larger range may require a more complex or weighted model.
Weighting may be important because a common feature for most analytical methods is that the variance of the signal is a function of the level or quantity to be measured. In case of heterogeneous variances of the signal across the concentration range – which is frequent – it is natural to observe that weighting improve signiﬁcantly the accuracy of the results, particularly at low concentration levels. When observations are not weighted, an observation more distant to the curve than others has more inﬂuence on the curve ﬁt. As a consequence, the curve ﬁt, and so the backcalculated may not be good where the variances are smaller. Regardless of model type, it is assumed that all observations ﬁt to a model are completely independent. In reality, replicates are often not independent for many analytical procedures because of the steps followed in preparation and analysis of samples. In such cases, replicates should not be used separately. Models are typically applied on either a linear scale or log scale of the assay signal and/or the calibrator concentrations. The linear scale is used in case of homogeneous variance across the concentration range and the log scale is usually recommended when variance increases with increasing response, because it suggests that the response is lognormally distributed. Most commonly used types of
116
E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125
Fig. 3. Standard curves (top, 1), linearity proﬁles (middle, 2) and accuracy proﬁles (bottom, 3) obtained on a highperformance thinlayer chromatography using (left, a) a quadratic regression model, (right, b) a linear regression model. For the linearity proﬁle and the accuracy proﬁles, the dotted lines represent the acceptance limits (−10%, 10%); the dashed lines the 95% tolerance interval connected. When the tolerance intervals are included in the acceptance limits, then the assay is able to quantify accurately, other wise not. For the linearity proﬁle, the continuous line represents the identity line (result = concentration) while for the accuracy proﬁle the continuous line represents the estimated bias line.
polynomial models include simple linear regression (with or without an intercept) and quadratic regression models. The model parameters are estimated using the restricted maximum likelihood method, which is equivalent to the ordinary least square method when the data are normally distributed.
This being said, and because of ﬁtting techniques, the exper imental design, i.e. the way to spread the concentration values over the range may signiﬁcantly impact the precisions of the results or inverse predictions that the response function will pro vide. As show by Franc¸ois et al. [31], depending on the model
E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125
117
Fig. 4. Standard curves (top, 1), linearity proﬁles (middle, 2) and accuracy proﬁles (bottom, 3) obtained on an immunoassay using (left, a) a weighted 4parameter logistic model, (right, b) a linear regression on the most “linear” part of the response. For the linearity proﬁle and the accuracy proﬁles, the dotted lines represent the acceptance limits (−30%, 30%); the dashed lines the 95% tolerance interval connected. When the tolerance intervals are included in the acceptance limits, then the assay is able to quantify accurately, other wise not. For the linearity proﬁle, the continuous line represents the identity line (result = concentration) while for the accuracy proﬁle the continuous line represents the estimated bias line.
that will be used for the response function, there are designs that give more precise measurements than others. As general good “rule of thumb” for optimally choosing the concentrations val ues, they show that for most models used in assays, from linear
to fourparameter logistic models, having standard points at the extremes of the range and equally spreading the replicated stan dard points over the range in between gives excellent results in general, particularly when the model has not yet been clearly
118
E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125
identiﬁed, which is the case during validation phase. They also stress the importance of having replicates, particularly at the extremes because of the “leverage” of those points on the ﬁt ting. When the model has been identiﬁed, then optimal designs can be envisaged to improve again slightly the precision of the measurements.
4. Accuracy, trueness and precision
4.1. Trueness
As can be seen from the following deﬁnition of trueness ﬁnd
in the ISO documents [5,6], the International vocabulary of basic
and general terms in metrology (VIM) [32] or Eurachem doc ument on the Fitness for Purpose of Analytical Methods [33], trueness is a concept that is related to systematic errors: ISO 5725Part 1 (General Principles and Deﬁnitions) deﬁnition of trueness (section 3.7) is: “The closeness of agreement between
the average value obtained from a large series of test results and an accepted reference value. The measure of trueness is usually expressed in terms of bias. Trueness has been referred
to as “accuracy of the mean”. This usage is not recommended.” Indeed it is expressed as the distance from the average value of
a series of measurements (¯x _{i} ) and a reference value μ _{T} . This concept is measured by a bias, relative bias or recovery:
Bias = x¯ _{i} − μ _{T}
Relative bias (%) = 100 × ^{x}^{¯} ^{i} ^{−} ^{μ} ^{T}
μ T
Recovery (%) = 100 ×
x¯ _{μ} T = 100 − relative bias (%)
i
The ISO documents 5725 unambiguously afﬁrm what is true ness and how to measure it. Application of this concept to the validation experiments is that measuring several times indepen dent validation standards, for instance i standards, for which the true value of analyte concentration or amount (μ _{T} ) is known allows to compute their predicted concentration or amount: x _{i} . Therefore, it is possible to compute the mean value of these predicted results (¯x _{i} ) and consequently estimate the bias, rela tive bias or recovery. Those values are well estimated as they are daily computed during the validation step of an analytical procedure. Trueness is related to the systematic errors of the analytical procedures [2,5,6,34]. Trueness refers thus to a char acteristic or a quality of the analytical procedure and not to a result generated by this procedure. This nuance is fundamental, as we will see thereafter. However, when looking for trueness in the regulatory doc uments for the validation of the pharmaceutical analytical procedures, this concept is not per se deﬁned. Conscientiously reading both the ICH Q2R1 [4] and the FDA Bioanalytical Method validation [3] documents references to this concept are nonetheless made. When looking at ICH Q2R1 – part 1 the use of trueness is made: “The accuracy of an analytical procedure expresses the closeness of agreement between the value which is accepted either as a conventional true value or an accepted
reference value and the value found. This is sometimes termed trueness.” Whereas for the FDA Bioanalytical Method Vali dation document this reference is made in the Glossary: “The degree of closeness of the determined value to the nominal or known true value under prescribed conditions. This is some times termed trueness.” Here can be seen a mix between trueness and by extension accuracy of the mean (by opposition to accu racy of the results). ISO documents also specify that this use of terminology of accuracy should be avoided and replaced by trueness. When comparing both of those two last quotes of truenes in the ICH or FDA documents to the deﬁnition of ISO docu ments, the main difference is that both documents talk about the distance between the true value and the value found or the deter mined value whereas the trueness deﬁnition of ISO is looking at the distance between the average value and the true value. It is essential to distinguish the difference between a result and an average value. The results of an analytical procedure are its very objective. When examining a quality control sample, the result impacts the decision to release a batch. When unknown samples are determined the results gives information about the therapeutic effect of a drug or about the pathological or physi ological state of a patient and so on. What matters is to ensure that each unknown or known sample will be determined ade quately. This average value only gives the central location of the distribution of results of the same true content, not the position of each individual result. By extension, the bias, relative bias or recovery will locate the distribution of the results produce by the analytical procedure relative to the accepted true value.
This incoherence of deﬁnition is not only found from one document to another but also in different sections in a single doc ument, especially in ICH Q2R1 document. In part II, Section 4.3. “Recommended data” relative to accuracy: “accuracy should be reported as percent recovery by the assay of known added amount of analyte in the sample or as the difference between the mean and the accepted true value together with the conﬁdence intervals.” This is coherent with the deﬁnition of trueness form the ISO documents and not with the corresponding deﬁnition of part I of the ICH document. In other documents confusion between trueness and accuracy is also observed [7,35]. When assessing the acceptability of the bias, relative bias or recovery, the methodology mostly used is to apply the following Student ttest:
H _{0} : x¯ _{i} − μ _{T} = 0
H _{1} : x¯ _{i} − μ _{T} = 0
for which a signiﬁcation level α is set, generally at 0.05 in the pharmaceutical ﬁeld. This means that it is accepted that the null hypothesis H _{0} will be rejected wrongly 5 times out of 100. That is, we accept to erroneously consider the bias different from 0 in 5 times out of 100. When the computed student quantile is higher than the corresponding theoretical quantile, or equiva lently when the pvalue is smaller than α, the null hypothesis is rejected. Therefore, there is a high conﬁdence that the bias is different from 0 as the signiﬁcant level is ﬁxed by the analyst. Another way to interpret this test is to look if the 0% relative
E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125
119
bias or 100% recovery is included in the 1 − α conﬁdence inter val of the relative bias or recovery, respectively. If these values are outside their corresponding conﬁdence interval then the null hypothesis is rejected. However, the only conclusion which can be made when the null hypothesis is not rejected is not that the bias, relative bias or recovery is equal to 0, 0% or 100% but it is that the test could not demonstrate that the bias, relative bias or recovery is different than 0 or 100. As clearly demonstrated in numerous publications [27,36–38], the β risk, which is the probability to wrongly accept the null hypothesis, is not ﬁxed by the user in this situation. Furthermore, this approach can con clude that the bias is signiﬁcantly different from 0, whereas it could be analytically acceptable [27,36–38]. It will also always consider that the bias is not different from 0 when the variability of the procedure is relatively high. In fact, the Student ttest used this way is a difference test which answers the question: “Is the bias of my analytical procedure different of 0?” However, the question the analyst is wishing to answer during the validation step of the analytical procedure is: “Is the bias of my analytical procedure acceptable?” The test to answer this last question is an equivalence or interval hypothesis test [27,36–38]. In these types of test, the analyst has to select an acceptance limit for the bias, relative bias or recovery, that is limits in which if the true bias, relative bias or recovery of the analytical procedure lays the trueness of this procedure is acceptable. Different authors have recommended the use of this type of tests to assess the accept ability of a bias [27,38]. Indeed a perfectly unbiased procedure is utopia. Furthermore the bias obtained during the validation experiment is only an estimation of the true unknown bias of the analytical procedure. Nevertheless, this latest interval hypothe sis test, while statistically correct, is not answering to the real analytical question: the very purpose of validation is to validate the results a method will produce, not the method itself. We will come back to this objective and explain more in detail, in Section 5, the connections existing between “good” results and “good” methods.
4.2. Precision
Contrary to trueness, homogenous deﬁnitions for precision can be found in the regulatory documentation. For instance, the ICH Q2R1 Part 1 deﬁnition of precision is: “The precision of an analytical procedure expresses the closeness of agreement (degree of scatter) between a series of measurements obtained from multiple sampling of the same homogeneous sample under the prescribed conditions.” This deﬁnition of precision is con sistent with its deﬁnition in the FDA Bioanalytical Method Validation, ISO, Eurachem, IUPAC, FAO and AMC documents. As stated in all documents, precision is expressed as stan dard deviation (s), variance (s ^{2} ) or relative standard deviation (RSD) or coefﬁcient of variation (CV). It measures the random error linked to the analytical procedure, i.e. the dispersion of the results around their average value. The estimate of precision is independent of the true or speciﬁed value and the mean or trueness estimate. Each document makes reference to different precision levels. For ICH Q2R1 and ISO documents, three levels could be assessed:
(1)
operating conditions over a short interval of time. Repeata bility is also termed intraassay precision.” (2) Intermediate precision which “expresses withinlaborato ries variations: different days, different analysts, different equipment, etc.” (3) Reproducibility which “expresses the precision between laboratories (collaborative studies, usually applied to stan dardization of methodology).”
Repeatability which “expresses the precision under the same
The repeatability conditions involve the reexecution of the entire procedure to the selection and preparation of the test
portion in the laboratory sample and not only the replicate instru mental determinations on a single prepared test sample. The latter is the instrumental precision which does not include the repetition of the whole analytical procedure. The document of the FDA, also distinguish “withinrun, intra batch precision or repeatability, which assesses precision during
a single analytical run”, and “betweenrun, interbatch preci
sion or repeatability, which measures precision with time, and may involve different analysts, equipment, reagents, and lab oratories”. As can be seen in this document the same word, namely repeatability, is used twice for both component of variability which is certainly not free of confusion for the ana lyst. Furthermore this document considers at the same level the variability in a single laboratory or in different labora tory. The validation of an analytical procedure is performed by
a single laboratory as it has to demonstrate that the analytical procedure is suitable for its intended purpose. The evaluation of laboratory to laboratory method adequacy is usually performed
with the objective to standardize the procedure or to evaluate the performance of several laboratories in a “proﬁciency test”, also called “ring test”, and is regulated by speciﬁc documents and rules. In order to evaluate correctly the two components of variabil ity of an analytical procedure during the validation phase, the analysis of variance (ANOVA) by concentration level investi gated is recommended. As long as the design is balanced, i.e. the same numbers of replicates per series for a concentration level, the least square estimations of the variance components can be used. However, when this condition is not met the maximum likelihood estimates of those components should be preferred
[2,30].
From the ANOVA table, the repeatability or withinrun precision and the betweenrun precision are obtained as
follows:
MSM _{j} =
^{1}
p − 1
p
i=1
n(¯x _{i}_{j} _{,}_{c}_{a}_{l}_{c} − μˆ _{j} ) ^{2}
where x¯ _{i}_{j} _{,}_{c}_{a}_{l}_{c} is the average of the calculated concentration of the jth concentration level of the ith series; p the number of series; n the number of replicates per series;
μˆ _{j} =
^{1}
pn
p
n
i=1 k=1
x ijk,calc
120
E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125
with x _{i}_{j}_{k}_{,} _{c}_{a}_{l}_{c} being the calculated concentration from the selected response function.
MSE _{j} =
^{1}
pn − p
p
n
i=1 k=1
If MSE _{j} < MSM _{j} then:
2
σˆ _{W}_{,}_{j} = MSE _{j}
(x ijk,calc − x¯ ij,calc ) ^{2}
Table 2 Experimental design of four runs taking into account days, operators and equip ments as sources of variability
Run 1 
Run 2 
Run 3 
Run 4 
Day 1 
Day 1 
Day 2 
Day 2 
Operator 1 
Operator 2 
Operator 1 
Operator 2 
Equipment 2 
Equipment 1 
Equipment 1 
Equipment 2 
Indeed this parameter shows how important is the series to series (or run to run) variance in comparison to the repeatability variance. High values of R _{j} , e.g. greater than 4, could suggest either a problem with the variability of the analytical procedure whose results may vary from one run to the other, and so leading to the redevelopment of the method, or either stress a lack of number of series (runs) used during the validation process to obtain a reliable estimate of the betweenseries variance σˆ
In this last situation, all the results within a run are highly correlated to each others providing little effective information in regards to the run to run results. The effective sample size of the validation is consequently smaller than the real sample size used for the design of the validation experiments. The term
2
‘effective sample size’ is used to indicate that when the results
within a run a correlated, then there is in fact less information to judge the quality of results as compared to a situation where all results are fully independent and not dependent of the run they belong to. This is an important feature to take into account for the deﬁnition of the degrees of freedom to use for the compu tation of a conﬁdence interval or of a tolerance interval. Indeed
if the results of repeated experiments are correlated computing
the degrees of freedom with the total number of experiments performed will artiﬁcially reduce the conﬁdence interval or the tolerance interval. The Satterthwaite degrees of freedom [40]
include this concept of effective sample by modeling the degrees of freedom between a minimum value – the number of series
– and a maximum value – the total number of experiments
(series × replicates) [41]. Usually, precision is commonly expressed as the percent Rel ative Standard Deviation (RSD). The classical formula is:
B,j _{=} MSM _{j} − MSE _{j}
σˆ ^{2}
n
Else:
σˆ
2 1
W,j ^{=}
pn − 1
p
n
i=1 k=1
σˆ ^{2}
B,j
^{=} ^{0}
(x ijk,calc − x¯ j,calc ) ^{2}
2
B,j ^{.}
The intermediate precision is computed as follows:
2
IP,j ^{=} ^{σ}^{ˆ}
2
2
W,j ^{+} ^{σ}^{ˆ}
2
B,j
σˆ
where σˆ _{W}_{,}_{j} is the withinrun or repeatability variance and σˆ
is the between run variance. It is important to note that the misapplications of known vari ance formula are still widely used and can lead to dramatic overestimation of the variance components [36,39]. As can be seen in the regulatory documents what makes the difference between repeatability and intermediate precision is the concept of series or runs. These series or runs are composed at least of different days with eventually different operators and/or different equipments. A run or series is a period during which analyses are executed under repeatability conditions that remain constant. The rational to select the different factors which will compose the runs/series is to mimic conditions that will be encountered during the routine use of the analytical procedure. It is evident that the analytical procedure will not be used only 1 day. So including the variability from one day to another of the analytical procedure is mandatory. Then during its routine use, will the analytical procedure be used by only one opera tor, and/or on only one equipment? Depending on the answers of these questions, different factors representing the procedure that will be used during the routinely performed analysis will be introduced in the validation protocol, leading to a representative estimation of the variability of the analytical procedure. When the selection of the appropriate factors is made, an experimental design can be made in order to optimize the number of runs or series to account for the main effects of these factors with a cost effective analysis time. For example if the factor selected are days, operators and equipments, each of them at two levels, then a fractional factorial design allows to execute four runs or series in only 2 days. The design is shown in Table 2. Having computed the variance components, one interesting parameter to observe is the ratio R _{j} , with
B,j
RSD (%) = 100 × √ ^{σ}^{ˆ} ^{2}
ˆ
x¯
where σˆ ^{2} is the estimated variance and x¯ is the estimated average
value. When an RSD precision is expressed, the corresponding vari ance is used, e.g. repeatability or intermediate precision. The computed RSD is therefore the ratio of two random variables, giving a new parameter with high uncertainty. However, in the case of validation of analytical procedure, because the true or ref erence value is known, then the denominator should be replaced by its corresponding true value μ _{T} . The RSD computed by this way depends only on the estimated precision (estimated variances), regardless of the estimated trueness. This being said, the use of relative estimate is convenient from a direct reading point de view but triggers nevertheless
a series of queries: what matters the most for the results, the
(absolute) variance or the relative standard deviation? Imagine
ˆ
σˆ
2
B,j
R j =
σˆ ^{2}
W,j
.
E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125
121
that a bioanalytical method is used for supporting a pharmacoki netic study. In that case the results are used for ﬁtting the PK nonlinear model and what matters only is either the variance of the results or the variance of the logarithms of the results, but not the RSD at all. Remember that a procedure is validated for its intended use. So what is the relevance of making a decision on acceptance of a method based on the RSD when only the variance of its results are important with regard to the intend of its use. This distinction becomes particularly important when dealing with the LOQ. Indeed, since the RSD is the SD divided by the true concentration value, the RSD becomes large at the lower end simply because the SD is divided by a small number, not because the method becomes less precise. A good example can be seen by comparing the same information on Fig. 3.a.2 in absolute scale and Fig. 3.a.3 in relative scale. On Fig. 3.a.2 the distance between the two dashed lines represents a multiple of the intermediate precision in absolute value while on Fig. 3.a.3 it is the same value but expressed in relative value. While it appears that on that later ﬁgure (a.3) the relative intermediate precision (RSD) explode at the smallest concentration, leading to con clude that results are not precise enough at that level, it is also clear that, for this example, the absolute intermediate precision improves at the smallest concentration because the intermedi ate precision SD is smaller. The contradiction comes here from the fact that the SD has been divided by a small number, not because the measurements are less precise, on contrary. This raise questions on the meaning and the deﬁnition of the LOQ. Indeed why ignoring or discarding the results at those low lev els while they are obtained with a variance much smaller than results at high concentrations? Once again, the answer to this question lies in the intended use of the results: for supporting stability or pharmacokinetic studies, not only it is not relevant to discard those very precise measurements at the small concen trations, but they also are very useful, for example, in estimating accurately the halflife or the pharmacokinetic of metabolites. Only the variance or the SD matters, not the RSD. So, while the common practice evaluate a method with respect to the relative expression of the precision, scientists in the laboratories should carefully consider the absolute and fundamental variance before discarding data and question if it serves or not the objectives of the study.
4.3. Accuracy
In document ICH Q2R1 part 1 [4], accuracy is deﬁned as: “ the closeness of agreement between the value which is accepted either as a conventional true value or an accepted reference value and the value found.” This deﬁnition corresponds to the one of ISO [5,6] documents or VIM [32] which states that accuracy is: “the closeness of agreement between a test result and the accepted reference value.” Furthermore, in the ISO deﬁnition a note is added specifying that accuracy is the combination of ran dom error and systematic error or bias. From this and as speciﬁed by the Analytical Methods Committee (AMC) [34], it is easily understood that accuracy rigorously applies to results and not to analytical methods, laboratories or operators. The AMC also outlines that accuracy should be used that way in formal writing.
Therefore, accuracy denotes the absence of error of a result. Sim ilar deﬁnitions of accuracy are found in the Eurachem document
[33].
The total measurement error of the results obtained from an analytical procedure is related to the closeness of agreement between the value found, i.e. the result, and the value that is accepted either as a conventional true value or an accepted ref erence value. The closeness of agreement observed is based on the sum of the systematic and random errors, namely the total error linked to the result. Consequently, the measurement error is the expression of the sum of trueness (or bias) and precision (or standard deviation), i.e. the total error. As shown below, each measurement X has three components: the true sample value μ _{T} , the bias of the method (estimated by the mean of several results) and the precision (estimated by the standard deviation or, in most cases, the intermediate precision). Equivalently, the difference between an observation X and the true value is the sum of the systematic and random errors, i.e. total error or measurement error.
X
X
= μ _{T} + bias + precision
⇔
− μ _{T} = bias + precision
⇔
X − μ _{T} = total error
⇔
X − μ _{T} = measurement error
⇔
X − μ _{T} = accuracy
However, when looking at the section corresponding to accu racy in part 2 of ICH Q2R1 document, the recommended data to document accuracy are presented as: “accuracy should be reported as percent recovery by the assay of known added amount of analyte in the sample or as the difference between the mean and the accepted true value together with the conﬁ
dence intervals.” This refers not anymore to accuracy but instead to the trueness deﬁnition of ISO 5725 document because it is the average value of several results – as opposed to a single result as for the accuracy – that is compared to the true value, as already stated previously. This section refers consequently to system atic errors whereas accuracy as deﬁned in ICH Q2R1 part 1 and ISO 5725 part 1 corresponds to the evaluation of the total measurement error. In the document FDA Bioanalytical Method
the closeness of mean
test results obtained by the method to the true value (concen
tration) of the analyte. (
15% of the actual value except at LLOQ, where it should not deviate by more than 20%. The deviation of the mean from the true value serves as the measure of accuracy.” As already men tioned in the previous sections, this deﬁnition corresponds to the analytical method trueness. For bioanalytical methods, ear lier reviews have already stressed the problem of the difference of the deﬁnition of accuracy relative to trueness [1,2,27,38]. For most uses it does not matter whether a deviation from the true value is due to random error (lack of precision) or to
The mean value should be within
Validation [3], accuracy is deﬁned as “
)
122
E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125
systematic error (lack of trueness), as long as the total quantity of error remains acceptable. Thus, the concept of total analyt ical error or accuracy as a function of random and systematic error is essential. Furthermore, every analyst wants to ensure that the total amount of error of the method will not affect the interpretation of the test result and compromise the subsequent decision [1,2,21,42–46]. Decision based on the separate evalu ation of the trueness and precision criteria cannot achieve this. Only evaluation of the accuracy of the results which takes into account the total error concept, gives guarantees to both labo ratories and regulatory bodies on the ability of the method to achieve its purpose.
5. Decision rule
Most of the regulatory documents do not make any recom mendation on acceptance limits to help the analyst to decide when an analytical procedure is acceptable. They insist, with the confusions already mentioned, about the criteria that need to be examined, estimated and reported, but only few rules are pro posed about the way to decide. It is a laboratory competence to justify the decision of accepting and using an analytical method [3]. The only exception found concerns the FDA document on bioanalytical methods that clearly indicates in the prestudy val idation part: “The mean value should be within ±15% of the theoretical value, except at LLOQ, where it should not deviate by more than ±20%. The precision around the mean value should not exceed 15% of the CV, except for LLOQ, where it should not exceed 20% of the CV.” Later, when referring to instudy validation, the same document indicates: “Acceptance criteria:
At least 67% (4 out of 6) of quality control (QC) samples should be within 15% of their respective nominal value, 33% of the QC samples (not all replicates at the same concentration) may be outside 15% of nominal value. In certain situations, wider acceptance criteria may be justiﬁed.” However, these two sec tions relating to prestudy and to instudy acceptance criteria summarizes very well the indepth confusion that exists and that triggers many debates in conferences on validation. The pro posed objective is that for bioanalytical methods, measurements must be sufﬁciently close from their true value—less than 15%. That’s clearly indicated here: “QC samples should be within 15% of their respective nominal value”. As suggested in the section on accuracy, this objective is not aligned at all with the previous rule for (prestudy) validation that impose limits on methods performance – not the results – such as the mean and the precision that must be better than 15% (20% at the LLOQ). The objective of a quantitative analytical method is to be able to quantify as accurately as possible each of the unknown quantities that the laboratory will have to determine. In other terms, what all analysts expect from an analytical procedure is that the difference between the measurement or observation (X) and the unknown “true value” μ _{T} of the test sample be small or inferior to an acceptance limit λ a priori deﬁned:
−λ<X − μ _{T} < λ ⇔ X − μ _{T}  < λ
The acceptance limit λ can be different depending on the requirements of the analyst and the objective of the analytical
procedure. The objective is linked to the requirements usually admitted by the practice (e.g. 1% or 2% on bulk, 5% on phar maceutical specialties, 15% for biological samples, or whatever limits predeﬁned according the intent of use of the results). Therefore, the aim of the validation phase is to gener ate enough information to have guarantees that the analytical method will provide, in routine, measurements close to the true value without being affected by other elements of the present in the sample, assuming everything else remain reasonably similar. In other words, the validation phase should demonstrate that this will be fulﬁled for a large proportion of the results. As already mentioned, the difference between a measurement X and its true value is composed of a systematic error (bias or trueness) and a random error (variance or precision). The true values of these performance parameters are unknown but they can be estimated based on the (prestudy) validation experiments and the reliability of these estimates depends on the adequacy of these experiments (design, size). Consequently, the objective of the validation phase is to eval uate whether, given or conditionally to the estimates of bias (μˆ _{M} ) and standard deviation (σˆ _{M} ), the expected proportion of measures that will fall within the acceptance limits, later in rou tine, is greater than a predeﬁned level of proportion, say β, i.e.:
E _{μ}_{,}_{ˆ} _{σ}_{ˆ} {P[X − μ _{T}  < λ]μˆ _{M} , σˆ _{M} } ≥ β
However, there exists no exact solution to estimate this expected proportion. An easy solution to circumvent this aspect and make a reliable decision, as already proposed by other authors [1,28,47–49], is to compute the βexpectation tolerance intervals [50]:
E _{μ}_{ˆ} _{M} _{,}_{σ}_{ˆ} _{M} {P _{X} [μˆ _{M}
− kσˆ _{M} <X<
μˆ _{M} + kσˆ _{M} μˆ _{M} , σˆ _{M} ]} = β
where the factor k is determined so that the expected proportion of the population falling within the interval is equal to β. If the βexpectation tolerance interval obtained that way is totally included within the acceptance limits [−λ, +λ] (e.g. [−15%, 15%] for bioanalytical methods or [−5%, 5%] for analytical methods used for a batch release) then the expected proportion of measurements within the same acceptance limits is greater or equal to β. Most of the time, an analytical procedure is intended to quan tify over a range of quantities or concentrations. Consequently, during the validation phase, samples are prepared to adequately cover this range, and a βexpectation tolerance interval is calcu lated at each level. The accuracy error proﬁle is simply obtained by connecting the lower limits and by connecting the upper limits, as can be seen on Fig. 1 or in the bottom of Fig. 3. The inclusion of the measurement error proﬁle within the acceptance limits [−λ, λ] at key levels must be examined before declaring that the procedure is valid over a speciﬁc range of values. β will usually be chosen above 80% and as shown by Boulanger et al. [43,44], choosing 80% for β during prestudy validation guarantees that 90% of the runs will later be accepted in routine when the 4–6–l (e.g. 4–6–15) rule is used in routine.
E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125
123
That way, the prestudy validation decision and the (instudy) routine decision rule for acceptance of runs becomes aligned with their respective risks which is not the case as proposed by the FDA guide [3]. Indeed having a mean (trueness) smaller than 15% and a precision (CV%) smaller than 15% does not guaran tee at all that most future results will be within [−15%, 15%]. There are two statistical errors behind this classical assumption that could be summarized as “good methods give good results”. First, as pointed out in the section on accuracy, the difference between the result and its true value is composed by the sys tematic error (trueness) plus the random error (precision). So, if for example, a method shows an estimated mean of 14% and an estimated precision of 14% as well, it is then obvious to imag ine that most results will likely fall outside the acceptance limits and so most runs will be rejected. Second, predicting what will happen in the future routine depend largely on the quality of the estimates of the mean and precision, i.e. primarily of the number of observations collected and the conditions of the experiments. If the mean and the precision are estimated based on too few measurements during prestudy validation, or with conditions
(operator, days,
) not representative from the routine use,
and ULOQ is the upper limit of quantitation. Thus, the above mentioned deﬁnitions are quite similar because for both of them, the range is correlated with the linearity and the accu racy (trueness + precision). Moreover, both documents specify that the range is dependent on the speciﬁc application of the procedure. ICH Q2R1 part 2 states that the speciﬁed range is “established by conﬁrming that the analytical procedure pro vides an acceptable degree of linearity, accuracy and precision when applied to samples containing amounts of analyte within or at the extremes of the speciﬁed range of the analytical proce dure”. IUPAC deﬁnes the range as a “set of values of measured for which the error of a measuring instrument is intended to lie within speciﬁed limits”. The range should be anticipated in the early stage of the method development and its selection is based on previous infor mation about the sample, in a particular study. The chosen range determines the number of standards used in constructing a cali bration curve. ICH Q2R1 part 2 recommends the minimum speciﬁed ranges for different studies:
there is poor conﬁdence that the true bias or the true precision are in fact not greater than acceptance criteria. The tolerance 
(i) 
for the assay of a drug substance or a ﬁnished (drug) prod uct: normally from 80 to 120% of the test concentration; 
interval approach, which is a prediction interval, avoids those two pitfalls and correctly estimates the expected proportion of good results depending on the performance criteria and the qual ity (size, design) of the performed experiments. If the tolerance 
(ii) 
for content uniformity, covering a minimum of 70–130% of the test concentration, unless a wider more appropriate range, based on the nature of the dosage form (e.g. metered dose inhalers), is justiﬁed; 
interval approach can prevent making decision based on poor 
(iii) 
for dissolution testing: ±20% over the speciﬁed range; 
data, it remains the responsibility of the analyst to ensure that the experimental conditions used during the (prestudy) vali 
(iv) 
for the determination of an impurity: from the reporting level of an impurity to 120% of the speciﬁcation. 
dation reﬂects what will be used and practice in routine. The subtle difference between the method and the procedure should be stressed here: in the validation experiments, the various oper ational aspects or potential sources of variance must be included in the experiments to anticipate what could happen later in rou tine. The most classical factors are the operators, the column lot, different setups, independent preparation of samples, etc. in order to simulate or mimic as closely as possible the daily practice and set of procedure around the use of the method. As already indicated, it is the whole procedure or practice that must be validated, not only the method in its most restrictive sense. 
Therefore, the dosing range is the concentration or amount interval over which the total error of measurement – or accuracy – is acceptable. It is essential to demonstrate the accuracy of the results over the entire range. Consequently, and in order to fulﬁl these deﬁnitions, the proposition of ICH document to realize six measurements only at the 100% level of the test concentration to assess the precision of the analytical method should be used with precautions to be in accordance with the deﬁnition of the range. Accuracy, and therefore trueness and precision should be evaluated experimentally and acceptable over the whole range targeted for the application of the analytical procedure. 
6. Dosing range
For any quantitative method, it is necessary to determine the range of analyte concentrations or property values over which the method may be applied. ICH Q2R1 part 1 document deﬁnes the range of an analytical procedure as “the interval between the upper and lower concentration (amounts) of analyte in the sample (including these concentrations) for which it has been demonstrated that the analytical procedure has a suitable level of precision, accuracy and linearity”. The FDA Bioanalyti cal Method validation deﬁnition of the quantiﬁcation range is “the range of concentration, including ULOQ and LLOQ, that can be reliably and reproducibly quantiﬁed with accu racy and precision through the use of a concentration–response relationship”, where LLOQ is the lower limit of quantitation
7. Limit of quantitation
ICH considers that the “quantitation limit is a parameter of quantitative assays for low levels of compounds in sample matrices, and is used particularly for the determination of impu rities and/or degradation products”. ICH Q2R1 part 1 deﬁnes the quantitation limit of an individual analytical procedure as “the lowest amount of analyte in a sample which can be quantitatively determined with suitable precision and accuracy”. Limit of quan titation (or quantitation limit) is often called LOQ. Both terms are used in regulatory documents, the meaning being exactly the same. ICH document deﬁnes only one limit of quantitation. But the quantiﬁcation range of the analytical procedure has two lim its: LLOQ and ULOQ. In the deﬁnition of quantitation limit(s)
124
E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125
excerpted from IUPAC, Eurachem let us understand that there is more than one limit of quantiﬁcation: “quantiﬁcation limits are performance characteristics that mark the ability of a chemical measurement process to adequately quantify an analyte”. But, in Eurachem document, only LLOQ, called “quantiﬁcation limit” is discussed. The FDA Bioanalytical Method Validation document distin guishes between the two limits and deﬁnes the lower limit of quantiﬁcation (the lowest amount of an analyte in a sample that can be quantitatively determined with suitable precision and accuracy) and the upper limit of quantiﬁcation (the high est amount of an analyte in a sample that can be quantitatively determined with precision and accuracy). As can be seen in this document, the only difference is the substitution of “lowest” with “highest” word. ICH Q2R1 part 2 proposes exactly the same approaches to
estimate the (lower) quantiﬁcation limit as for the detection limit. A ﬁrst approach is based on the well known signaltonoise (s/n) ratio approach. A 10:1 s/n is considered by ICH document to be sufﬁcient to discriminate the analyte from the background noise. The main problem appears when the measured signal is not the signal used to quantify the analyte. For example, in chromatography with spectral detection, the measured signal represents the absorption units, i.e. the signal height but for the quantitation the areas are generally used. Therefore, the quan titation limit is not expressing the lowest level of the analyte,
but lowest quantiﬁed absorbance. The problem becomes more
complicated in electrophoresis, where the signal is usually con sidered as the ratio between the peak area and the migration time. The other approaches proposed by ICH Q2R1 part 2 docu ment are based on the “Standard Deviation of the Response and the Slope” and it is similar to the approach used for detection limit computation. The computation ways for detection (DL) and quantitation limit (QL) are similar, the only difference being the multiplier of the standard deviation of the response:
_{D}_{L}
3.3σ
S
_{Q}_{L} 10σ
S
where σ is the standard deviation of the response and S = the slope of the calibration curve. The same problems explained previously arise for the detec tion in chromatography or electrophoresis. On one hand, ICH
Q2R1 part 2 document assumes that the calibration is linear, that is not always true. On the other hand, two ways of measuring σ
are proposed: those “based on Standard Deviation of the Blank”
and those “based on the Calibration Curve”. Neither of these alternatives offers the adequate solution. The former because the assumption that the signal units are the same as the measured units for the calibration and the latter because of the assumption that LOQ range is already known. Other problems with those methods of estimation of limits of quantitation are that they assume that there is a measurable noise, which is not always the case. Furthermore, when possible, these approaches are dependant on the manner the noise is measured
and depend from one instrument to another or internal opera tional setup such as signal data acquisition rate or the detector time constant. The LOQ estimated using the signal to noise ratio
is extremely subjective [51,52] and is equipment dependant. The approaches using the standard deviation of the intercept should be carefully used as the estimation of this intercept is depen dant of the range of the calibration curve: the intercept is only well estimated if the concentrations used are sufﬁciently small. Furthermore each of these approaches provides different value of the lower limit of quantitations [51,52]. This is highly prob lematic as it does not allow to compare the LOQ of different laboratories using the same analytical procedure. Another approach to estimate the lower limit of quantita tion is proposed by Eurachem, based on a target RSD [33]. The RSD at concentration levels close to the expected LOQ are plot ted versus their concentration, and a curve is ﬁtted to this plot. When the curve crosses the target RSD the corresponding con centration levels is the LOQ. This approach alleviates most the problems stressed to the previous approaches, as it is not any more equipment and operator dependant. Still however, none of these approaches fulﬁl the deﬁnition of the LOQ. Indeed even with this last approach, only the precision of the analytical pro cedure has been assessed without trueness estimation and the whole accuracy (trueness + precision) as required. In our opinion, the best way to compute both quantitation limits (LLOQ and ULOQ) is the use of the accuracy proﬁle approach [1,2,29,30,43–45,47–49] which fulﬁl the LOQ criteria requirement by demonstrating that the total error of the result is known and acceptable at these concentration levels, i.e. both an acceptable level of systematic and random errors.
8. Conclusion
For analysts, method validation is the process of proving that an analytical method is acceptable for its intended pur pose. In order to resolve this very important issue, analysts refer to regulatory or guidance documents which can differ in sev eral points. Therefore, the validity of the analytical method is partially dependant on the chosen guidance, terminology and methodology. It is therefore highly essential to have clear def initions of the validation criteria used to assess this validity, to have methodologies in accordance with these deﬁnitions and consequently to use statistical methods which are relevant with these deﬁnitions, the objective of the validation and the objective of any analytical methods. Repositioning the deﬁnitions and the methodologies during revision processes of regulatory documents to eliminate con tradictory, sometimes scientiﬁcally irrelevant, requirements and deﬁnitions should be recommended and rapidly implemented.
Acknowledgements
Thanks are due to the Walloon Region and the European Social Fund for a research grant to E.R. (First Europe Objective 3 project No. 215269).
References
[1] Ph. Hubert, J.J. NguyenHuu, B. Boulanger, E. Chapuzet, P. Chiap, N. Cohen, P.A. Compagnon, W. Dewe, M. Feinberg, M. Lallier, M. Laurentie,
E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125
125
N. Mercier, G. Muzard, C. Nivet, L. Valat, J. Pharm. Biomed. Anal. 36
(2004) 579.
[2] Ph. Hubert, J.J. NguyenHuu, B. Boulanger, E. Chapuzet, P. Chiap, N. Cohen, P.A. Compagnon, W. Dewe, M. Feinberg, M. Lallier, M. Laurentie,
N. Mercier, G. Muzard, C. Nivet, L. Valat, STP Pharma Pratiques 13 (2003)
101.
[3] Guidance for Industry: Bioanalytical Method Validation, US Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER), Rockville, May 2001. [4] International Conference on Harmonization (ICH) of Technical Require ments for Registration of Pharmaceuticals for Human Use, Topic Q2 (R1):
Validation of Analytical Procedures: Text and Methodology, Geneva, 2005. [5] ISO 5725, Application of the StatisticsAccuracy (Trueness and Precision) of the Results and Methods of Measurement—Parts 1 to 6, International Organization for Standardization (ISO), Geneva, 1994. [6] ISO 35341: Statistics—Vocabulary and Symbols. International Organiza tion for Standardization (ISO), Geneva, 2006. [7] M. Thompson, S.L.R. Ellison, R. Wood, Pure Appl. Chem. 74 (2002) 835. [8] Association of Ofﬁcial Analytical Chemists (AOAC), Ofﬁcial Methods of Analysis, vol. 1, AOAC, Arlington, VA, 15th ed., 1990, p. 673. [9] J. Vessman, J. Pharm. Biomed. Anal. 14 (1996) 867. [10] J. Vessman, Accred. Qual. Assur. 6 (2001) 522. [11] B.A. Persson, J. Vessman, Trends Anal. Chem. 17 (1998) 117. [12] B.A. Persson, J. Vessman, Trends Anal. Chem. 20 (2001) 526.
J. Vessman, R.I. Stefan, J.F. Van Staden, K. Danzer, W. Lindner, D.T. Burns,
[13]
A. Fajgelj, H. Muller,¨ Pure Appl. Chem. 73 (2001) 1381.
[14] A.D. McNaught, A. Wilkinson, IUPAC Compendium of Chemical Termi nology, second ed., Blackwell, Oxford, 1997. [15] H. Kaiser, Fresenius Z. Anal. Chem. 260 (1972) 252.
[16] D.L. Massart, B.G. Vandeginste, S.N. Deming, Y. Michotte, L. Kaufman, Chemometrics, Elsevier, Amsterdam, 1988, p. 115. [17] A. Lorber, K. Faber, B.R. Kowalski, Anal. Chem. 69 (1997) 69. [18] K. Faber, A. Lorber, B.R. Lowaski, J. Chemometrics 11 (1997) 419. [19] K. Danzer, Fresenius J. Anal. Chem. 369 (1997) 397.
[20]
WELAC Guidance Documents WG D2, Eurachem/Western European Lab
oratory Accreditation Cooperation (WELAC) Chemistry, Teddington, UK, ﬁrst ed., 1993. [21] J.W. Lee, V. Devanarayan, Y.C. Barrett, R. Weiner, J. Allinson, S. Fountain,
S. Keller, I. Weinryb, M. Green, L. Duan, J.A. Rogers, R. Millham, P.J.
312.
[22] UK Department of Trade and Industry, Manager’s Guide to VAM, Valid Analytical Measurement Programme, Laboratory of the Government Chemist (LGC), Teddington, UK, 1998; http://www.vam.org.uk. [23] J. Ermer, J.H.M. Miller, Practical Method Validation in Pharmaceutical Analysis, WileyVCH, Weinheim, 2005. [24] Analytical Methods Committee, Analyst 113 (1988) 1469. [25] S.V.C. de Souza, R.G. Junqueira, Anal. Chim. Acta 552 (2005) 25. [26] J. Ermer, H.J. Ploss, J. Pharm. Biomed. Anal. 37 (2005) 859.
[27]
C. Hartmann, J. SmeyersVerbeke, D.L. Massart, R.D. McDowall, J. Pharm. Biomed. Anal. 17 (1998) 193.
[28] D. Hoffman, R. Kringle, J. Biopharm. Stat. 15 (2005) 283.
[29] B. Streel, A. Ceccato, R. Klinkenberg, Ph. Hubert, J. Chromatogr. B 814 (2005) 263. [30] Ph. Hubert, J.J. NguyenHuu, B. Boulanger, E. Chapuzet, P. Chiap, N. Cohen, P.A. Compagnon, W. Dewe, M. Feinberg, M. Lallier, M. Laurentie,
N. Mercier, G. Muzard, C. Nivet, L. Valat, STP Pharma Pratiques 16 (2006)
87.
[31] N. Franc¸ois, B. Govaerts, B. Boulanger, Chemometr. Intell. Lab. Syst. 74 (2004) 283. [32] ISO VIM. DGUIDE 99999.2, International Vocabulary of Basic and Gen eral terms in Metrology (VIM), third ed., ISO, Geneva, 2006 (under approval step). [33] The Fitness for Purpose of Analytical Methods, Eurachem, Teddington,
1998.
[34]
Chemistry, London, 2003; http://www.rsc.org/Membership/Networking/ InterestGroups/Analytical/AMC/TechnicalBriefs.asp. [35] Food and Agriculture Organization of the United Nations (FAO), Codex Alimentarius Commission, Procedural Manual, 15th ed., Rome, 2005.
[36] H. Rosing, W.Y. Man, E. Doyle, A. Bult, J.H. Beijnen, J. Liq. Chrom. Rel. Technol. 23 (2000) 329. [37] J. Ermer, J. Pharm. Biomed. Anal. 24 (2001) 755. [38] C. Hartmann, D.L. Massart, R.D. McDowall, J. Pharm. Biomed. Anal. 12 (1994) 1337. [39] C.R. Jensen, Qual. Eng. 14 (2002) 645. [40] F. Satterthwaite, Psychometrika 6 (1941) 309. [41] B. Boulanger, P. Chiap, W. Dewe, J. Crommen, Ph. Hubert, J. Pharm. Biomed. Anal. 32 (2003) 753. [42] B. DeSilva, W. Smith, R. Weiner, M. Kelley, J. Smolec, B. Lee, M. Khan,
Analytical Methods Committee, AMC Technical Brief 13, Royal Society of
R. Tacey, H. Hill, A. Celniker, Pharm. Res. 20 (2003) 1885.
[43] B. Boulanger, W. Dewe, Ph. Hubert, B. Govaerts, C. Hammer, F. Moonen, Accuracy and Precision (total error vs. 4/6/30), AAPS Third Bioan alytical Workshop: Quantitative Bioanalytical Methods Validation and Implementation—Best Practices for Chromatographic and Ligand Bind ing Assays, Arlington, VA, 1–3 May 2006; http://www.aapspharmaceutica.
com/meetings/meeting.asp?id=64.
[44] B. Boulanger, W. Dewe, A. Gilbert, B. Govaerts, M. MaumyBertrand, Chemometr. Intell. Lab. Syst. 86 (2007) 198. [45] J.W.A. Findlay, W.C. Smith, J.W. Lee, G.D. Nordblom, I. Das, B.S. DeSilva, M.N. Khan, R.R. Bowsher, J. Pharm. Biomed. Anal. 21 (2000)
1249.
[46] H.T. Karnes, G. Shiu, V.P. Shah, Pharm. Res. 8 (1991) 421.
[47] Ph. Hubert, J.J. NguyenHuu, B. Boulanger, E. Chapuzet, P. Chiap, N. Cohen, P.A. Compagnon, W. Dewe, M. Feinberg, M. Lallier, M. Laurentie,
N. Mercier, G. Muzard, C. Nivet, L. Valat, STP Pharma Pratiques 16 (2006)
28.
[48] M. Feinberg, B. Boulanger, W. Dewe, Ph. Hubert, Anal. Bioanal. Chem. 380 (2004) 502. [49] A.G. Gonzalez, M.A. Herrador, Talanta 70 (2006) 896. [50] R.W. Mee, Technometrics 26 (1984) 251. [51] J. Vial, A. Jardy, Anal. Chem. 71 (1999) 2672. [52] J. Vial, K. Le Mapihan, A. Jardy, Chromatographia 57 (2003) S303.
Гораздо больше, чем просто документы.
Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.
Отменить можно в любой момент.