Вы находитесь на странице: 1из 16
Seediscussions,stats,andauthorprofilesforthispublicationat: https://www.researchgate.net/publication/6402702 Analysis of

Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/6402702

ARTICLE in JOURNALOFCHROMATOGRAPHYA·AUGUST2007

ImpactFactor:4.17·DOI:10.1016/j.chroma.2007.03.111·Source:PubMed

CITATIONS

148

8AUTHORS,INCLUDING:

52 PUBLICATIONS 525 CITATIONS

107 PUBLICATIONS 2,681 CITATIONS

READS

280

266 PUBLICATIONS 6,626 CITATIONS

253 PUBLICATIONS 5,245 CITATIONS

Availablefrom:AttilioCeccato

Retrievedon:10February2016

Journal of Chromatography A, 1158 (2007) 111–125 Analysis of recent pharmaceutical regulatory documents on analytical
Journal of Chromatography A, 1158 (2007) 111–125 Analysis of recent pharmaceutical regulatory documents on analytical

Journal of Chromatography A, 1158 (2007) 111–125

Journal of Chromatography A, 1158 (2007) 111–125 Analysis of recent pharmaceutical regulatory documents on analytical

Analysis of recent pharmaceutical regulatory documents on analytical method validation

Eric Rozet a , Attilio Ceccato b , Cedric´

Hubert a , Eric Ziemons a , Radu Oprean c ,

Serge Rudaz d , Bruno Boulanger b , Philippe Hubert a,

a Laboratory of Analytical Chemistry, Bioanalytical Chemistry Research Unit, Institute of Pharmacy, University of Li`ege, CHU, B36, B-4000 Li`ege, Belgium b Lilly Development Centre, rue Granbompr´e 11, B-1348 Mont-Saint-Guibert, Belgium c Analytical Chemistry Department, Faculty of Pharmacy, University of Medicine and Pharmacy “Iuliu Hatieganu”, 13 Emil Isac Street, RO-3400 Cluj-Napoca, Romania d Laboratory of Pharmaceutical Analytical Chemistry, School of Pharmacy, University of Geneva, 20 Bd. d’Yvoy, 1211 Geneva 4, Switzerland

Available online 1 April 2007

Abstract

All analysts face the same situations as method validation is the process of proving that an analytical method is acceptable for its intended purpose. In order to resolve this problem, the analyst refers to regulatory or guidance documents, and therefore the validity of the analytical methods is dependent on the guidance, terminology and methodology, proposed in these documents. It is therefore of prime importance to have clear definitions of the different validation criteria used to assess this validity. It is also necessary to have methodologies in accordance with these definitions and consequently to use statistical methods which are relevant with these definitions, the objective of the validation and the objective of the analytical method. The main purpose of this paper is to outline the inconsistencies between some definitions of the criteria and the experimental procedures proposed to evaluate those criteria in recent documents dedicated to the validation of analytical methods in the pharmaceutical field, together with the risks and problems when trying to cope with contradictory, and sometimes scientifically irrelevant, requirements and definitions. © 2007 Elsevier B.V. All rights reserved.

Keywords: Validation; Guidelines; Terminology; Methodology; Accuracy profile

1. Introduction

The demonstration of the ability of an analytical method to quantify is of great importance to ensure quality, safety and efficacy of pharmaceuticals. Consequently, before an analyti- cal method can be implemented for routine use, it must first be validated to demonstrate that it is suitable for its intended purpose. While the need to validate methods is obvious, the procedures for performing a rigorous validation program are generally not defined. If regulatory documents allow selecting the validation parameters that should be established, there are still three main questions remaining: (a) How to interpret the regulatory definitions of the parameters? (b) What should be the specific procedure to follow to evaluate a particular parame-

Corresponding author. Tel.: +32 4 366 43 16; fax: +32 4 366 43 17. E-mail address: Ph.hubert@ulg.ac.be (P. Hubert).

0021-9673/$ – see front matter © 2007 Elsevier B.V. All rights reserved.

doi:10.1016/j.chroma.2007.03.111

ter? (c) What is the appropriate acceptance criterion for a given parameter? Furthermore, method validation is not specific to pharmaceutical industry, but to most industrial fields involving either biology or chemistry. Even though each field of work has

its own characteristics and issues, the main criteria to fulfil are similar or should be similar since the validation of an analyti- cal method is independent of the industrial sector, matrix of the samples or analytical technology employed. A harmonized val- idation terminology should be adopted to allow discussions and comparisons of validation issues between scientists of different fields. This consensus on terminology is not yet available even

if an attempt was made [1,2]. However, if it is desirable to have

a harmonization between the different fields interested in ana-

lytical validation, it is interesting to note that, even within the

pharmaceutical field, all the laboratories are not using the same terminology while they should use similar definitions to describe validation criteria. The terminology used between different offi- cial documents such as the Food and Drug Administration (FDA)

112

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125

guide on validation of bioanalytical methods [3], ICHQ2R1 [4], ISO [5,6], IUPAC [7], AOAC [8] is different. Further- more, in some cases inhomogeneous terminology can be found throughout the same document depending on the section where it is mentioned. Therefore, the knowledge and understanding of these significant differences in terminology and definitions are essential since the methodologies proposed to fulfil the definition criteria can lead to confusion when preparing the validation pro- tocol and the experimental design. Furthermore the subsequent statistical interpretation of the results obtained and the final deci- sion about the validity of the analytical procedure depends on the consistent and adequate definition of the criteria assessed. This leads to highly critical consequences since the validated analytical method will be daily used in routine analysis (batch release, stability assessment, establishment of shelf life, phar- macokinetic or bioequivalence studies, etc.) to make decision of the utmost business and public health consequences. Therefore, the main objective of this review is to reveal the inconsisten- cies between the definitions of the validation criteria and the proposed experimental procedures to perform those criteria as well as the statistical tools mandatory to help the decision about the validity of the analytical procedure. The main points dis- cussed in this review are: (a) the distinction that can be made concerning specificity and selectivity; (b) the clarification of the linearity concept and the difference with the response function; (c) the definition of precision, trueness and accuracy; (d) the discussion about the decision rules to adopt from a statistical point of view; (e) the definition of the dosing range in which the analytical method may be used and, last but not least, (f) the determination of the limit of quantification (LOQ). Finally,

the risks and problems when trying to cope with inconsistent, sometimes scientifically irrelevant, requirements and definitions are highlighted.

2. Specificity or selectivity

The first criterion for an analyst when evaluating an ana- lytical method consists in its capability of delivering signals or responses that are free from interferences and give true results. This ability to discriminate the analyte from interfer- ing components has been confusedly expressed for many years as “selectivity” or “specificity” of a method, depending on area of expertise of the authors. The terms “selectivity” and “specificity” are often used interchangeably while their significances are different. This concept was extensively discussed by Vessman in different papers [9–13]. He particularly pointed out that organizations such as IUPAC, WELAC or ICH are defining specificity and/or selectivity in different manners (Table 1). However, a clear distinction should be made as proposed by Christian [14], “A specific reaction or test is one that occurs only with the substance of interest, while a selective reaction or test is one that can occur with other substances but exhibits a degree of preference for the substance of interest. Few reaction are specific, but many exhibits selectivity”. This is consistent with the concept that selectivity is something that can be graded while specificity is an absolute characteristic. Some tentative to quantify selectivity can be found in the literature [15–19]. For many analytical chemists, it is commonly accepted that specificity is something exceptional since there are, in fact, few methods that respond

Table 1 Definitions of selectivity and specificity in different international organizations

Organization

Definition

Reference

IUPAC

Selectivity (in analysis)

[14]

1.

(qualitative): The extent to which other substances interfere with the determination of a

substance according to a given procedure.

 

2.

(quantitative): A term used in conjunction with another substantive (e.g. constant,

coefficient, index, factor, number) for the quantitative characterization of interferences. Specific (in analysis) A term, which expresses qualitatively the extent to which other substances interfere with the determination of a substance according to a given procedure. Specific is considered to be the ultimate of selective, meaning that no interferences are supposed to occur. The term specificity is not mentioned. Selectivity of a method refers to the extent to which it can determine particular analyte(s) in a complex mixture without interference from other components in the mixture. A method, which is perfectly selective for an analyte or group of analytes is said to be specific. Not defined.

WELAC

[20]

ISO

ICH

Specificity is the ability to assess unequivocally the analyte in the presence of components, which may be expected to be present. Typically these might include impurities, degradants, matrix, etc. Lack of specificity of an individual analytical procedure may be compensated for by other supporting analytical procedure(s). Test for interferences (specificity): (a) Test effect of impurities, ubiquitous contaminants, flavours, additives, and other components expected to be present and at unusual concentrations. (b) test nonspecific effects of matrices. (c) Test effects of transformation products, if method is to indicate stability, and metabolic products, if tissue residues are involved.

[4]

AOAC

[8]

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125

113

to only one analyte. Considering these elements, the IUPAC definition stating that the specificity can be considered as the ultimate selectivity seems to be rational regarding the situation in the pharmaceutical industry [9]. It must be noted that WELAC provides probably the most clear definition of selectivity by saying that a method which is perfectly selective for an analyte is said to be specific [20]. As recommended by the IUPAC and WELAC, the term selectivity should be promoted in analytical chemistry and particularly in separation techniques and the term specificity should be discouraged.

3. Response function and linearity

The response function for an analytical procedure is the exist- ing relationship, within a specified range, between the response (signal, e.g. area under the curve, peak height, absorption) and the concentration (quantity) of the analyte in the sample. The calibration curve should be described preferably by a simple monotonic (i.e. strictly increasing or decreasing) response func- tion that gives reliable measurements, i.e. accurate results as discussed thereafter. The response function – or standard curve – is widely and frequently confounded with the linearity crite- rion. The linearity criterion refers to the relationship between the quantity introduced and the quantity back-calculated from the calibration curve while the response function refers to the relationship between the instrumental response and the con- centration. Because of this confusion, it is very common to

see laboratory analysts trying to demonstrate that the response function is linear in the classical sense, i.e. a conventional least- squared linear model is adapted. As demonstrated by several authors, to systematically force a linear function is not required, often irrelevant and may lead to large errors in measured results (e.g. for bioanalytical methods using LC–MS/MS or ligand bind- ing assays) where the linear range can be different from the working or dosing range [21,22]. A significant source of bias and imprecision in analytical measurements can be caused by the inadequate choice of the statistical model for the calibra- tion curve. The confusion is even contained and maintained in the ICH document. In the terminology part of Q2R1 (formerly

ability (within

a given range) to obtain test results which are directly propor-

Q2A), the linearity is correctly defined as the “

tional to the concentration (amount) of analyte in the sample.”

But later in the methodology section (formerly Q2B) it is men-

tioned that “Linearity should be evaluated by visual inspection

of a plot of signals as a function of analyte concentration or con-

tent.” The text indicates clearly that it is the signal and no more the result that matters in the linearity. The document clearly confounds, on one hand, linearity and calibration curve and, on the other hand, test results and signal. The continuation of the text is self-explicit: “If there is a linear relationship, test results should be evaluated by appropriate statistical methods, for example, by calculation of a regression line by the method of least squares.” For an analyst, the “test results” are, without ambiguity, the back-calculated measurements evaluated by the “regression line” that is in fact the calibration curve, established using appropriate statistics methodologies. Last but not least, the fact that no linearity is needed between the quantity and the

signal is – paradoxically – contained in the last sentence of that section devoted to linearity: “In some cases, to obtain linearity between assays and sample concentrations, the test data may have to be subjected to a mathematical transformation prior to the regression analysis.” Indeed, if any kind of mathematical transformation can be applied to both concentration and/or sig- nal to make their relationship looking like “straight lines” what is the very purpose of requiring linearity? Clearly, the intend of that section was, confusedly, to suggest that to use the clas- sical least-square linear function it is sometimes convenient to apply transformations to the data when the “visual plot” signal versus concentration does not look straight. It is indeed a good trick, largely diffused to establish the standard curve, but that trick should not be interpreted as a scientific necessity to have a “linear” relationship between the concentration and the sig- nal. Hopefully, since 1995 understanding has evolved so that the FDA guidance on Bioanalytical Method Validation issued in May 2001 [3] does not contain any more the word “linearity” but only “calibration/standard curve” without particular restriction except that “The simplest model that adequately describes the concentration–response relationship should be used.” Nevertheless, the same confusion in concept and wording between response function and linearity of results still can be found in the recent book by Ermer and Miller [23]. While those authors indicate that “some analytical procedures have intrin-

sic non-linear response function, such as quantitative TLC

they continue to use the linearity terminology to express the calibration curve. In the same context, HPLC methods coupled to spec- trophotometric detection (UV) are usually linear according to Lambert–Beer’s law while immunoassays are typically non- linear. However, even for HPLC–UV methods covering a large dynamical range, advanced models, such as quadratic models or log–log models, could be necessary. Indeed, it is important to model properly the whole procedure, including all the handling and the preparation of samples that do not obviously remain linear over a large range of concentration even if the detector response is, according to Lambert–Beer’s law. It has to be noted that the complete analytical procedure should be modeled by an overall appropriate response function. As long as the model remains monotonic and allows an accurate measurement, that is all that is required. Another aspect that is very important and that has been largely neglected and ignored in the analytical literature is the fit-for- purpose principle [21]. The central idea is very logical: the purpose of an analytical procedure is to give accurate measure- ments in the future; so a standard curve must be evaluated on its ability to provide accurate measurements. A significant source of bias and imprecision in analytical measurements can be caused by the inadequate choice of the statistical model for the calibra- tion curve. The statistical criteria such as R 2 , lack-of-fit or any

other statistical test to demonstrated quality of fit of a model are only informative and barely relevant for the objective of the assay [21,24–27]. For that intend, several authors [1,2,28] have introduced the use of the accuracy profile based on the toler- ance intervals (or prediction intervals) to decide if a calibration model will give quality results. The models should be retained

114

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125

114 E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125 Fig. 1. Accuracy profiles

Fig. 1. Accuracy profiles of the LC–MS/MS assay for the determination of loperamide in plasma (concentration pg/ml) using (A) linear regression model, (B) weighted linear regression model with a weight equal to 1/X 2 , (C) linear regression model after logarithm transformation, (D) quadratic regression. The dotted lines represent the acceptance limits (15%, 15%); the dashed lines the 95% tolerance interval connected. When the tolerance intervals are included in the acceptance limits, then the assay is able to quantify accurately, other wise not. The continuous line represents the estimated relative bias line.

or rejected based on the accuracy of the back-calculated results regardless of the statistical properties. This approach has already been used by several authors such as Streel et al. [29] for the val- idation of a LC–MS/MS assay for the quantitative determination of loperamide in plasma [29]. As can be seen from Fig. 1 and indicated by the authors, the weighted linear regression provides for the procedure the best accuracy profile as obtained by joining the extremes of the 95% tolerance intervals, i.e. the interval that will contain 95% of the future individual results. Inversely, the simple linear model, the quadratic or even a model with a log–log transfor- mation are not adapted because they do not better contribute to the ultimate goal of the assay, i.e. providing accurate results in the future. Indeed, the tolerance intervals for those three models are not as included in the acceptance limits defined as with the selected model. Nevertheless, as can be seen on Fig. 2, when looking at the quality of fit as usually practiced, the four models exhibited a R 2 > 0.999 for all series. This figure, representing in a way the quality of the “linear” fit [4], does not show any difference from one model to the other. This contrast with the accuracy profile figure, where a major difference exists in the

quality of the results depending on the model selected as standard curve. Another example to illustrate the difference between response function, linearity and fit-for-purpose accuracy profile can be obtained with a high-performance thin-layer chromatographic assay (HPTLC; Fig. 3) and an enzyme-linked immunosorbent assay (ELISA) published in [30] (Fig. 4). Indeed as can be seen, using a quadratic response function for the HPTLC assay or

a non-linear standard curve such as the weighted four param-

eter logistic model for the ELISA, the graphic of the signal

(Figs. 3.a.1 and 4.a.1) does not look linear, while the results as a function of the concentration are linear (Figs. 3.a.2. and 4.a.2.). The same apply with the accuracy profile (Fig. 3.a.3. and 4.a.3) that clearly shows that when using this standard curve, the assay

is able to quantify over a large range. This property is not fulfiled

with the other linear models for both type of assay. In both cases, the fit of the model is acceptable, but none of these two “linear” models show acceptable linearity or accuracy according to ICH definition. The standard curve model selection based on the obtained accuracy of the results was difficult to envisage few years ago

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125

115

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125 115 Fig. 2. Response functions

Fig. 2. Response functions for the LC–MS/MS assay for the determination of loperamide in plasma (concentration pg/ml) for series 2 only using (A) linear regression model (R 2 = 0.9991), (B) weighted linear regression model with a weight equal to 1/X 2 (R 2 = 0.9991), (C) linear regression model after logarithm transformation (R 2 = 0.9997), (D) quadratic regression (R 2 = 0.9991).

because it requires a lot of computing and is a post-data acqui- sition scenario, e.g. evaluation of all the putative calibration models before making a choice. Currently, the computational power is no more a limitation and the selection of a model is perfectly aligned with the objective of the method. Having stressed the difference between response function and linearity, it allows to apply the concept of linearity not only to relative but also to absolute analytical methods such as titra- tion for which the results are not obtained by back-calculation from a calibration curve. Attempts to provide a response func- tion are therefore of no use and impracticable as there is no signal or response whereas the linearity of the results can be assessed. Statistical models for calibration curves can be either linear or non-linear in their parameter(s)—as opposed to linear in shape, indeed a quadratic model Y = α + βX + γX 2 is linear because it is a sum of “X” even if its graphical representation may look “curved” on a XY plot. The choice between these two families of models will depend on the type of method and/or the range of concentrations of interest. When a narrow range is consid- ered, an unweighted linear model is usually adapted, while a larger range may require a more complex or weighted model.

Weighting may be important because a common feature for most analytical methods is that the variance of the signal is a function of the level or quantity to be measured. In case of heterogeneous variances of the signal across the concentration range – which is frequent – it is natural to observe that weighting improve significantly the accuracy of the results, particularly at low concentration levels. When observations are not weighted, an observation more distant to the curve than others has more influence on the curve fit. As a consequence, the curve fit, and so the back-calculated may not be good where the variances are smaller. Regardless of model type, it is assumed that all observations fit to a model are completely independent. In reality, replicates are often not independent for many analytical procedures because of the steps followed in preparation and analysis of samples. In such cases, replicates should not be used separately. Models are typically applied on either a linear scale or log scale of the assay signal and/or the calibrator concentrations. The linear scale is used in case of homogeneous variance across the concentration range and the log scale is usually recommended when variance increases with increasing response, because it suggests that the response is log-normally distributed. Most commonly used types of

116

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125

116 E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125 Fig. 3. Standard curves

Fig. 3. Standard curves (top, 1), linearity profiles (middle, 2) and accuracy profiles (bottom, 3) obtained on a high-performance thin-layer chromatography using (left, a) a quadratic regression model, (right, b) a linear regression model. For the linearity profile and the accuracy profiles, the dotted lines represent the acceptance limits (10%, 10%); the dashed lines the 95% tolerance interval connected. When the tolerance intervals are included in the acceptance limits, then the assay is able to quantify accurately, other wise not. For the linearity profile, the continuous line represents the identity line (result = concentration) while for the accuracy profile the continuous line represents the estimated bias line.

polynomial models include simple linear regression (with or without an intercept) and quadratic regression models. The model parameters are estimated using the restricted maximum likelihood method, which is equivalent to the ordinary least square method when the data are normally distributed.

This being said, and because of fitting techniques, the exper- imental design, i.e. the way to spread the concentration values over the range may significantly impact the precisions of the results or inverse predictions that the response function will pro- vide. As show by Franc¸ois et al. [31], depending on the model

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125

117

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125 117 Fig. 4. Standard curves

Fig. 4. Standard curves (top, 1), linearity profiles (middle, 2) and accuracy profiles (bottom, 3) obtained on an immuno-assay using (left, a) a weighted 4-parameter logistic model, (right, b) a linear regression on the most “linear” part of the response. For the linearity profile and the accuracy profiles, the dotted lines represent the acceptance limits (30%, 30%); the dashed lines the 95% tolerance interval connected. When the tolerance intervals are included in the acceptance limits, then the assay is able to quantify accurately, other wise not. For the linearity profile, the continuous line represents the identity line (result = concentration) while for the accuracy profile the continuous line represents the estimated bias line.

that will be used for the response function, there are designs that give more precise measurements than others. As general good “rule of thumb” for optimally choosing the concentrations val- ues, they show that for most models used in assays, from linear

to four-parameter logistic models, having standard points at the extremes of the range and equally spreading the replicated stan- dard points over the range in between gives excellent results in general, particularly when the model has not yet been clearly

118

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125

identified, which is the case during validation phase. They also stress the importance of having replicates, particularly at the extremes because of the “leverage” of those points on the fit- ting. When the model has been identified, then optimal designs can be envisaged to improve again slightly the precision of the measurements.

4. Accuracy, trueness and precision

4.1. Trueness

As can be seen from the following definition of trueness find

in the ISO documents [5,6], the International vocabulary of basic

and general terms in metrology (VIM) [32] or Eurachem doc- ument on the Fitness for Purpose of Analytical Methods [33], trueness is a concept that is related to systematic errors: ISO 5725-Part 1 (General Principles and Definitions) definition of trueness (section 3.7) is: “The closeness of agreement between

the average value obtained from a large series of test results and an accepted reference value. The measure of trueness is usually expressed in terms of bias. Trueness has been referred

to as “accuracy of the mean”. This usage is not recommended.” Indeed it is expressed as the distance from the average value of

a series of measurements (¯x i ) and a reference value μ T . This concept is measured by a bias, relative bias or recovery:

Bias = x¯ i μ T

Relative bias (%) = 100 × x¯ i μ T

μ T

Recovery (%) = 100 ×

x¯ μ T = 100 relative bias (%)

i

The ISO documents 5725 unambiguously affirm what is true- ness and how to measure it. Application of this concept to the validation experiments is that measuring several times indepen- dent validation standards, for instance i standards, for which the true value of analyte concentration or amount (μ T ) is known allows to compute their predicted concentration or amount: x i . Therefore, it is possible to compute the mean value of these predicted results (¯x i ) and consequently estimate the bias, rela- tive bias or recovery. Those values are well estimated as they are daily computed during the validation step of an analytical procedure. Trueness is related to the systematic errors of the analytical procedures [2,5,6,34]. Trueness refers thus to a char- acteristic or a quality of the analytical procedure and not to a result generated by this procedure. This nuance is fundamental, as we will see thereafter. However, when looking for trueness in the regulatory doc- uments for the validation of the pharmaceutical analytical procedures, this concept is not per se defined. Conscientiously reading both the ICH Q2R1 [4] and the FDA Bioanalytical Method validation [3] documents references to this concept are nonetheless made. When looking at ICH Q2R1 – part 1 the use of trueness is made: “The accuracy of an analytical procedure expresses the closeness of agreement between the value which is accepted either as a conventional true value or an accepted

reference value and the value found. This is sometimes termed trueness.” Whereas for the FDA Bioanalytical Method Vali- dation document this reference is made in the Glossary: “The degree of closeness of the determined value to the nominal or known true value under prescribed conditions. This is some- times termed trueness.” Here can be seen a mix between trueness and by extension accuracy of the mean (by opposition to accu- racy of the results). ISO documents also specify that this use of terminology of accuracy should be avoided and replaced by trueness. When comparing both of those two last quotes of truenes in the ICH or FDA documents to the definition of ISO docu- ments, the main difference is that both documents talk about the distance between the true value and the value found or the deter- mined value whereas the trueness definition of ISO is looking at the distance between the average value and the true value. It is essential to distinguish the difference between a result and an average value. The results of an analytical procedure are its very objective. When examining a quality control sample, the result impacts the decision to release a batch. When unknown samples are determined the results gives information about the therapeutic effect of a drug or about the pathological or physi- ological state of a patient and so on. What matters is to ensure that each unknown or known sample will be determined ade- quately. This average value only gives the central location of the distribution of results of the same true content, not the position of each individual result. By extension, the bias, relative bias or recovery will locate the distribution of the results produce by the analytical procedure relative to the accepted true value.

This incoherence of definition is not only found from one document to another but also in different sections in a single doc- ument, especially in ICH Q2R1 document. In part II, Section 4.3. “Recommended data” relative to accuracy: “accuracy should be reported as percent recovery by the assay of known added amount of analyte in the sample or as the difference between the mean and the accepted true value together with the confidence intervals.” This is coherent with the definition of trueness form the ISO documents and not with the corresponding definition of part I of the ICH document. In other documents confusion between trueness and accuracy is also observed [7,35]. When assessing the acceptability of the bias, relative bias or recovery, the methodology mostly used is to apply the following Student t-test:

H 0 : x¯ i μ T = 0

H 1 : x¯ i μ T = 0

for which a signification level α is set, generally at 0.05 in the pharmaceutical field. This means that it is accepted that the null hypothesis H 0 will be rejected wrongly 5 times out of 100. That is, we accept to erroneously consider the bias different from 0 in 5 times out of 100. When the computed student quantile is higher than the corresponding theoretical quantile, or equiva- lently when the p-value is smaller than α, the null hypothesis is rejected. Therefore, there is a high confidence that the bias is different from 0 as the significant level is fixed by the analyst. Another way to interpret this test is to look if the 0% relative

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125

119

bias or 100% recovery is included in the 1 α confidence inter- val of the relative bias or recovery, respectively. If these values are outside their corresponding confidence interval then the null hypothesis is rejected. However, the only conclusion which can be made when the null hypothesis is not rejected is not that the bias, relative bias or recovery is equal to 0, 0% or 100% but it is that the test could not demonstrate that the bias, relative bias or recovery is different than 0 or 100. As clearly demonstrated in numerous publications [27,36–38], the β risk, which is the probability to wrongly accept the null hypothesis, is not fixed by the user in this situation. Furthermore, this approach can con- clude that the bias is significantly different from 0, whereas it could be analytically acceptable [27,36–38]. It will also always consider that the bias is not different from 0 when the variability of the procedure is relatively high. In fact, the Student t-test used this way is a difference test which answers the question: “Is the bias of my analytical procedure different of 0?” However, the question the analyst is wishing to answer during the validation step of the analytical procedure is: “Is the bias of my analytical procedure acceptable?” The test to answer this last question is an equivalence or interval hypothesis test [27,36–38]. In these types of test, the analyst has to select an acceptance limit for the bias, relative bias or recovery, that is limits in which if the true bias, relative bias or recovery of the analytical procedure lays the trueness of this procedure is acceptable. Different authors have recommended the use of this type of tests to assess the accept- ability of a bias [27,38]. Indeed a perfectly unbiased procedure is utopia. Furthermore the bias obtained during the validation experiment is only an estimation of the true unknown bias of the analytical procedure. Nevertheless, this latest interval hypothe- sis test, while statistically correct, is not answering to the real analytical question: the very purpose of validation is to validate the results a method will produce, not the method itself. We will come back to this objective and explain more in detail, in Section 5, the connections existing between “good” results and “good” methods.

4.2. Precision

Contrary to trueness, homogenous definitions for precision can be found in the regulatory documentation. For instance, the ICH Q2R1 Part 1 definition of precision is: “The precision of an analytical procedure expresses the closeness of agreement (degree of scatter) between a series of measurements obtained from multiple sampling of the same homogeneous sample under the prescribed conditions.” This definition of precision is con- sistent with its definition in the FDA Bioanalytical Method Validation, ISO, Eurachem, IUPAC, FAO and AMC documents. As stated in all documents, precision is expressed as stan- dard deviation (s), variance (s 2 ) or relative standard deviation (RSD) or coefficient of variation (CV). It measures the random error linked to the analytical procedure, i.e. the dispersion of the results around their average value. The estimate of precision is independent of the true or specified value and the mean or trueness estimate. Each document makes reference to different precision levels. For ICH Q2R1 and ISO documents, three levels could be assessed:

(1)

operating conditions over a short interval of time. Repeata- bility is also termed intra-assay precision.” (2) Intermediate precision which “expresses within-laborato- ries variations: different days, different analysts, different equipment, etc.” (3) Reproducibility which “expresses the precision between laboratories (collaborative studies, usually applied to stan- dardization of methodology).”

Repeatability which “expresses the precision under the same

The repeatability conditions involve the re-execution of the entire procedure to the selection and preparation of the test

portion in the laboratory sample and not only the replicate instru- mental determinations on a single prepared test sample. The latter is the instrumental precision which does not include the repetition of the whole analytical procedure. The document of the FDA, also distinguish “within-run, intra- batch precision or repeatability, which assesses precision during

a single analytical run”, and “between-run, inter-batch preci-

sion or repeatability, which measures precision with time, and may involve different analysts, equipment, reagents, and lab- oratories”. As can be seen in this document the same word, namely repeatability, is used twice for both component of variability which is certainly not free of confusion for the ana- lyst. Furthermore this document considers at the same level the variability in a single laboratory or in different labora- tory. The validation of an analytical procedure is performed by

a single laboratory as it has to demonstrate that the analytical procedure is suitable for its intended purpose. The evaluation of laboratory to laboratory method adequacy is usually performed

with the objective to standardize the procedure or to evaluate the performance of several laboratories in a “proficiency test”, also called “ring test”, and is regulated by specific documents and rules. In order to evaluate correctly the two components of variabil- ity of an analytical procedure during the validation phase, the analysis of variance (ANOVA) by concentration level investi- gated is recommended. As long as the design is balanced, i.e. the same numbers of replicates per series for a concentration level, the least square estimations of the variance components can be used. However, when this condition is not met the maximum likelihood estimates of those components should be preferred

[2,30].

From the ANOVA table, the repeatability or within-run precision and the between-run precision are obtained as

follows:

MSM j =

1

p 1

p

i=1

nx ij ,calc μˆ j ) 2

where x¯ ij ,calc is the average of the calculated concentration of the jth concentration level of the ith series; p the number of series; n the number of replicates per series;

μˆ j =

1

pn

p

n

i=1 k=1

x ijk,calc

120

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125

with x ijk, calc being the calculated concentration from the selected response function.

MSE j =

1

pn p

p

n

i=1 k=1

If MSE j < MSM j then:

2

σˆ W,j = MSE j

(x ijk,calc x¯ ij,calc ) 2

Table 2 Experimental design of four runs taking into account days, operators and equip- ments as sources of variability

Run 1

Run 2

Run 3

Run 4

Day 1

Day 1

Day 2

Day 2

Operator 1

Operator 2

Operator 1

Operator 2

Equipment 2

Equipment 1

Equipment 1

Equipment 2

Indeed this parameter shows how important is the series to series (or run to run) variance in comparison to the repeatability variance. High values of R j , e.g. greater than 4, could suggest either a problem with the variability of the analytical procedure whose results may vary from one run to the other, and so leading to the redevelopment of the method, or either stress a lack of number of series (runs) used during the validation process to obtain a reliable estimate of the between-series variance σˆ

In this last situation, all the results within a run are highly correlated to each others providing little effective information in regards to the run to run results. The effective sample size of the validation is consequently smaller than the real sample size used for the design of the validation experiments. The term
2

‘effective sample size’ is used to indicate that when the results

within a run a correlated, then there is in fact less information to judge the quality of results as compared to a situation where all results are fully independent and not dependent of the run they belong to. This is an important feature to take into account for the definition of the degrees of freedom to use for the compu- tation of a confidence interval or of a tolerance interval. Indeed

if the results of repeated experiments are correlated computing

the degrees of freedom with the total number of experiments performed will artificially reduce the confidence interval or the tolerance interval. The Satterthwaite degrees of freedom [40]

include this concept of effective sample by modeling the degrees of freedom between a minimum value – the number of series

– and a maximum value – the total number of experiments

(series × replicates) [41]. Usually, precision is commonly expressed as the percent Rel- ative Standard Deviation (RSD). The classical formula is:

B,j = MSM j MSE j

σˆ 2

n

Else:

σˆ

2 1

W,j =

pn 1

p

n

i=1 k=1

σˆ 2

B,j

= 0

(x ijk,calc x¯ j,calc ) 2

2

B,j .

The intermediate precision is computed as follows:

2

IP,j = σˆ

2

2

W,j + σˆ

2

B,j

σˆ

where σˆ W,j is the within-run or repeatability variance and σˆ

is the between run variance. It is important to note that the misapplications of known vari- ance formula are still widely used and can lead to dramatic overestimation of the variance components [36,39]. As can be seen in the regulatory documents what makes the difference between repeatability and intermediate precision is the concept of series or runs. These series or runs are composed at least of different days with eventually different operators and/or different equipments. A run or series is a period during which analyses are executed under repeatability conditions that remain constant. The rational to select the different factors which will compose the runs/series is to mimic conditions that will be encountered during the routine use of the analytical procedure. It is evident that the analytical procedure will not be used only 1 day. So including the variability from one day to another of the analytical procedure is mandatory. Then during its routine use, will the analytical procedure be used by only one opera- tor, and/or on only one equipment? Depending on the answers of these questions, different factors representing the procedure that will be used during the routinely performed analysis will be introduced in the validation protocol, leading to a representative estimation of the variability of the analytical procedure. When the selection of the appropriate factors is made, an experimental design can be made in order to optimize the number of runs or series to account for the main effects of these factors with a cost effective analysis time. For example if the factor selected are days, operators and equipments, each of them at two levels, then a fractional factorial design allows to execute four runs or series in only 2 days. The design is shown in Table 2. Having computed the variance components, one interesting parameter to observe is the ratio R j , with

B,j

RSD (%) = 100 × √ σˆ 2

ˆ

x¯

where σˆ 2 is the estimated variance and x¯ is the estimated average

value. When an RSD precision is expressed, the corresponding vari- ance is used, e.g. repeatability or intermediate precision. The computed RSD is therefore the ratio of two random variables, giving a new parameter with high uncertainty. However, in the case of validation of analytical procedure, because the true or ref- erence value is known, then the denominator should be replaced by its corresponding true value μ T . The RSD computed by this way depends only on the estimated precision (estimated variances), regardless of the estimated trueness. This being said, the use of relative estimate is convenient from a direct reading point de view but triggers nevertheless

a series of queries: what matters the most for the results, the

(absolute) variance or the relative standard deviation? Imagine

ˆ

σˆ

2

B,j

R j =

σˆ 2

W,j

.

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125

121

that a bioanalytical method is used for supporting a pharmacoki- netic study. In that case the results are used for fitting the PK non-linear model and what matters only is either the variance of the results or the variance of the logarithms of the results, but not the RSD at all. Remember that a procedure is validated for its intended use. So what is the relevance of making a decision on acceptance of a method based on the RSD when only the variance of its results are important with regard to the intend of its use. This distinction becomes particularly important when dealing with the LOQ. Indeed, since the RSD is the SD divided by the true concentration value, the RSD becomes large at the lower end simply because the SD is divided by a small number, not because the method becomes less precise. A good example can be seen by comparing the same information on Fig. 3.a.2 in absolute scale and Fig. 3.a.3 in relative scale. On Fig. 3.a.2 the distance between the two dashed lines represents a multiple of the intermediate precision in absolute value while on Fig. 3.a.3 it is the same value but expressed in relative value. While it appears that on that later figure (a.3) the relative intermediate precision (RSD) explode at the smallest concentration, leading to con- clude that results are not precise enough at that level, it is also clear that, for this example, the absolute intermediate precision improves at the smallest concentration because the intermedi- ate precision SD is smaller. The contradiction comes here from the fact that the SD has been divided by a small number, not because the measurements are less precise, on contrary. This raise questions on the meaning and the definition of the LOQ. Indeed why ignoring or discarding the results at those low lev- els while they are obtained with a variance much smaller than results at high concentrations? Once again, the answer to this question lies in the intended use of the results: for supporting stability or pharmacokinetic studies, not only it is not relevant to discard those very precise measurements at the small concen- trations, but they also are very useful, for example, in estimating accurately the half-life or the pharmacokinetic of metabolites. Only the variance or the SD matters, not the RSD. So, while the common practice evaluate a method with respect to the relative expression of the precision, scientists in the laboratories should carefully consider the absolute and fundamental variance before discarding data and question if it serves or not the objectives of the study.

4.3. Accuracy

In document ICH Q2R1 part 1 [4], accuracy is defined as: “ the closeness of agreement between the value which is accepted either as a conventional true value or an accepted reference value and the value found.” This definition corresponds to the one of ISO [5,6] documents or VIM [32] which states that accuracy is: “the closeness of agreement between a test result and the accepted reference value.” Furthermore, in the ISO definition a note is added specifying that accuracy is the combination of ran- dom error and systematic error or bias. From this and as specified by the Analytical Methods Committee (AMC) [34], it is easily understood that accuracy rigorously applies to results and not to analytical methods, laboratories or operators. The AMC also outlines that accuracy should be used that way in formal writing.

Therefore, accuracy denotes the absence of error of a result. Sim- ilar definitions of accuracy are found in the Eurachem document

[33].

The total measurement error of the results obtained from an analytical procedure is related to the closeness of agreement between the value found, i.e. the result, and the value that is accepted either as a conventional true value or an accepted ref- erence value. The closeness of agreement observed is based on the sum of the systematic and random errors, namely the total error linked to the result. Consequently, the measurement error is the expression of the sum of trueness (or bias) and precision (or standard deviation), i.e. the total error. As shown below, each measurement X has three components: the true sample value μ T , the bias of the method (estimated by the mean of several results) and the precision (estimated by the standard deviation or, in most cases, the intermediate precision). Equivalently, the difference between an observation X and the true value is the sum of the systematic and random errors, i.e. total error or measurement error.

X

X

= μ T + bias + precision

μ T = bias + precision

X μ T = total error

X μ T = measurement error

X μ T = accuracy

However, when looking at the section corresponding to accu- racy in part 2 of ICH Q2R1 document, the recommended data to document accuracy are presented as: “accuracy should be reported as percent recovery by the assay of known added amount of analyte in the sample or as the difference between the mean and the accepted true value together with the confi-

dence intervals.” This refers not anymore to accuracy but instead to the trueness definition of ISO 5725 document because it is the average value of several results – as opposed to a single result as for the accuracy – that is compared to the true value, as already stated previously. This section refers consequently to system- atic errors whereas accuracy as defined in ICH Q2R1 part 1 and ISO 5725 part 1 corresponds to the evaluation of the total measurement error. In the document FDA Bioanalytical Method

the closeness of mean

test results obtained by the method to the true value (concen-

tration) of the analyte. (

15% of the actual value except at LLOQ, where it should not deviate by more than 20%. The deviation of the mean from the true value serves as the measure of accuracy.” As already men- tioned in the previous sections, this definition corresponds to the analytical method trueness. For bioanalytical methods, ear- lier reviews have already stressed the problem of the difference of the definition of accuracy relative to trueness [1,2,27,38]. For most uses it does not matter whether a deviation from the true value is due to random error (lack of precision) or to

The mean value should be within

Validation [3], accuracy is defined as “

)

122

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125

systematic error (lack of trueness), as long as the total quantity of error remains acceptable. Thus, the concept of total analyt- ical error or accuracy as a function of random and systematic error is essential. Furthermore, every analyst wants to ensure that the total amount of error of the method will not affect the interpretation of the test result and compromise the subsequent decision [1,2,21,42–46]. Decision based on the separate evalu- ation of the trueness and precision criteria cannot achieve this. Only evaluation of the accuracy of the results which takes into account the total error concept, gives guarantees to both labo- ratories and regulatory bodies on the ability of the method to achieve its purpose.

5. Decision rule

Most of the regulatory documents do not make any recom- mendation on acceptance limits to help the analyst to decide when an analytical procedure is acceptable. They insist, with the confusions already mentioned, about the criteria that need to be examined, estimated and reported, but only few rules are pro- posed about the way to decide. It is a laboratory competence to justify the decision of accepting and using an analytical method [3]. The only exception found concerns the FDA document on bioanalytical methods that clearly indicates in the pre-study val- idation part: “The mean value should be within ±15% of the theoretical value, except at LLOQ, where it should not deviate by more than ±20%. The precision around the mean value should not exceed 15% of the CV, except for LLOQ, where it should not exceed 20% of the CV.” Later, when referring to in-study validation, the same document indicates: “Acceptance criteria:

At least 67% (4 out of 6) of quality control (QC) samples should be within 15% of their respective nominal value, 33% of the QC samples (not all replicates at the same concentration) may be outside 15% of nominal value. In certain situations, wider acceptance criteria may be justified.” However, these two sec- tions relating to pre-study and to in-study acceptance criteria summarizes very well the in-depth confusion that exists and that triggers many debates in conferences on validation. The pro- posed objective is that for bioanalytical methods, measurements must be sufficiently close from their true value—less than 15%. That’s clearly indicated here: “QC samples should be within 15% of their respective nominal value”. As suggested in the section on accuracy, this objective is not aligned at all with the previous rule for (pre-study) validation that impose limits on methods performance – not the results – such as the mean and the precision that must be better than 15% (20% at the LLOQ). The objective of a quantitative analytical method is to be able to quantify as accurately as possible each of the unknown quantities that the laboratory will have to determine. In other terms, what all analysts expect from an analytical procedure is that the difference between the measurement or observation (X) and the unknown “true value” μ T of the test sample be small or inferior to an acceptance limit λ a priori defined:

λ<X μ T < λ ⇔ |X μ T | < λ

The acceptance limit λ can be different depending on the requirements of the analyst and the objective of the analytical

procedure. The objective is linked to the requirements usually admitted by the practice (e.g. 1% or 2% on bulk, 5% on phar- maceutical specialties, 15% for biological samples, or whatever limits predefined according the intent of use of the results). Therefore, the aim of the validation phase is to gener- ate enough information to have guarantees that the analytical method will provide, in routine, measurements close to the true value without being affected by other elements of the present in the sample, assuming everything else remain reasonably similar. In other words, the validation phase should demonstrate that this will be fulfiled for a large proportion of the results. As already mentioned, the difference between a measurement X and its true value is composed of a systematic error (bias or trueness) and a random error (variance or precision). The true values of these performance parameters are unknown but they can be estimated based on the (pre-study) validation experiments and the reliability of these estimates depends on the adequacy of these experiments (design, size). Consequently, the objective of the validation phase is to eval- uate whether, given or conditionally to the estimates of bias (μˆ M ) and standard deviation (σˆ M ), the expected proportion of measures that will fall within the acceptance limits, later in rou- tine, is greater than a predefined level of proportion, say β, i.e.:

E μ,ˆ σˆ {P[|X μ T | < λ]|μˆ M , σˆ M } ≥ β

However, there exists no exact solution to estimate this expected proportion. An easy solution to circumvent this aspect and make a reliable decision, as already proposed by other authors [1,28,47–49], is to compute the β-expectation tolerance intervals [50]:

E μˆ M ,σˆ M {P X [μˆ M

ˆ M <X<

μˆ M + ˆ M |μˆ M , σˆ M ]} = β

where the factor k is determined so that the expected proportion of the population falling within the interval is equal to β. If the β-expectation tolerance interval obtained that way is totally included within the acceptance limits [λ, +λ] (e.g. [15%, 15%] for bioanalytical methods or [5%, 5%] for analytical methods used for a batch release) then the expected proportion of measurements within the same acceptance limits is greater or equal to β. Most of the time, an analytical procedure is intended to quan- tify over a range of quantities or concentrations. Consequently, during the validation phase, samples are prepared to adequately cover this range, and a β-expectation tolerance interval is calcu- lated at each level. The accuracy error profile is simply obtained by connecting the lower limits and by connecting the upper limits, as can be seen on Fig. 1 or in the bottom of Fig. 3. The inclusion of the measurement error profile within the acceptance limits [λ, λ] at key levels must be examined before declaring that the procedure is valid over a specific range of values. β will usually be chosen above 80% and as shown by Boulanger et al. [43,44], choosing 80% for β during pre-study validation guarantees that 90% of the runs will later be accepted in routine when the 4–6–l (e.g. 4–6–15) rule is used in routine.

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125

123

That way, the pre-study validation decision and the (in-study) routine decision rule for acceptance of runs becomes aligned with their respective risks which is not the case as proposed by the FDA guide [3]. Indeed having a mean (trueness) smaller than 15% and a precision (CV%) smaller than 15% does not guaran- tee at all that most future results will be within [15%, 15%]. There are two statistical errors behind this classical assumption that could be summarized as “good methods give good results”. First, as pointed out in the section on accuracy, the difference between the result and its true value is composed by the sys- tematic error (trueness) plus the random error (precision). So, if for example, a method shows an estimated mean of 14% and an estimated precision of 14% as well, it is then obvious to imag- ine that most results will likely fall outside the acceptance limits and so most runs will be rejected. Second, predicting what will happen in the future routine depend largely on the quality of the estimates of the mean and precision, i.e. primarily of the number of observations collected and the conditions of the experiments. If the mean and the precision are estimated based on too few measurements during pre-study validation, or with conditions

(operator, days,

) not representative from the routine use,

and ULOQ is the upper limit of quantitation. Thus, the above- mentioned definitions are quite similar because for both of them, the range is correlated with the linearity and the accu- racy (trueness + precision). Moreover, both documents specify that the range is dependent on the specific application of the procedure. ICH Q2R1 part 2 states that the specified range is “established by confirming that the analytical procedure pro- vides an acceptable degree of linearity, accuracy and precision when applied to samples containing amounts of analyte within or at the extremes of the specified range of the analytical proce- dure”. IUPAC defines the range as a “set of values of measured for which the error of a measuring instrument is intended to lie within specified limits”. The range should be anticipated in the early stage of the method development and its selection is based on previous infor- mation about the sample, in a particular study. The chosen range determines the number of standards used in constructing a cali- bration curve. ICH Q2R1 part 2 recommends the minimum specified ranges for different studies:

there is poor confidence that the true bias or the true precision are in fact not greater than acceptance criteria. The tolerance

(i)

for the assay of a drug substance or a finished (drug) prod- uct: normally from 80 to 120% of the test concentration;

interval approach, which is a prediction interval, avoids those two pitfalls and correctly estimates the expected proportion of good results depending on the performance criteria and the qual- ity (size, design) of the performed experiments. If the tolerance

(ii)

for content uniformity, covering a minimum of 70–130% of the test concentration, unless a wider more appropriate range, based on the nature of the dosage form (e.g. metered dose inhalers), is justified;

interval approach can prevent making decision based on poor

(iii)

for dissolution testing: ±20% over the specified range;

data, it remains the responsibility of the analyst to ensure that the experimental conditions used during the (pre-study) vali-

(iv)

for the determination of an impurity: from the reporting level of an impurity to 120% of the specification.

dation reflects what will be used and practice in routine. The subtle difference between the method and the procedure should be stressed here: in the validation experiments, the various oper- ational aspects or potential sources of variance must be included in the experiments to anticipate what could happen later in rou- tine. The most classical factors are the operators, the column lot, different set-ups, independent preparation of samples, etc. in order to simulate or mimic as closely as possible the daily practice and set of procedure around the use of the method. As already indicated, it is the whole procedure or practice that must be validated, not only the method in its most restrictive sense.

Therefore, the dosing range is the concentration or amount interval over which the total error of measurement – or accuracy – is acceptable. It is essential to demonstrate the accuracy of the results over the entire range. Consequently, and in order to fulfil these definitions, the proposition of ICH document to realize six measurements only at the 100% level of the test concentration to assess the precision of the analytical method should be used with precautions to be in accordance with the definition of the range. Accuracy, and therefore trueness and precision should be evaluated experimentally and acceptable over the whole range targeted for the application of the analytical procedure.

6. Dosing range

For any quantitative method, it is necessary to determine the range of analyte concentrations or property values over which the method may be applied. ICH Q2R1 part 1 document defines the range of an analytical procedure as “the interval between the upper and lower concentration (amounts) of analyte in the sample (including these concentrations) for which it has been demonstrated that the analytical procedure has a suitable level of precision, accuracy and linearity”. The FDA Bioanalyti- cal Method validation definition of the quantification range is “the range of concentration, including ULOQ and LLOQ, that can be reliably and reproducibly quantified with accu- racy and precision through the use of a concentration–response relationship”, where LLOQ is the lower limit of quantitation

7. Limit of quantitation

ICH considers that the “quantitation limit is a parameter of quantitative assays for low levels of compounds in sample matrices, and is used particularly for the determination of impu- rities and/or degradation products”. ICH Q2R1 part 1 defines the quantitation limit of an individual analytical procedure as “the lowest amount of analyte in a sample which can be quantitatively determined with suitable precision and accuracy”. Limit of quan- titation (or quantitation limit) is often called LOQ. Both terms are used in regulatory documents, the meaning being exactly the same. ICH document defines only one limit of quantitation. But the quantification range of the analytical procedure has two lim- its: LLOQ and ULOQ. In the definition of quantitation limit(s)

124

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125

excerpted from IUPAC, Eurachem let us understand that there is more than one limit of quantification: “quantification limits are performance characteristics that mark the ability of a chemical measurement process to adequately quantify an analyte”. But, in Eurachem document, only LLOQ, called “quantification limit” is discussed. The FDA Bioanalytical Method Validation document distin- guishes between the two limits and defines the lower limit of quantification (the lowest amount of an analyte in a sample that can be quantitatively determined with suitable precision and accuracy) and the upper limit of quantification (the high- est amount of an analyte in a sample that can be quantitatively determined with precision and accuracy). As can be seen in this document, the only difference is the substitution of “lowest” with “highest” word. ICH Q2R1 part 2 proposes exactly the same approaches to

estimate the (lower) quantification limit as for the detection limit. A first approach is based on the well known signal-to-noise (s/n) ratio approach. A 10:1 s/n is considered by ICH document to be sufficient to discriminate the analyte from the background noise. The main problem appears when the measured signal is not the signal used to quantify the analyte. For example, in chromatography with spectral detection, the measured signal represents the absorption units, i.e. the signal height but for the quantitation the areas are generally used. Therefore, the quan- titation limit is not expressing the lowest level of the analyte,

but lowest quantified absorbance. The problem becomes more

complicated in electrophoresis, where the signal is usually con- sidered as the ratio between the peak area and the migration time. The other approaches proposed by ICH Q2R1 part 2 docu- ment are based on the “Standard Deviation of the Response and the Slope” and it is similar to the approach used for detection limit computation. The computation ways for detection (DL) and quantitation limit (QL) are similar, the only difference being the multiplier of the standard deviation of the response:

DL

3.3σ

S

QL 10σ

S

where σ is the standard deviation of the response and S = the slope of the calibration curve. The same problems explained previously arise for the detec- tion in chromatography or electrophoresis. On one hand, ICH

Q2R1 part 2 document assumes that the calibration is linear, that is not always true. On the other hand, two ways of measuring σ

are proposed: those “based on Standard Deviation of the Blank”

and those “based on the Calibration Curve”. Neither of these alternatives offers the adequate solution. The former because the assumption that the signal units are the same as the measured units for the calibration and the latter because of the assumption that LOQ range is already known. Other problems with those methods of estimation of limits of quantitation are that they assume that there is a measurable noise, which is not always the case. Furthermore, when possible, these approaches are dependant on the manner the noise is measured

and depend from one instrument to another or internal opera- tional set-up such as signal data acquisition rate or the detector time constant. The LOQ estimated using the signal to noise ratio

is extremely subjective [51,52] and is equipment dependant. The approaches using the standard deviation of the intercept should be carefully used as the estimation of this intercept is depen- dant of the range of the calibration curve: the intercept is only well estimated if the concentrations used are sufficiently small. Furthermore each of these approaches provides different value of the lower limit of quantitations [51,52]. This is highly prob- lematic as it does not allow to compare the LOQ of different laboratories using the same analytical procedure. Another approach to estimate the lower limit of quantita- tion is proposed by Eurachem, based on a target RSD [33]. The RSD at concentration levels close to the expected LOQ are plot- ted versus their concentration, and a curve is fitted to this plot. When the curve crosses the target RSD the corresponding con- centration levels is the LOQ. This approach alleviates most the problems stressed to the previous approaches, as it is not any more equipment and operator dependant. Still however, none of these approaches fulfil the definition of the LOQ. Indeed even with this last approach, only the precision of the analytical pro- cedure has been assessed without trueness estimation and the whole accuracy (trueness + precision) as required. In our opinion, the best way to compute both quantitation limits (LLOQ and ULOQ) is the use of the accuracy profile approach [1,2,29,30,43–45,47–49] which fulfil the LOQ criteria requirement by demonstrating that the total error of the result is known and acceptable at these concentration levels, i.e. both an acceptable level of systematic and random errors.

8. Conclusion

For analysts, method validation is the process of proving that an analytical method is acceptable for its intended pur- pose. In order to resolve this very important issue, analysts refer to regulatory or guidance documents which can differ in sev- eral points. Therefore, the validity of the analytical method is partially dependant on the chosen guidance, terminology and methodology. It is therefore highly essential to have clear def- initions of the validation criteria used to assess this validity, to have methodologies in accordance with these definitions and consequently to use statistical methods which are relevant with these definitions, the objective of the validation and the objective of any analytical methods. Repositioning the definitions and the methodologies during revision processes of regulatory documents to eliminate con- tradictory, sometimes scientifically irrelevant, requirements and definitions should be recommended and rapidly implemented.

Acknowledgements

Thanks are due to the Walloon Region and the European Social Fund for a research grant to E.R. (First Europe Objective 3 project No. 215269).

References

[1] Ph. Hubert, J.-J. Nguyen-Huu, B. Boulanger, E. Chapuzet, P. Chiap, N. Cohen, P.-A. Compagnon, W. Dewe, M. Feinberg, M. Lallier, M. Laurentie,

E. Rozet et al. / J. Chromatogr. A 1158 (2007) 111–125

125

N. Mercier, G. Muzard, C. Nivet, L. Valat, J. Pharm. Biomed. Anal. 36

(2004) 579.

[2] Ph. Hubert, J.-J. Nguyen-Huu, B. Boulanger, E. Chapuzet, P. Chiap, N. Cohen, P.-A. Compagnon, W. Dewe, M. Feinberg, M. Lallier, M. Laurentie,

N. Mercier, G. Muzard, C. Nivet, L. Valat, STP Pharma Pratiques 13 (2003)

101.

[3] Guidance for Industry: Bioanalytical Method Validation, US Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER), Rockville, May 2001. [4] International Conference on Harmonization (ICH) of Technical Require- ments for Registration of Pharmaceuticals for Human Use, Topic Q2 (R1):

Validation of Analytical Procedures: Text and Methodology, Geneva, 2005. [5] ISO 5725, Application of the Statistics-Accuracy (Trueness and Precision) of the Results and Methods of Measurement—Parts 1 to 6, International Organization for Standardization (ISO), Geneva, 1994. [6] ISO 3534-1: Statistics—Vocabulary and Symbols. International Organiza- tion for Standardization (ISO), Geneva, 2006. [7] M. Thompson, S.L.R. Ellison, R. Wood, Pure Appl. Chem. 74 (2002) 835. [8] Association of Official Analytical Chemists (AOAC), Official Methods of Analysis, vol. 1, AOAC, Arlington, VA, 15th ed., 1990, p. 673. [9] J. Vessman, J. Pharm. Biomed. Anal. 14 (1996) 867. [10] J. Vessman, Accred. Qual. Assur. 6 (2001) 522. [11] B.-A. Persson, J. Vessman, Trends Anal. Chem. 17 (1998) 117. [12] B.-A. Persson, J. Vessman, Trends Anal. Chem. 20 (2001) 526.

J. Vessman, R.I. Stefan, J.F. Van Staden, K. Danzer, W. Lindner, D.T. Burns,

[13]

A. Fajgelj, H. Muller,¨ Pure Appl. Chem. 73 (2001) 1381.

[14] A.D. McNaught, A. Wilkinson, IUPAC Compendium of Chemical Termi- nology, second ed., Blackwell, Oxford, 1997. [15] H. Kaiser, Fresenius Z. Anal. Chem. 260 (1972) 252.

[16] D.L. Massart, B.G. Vandeginste, S.N. Deming, Y. Michotte, L. Kaufman, Chemometrics, Elsevier, Amsterdam, 1988, p. 115. [17] A. Lorber, K. Faber, B.R. Kowalski, Anal. Chem. 69 (1997) 69. [18] K. Faber, A. Lorber, B.R. Lowaski, J. Chemometrics 11 (1997) 419. [19] K. Danzer, Fresenius J. Anal. Chem. 369 (1997) 397.

[20]

WELAC Guidance Documents WG D2, Eurachem/Western European Lab-

oratory Accreditation Cooperation (WELAC) Chemistry, Teddington, UK, first ed., 1993. [21] J.W. Lee, V. Devanarayan, Y.C. Barrett, R. Weiner, J. Allinson, S. Fountain,

S. Keller, I. Weinryb, M. Green, L. Duan, J.A. Rogers, R. Millham, P.J.

O’Brian, J. Sailstad, M. Khan, C. Ray, J.A. Wagner, Pharm. Res. 23 (2006)

312.

[22] UK Department of Trade and Industry, Manager’s Guide to VAM, Valid Analytical Measurement Programme, Laboratory of the Government Chemist (LGC), Teddington, UK, 1998; http://www.vam.org.uk. [23] J. Ermer, J.H.M. Miller, Practical Method Validation in Pharmaceutical Analysis, Wiley-VCH, Weinheim, 2005. [24] Analytical Methods Committee, Analyst 113 (1988) 1469. [25] S.V.C. de Souza, R.G. Junqueira, Anal. Chim. Acta 552 (2005) 25. [26] J. Ermer, H.-J. Ploss, J. Pharm. Biomed. Anal. 37 (2005) 859.

[27]

C. Hartmann, J. Smeyers-Verbeke, D.L. Massart, R.D. McDowall, J. Pharm. Biomed. Anal. 17 (1998) 193.

[28] D. Hoffman, R. Kringle, J. Biopharm. Stat. 15 (2005) 283.

[29] B. Streel, A. Ceccato, R. Klinkenberg, Ph. Hubert, J. Chromatogr. B 814 (2005) 263. [30] Ph. Hubert, J.J. Nguyen-Huu, B. Boulanger, E. Chapuzet, P. Chiap, N. Cohen, P.A. Compagnon, W. Dewe, M. Feinberg, M. Lallier, M. Laurentie,

N. Mercier, G. Muzard, C. Nivet, L. Valat, STP Pharma Pratiques 16 (2006)

87.

[31] N. Franc¸ois, B. Govaerts, B. Boulanger, Chemometr. Intell. Lab. Syst. 74 (2004) 283. [32] ISO VIM. DGUIDE 99999.2, International Vocabulary of Basic and Gen- eral terms in Metrology (VIM), third ed., ISO, Geneva, 2006 (under approval step). [33] The Fitness for Purpose of Analytical Methods, Eurachem, Teddington,

1998.

[34]

Chemistry, London, 2003; http://www.rsc.org/Membership/Networking/ InterestGroups/Analytical/AMC/TechnicalBriefs.asp. [35] Food and Agriculture Organization of the United Nations (FAO), Codex Alimentarius Commission, Procedural Manual, 15th ed., Rome, 2005.

[36] H. Rosing, W.Y. Man, E. Doyle, A. Bult, J.H. Beijnen, J. Liq. Chrom. Rel. Technol. 23 (2000) 329. [37] J. Ermer, J. Pharm. Biomed. Anal. 24 (2001) 755. [38] C. Hartmann, D.L. Massart, R.D. McDowall, J. Pharm. Biomed. Anal. 12 (1994) 1337. [39] C.R. Jensen, Qual. Eng. 14 (2002) 645. [40] F. Satterthwaite, Psychometrika 6 (1941) 309. [41] B. Boulanger, P. Chiap, W. Dewe, J. Crommen, Ph. Hubert, J. Pharm. Biomed. Anal. 32 (2003) 753. [42] B. DeSilva, W. Smith, R. Weiner, M. Kelley, J. Smolec, B. Lee, M. Khan,

Analytical Methods Committee, AMC Technical Brief 13, Royal Society of

R. Tacey, H. Hill, A. Celniker, Pharm. Res. 20 (2003) 1885.

[43] B. Boulanger, W. Dewe, Ph. Hubert, B. Govaerts, C. Hammer, F. Moonen, Accuracy and Precision (total error vs. 4/6/30), AAPS Third Bioan- alytical Workshop: Quantitative Bioanalytical Methods Validation and Implementation—Best Practices for Chromatographic and Ligand Bind- ing Assays, Arlington, VA, 1–3 May 2006; http://www.aapspharmaceutica.

com/meetings/meeting.asp?id=64.

[44] B. Boulanger, W. Dewe, A. Gilbert, B. Govaerts, M. Maumy-Bertrand, Chemometr. Intell. Lab. Syst. 86 (2007) 198. [45] J.W.A. Findlay, W.C. Smith, J.W. Lee, G.D. Nordblom, I. Das, B.S. DeSilva, M.N. Khan, R.R. Bowsher, J. Pharm. Biomed. Anal. 21 (2000)

1249.

[46] H.T. Karnes, G. Shiu, V.P. Shah, Pharm. Res. 8 (1991) 421.

[47] Ph. Hubert, J.J. Nguyen-Huu, B. Boulanger, E. Chapuzet, P. Chiap, N. Cohen, P.A. Compagnon, W. Dewe, M. Feinberg, M. Lallier, M. Laurentie,

N. Mercier, G. Muzard, C. Nivet, L. Valat, STP Pharma Pratiques 16 (2006)

28.

[48] M. Feinberg, B. Boulanger, W. Dewe, Ph. Hubert, Anal. Bioanal. Chem. 380 (2004) 502. [49] A.G. Gonzalez, M.A. Herrador, Talanta 70 (2006) 896. [50] R.W. Mee, Technometrics 26 (1984) 251. [51] J. Vial, A. Jardy, Anal. Chem. 71 (1999) 2672. [52] J. Vial, K. Le Mapihan, A. Jardy, Chromatographia 57 (2003) S-303.