Predictive Influence in The Accelerated Failure Time Model

Biostatistics (2002), 3, 3, pp.
331346
Printed in Great Britain
Predictive inuence in the accelerated failure time
model
EDWARD J. BEDRICK
Department of Mathematics and Statistics, University of New Mexico, Albuquerque,

NM 87131, USA
bedrick@stat.unm.edu
ALEX EXUZIDES
Exponent Inc., Menlo Park, CA USA
WESLEY O. JOHNSON
Department of Statistics, University of California, Davis, CA 95616, USA
MARK C. THURMOND
Department of Medicine and Epidemiology, University of California, Davis, CA, 95616, USA
SUMMARY
We develop case deletion diagnostics for prediction of future observations in the accelerated failure
time model. We view prediction to be an important inferential goal in a survival analysis and thus it is
important to identify whether particular observations may be inuencing the quality of predictions. We
use the KullbackLeibler divergence as a measure of the discrepency between the estimated probability
distributions for the full and the case-deleted samples. In particular, we focus on the effect of case deletion
on estimated survival curves but where we regard the survival curve estimate as a vehicle for prediction.
We also develop a diagnostic for assessing the effect of case deletion on inferences for the median time to
failure. The estimated median can be used with both predictive and estimative purposes in mind. We also
discuss the relationship between our suggested measures and the corresponding Cook distance measure,
which was designed with the goal of assessing estimative inuence. Several applications of the proposed
diagnostics are presented.
Keywords: Case deletion; Cooks distance; KullbackLeibler divergence; Survival analysis.
1. INTRODUCTION
Our goal in this paper is to develop statistical methods for the detection of inuential observations in
the parametric accelerated failure time (AFT) model. There has been a tremendous interest in developing
such methods in linear and nonlinear regression models, generalized linear models and, more recently,
in failure time or survival models. A standard approach considers the effect that deleting single cases
or subsets of cases has on the estimated regression coefcients. This focus has led to several well known
inuence measures, for example Cooks (1977) distance and Belsley et al.s (1980) DFBETAs. Alternative
To whom correspondence should be addressed

c Oxford University Press (2002)
332 E. J. BEDRICK ET AL.
approaches, based on the notion of local inuence (Cook, 1986; Lawrance, 1991), consider the impact that
perturbing cases, rather than deleting them, has on parameter estimates and inferences.
Johnson & Geisser (1983) and Johnson (1985) argued that prediction is the ultimate goal in many
statistical analyses, and thus diagnostics are needed to assess the impact that subsets of the data have on
prediction. We agree with this view, and note that cases that have a large impact on regression coefcients
may have little effect on predictions, and conversely, inuential cases for prediction might have little
impact on estimation (see Christensen et al. (1992) for examples in spatial data).
In this paper, we develop case deletion diagnostic measures for identifying inuential cases for
prediction of future observations in the AFT model. Having identied such cases, the practical import of
case deletion is assessed by viewing estimated survival curves corresponding to covariate combinations of
interest and comparing them with the corresponding curves based on data with the candidate case having
been removed. If some of the estimated curves of interest change dramatically upon removal of a case,
prognoses based on these curves would correspondingly be affected. Such difference in prognoses should
be of interest to physicians who are informing the corresponding patients about their survival prospects.
The AFT model is often viewed as a competitor to the proportional hazards (PH) model (Cox, 1972)
when the PH model fails to t. Although the AFT model with an unspecied error distribution is a natural
analog to the PH model, there are no widely accepted methods for implementing this approach. Miller
(1976), Buckley &James (1979), Koul et al. (1981) and Christensen &Johnson (1988) developed methods
for a semi-parametric AFT model, but difculties inherent in the model made their approaches somewhat
unpalatable. Kuo & Mallick (1997) and Walker & Mallick (1999) present novel Bayesian approaches but
it will take time to determine their long-term viability. We will focus our attention on the parametric AFT
model, which Aalen (2000) noted is underused in medical research and deserving of more attention.
Earlier work on diagnostics for the AFT model, for example Weissfeld & Schneider (1990a,b) and
Escobar & Meeker (1992), considered the effect of single cases and subsets of the data on parameter
estimates and functions of the parameter estimates, such as the median lifetime. We use the Kullback
Leibler (KL) divergence as a diagnostic measure of the discrepency between estimated probability
distributions based on full and case-deleted samples. A large KullbackLeibler number for a specic
case is an indication that substantially different predictive inferences might result if that case were
deleted. There is considerable literature in which the KL divergence is used for purposes similar to ours
McCulloch (1989), Carlin & Polson (1991), Geisser (1993) and Soo (1994) to name a few. Although we
focus on prediction, we recognize that inference on regression coefcients is an important consideration
in many studies. Consequently, we make detailed comparisons of our diagnostics with Cooks distance,
and illustrate similarities and differences between approaches throughout the paper.
The remainder of the paper is organized as follows. Section 2 discusses the AFT model and denes the
KL divergence. Section 2 also proposes a diagnostic that measures the effect that individual cases have on
the estimated percentiles. Section 3 explores properties of the KL divergence for standard accelerated
failure models, such as the log-normal, the log-logistic, and the Weibull, and relates the divergence
to Cooks distance. Section 4 illustrates the methods using two examples. Section 5 gives concluding
remarks. Technical results are summarized in an Appendix.
2. NOTATION
The AFT model is specied by
log(T
i
) = x
i
+ U
i
, i = 1, . . . , n U
i
i i d
S
0
(), (1)
where the T
i
are actual, sometimes unobserved, survival times, the x
i
are xed p1 vectors of covariates,
is the vector of regression coefcients, is a scale parameter, and S
0
() is a known baseline survivor
Predictive inuence in the accelerated failure time model 333
Table 1. Baseline distributions
Baseline Distribution f
0
(u) S
0
(u)
Normal
1
2
e
0.5u
2
1 (u)
Logistic e
u
_
(1 + e
u
)
2
(1 + e
u
)
1
Extreme value e
(ue
u
)
e
e
u
function. For example, we obtain the usual log-normal survival model if the baseline is standard normal,
the log-logistic model if S
0
is logistic, and the Weibull if S
0
is the extreme-value distribution. These
baseline survival functions S
0
(u) along with their densities f
0
(u) are given in Table 1. Using the change-
of-variable formula, the density of T
i
is f
0
(z
i
)/t
i
, where z
i
= {log(t
i
) x
i
}/.
Exponential regression results from choosing the extreme value baseline and setting = 1.
Kalbeisch & Prentice (1980) discuss the log F baseline which generalizes these distributions and others.
Our approach could be applied to that more general family but, for illustrative purposes, we restrict
ourselves to the three distributions in Table 1.
Consider censored survival data (t
i
,
i
, x
i
), i = 1, . . . , n, where t
i
is the random observed time (either
survival or censoring) and
i
is the random indicator of noncensoring, that is
i
= 0 if the i
th
observation
is censored, and 1 otherwise. We assume that the censoring mechanism is independent of the survival
times. The log-likelihood function for
= (
, ) is l()

n
i =1
l
i
(), where
l
i
() l
i
(|t
i
,
i
) =
i
log
_
1
t
i
f
0
(z
i
)
_
+ (1
i
)logS
0
(z
i
)
is the log-likelihood based on the i
th
observation. Let

= (

, ) be the maximum likelihood estimate

(MLE) of , and dene
f (t ) =
1
t
f
0
_
log(t ) x

_
to be the MLE of the density of T when the covariate vector is x.
2.1 The KullbackLeibler divergence
Our primary goal is the assessment of the inuence that individual cases have on the prediction of future
survival times, and survival probabilities associated with future subjects. For this purpose, we consider
the symmetric KL divergence (Johnson, 1985), which measures the discrepency between two probability
distributions with densities g
1
and g
2
as follows: J(g
1
, g
2
) = I (g
1
, g
2
) + I (g
2
, g
1
), where
I (g
i
, g
j
) =
_
g
i
(t )log
_
g
i
(t )
g
j
(t )
_
dt.
Note that J(g
1
, g
2
) 0 with J(g
1
, g
2
) = 0 if and only if the densities are identical. In particular, we
propose the symmetric divergence J(

f ,

f
(i )
) to measure the discrepency between the full data estimative
density

f and the estimative density
f
(i )
(t ) =
1
t
(i )
f
0
_
log(t ) x
(i )

(i )
_
,
where

(i )
= (

(i )
,
(i )
) is the MLE when the i th case in the sample is held out. More generally, the
collective effect that case i has on predicting the survival times of m future observations with covariate
vectors x
1
, x
2
, . . . , x
m
is given by
D
i
=
m
j =1
J(

f
j
,

f
j (i )
),
where

f
j
and

f
j (i )
are the estimated densities when x = x
j
. A possible choice for the x
j
is the covariate
values x
1
, x
2
, . . . , x
n
for the n cases in the sample. A large value of J(

f
j
,

f
j (i )
) or

D
i
indicates that
deletion of case i results in different predicted survival probabilities than if it were retained, possibly
resulting in different inferences or decisions. Johnson (1985) proposed a similar diagnostic for logistic
regression.
To better understand the results, consider the log-normal model for which
J(

f ,

f
(i )
) = I (

f ,

f
(i )
) + I (

f
(i )
,

f )
= 0.5
_

2

2
(i )
+

2
(i )

2
2 + (

(i )
)
xx
(i )
)
_
1

2
(i )
+
1

2
__
.
A second-order Taylor series expansion of the rst part of J(

f ,

f
(i )
) gives
J(

f ,

f
(i )
)
.
=
2(
(i )
)
2

2
+ 0.5(

(i )
)
xx
(i )
)
_
1

2
(i )
+
1

2
_
.
If we aggregate the KL divergences over the n covariate values x
1
, x
2
, . . . , x
n
in the sample we get
D
i
.
=
2n(
(i )
)
2

2
+ 0.5(

(i )
)
X(

(i )
)
_
1

2
(i )
+
1

2
_
, (2)
where X is the n p design matrix with i th row x
i
. The diagnostic is the sum of two components
that separately measure the effect of the i th case on the estimated scale and regression coefcients.
Alternatively, if we are interested in the prediction of the survival times for m future observations with
covariate values x
1
, x
2
, . . . , x
m
, then (2) holds with n = m and X = X
f
, where X
f
is the m p matrix
with i th row x
i
.
2.2 Inuence on estimated percentiles
The KL divergence

D
i
measures the inuence of individual cases on the survival distribution. The
inuence on specic percentiles, say the median survival time, might also be of interest both in terms of
making summary prognoses for specic individuals or for estimating median survival for populations of
individuals with specic covariate combinations. To address this issue, note that the estimated percentile
for the survival time T of a future observation with covariate vector x is

T
= exp( x

+ u
) =
exp( x

), where u
is the percentile of the baseline distribution and x
= ( x
, u
). The estimated
variance of

T
is

T
2
I(
)
1
x
, where I(
) =
l(
) is the observed information matrix evaluated at

.
An expression for
l(
) is given in the Appendix.

The inuence that the i th case has on the estimated percentile can be measured by
,(i )
( x) =
(

T

T
,(i )
)
2
T
2
I(
)
1
x
,
which is a scaled version of the relative change |

T

T
,(i )
|/

T
in the estimated percentile obtained

by holding out the i th case. Escobar & Meeker (1992) proposed an analogous measure based on local
inuence.
The overall effect that individual observations have on the estimated percentiles for m future
observations with covariate values x
1
, x
2
, . . . , x
m
is given by
,(i )
=
m
j =1
,(i )
( x
j
).
As with

D
i
, a possible choice for the x
j
is the covariate values for the n cases in the sample.
3. EXPLORING THE KULLBACKLEIBLER DIVERGENCE
The KL divergence J(

f ,

f
(i )
) does not have an analytic form for certain AFT models, such as the log-
logistic. We consider two approximations to the KL divergence for such models, one based on numerical
integration, and the other based on a Taylor series expansion.
To develop the Taylor series approximation, consider I (

f ,

f
(i )
). Let u = {log(t ) x

}/ , and dene
a = /
(i )
and b = x
(i )
)/
(i )
. Note that {log(t ) x
(i )
}/
(i )
= au + b. Given this notation, a
change of variables gives
I (

f ,

f
(i )
) = log(a) +
_
f
0
(u)log
_
f
0
(u)
f
0
(au + b)
_
du
= log(a) + E{log f
0
(U) log f
0
(aU + b)},
where U has density f
0
(u), and where a and b are constants with respect to the integration. Let H(u)
log f
0
(u)/u and G(u)
2
log f
0
(u)/u
2
. We show in the Appendix that a second-order Taylor series
expansion of I (

f ,

f
(i )
) about the point (a, b) = (1, 0) gives the following approximation:
I
A
(

f ,

f
(i )
) = log(a) E[{(a 1)U + b}H(U)] 0.5E[{(a 1)U + b}
2
G(U)].
Table 2 gives expressions for I (

f ,

f
(i )
) and I
A
(

f ,

f
(i )
) for the three standard AFT models. Analogous
approximations apply to I (

f
(i )
,

f ), J(

f ,

f
(i )
), and

D
i
. The approximation to

D
i
will be denoted

D
A,i
. It
is our experience that the approximations are accurate and that the approximation error has little impact on
identifying the inuential cases. For example, the average relative error in the approximation to J(

f ,

f
(i )
)
is estimated to be 2% and 6% for the log-logistic and the Weibull models, respectively, when (a, b) is
uniformly distributed on the rectangle with log(2/3) log(a) log(3/2) and 1 b 1. The
approximations are exact for the log-normal model.
The KL divergence is a function of the MLEs, which must be computed iteratively. A one-step
approximation to the MLEs for the case-deleted samples
(i )
.
=

+ {
l(
)

l
i
(
)}
1
l
i
(
)
is recommended with the KL divergences (and other diagnostics) to simplify the calculations. We
explore the accuracy of the one-step approximation in Section 4. A general expression for the one-step
approximation is given in the Appendix.
3.1 A Comparison of the divergence and Cooks distance
The effect that individual cases have on the parameter estimates can be measured by Cooks distance
C
i
= (
(i )
)
I(
)(
(i )
). Cooks distances for subsets of , for example the individual regression
coefcients
j
, are dened analogously.
Table 2. Exact and approximate KL divergences: I = I (

f ,

f
(i )
), I
A
= I
A
(

f ,

f
(i )
), where a =
/
(i )
and b = x
(i )
)/
(i )
; = 0.577215 is Eulers constant
Baseline Dist.
Normal I = 0.5(a
2
2log(a) 1 + b
2
)
I
A
= 0.5(a
2
2log(a) 1 + b
2
)
Logistic I = log(a) b 2 + 2
_
+
0
log(1 + e
b
k
a
)/(1 + k)
2
dk.
I
A
= a log(a) 1 + 0.5(
2
/9 2/3)(a 1)
2
+ b
2
/6
Extreme value I = log(a) + (a 1) b 1 + e
b
(1 + a)
I
A
= a log(a) 1 + 0.5(a 1)
2
(
2
/6 +
2
2 ) + 0.5b
2
+ (1 )(a 1)b
A primary difference between C
i
and

D
i
is that C
i
gives different weights to censored and uncensored
cases. The KL divergence, which is used to assess the inuence of the observed data on predictions of
future observations, does not distinguish between censored and uncensored values.
To contrast the two diagnostics, we rst consider the log-normal model with known variance, for
which C
i
= (

(i )
)

W X(

(i )
), where

W is a diagonal matrix with j th diagonal element
w
j
= {
j
+ (1
j
)
0
( z
j
)}/
2
. Here z
j
= {log(t
j
) x
j

}/ is the j th standardized residual and

0
()
is the derivative of the baseline hazard
0
() = f
0
()/S
0
().
Using the one-step approximation yields a simple and easily interpretable expression for C
i
that
illustrates the distinction between deleting censored and uncensored cases. Let

h
j
= w
j
x
j
(X

W X)
1
x
j
and dene
R
2
i
=
_
_
z
i
_
1

h
i
_
_
2
h
i
1

h
i
and S
2
i
=
_
_
0
( z
i
)
_
0
( z
i
)(1

h
i
)
_
_
2
h
i
1

h
i
.
Then C
i
.
=
i
R
2
i
+ (1
i
)S
2
i
, which reduces to the standard Cook (1977) statistic when there is no
censoring. An uncensored case with z
i
= 0 is not inuential for whereas a censored case can be.
Censored cases with large negative residuals z
i
have little inuence because S
2
i
0, but censored cases
with large positive residuals have the same effect as an uncensored case because {
2
0
( z
i
)/
0
( z
i
)}/ z
2
i
1
as z
i
.
In comparison, the KL divergence (aggregated over the observed covariate values) is
D
i
=
1
2
(

(i )
)
X(

(i )
) = C
i
+
j
_
1
2
w
j
_
(

Y
j

Y
j (i )
)
2
,
where

Y
j
= x
j

and

Y
j (i )
= x
(i )
are the predicted values (on a log scale). An uncensored case has
w
j
= 1/
2
so only censored cases contribute to the second term in

D
i
, which simplies to give
D
i
= C
i
+
1
j
=0
{1

0
( z
j
)}(

Y
j

Y
j (i )
)
2
C
i
.
The inequality follows because

0
() is increasing and bounded between 0 and 1. If there is no censoring
then

D
i
= C
i
. However, the second term in

D
i
could be large if a substantial percentage of cases are
censored, and if holding out the i th case has a noticeable effect on the predicted values for the censored
cases, especially if the censored cases have large negative residuals. Under these conditions,

D
i
and C
i
may highlight different cases as inuential.
If is unknown and there is no censoring then the information matrix is block diagonal under the
log-normal model (i.e.

and are independent) and
C
i
=
1

2
{2n(
(i )
)
2
+ (

(i )
)
X(

(i )
)}
.
=

D
i
.
In this situation C
i
and

D
i
should identify the same cases as potentially inuential. If there is censoring,
C
i
includes a cross-product term in (
(i )
)(

(i )
), which makes a comparison with

D
i
difcult,
regardless of the error distribution.
3.2 Two illustrations of differences between the diagnostics
It is difcult to characterize when C
i
and

D
i
will disagree. We speculate that the potential for differences
is tied to the degree of censoring. We present two brief analyses which show clear differences between

D
i
(aggregated over the observed covariate values) and C
i
in data sets with heavy censoring. In Section 4,
D
i
is tailored to examine the inuence on predictions within a subset of the design space, and in such
settings, discrepancies with C
i
should not be surprising, regardless of the degree of censoring.
As a rst example, consider the following hypothetical sample of 19 lifetimes: 1.5
0
, 24, 42,
43
0
, 77, 89
0
, 105, 194, 270, 290
0
, 309
0
, 325
0
, 446, 503
0
, 561
0
, 643
0
, 1457
0
, 2060
0
, 2879,
where censored lifetimes are superscripted. Figure 1 gives index plots of the exact

D
i
and C
i
for a
regression model with an intercept and scale. Plots are given for the three standard models. The diagnostics
were normalized to have a maximum of one within each series. Observations 2 and 18 are highlighted in
the plots. Observation 18 has the second highest lifetime, and the highest lifetime among the censored
cases, whereas observation 2 has the minimum lifetime among the uncensored cases.
Focusing on the log-normal model, observation 18 has a much larger impact on

D
i
than it has on
C
i
probably because of the extensive censoring and the marked decrease in the estimated mean (the
predicted value for each case) and scale that is obtained by holding this case out. Cooks distance suggests
that observation 2 is much more inuential. Similar results were found for the log-logistic model. For the
Weibull model, observation 18 has the largest

D
i
and C
i
, but observation 2, which has nearly the same
Cooks distance as observation 18, has little effect as measured by the KL divergence.
As a second example, we consider data from a study of risk factors associated with coronary heart
disease, see, Roseman et al. (1975) and Selvin (1995, p. 436). The study involves 35 adult white males
with high cholesterol levels (above 340 mg per 100 ml). The response is the time in days until a coronary
event. Only eight of the responses are not censored.
We tted a log-normal model to the data, using body weight, cholesterol level and cigarette consump-
tion (cigarettes per day) as predictors. Cooks distance and the KL divergence identify observations 15,
19, and 35 as the three most inuential cases, but order the observations differently; see Table 3. Each
of these cases is uncensored. Observation 35, which corresponds to an individual who had an extremely
short response time given his relatively low cholesterol level and cigarette consumption, has the greatest
impact on prediction. Observation 19, which corresponds to an individual with a cholesterol level of 645
(the second highest level is 400) has the greatest inuence on the parameter estimates. Our analysis in
Section 3.1 suggests that this discrepancy between C
i
and

D
i
might be tied to the differential effect that
observations 19 and 35 have on the predicted values for the censored cases. Indeed, these predicted values
change noticeably when case 35 (or case 15) is held out, but change little when case 19 is removed.
Case
C
5 10 15
0
.
0
0
.
4
0
.
8
(a)
Case
5 10 15
0
.
0
0
.
4
0
.
8
(b)
Case
5 10 15
0
.
0
0
.
4
0
.
8
(c)
i
C
i
C
i
Fig. 1. Index plots for test data. The solid line is the KL divergence. The dots are the Cook distances. (a) Log normal;
(b) Log logistic; (c) Weibull.
Table 3. Inuential cases in the
coronary study
Observation C
i

D
i
15 1.05 27.38
19 11.14 13.54
35 7.57 193.85
4. EXAMPLES
4.1 Time Until Abortion in Dairy Cattle
We consider data from a study designed by Dr Mark Thurmond and Dr Sharon Hietala of the University
of California, Davis to examine factors that might affect the time to natural abortion in dairy cattle. The
data set includes the time in days from conception to abortion for 45 aborted cows, and information on
two covariates: IS, an indicator of infection status, (IS = 1 corresponds to cows infected with Neospora
caninum whereas IS = 0 corresponds to non-infected cows) and days open (DO), that is the number of
days between the most recent previous birth and conception. There are no censored observations in the
data set.
The scientists believed that infection status would be the more important variable in terms of predicting
time to abortion, with abortion occurring later in infected animals. They also expected that increasing DO
would slightly increase the time to abortion. The primary goal of this study is to characterize and quantify
the effect of N. caninum on the time to abortion. This is part of an ongoing investigation of N. caninum and
other infectious abortifacients with the ultimate goal of reducing fetal wastage, see Thurmond & Hietala
(1997).
Even though it is known that Neospora infection is a causal agent for abortion, it is of interest to
quantify the distribution of times to abortion among infected and non-infected animals. The ability to
predict the time of abortion for particular animals is of interest since management policy can then be
adapted to take advantage of this knowledge. In particular, knowing that an animal is or is not infected,
and having the corresponding estimated survival curve, the dairy farmer can take preventive measures at
appropriate times before the abortion is expected. Since infected animals are known to abort later than non-
infected animals, the timing of intervention strategies will differ based on knowledge of type of animal.
Thus, if removing a particular case has a large impact on the resulting estimated survival curve, the end
result is a potentially harmful effect on the dairy farmers prognostic ability, which ultimately effects the
efcacy of his/her intervention strategy.
We considered a log-logistic model for the time to abortion. This model provided a much better t
(based on minus twice the log-likelihood) than the Weibull model, and a marginally better t than the
log-normal model. The MLEs for the intercept, scale, and the regression coefcients for IS, and DO, are
3.971, 0.257, 0.371, and 0.0030, with standard errors of 0.150, 0.031, 0.131, and 0.0012, respectively.
At the 5% level, the regression coefcients for the two predictors are signicantly different from zero,
with IS having the larger z-statistic (2.82 for IS versus 2.41 for DO). These results are consistent with our
expectations.
Figures 2(a)(d) give index plots of C
i
,

D
i
,

D
A,i
, and
0.5,(i )
, respectively, for the 45 observations.
Each gure gives a plot based on the exact MLEs for

(i )
and a plot based on the one-step approximation
to

(i )
. The diagnostics

D
i
,

D
A,i
and
0.5,(i )
were aggregated over the observed covariate values.

D
i
was evaluated using numerical integration. The diagnostics that use the one-step approximation to

(i )
are accurate, but tend to overstate the inuence of the most extreme cases. Furthermore, the difference
between C
i
and

D
i
is small across cases, and the second-order approximation to

D
i
is extremely accurate,
regardless of which version of

(i )
is considered.
Each of these diagnostics identify observations 10, 26 and 43 as the most inuential cases. These
three observations have the rst, second, and fourth longest times to abortion (259, 254, and 149 days),
respectively, but correspond to uninfected cows with low numbers of days open (46, 61, and 54 days).
Observation 10 has little inuence on predictive ability. Consideration of Figure 3 shows that the estimated
survival curves for cows with DO = 220 and IS = 0 based on the full and case-10-deleted data are
virtually indistinguishable. The corresponding effect on the curve with IS = 1 is probably not of practical
importance in terms of dairy management strategy (the difference in estimated medians was six days).
The parameter estimates changed by less than one-half a standard error when this case was removed, and
the standard errors changed only slightly, so inferences about are also not sensitive to whether case 10
is included in the analysis. Similar conclusions were reached when observation 26 was held out.
Aparticular interest in the study is to characterize the effect that infection has upon the time to abortion
for cows that are initially unable to conceive for an extended period of time. Figure 3(a) is particularly
relevant to this purpose, and comparison with Figure 3(b) establishes that deleting case 10 is not a cause
for concern. Pursuing this particular interest further, Figures 2(e) and (f) give index plots of

D
i
and
0.5,(i )
assuming that we are interested in predicting the survival time for two cows left open for 220 days, one
of which is infected, and the other of which is not. Observation 22 is the most inuential case here,
followed by observation 21. Observation 22 is an uninfected cow with 219 days open that aborted after
70 days. Figure 3(c) gives the estimated survival curves after holding out observation 22. The change in
Observation
0 10 20 30 40
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
(a)
Observation
0 10 20 30 40
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
(b)
Observation
0 10 20 30 40
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
(c)
Observation
0 10 20 30 40
0
1
2
3
4
5
(d)
D
i
D
i
C
i
Observation
D
0 10 20 30 40
0
.
0
0
.
0
2
0
.
0
4
0
.
0
6
(e)
Observation
0 10 20 30 40
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
(f)
i
Fig. 2. Index plots for the cow abortion data. The solid line is based on exact MLEs. The dots are based on one-step
MLEs. (a) Cooks distance C
i
; (b) prediction diagnostic D
i
; (c) second-order approximation to D
i
; (d) diagnostic for
median survival; (e) D
i
for prediction at two locations; (f) diagnostic for median survival at two locations.
survival curves and medians is more noticeable than when observation 10 was omitted. Indeed, the index
plot for
0.5,(i )
suggests that observation 10 should have relatively little effect on the estimated median
survival times at these two locations. This analysis emphasizes the point that the most inuential cases
for prediction potentially depend on the prediction region and are not necessarily the most inuential for
estimation. The estimated survival probabilities are slightly higher after observation 22 is deleted, and the
increase in estimated median times to abortion are marginally large enough to be of practical importance.
Fetal age in days
P
r
o
b
a
b
i
l
i
t
y
0 50 100 150 200 250
0
.
0
0
.
4
0
.
8
102 148
IS=0
IS=1
(a)
Fetal age in days
P
r
o
b
a
b
i
l
i
t
y
0 50 100 150 200 250
0
.
0
0
.
4
0
.
8
103 154
IS=0
IS=1
(b)
Fetal age in days
P
r
o
b
a
b
i
l
i
t
y
0 50 100 150 200 250
0
.
0
0
.
4
0
.
8
112 158
IS=0
IS=1
(c)
Fig. 3. Estimated fetal survival curves in cow abortion data with DO = 220. (a) Full data; (b) holding out observation
10; (c) holding out observation 22.
There are only three cows in the sample that had not conceived by 200 days. It is not too surprising then
that one or more of these observations might be inuential for estimating the survival curves at 220 days
open. Because a goal of the study is to estimate the survival distributions at such extreme covariate values,
we should consider collecting more data on cows with extended open periods as a means of minimizing
the potential impact that any one observation might have on the predictions.
4.2 Ovarian cancer data
Edmunson et al. (1979) designed a study to assess the effectiveness of various chemotherapy treatments
for women with ovarian cancer. The trial involved 26 women who had minimal residual disease after
having undergone surgery to excise all tumors greater than 2 cm in diameter. The response variable was
the survival time in days following randomization to one of the two chemotherapy treatments that either
used cyclophosphamide alone (Treatment = 1) or use cyclophosphamide combined with adriamycin
(Treatment = 2). In addition to the treatment, possible prognostic factors associated with the survival
times are a patients age, whether the residual disease was completely or partially excised, and a patients
performance status at the start of the trial (good or poor). Clearly one of the important goals of such a
study ought to be to develop survival curve estimates that can be used as prognostic tools for guiding
Observation
C
i
0 5 10 15 20 25
0
.
0
1
.
0
2
.
0
3
.
0
(a)
Observation
C
i
0 5 10 15 20 25
0
.
0
0
.
4
0
.
8
1
.
2
(b)

Observation
C
i
0 5 10 15 20 25
0
.
0
0
.
4
0
.
8
(c)
Observation
D
i
0 5 10 15 20 25
0
2
4
6
8
(d)
Fig. 4. Index plots for the ovarian cancer data. The solid line is based on exact MLEs. The dots are based on one-
step MLEs. (a) Cooks distance C
i
; (b) Cooks distance C
i
for treatment coefcient; (c) Cooks distance C
i
for age
coefcient; (d) prediction diagnostic D
i
.
the treatment of individual patients, and furthermore, to help an individual patient to make life decisions
based on their knowledge of the estimated curve that corresponds to them. Thus, it should be of great
interest to know if removal of a case appreciably alters these estimated curves.
Collett (1994) tted a Weibull model to these data and concluded that a patients age and treatment
were the only signicant predictors of survival time. The MLEs for the intercept, scale, and the regression
coefcients for treatment and age are 10.425, 0.549, 0.561, and 0.079, with standard errors of 1.434,
0.129, 0.340, and 0.020, respectively. Figures 4(a)(d) give index plots of C
i
, C
i
(Trt), C
i
(Age) and

D
i
for the 26 observations ordered by increasing survival time. C
i
(Trt) and C
i
(Age) are Cooks distances for
the treatment and age regression coefcients, respectively.

D
i
was aggregated over the observed covariate
values. As in the analysis of times to abortion, the diagnostics that use the one-step approximation identify
the most inuential cases but tend to overstate their inuence. In contrast to the previous analysis, C
i
and
D
i
differ noticeably, which, to a certain degree, reects the extensive censoring (46%) in the sample.
Although not given here, we note that second-order approximation to

D
i
was accurate.
Figure 4(d) shows that observations 26, 5, and 4 are the most inuential cases for prediction. The
same three cases are highlighted by
0.5,(i )
, except that observation 5 had a slightly greater inuence on
the median survival times than the other two cases. Observation 26 corresponds to a 59 year old woman
given cyclophosphamide combined with adriamycin (Treatment = 2) who was censored at 1227 days.
Observation 4 corresponds to a 74 year old woman given cyclophosphamide alone (Treatment = 1) who
Days
P
r
o
b
a
b
i
l
i
t
y
0 500 1000 1500
0
.
0
0
.
4
0
.
8
1015 579
TRT=1
TRT=2
(a)
Days
P
r
o
b
a
b
i
l
i
t
y
0 500 1000 1500
0
.
0
0
.
4
0
.
8
1060 551
TRT=1
TRT=2
(b)
Days
P
r
o
b
a
b
i
l
i
t
y
0 500 1000 1500
0
.
0
0
.
4
0
.
8
1035 716
TRT=1
TRT=2
(c)
Days
P
r
o
b
a
b
i
l
i
t
y
0 500 1000 1500
0
.
0
0
.
4
0
.
8
808 578
TRT=1
TRT=2
(d)
Fig. 5. Estimated survival curves in ovarian cancer data when age = 56. (a) Full data; (b) holding out observation 4;
(c) holding out observation 5; (d) holding out observation 26.
survived 268 days. These two cases have the longest survival time and highest age, respectively, in the
data set. Observation 5 corresponds to a 43 year old woman given Treatment 1 who survived 329 days.
Some interesting trends emerge from examining the divergences J(

f ,

f
(i )
) for predicting individual
survival times. We found that observation 26 is the most inuential case for predicting the survival time
of future patients that are given the combined treatment, regardless of their age. Observations 5 and 4
are the most inuential cases for predicting the survival time of patients treated with cyclophosphamide
for which Age 62 and Age > 62, respectively. Figure 5, which gives estimated survival curves for a
56 year old woman, illustrates most of these trends. In particular, holding out observation 26 shifts the
survival curve for Treatment 2 considerably to the left, reducing the estimated median survival time from
1015 to 808. Deleting observation 5 shifts the survival curve for Treatment 1 considerably to the right.
Holding out observation 4 has little effect on the survival curves for 56 year olds.
Observations 26 and 4 have the largest inuence on

. The relative impact of these cases on the
regression coefcients is isolated to a single predictor, with observations 26 and 4 having a large potential
impact on the treatment effect and age effect, respectively. Using the full data estimates, the median
survival time for patients given the combined treatment is 75% higher (exp(0.561) = 1.75)) than the
median survival time for patients given cyclophosphamide alone, regardless of their age. If observation 26
is held out, the relative increase in median survival time is reduced to 40% (exp(0.336) = 1.40)). The age
coefcient is 0.096 when observation 4 is held out, so this case has a smaller impact on the age effect
than observation 26 has on the treatment effect.
5. CONCLUDING REMARKS
We proposed the KL divergence as a case deletion diagnostic for prediction of future observations
in the accelerated failure time model. We developed simple approximations to the divergence, and
showed that the approximations were accurate in two examples. A concern with plug-in methods, such
as the prediction diagnostics considered here, is that they fail to account for uncertainty in the estimated
densities. A Bayesian analysis using predictive densities is a natural way to account for the uncertainty
in

and

f . However, given that the predictive density for the survival time of a future observation is
reasonably approximated by the estimative density

f (t ), we expect that our diagnostic will identify the
same cases that would be identied as potentially inuential were this uncertainty taken into account.
ACKNOWLEDGEMENTS
The work of Wes Johnson and Mark Thurmond was supported in part from USDA NRI grant no
98-2517.
APPENDIX
A1. Derivation of the approximation to the KL divergence
Write I (

f ,

f
(i )
) = log(a) + g(a, b), where
g(a, b)
_
+
f
0
(u){log f
0
(u) log f
0
(au + b)} du.
If k and l are either 0 or 1 with k +l = 1, then
k+l
g(a, b)/a
k
b
l
=
_
+
u
k
f
0
(u)H(au + b) du = E{U
k
H(aU + b)},
whereas if k and l are either 0, 1, or 2 with k +l = 2, then
k+l
g(a, b)/a
k
b
l
=
_
+
u
k
f
0
(u)G(au + b) du = E{U
k
G(aU + b)}.
A second-order Taylor series expansion of g(a, b) about the point (a, b) = (1, 0) gives
g(a, b)
.
= g(1, 0) (a 1) E{UH(U)} b E{H(U)}
0.5[(a 1)
2
E{U
2
G(U)} + 2(a 1)b E{UG(U)} + b
2
E{G(U)}].
Noting that g(1, 0) = 0 we get the approximation given in Section 3:
I (

f ,

f
(i )
)
.
= I
A
(

f ,

f
(i )
)
= log(a) E[{(a 1)U + b}H(U)] 0.5E[{(a 1)U + b}
2
G(U)].
A2. Evaluation of the KL divergence for the Weibull model
We sketch the derivations of the exact and approximate KL divergences for the Weibull model. The results
given in Table 2 for the log-normal and log-logistic follow from similar calculations.
For the Weibull, log f
0
(u) = u e
u
, so H(u) = 1 e
u
and G(u) = e
u
. Also, with identifying
Eulers constant, E(U) =
.
= 0.577215, E(e
U
) = 1, E(Ue
U
) = 1 , and E(U
2
e
U
) =
2
/6+
2
2 , see Lawless (1982). A short calculation then gives
I
A
(

f ,

f
(i )
) = (a log(a) 1) + (
2
/6 +
2
2 )(a 1)
2
/2 + 0.5b
2
+ (1 )(a 1)b.
To get the exact KL divergence, note that
log f
0
(u) log f
0
(au + b) = (a 1)u b e
u
+ e
au+b
and E(e
aU
) = (1 + a), and thus
I (

f ,

f
(i )
) = log(a) + (a 1) b 1 + e
b
(1 + a).
A3. Observed information matrix and one-step approximations to MLEs
Let z
j
= {log(t
j
) x
j
}/, c
j
= (z
j
a
j

j
)/ and v
j
= a
j
/, where
a
j
=
j
H(z
j
) + (1
j
)
0
(z
j
),
and
0
(z
j
) = f
0
(z
j
)/S
0
(z
j
) is the baseline hazard function. Also, dene w
j
= (a
j
/z
j
)/
2
, d
j
=
a
j
/
2
+ z
j
w
j
, and e
j
= (2z
j
a
j

j
)/
2
+ z
2
j
w
j
. Then the score function and observed information
matrix are given by
l() =
n
j =1
l
j
() =
_
l
_
=
_
n
j =1
v
j
x
j
n
j =1
c
j
_
and
l(), respectively, where
l() =
n
j =1
l
j
() =
_

2
l
2
l
2
l
2
l
_
=
_
n
j =1
w
j
x
j
x
n
j =1
d
j
x
j
n
j =1
d
j
x
n
j =1
e
j
_
,
see Kalbeisch & Prentice (1980, page 55).
A one-step approximation to

(i )
is given by

(i )
.
=

+ {
l(
)

l
i
(
)}
1
l
i
(
) which reduces to
_

(i )

(i )
_
.
=
_

_

j =i
w
j
x
j
x
j =i

d
j
x
j
j =i

d
j
x
j =i
e
j
_
1 _
v
i
x
i
c
i
_
,
where, for example, w
j
= w
j
(
).
REFERENCES
AALEN, O. O. (2000). Medical statisticsno time for complacency. Statistical Methods in Medical Research 9,
3140.
BELSLEY, D. A., KUH, E. AND WELSCH, R. E. (1980). Regression Diagnostics: Identifying Inuential Data and
Sources of Collinearity. New York: Wiley.
BUCKLEY, J. AND JAMES, I. (1979). Linear regression with censored data. Biometrika 66, 429436.
CARLIN, B. P. AND POLSON, N. G. (1991). An expected utility approach to inuence diagnostics. Journal of the
American Statistical Association 86, 10131021.
CHRISTENSEN, R. AND JOHNSON, W. (1988). Modelling accelerated failure time with a Dirichlet process.
Biometrika 75, 793704.
CHRISTENSEN, R., JOHNSON, W. O. AND PEARSON, L. M. (1992). Predictive inuence measures for spatial linear
models. Biometrika 79, 583591.
COLLETT, D. (1994). Modelling Survival Data in Medical Research. London: Chapman and Hall.
COOK, R. D. (1977). Detection of inuential observations in linear regression. Technometrics 19, 1518.
COOK, R. D. (1986). Assessment of local inuence (with discussion). Journal of the Royal Statistical Society, Series
B 48, 133169.
COX, D. R. (1972). Regression models and life tables (with discussion). Journal of the Royal Statistical Society,
Series B 74, 187220.
EDMUNSON, J. H., FLEMING, T. R., DECKER, D. G., MALKASIAN, G. D., JORGENSON, E. O., JEFFRIES, J.
A., WEBB, M. J. AND KVOLS, L. K. (1979). Different chemotherapeutic sensitivities and host factors affecting
prognosis in advanced ovarian carcinoma versus minimal residual disease. Cancer Treatment Reports 63, 241247.
ESCOBAR, L. A. AND MEEKER, W. Q. (1992). Assessing inuence in regression analysis with censored data.
Biometrics 48, 507528.
GEISSER, S. (1993). Predictive Inference: An Introduction. New York: Chapman and Hall.
JOHNSON, W. O. AND GEISSER, S. (1983). A predictive view of the detection and characterization of inuential
observations in regression analysis. Journal of the American Statistical Association 78, 137144.
JOHNSON, W. O. (1985). Inuence measures for logistic regression: another point of view. Biometrika 72, 5965.
KALBFLEISCH, L. D. AND PRENTICE, R. L. (1980). The Statistical Analysis of Failure Time Data. NewYork: Wiley.
KOUL, H., SUSARLA, V. AND VAN RYZIN, J. (1981). Regression analysis with randomly right censored data. Annals
of Statistics 8, 12761288.
KUO, L. AND MALLICK, B. (1997). Bayesian semiparametric inference for the accelerated failure-time model.
Canadian Journal of Statistics 25, 457472.
LAWLESS, J. F. (1982). Statistical Models and Methods for Lifetime Data. New York: Wiley.
LAWRANCE, A. J. (1991). Local and deletion inuence. In Stahel, W. and Weisberg, S. (eds), Directions in Robust
Statistics and Diagnostics, New York: Springer.
MCCULLOCH, R. E. (1989). Local model inuence. Journal of the American Statistical Association 84, 473478.
MILLER, R. G. (1976). Least squares regression with censored data. Biometrika 63, 449464.
ROSEMAN, R. H., BRAND, R. J. AND JENKINS, C. C. (1975). Coronary heart disease in the Western Collabora-
tive Group Study. Journal of the American Medical Association 223, 872877.
SELVIN, S. (1995). Practical Biostatistical Methods. New York: Duxbury.
SOOFI, E. (1994). Capturing the intangible concept of information. Journal of the American Statistical Association
89, 12431254.
THURMOND, M. C. AND HIETALA, S. K. (1997). Effect of congenitally acquired Neospora caninum infection on
risk of abortion and subsequent abortions in dairy cattle. American Journal of Veterinary Research 58, 13811385.
WALKER, S. AND MALLICK, B. K. (1999). A Bayesian semiparametric accelerated failure time model. Biometrics
55, 477483.
WEISSFELD, L. A. AND SCHNEIDER, H. (1990a). Inuence diagnostics for the normal linear model with censored
data. Australian Journal of Statistics 32, 1120.
WEISSFELD, L. A. AND SCHNEIDER, H. (1990b). Inuence diagnostics for the Weibull model t to censored data.
Statistics and Probability Letters 9, 6773.
[Received July 28, 2000; revised August 9, 2001; accepted for publication November 14, 2001]

Predictive Influence in The Accelerated Failure Time Model

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Predictive Influence in The Accelerated Failure Time Model

Загружено:

Авторское право:

Доступные форматы

Biostatistics (2002), 3, 3, pp.

Department of Mathematics and Statistics, University of New Mexico, Albuquerque,

To whom correspondence should be addressed

, ) be the maximum likelihood estimate

is the percentile of the baseline distribution and x

) is the observed information matrix evaluated at

) is given in the Appendix.

in the estimated percentile obtained

l(), respectively, where

Вам также может понравиться