Bayesian Analysis Hazard

Lifetime Data Anal
DOI 10.1007/s10985-010-9181-x
Bayesian analysis for monotone hazard ratio
Yongdai Kim · Jin Kyung Park · Gwangsu Kim
Received: 9 April 2009 / Accepted: 2 July 2010

© Springer Science+Business Media, LLC 2010
Abstract We propose a Bayesian approach for estimating the hazard functions under
the constraint of a monotone hazard ratio. We construct a model for the monotone
hazard ratio utilizing the Cox’s proportional hazards model with a monotone time-
dependent coefficient. To reduce computational complexity, we use a signed gamma
process prior for the time-dependent coefficient and the Bayesian bootstrap prior for
the baseline hazard function. We develope an efficient MCMC algorithm and illustrate
the proposed method on simulated and real data sets.
Keywords Bayesian bootstrap · Censoring · Monotone hazard ratio · Order

restriction · Proportional hazards model
1 Introduction
Estimation and inference of two survival functions S1 and S2 under certain order restric-
tions have received much attention in survival analysis. The most popular order restric-
tion is the stochastic ordering, which assumes that S1 (t) ≥ S2 (t) for all t ∈ [0, ∞).
The nonparametric estimator of the survival functions under the stochastic ordering
were found by Brunk et al. (1966) for complete observations and Dykstra (1982)
for right censored data, and asymptotic properties were studied by Paestgaard and
Huang (1996). Bayesian approaches for stochastic ordering have been proposed by
Arjas and Gasbarra (1996) and Gelfand and Kottas (2001). Also, uniform stochastic
Y. Kim (B) · G. Kim

Seoul National University, Seoul, Korea
e-mail: ydkim0903@gmail.com
J. K. Park
International Vaccine Institute, Seoul, Korea
123
Yongdai Kim et al.
ordering, which assumes that S1 (t)/S2 (t) is nonincreasing/nondecreasing in t, has

been considered by Dykstra et al. (1991) and Mukerjee (1996).
Statistical inference under order restriction on hazard functions has also been con-
sidered in the context of assessing the validity of the proportional hazards assumption.
Gill and Schumacher (1987) and Deshpande and Sengupta (1995) proposed test sta-
tistics for assessing the hypothesis of the proportional hazards against the monotone
hazard ratio alternative, and Sengupta et al. (1998) developed a testing procedure for
the increasing cumulative hazard ratio alternative. However, these methods do not give
an estimation of the hazard ratio under order restriction.
In this paper, we propose a Bayesian approach for estimating the hazard functions
under the monotone hazard ratio constraint. We construct a model for the monotone
hazard ratio using the Cox’s proportional hazards model with a time-dependent coef-
ficient that is monotone. An advantage of this model is that we can simultaneously
estimate the monotone hazard ratio and assess the validity of the proportional hazards
assumption against the monotone hazard ratio alternative.
We utilize a signed gamma process prior for the monotone hazard ratio. For the
prior of the baseline hazard function, we could use gamma process (Kalbfleisch 1978;
Kim and Lee 2003a) and beta process (Laud et al. 1998; Kim and Lee 2003a) pri-
ors. Such priors, however, require extensive computation for obtaining the posterior
because there are two nonparametric priors: one for the monotone hazard ratio and the
other for the baseline hazard function. To reduce the computational burden, we utilize
the Bayesian bootstrap (BB) prior proposed by Kim and Lee (2003b). The BB prior
makes the problem conceptually parametric and yields a much simpler MCMC algo-
rithm to compute, while still retaining the flexibility of nonparametric priors. Also,
Kim and Lee (2003b) showed that the posterior obtained with the BB prior closely
approximates the full Bayesian posterior with gamma or beta processes priors.
The paper is organized as follows. In Sect. 2, the model and prior are presented. In
Sect. 3, we first review the BB approach for the proportional hazards model and then
develop an efficient MCMC algorithm for calculating the BB posterior numerically. In
Sect. 4, we illustrate the proposed method on various data sets. In Sect. 5, we present
concluding remarks.
2 Model and prior
Let (xsi , δsi ), s = 1, 2, i = 1, . . . , n s be observations of pairs of right censored right-

censored time and censoring indicator. That is, xsi = min{tsi , csi } and δsi = I (tsi ≤
csi ) where xsi and csi are survival and censoring times, respectively.
To model the monotone hazard ratio assumption, we propose the following propor-
tional hazards model with a time-dependent coefficient: the hazard functions for the
groups s = 1, 2 are given as
λ1 (t) = λ(t)
and
123
λ2 (t) = exp (γ0 + γ1 H (t)) λ(t)
where γ0 ∈ (−∞, ∞), γ1 ∈ {−1, 0, 1} and H (·) is a nondecreasing nonnegative func-

tion with H (0) = 0. Note that the hazard ratio is monotonically increasing, constant,
or monotonically decreasing when γ1 is 1,0 or −1, respectively. Hence, we can assess
the validity of the proportional hazards assumption using the posterior probability of
γ1 being 0. Also, we can estimate the monotone hazard ratio by estimating γ0 , γ1 , and
H , as the hazard ratio is given as
λ2 (t)/λ1 (t) = exp (γ0 + γ1 H (t)) .
Note that the hazard ratio is modeled nonparametrically, as H is completely unspeci-

fied.
Remark Another advantage of the proposed model is that we could easily incorporate
other covariates z, if they exist in the model, by setting

λ1 (t|z) = exp(z β)λ(t)
and

λ2 (t|z) = exp z β + γ0 + γ1 H (t) λ(t).
This is useful if we want to know whether the risk of one group decreases faster than
that of the other group after adjusting for other risk factors such as age, gender, etc.
For prior, we use standard parametric priors for γ0 and γ1 and a nonparametric prior
for H. A priori, we let γ0 ∼ N (0, σ02 ) and Pr(γ1 = k) = 1/3 for k = −1, 0, 1. For H,
a priori, we let H be a gamma process with mean H0 and precision parameter c > 0.
That is, H is a nondecreasing stochastic process on [0, ∞) with independent incre-
ments such that H (0) = 0 and H (t) − H (s), s ≤ t follows a gamma distribution with
mean H0 (t) − H0 (s) and variance (H0 (t) − H0 (s))/c. See Lo (1982) and Kalbfleisch
(1978) for details of gamma processes. To reduce computational complexity, we use
the BB prior for λ, which is explained in detail in Sect. 3.
3 Posterior: Bayesian bootstrap approach
In this section, we develop an efficient MCMC algorithm to calculate the BB posterior

distribution. We first review the BB approach proposed by Kim and Lee (2003b) and
present the corresponding MCMC algorithm.
3.1 Bayesian bootstrap for the proportional hazards model: review
The main idea of the BB approach for the proportional hazards model is to approxi-
mate the full Bayesian posterior by the BB posterior that is proportional to the product
123
Yongdai Kim et al.
of the empirical likelihood and prior. Let (x1 , δ1 , z 1 (·)), . . . , (xn , δn , z n (·)) be obser-
vations where xi are right-censored times (i.e., minimum of survival and censoring
times), δi are censoring indicators, and z i (·) are (time-dependent) covariates. Under
the proportional hazards model given as

λ(t|z) = exp(z(t) β)λ(t)
where λ(t|z) is the hazard function of the survival time with covariate z, the likelihood
function of θ = (β, λ(·)) is
⎛ x ⎞
n
δi i

L(θ ) = exp(z i (xi ) β)λ(xi ) exp ⎝− exp(z i (s) β)λ(s)ds ⎠
i=1 0
⎛ ⎞
n
δi xi

= exp(z i (xi ) β)d(xi ) exp ⎝− exp(z i (s) β)d(s)⎠ , (1)
i=1 0

t
where (t) = 0 λ(s)ds is the cumulative hazard function. Let q be the number of
distinct, uncensored observations, and let 0 < t1 < · · · < tq be the correspond-
ing ordered, uncensored observations. Then, the empirical likelihood is obtained by
assuming that is a step function having jumps only at t1 , . . . , tq and replacing d(t)
by (t) = (t) − (t−) in (1), which results in
⎛ ⎞
n
δi

L E (θ ) = exp(z i (xi ) β) (xi ) exp ⎝− exp(z i (tk ) β) (tk )⎠. (2)
i=1 k:tk ≤xi
For details of the empirical likelihood (2), see Andersen et al. (1993). Finally, the BB
posterior of θ is defined to be proportional to the product of the empirical likelihood
and prior.
Remark There is an alternative empirical likelihood called the binomial form empir-
ical likelihood. See Kim and Lee (2003b) for details. An advantage of the binomial
form is that the resulting BB posterior can be obtained as a limit of full Bayesian
posteriors. However, the computation is more difficult, and the BB posterior may not
be proper. Therefore, we do not consider the binomial form empirical likelihood in
this paper.
An advantage of the BB approach is that the dimension of parameter, θ , is finite
because we discretize to a step function with finitely many jumps. That is, the
parameters in the empirical likelihood are β and { (tk ), k = 1, . . . , q}, and hence,
the posterior distribution can be obtained easily using Bayes theorem.
A technical difficulty in the BB approach is the choice of the prior for { (tk ), k =
1, . . . , q}. For this, Kim and Lee (2003b) proposed the following improper prior (BB
prior):
123

q
1
π( ) ∝ , (3)
(tk )
k=1
and showed that the resulting posterior is always proper, approximates the full Bayes-
ian posterior well, and has desirable large sample properties. It is interesting to note
that the marginal BB posterior of β with the prior (3) turns out to be proportional to
the Cox’s partial likelihood times prior.
Remark The BB approach does not require prior information on , which may be a
disadvantage when we have prior information. However, we could incorporate prior
information to the BB posterior by choosing the prior of accordingly. Suppose
a priori follows a gamma process with mean 0 and precision parameter cλ > 0.
Given that we could think of (tk ) as an approximation of (tk ) − (tk−1 ), we
could incorporate the prior information into the BB posterior by choosing the BB prior
as

q
π( ) ∝ ( (tk ))cλ (0 (tk )−0 (tk−1 ))−1 exp (−cλ (tk )) . (4)
k=1
Remark A similar approach to the BB is to assume a piecewise constant hazard func-

tion. That is, λ(t) is given by

m
λ(t) = λk I (sk−1 < t ≤ sk )
k=1
for some sequence 0 = s0 < s1 < s2 < · · · < sm . See, for example, Arjas and
Gasbarra (1996) and Ibrahim et al. (2001). Nonetheless, we use the BB approach
because it has more sound theoretical backgrounds (at least asymptotically) and pro-
vides a simpler MCMC algorithm. In contrast, it is not easy to choose the break
points s1 , . . . , sm in the piecewise constant hazard model, and the computation of the
posterior would be more difficult.
3.2 Bayesian bootstrap posterior
The parameter in the model is θ = (γ0 , γ1 , H, ). The likelihood of the proposed
model is
ns
2 δsi
L(θ ) = exp (γ0 + γ1 H (xsi )) I (s=2) d(xsi )
s=1 i=1
⎡ ⎤
xsi
exp ⎣− exp (γ0 + γ1 H (u)) I (s=2) d(u)⎦ .
0
The full Bayesian computation is extremely hard, as the likelihood involves terms like
123
Yongdai Kim et al.
t
exp(γ1 H (s))d(s),
0
which require the knowledge of sample paths of both H (t) and (t). To resolve this
problem, we employ the BB approach as follows:
Let 0 < t1 < t2 < · · · < tq be the corresponding ordered distinct uncensored
survival times among the pooled sample, and let R(t) = {(s, i) : xsi ≥ t} and
D(t) = {(s, i) : xsi = t, δsi = 1}. Let (tk ) = (tk ) − (tk −) = λk , and we
assume that (t) = tk ≤t λk . Then, the empirical likelihood of the proposed model
becomes
⎛ ⎞

q
d(t )

L E (θ ) = λ k exp ⎝
k (γ0 + γ1 H (tk ))⎠
k=1 (2,i)∈D(tk )
⎧ ⎫
⎨ ⎬
× exp −λk exp (γ0 + γ1 H (tk )) I (s=2)
⎩ ⎭
(s,i)∈R(tk )
where d(t) is the cardinality of D(t). For prior of λk s, we use the BB prior

q
1
π(λ) = ,
λk
k=1
as in (3) where λ = (λ1 , . . . , λq ). Then, the BB posterior of θ is given by
π B B (θ |Data) ∝ L E (θ )π(θ ),
where π(θ ) = π(γ0 )π(γ1 )π(H )π(λ).
3.3 MCMC algorithm
We use a Gibbs sampler algorithm in which the parameters γ0 , γ1 , λ and H are gen-
erated sequentially from the conditional BB posteriors. We can easily generate γ0 and
γ1 using the Metropolis-Hastings (MH) algorithm with the following conditional BB
posterior distributions:
⎛ ⎞

q
π(γ0 |γ1 , λ, H, Data) ∝ exp ⎝γ0 1⎠
k=1 (2,i)∈D(tk )
⎡ ⎤

q
exp ⎣− exp(γ0 ) λk exp(γ1 H (tk )) 1⎦ π(γ0 ),
k=1 (2,i)∈R(tk )
(5)
123
⎛ ⎞

q
π(γ1 |γ0 , λ, H, Data) ∝ exp ⎝γ1 H (tk ) 1⎠
k=1 (2,i)∈D(tk )
⎡ ⎤

q
exp ⎣− exp(γ0 ) λk exp(γ1 H (tk )) 1⎦ π(γ1 ).
k=1 (2,i)∈R(tk )
(6)
Also, the conditional BB posterior distribution of λk given γ0 , γ1 , H and data is a

gamma distribution with mean αk /βk and variance αk /βk2 , where αk = d(tk ) and

βk = exp (γ0 + γ1 H (tk )) I (s=2) . (7)
(s,i)∈R(tk )
The difficult part is to generate H from the conditional BB posterior. To gen-

erate H , we use the Gibbs sampler algorithm with the acceptance-rejection (AR)
sampling technique (Ripley 2006). Note that the empirical likelihood depends on
H through H (t1 ), . . . , H (tq ), and so it suffices to generate W = (W1 , . . . , Wq )
from the conditional posterior where Wk = H (tk ) − H (tk−1 ) and H (t0 ) = 0.
In applying the Gibbs sampler algorithm to generate W, we need to generate Wk
from its conditional distribution given γ0 , γ1 , λ, W (−k) and data where W (−k) =
(W1 , . . . , Wk−1 , Wk+1 , . . . , Wq ).
Identifiability issues arise. First, γ0 and W1 are not identifiable in the empirical like-
lihood, whereas γ0 +W1 is identifiable. Other unidentifiable quantities in the empirical
likelihood are Wk for k > p where
p = min{max{x1i : δ1i = 1}, max{x2i : δ2i = 1}}.
Note that Wk for k > p are not used in the empirical likelihood when p = max{x2i :
δ2i = 1}, as they affect the empirical likelihood through γ0 + γ1 Wk + log λk when
p = max{x1i : δ1i = 1}, in which case Wk and λk are not identifiable by the empirical
likelihood. To avoid these identifiability issues, we let W1 = 0 and Wk = 0 for k > p,
which is equivalent to using H0∗ instead of H0 in the prior parameter of the gamma
process where H0∗ (t) = 0 for t < t1 , H0∗ (t) = H0 (t) − H0 (t1 ) for t1 ≤ t ≤ t p and
H0∗ (t) = H0 (t p ) for t > t p .
We now explain how to generate Wk from its conditional posterior distribution. Let
(−l)
Hk = H (tk ) − Wl . Then, the conditional posterior distribution of Wl for 2 ≤ l ≤ p
given others = (γ0 , γ1 , λ, W (−l) , Data) is given as
⎛ ⎞

q
π(Wl |others) ∝ exp ⎝Wl γ1 1⎠
k=l (2,i)∈D(tk )
⎧ ⎡ ⎛ ⎞⎤⎫
⎨
q ⎬
(−l) ⎝
× exp − exp(γ1 Wl ) ⎣ λk exp γ0 + γ1 Hk 1⎠⎦
⎩ ⎭
k=l (2,i)∈D(tk )
123
Yongdai Kim et al.
× Wlvl −1 exp(−cWl )I (Wl ≥ 0),
where vl = c(H0∗ (tl ) − H0∗ (tl−1 )). Let

q
αl = 1
k=l (2,i)∈D(tk )
and
⎛ ⎞

q
(−l) ⎝
βl = λk exp γ0 + γ1 Hk 1⎠ .
k=l (2,i)∈D(tk )
Then, the conditional posterior distribution of Wl is simplified as
π(Wl |others) = h l (exp(γ1 Wl )) Wlvl −1 exp(−cWl )I (Wl ≥ 0), (8)
where
h l (y) = y αl exp(−βl y). (9)
Note that the maximum of h l (exp(γ1 Wl )), say h l∗ , on Wl ∈ (0, ∞) can be easily
calculated and we can easily generate a random number from the gamma distribution.
Hence, we can use the AR sampling technique for generating Wl from (8) as follows:
1. Generate W ∼ Gamma(vl , c) where Gamma(a, b) is the gamma distribution with
mean a/b and variance a/b2 .
2. Generate U ∼ Uniform(0, 1).
3. Let Wl = W if h l (exp(γ1 W ))/ h l∗ ≥ U. Otherwise, go to 1.
The MCMC algorithm for the BB posterior can be summarized as follows:
• Sampling γ0 given γ1 , λ, H and data: We use the random-walk MH algorithm. Let
γ0∗ be a candidate value generated from a random-walk kernel q(γ0 , γ0∗ ). Then,
the acceptance rate is
π(γ0∗ |γ1 , λ, H, Data)q(γ0∗ , γ0 )

π(γ0 |γ1 , λ, H, Data)q(γ0 , γ0∗ )
where π(γ0 |γ1 , λ, H, Data) is in (5).

• Sampling γ1 given γ0 , λ, H and data: We generate γ1 = h for h ∈ {−1, 0, 1} with
probability ph where
π(h|γ0 , λ, H, Data)
ph =
l∈{−1,0,1} π(l|γ0 , λ, H, Data)
and π(h|γ0 , λ, H, Data) is in (6).
123
• Sampling λ given γ0 , γ1 , H and data: For k = 1, . . . , q, generate λk from

Gamma(αk , βk ) where α = d(tk ) and βk is in (7).
• Sampling H given γ0 , γ1 , λ and data:
− Let Wk = H (tk ) − H (tk−1 ). Let W1 = 0 and Wk = 0 for k > p.

− For l = 2, . . . , p
1. Generate W ∼ Gamma(vl , c).
2. Generate U ∼ Uniform(0, 1).
3. Let Wl =W if h l (exp(γ1 W ))/ h l∗ ≥ U where h l is in (9). Otherwise, go to 1.
− Let H (t) = k:tk ≤t Wk .
4 Numerical experiments
In this section, we illustrate the proposed model on various data sets. For prior param-
eters, we let σ02 = 10, H0 (t) = log(1 + t) and c = 1.
4.1 Simulation 1
We let n 1 = n 2 = 50 and generated survival times of the first group from the expo-
nential distribution with mean 20, and those of the second group from the exponential
distribution with mean 30. Censoring times are generated from the exponential dis-
tribution such that the censoring probability is 0.3. Note that the model used for the
simulation satisfies the proportional hazards assumption. We obtained the posterior
distributions of θ using the proposed MCMC algorithm. We iterated the MCMC algo-
rithm 100,000 times after a burn-in period of 10,000 iterations. Then, we collected
2,000 samples at every 50th iteration after the burn-in for further analysis. We used
a relatively extreme thinning (every 50th iteration) to make the samples almost inde-
pendent, making further analysis easier.
Figure 1 gives the traceplots and histograms of γ0 and H (t) and (t) at t = 20 (the
mean survival time of the first group) generated from the MCMC algorithm. The pro-
posed MCMC algorithm converges well, and the posterior densities have nice shapes
(at least, they are unimodal). Figure 2a shows how the empirical probability of γ1 = 0,
calculated based on the generated samples from the MCMC algorithm, converges. The
two dashed lines in the figure represent the 95% confidence interval obtained from the
samples, assuming that they are independent. With the exception of the early stage of
the iteration, the empirical probabilities lie inside the confidence limits, which implies
that the MCMC algorithm converges well to its stationary distribution for γ1 , too.
Figure 2b displays the posterior probabilities of γ1 , which supports the proportional
hazards model because it has the largest value when γ1 = 0.
Figure 3 shows the acceptance probability of Wk for k = 2, . . . , p in the AR
sampling step inside the MCMC algorithm. The smallest acceptance probability is
around 30%, which implies that the AR sampling step does not significantly hamper
the overall computing time of the MCMC algorithm.
Table 1 compares the Bayes estimator and 90% (equal-tail) posterior probability
interval of γ0 with those obtained from the proportional hazards model (i.e., γ1 = 0)
123
Yongdai Kim et al.
10
(a)
2
0.6
8
1
0.5
6
Lambda(20)
0
gamma0
H(20)
0.4
4
−1
0.3
−2
0.2
−3
0 1000 2000 0 1000 2000 0 1000 2000

the number of iteration the number of iteration the number of iteration
(b)
1.2
5
0.30
1.0
4
0.25
0.8
0.20
3
density
density
density
0.6
0.15
2
0.4
0.10
1
0.2
0.05
0.00
0.0
−3 −1 1 0 2 4 6 8 0.1 0.3 0.5 0.7

gamma0 H(20) Lambda(20)
Fig. 1 Panel a shows the traceplots of γ0 , H (20) and (20), and panel b shows the corresponding histo-
grams
and corresponding frequentist counterpart. The posterior interval based on the pro-
posed model is much wider than the other two intervals. This is because there is
additional uncertainty in estimating H for the proposed model. However, all intervals
contain the true value −0.4055.
We conducted additional simulations to investigate the effect of the censoring prob-
ability and sample sizes on the posterior distribution. Table 2 presents the posterior
distributions of γ1 for various values of the censoring probability and sample sizes.
The results are stable and consistently support the proportional hazards model.
123
(a) (b)
The empirical posterior probability
1.0
0.8
0.9
Posterior probability
0.6
of gamma1
0.8
0.4
0.7
0.2
0.6
0.5
0.0
0 500 1000 1500 2000 −1 0 1
the number of iteration gamma1
Fig. 2 Panel a shows the traceplots of the empirical posterior probability of γ1 = 0 (solid) with the 95%
confidence limits (dashed), and panel b present the posterior probabilities of γ1
1.0
Acceptance probability
0.8
0.6
0.4
0.2
0.0
2 5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 50
k
Fig. 3 Acceptance probabilities of Wl for l = 2, . . . , p in the AR algorithm
Table 1 Bayes estimator and 90% posterior probability interval of γ0 of the proposed model (MHR Mono-
tone hazard ratio) with those obtained from the proportional hazards (PH) model and corresponding frequ-
entist results
Method Point estimate 90% Interval
BB with the MHR model −0.6438 (−1.1583, −0.0898)

BB with the PH model −0.6638 (−1.0805, −0.2382)
MLE with the PH model −0.6600 (−1.0716, −0.2484)
Table 2 The posterior probabilities of γ1 = −1, 0 and 1 for various values of the censoring probability
and sample sizes in simulation 1
(n 1 , n 2 ) 30% censoring 50% censoring
(50,50) (0.0610, 0.8955, 0.0435) (0.0570, 0.8120, 0.1310)

(100,50) (0.2215, 0.7595, 0.0190) (0.0555, 0.7980, 0.1465)
(50,100) (0.0615, 0.6580, 0.3255) (0.0575, 0.7460, 0.1965)
(100,100) (0.0010, 0.7590, 0.2400) (0.0315, 0.8855, 0.0830)
123
Yongdai Kim et al.
(a) (b) Bayes (c) Bayes
4
0.8 True True
90%PB 90%PB
Cumulative lambda
Posterior probabilty
4
log(Hazard ratio)
3
0.6
2
0.4
0
0.2
1
−2
0.0
−4
0
gamma1 0 20 40 60 80 100 0 20 40 60 80 100
time time
Fig. 4 Panel a draws the posterior probability of γ1 , and panel b and c presents Bayes estimators of the
log hazard ratio and with the pointwise 90% probability bands (PB) and true functions, respectively
Table 3 The posterior probabilities of γ1 = −1, 0 and 1 for various values of the censoring probability
and sample sizes in Simulation 2
(n 1 , n 2 ) 30% censoring 50% censoring
(50,50) (0.9550, 0.0445, 0.0005) (0.9505, 0.0465, 0.0030)

(100,50) (1.0000, 0.0000, 0.0000) (0.9975, 0.0025, 0.0000)
(50,100) (0.9915, 0.0085, 0.0000) (0.8630, 0.1345, 0.0025)
(100,100) (1.0000, 0.0000, 0.0000) (0.9995, 0.0005, 0.0000)
4.2 Simulation 2
We let λ2 (t) = αt α−1 βλ1 (t). The hazard ratio is increasing monotonically when
α > 1 and decreasing when α < 1. We set α = 0.5 √ and λ1 (t) = 1/20 to have a
monotonically decreasing hazard ratio, and β = 20/ 10 to make the mean survival
time of the second group equal to 20. The other set-ups such as sample sizes, censor-
ing probability, the number of iterations of the MCMC algorithm etc., are the same as
those for the simulated data set 1.
The posterior probability of γ1 is given in Fig. 4a, which strongly supports the
true model, monotonically decreasing hazard ratio. Figure 4b and c present the Bayes
estimator and corresponding pointwise 90% posterior probability bands of the log
hazard ratio (γ0 + γ1 H (t)) and cumulative baseline hazard function (t) with the true
ones, respectively. Note that the true functions lie inside the probability bands, imply-
ing that the proposed method estimates the monotone hazard ratio and cumulative
baseline hazard function well.
As is done for Simulation 1, Table 3 presents the posterior probabilities of γ1 for
various values of the censoring probability and sample sizes. All of the results strongly
indicate that the hazard ratio is decreasing.
4.3 Prior sensitivity
Priors need to be specified for three parameters γ0 , γ1 and H . Since γ1 has a value
among {−1, 0, 1}, the uniform prior is a natural one. For γ0 , unless the prior variance
123
Table 4 The posterior probabilities of γ1 = −1, 0 and 1 for various specifications of H0 and c
H0 (t) c Simulation 1 Simulation 2
log(1 + t) 0.1 (0.2778, 0.4789, 0.2433) (1.0000, 0.0000, 0.0000)

1.0 (0.0498, 0.9433, 0.0069) (1.0000, 0.0000, 0.0000)
10 (0.0000, 1.0000, 0.0000) (1.0000, 0.0000, 0.0000)
√
t 0.1 (0.1370, 0.7318, 0.1312) (1.0000, 0.0000, 0.0000)
1.0 (0.0000, 1.0000, 0.0000) (1.0000, 0.0000, 0.0000)
10 (0.0000, 1.0000, 0.0000) (0.0000, 1.0000, 0.0000)
is very small, the posterior is not seriously affected by the choice of prior. However,
the specification of the prior of H needs a special consideration.
There are two prior parameters, prior mean H0 and precision parameter c. In the
previous two subsections, we let H0 (t) = log(1 + t), which is a concave function.
The concavity of H0 represents the prior belief that the degree of monotonicity of the
hazard ratio decreases as time increases, and we have seen that this choice of prior
works well in various cases. In this view, an alternative choice of H0 (t) would be t a
for a ∈ (0, 1).√ Table 4 compares the posterior probabilities of γ1 when H0 is either
log(1 + t) or t with various values of c. We set (n 1 , n 2 ) = (100, 100) and the
censoring probability 30%.
When c = 0.1, the probability of γ1 = 0 is relatively smaller than the other cases
for Simulation 1. This is because the model is poorly identifiable when H is very
small, if the true model has a constant hazard ratio. For an extreme case, γ1 is not
identifiable when H (t) ≡ 0. When c is small, the prior variance of H becomes
√ large,
and hence, γ1 becomes poorly identifiable. When c = 10 and H0 (t) = t, the result
is completely misleading for Simulation 2. Note that a priori H (t) is a Gamma random
variable with mean H0 (t) and variance H0 (t)/c. Hence, the prior mass is concentrated
too much around the prior mean when H0 (t) and c are large; therefore, and so when
the true hazard ratio is different from the prior mean, the posterior would support the
constant hazard ratio model rather than the prior mean H0 . Based on the simulation
results, we suggest selecting the prior parameters such that H0 does not increase too
fast and c is not too large or small.
4.4 Data analysis
In this section, we analyzed two real data sets. The first data set, “Leukemia,” is the
leukemia patients remission time data analyzed by Laud et al. (1998). The data con-
sisted of 42 patients, divided into control and treatment groups, with 21 patients each.
The treatment and control groups are given 6-mercaptopurine (or 6-MP) and placebo,
respectively. Of the 42 observations, 12 are censored. The second data set “Ovarian,” is
the time from treatment to progression of disease of 35 patients with stage II (n 1 = 15)
or IIA (n 2 = 20) ovarian cancer. Of 35 observations, 13 are censored. The data set
was analyzed by Gill and Schumacher (1987).
123
Yongdai Kim et al.
(a) (b)

1.0
1.0
0.8
0.8
of gamma1
of gamma1
0.6
0.6
0.4
0.4
0.2
0.2
0 500 1000 1500 2000 0 500 1000 1500 2000
the number of iteration the number of iteration
Fig. 5 The panels a and b show the traceplots of the empirical posterior probability of γ1 = 0, −1 (solid)
with the 95% confidence limits (dashed) for the Leukemia and Ovarian data sets, respectively
As is done in Simulation 1, Fig. 5 presents the traceplots of γ0 and Figs. 6 and 7

present the traceplots and corresponding histograms of γ0 , H (10) and (10) for the
Leukemia and Ovarian data sets. It seems that there is no problem in the convergence
of the MCMC algorithms.
Figure 8 presents the posterior probabilities of γ1 for the two data sets, and Table 5
gives the p-values of the three frequentist test statistics for the proportional hazards
model against the monotone hazard ratio alternative, as well as the DIC (deviance
information criterion, Spiegelhalter et al. (2002)) values and the effective numbers of
parameters ( p D ) of the proposed model with γ1 = −1, 0 and 1, respectively.
The GS1 and GS2 in Table 5 represent the test statistics proposed by Gill and
Schumacher (1987), with the Gehan versus log-rank weights and Prentice versus log-
rank weights, respectively, the DS is the test statistic proposed by Deshpande and
Sengupta (1995). The DIC is calculated based on the marginal likelihood obtained
by integrating out the baseline hazard function with respect to the prior. Because we
used the BB prior, the resulting marginal likelihood becomes the partial likelihood.
Note that the DIC is an extension of the AIC (Akaike information criterion), and the
AIC works well with the partial likelihood Hjort and Claeskens (2006). Hence, it is
reasonable to calculate the DIC with the marginal likelihood. The five methods, the
posterior probability, three p-values, and DIC, indicated that the proportional hazards
assumption is valid for the Leukemia data set, but not for the Ovarian data set.
Remark When we are interested in the validity of the monotone hazard ratio assump-
tion, the frequentist tests are not valid because the rejection of the frequentist tests does
not necessarily mean that the monotone hazard ratio is valid. In contrast, the Bayesian
results—the posterior probability of γ1 and the DIC values, directly confirm whether
the assumption of the monotone hazard ratio is valid.
Remark Along with the DIC values for γ1 = −1, 0 and 1, we calculated the DIC
value of the model where γ1 is random. The DIC value with random γ1 would be
expected to be smaller than that with γ1 = 0 when the proportional hazards assump-
tion is valid. The DIC and p D values with random γ1 for the Leukemia and Ovarian
data sets are 175.23, 1.20, and 128.41, 2.16 respectively, which do not confirm our
conjecture. We find, however, that the DIC values are unstable, particularly when the
123
2.5
(a)
1
2.0
0
Lambda(10)
−1
1.5
gamma0
H(10)
4
−2
1.0
−3
0.5
−4
0 1000 2000 0 1000 2000 0 1000 2000

1.5
(b)
0.7
0.4
0.6
0.5
1.0
0.3
0.4
density
density
density
0.2
0.3
0.5
0.2
0.1
0.1
0.0
0.0
0.0
−4 −2 0 2 0 2 4 6 8 0.5 1.5 2.5

Fig. 6 The Leukemia data set results—panel a presents the traceplots of γ0 , H (10), and (10) and panel
b shoes the corresponding histograms
proportional hazards assumption is valid. Note that the difference of the DIC values
between γ1 = 0 and 1 for the Leukemia data set is very small, whereas the posterior
probabilities are much different. We think that the DIC may not be appropriate for
our model because our model is semiparametric (i.e., the hazard ratio is completely
unspecified), and the DIC is developed mainly for parametric models where the max-
imum likelihood estimator is asymptotically Gaussian. We leave this problem as a
future work.
123
Yongdai Kim et al.
(a)
6
1.0
6
4
0.8
Lambda(10)
2
gamma0
H(10)
4
0.6
0
0.4
2
−2
0.2
0
0 1000 2000 0 1000 2000 0 1000 2000

0.4
(b)
0.35
2.5
0.30
0.3
2.0
0.25
0.20
1.5
density
density
density
0.2
0.15
1.0
0.10
0.1
0.5
0.05
0.00
0.0
0.0
−4 0 2 4 6 0 2 4 6 0.0 0.4 0.8

Fig. 7 The Ovarian data set results—panel a presents the traceplots of γ0 , H (10), and (10) and panel b
shows the corresponding histograms
For the Ovarian data set, in which the proportional hazard assumption is rejected
against the monotone hazard ratio, we draw the Bayes estimator of the hazard ratio
with the pointwise 90% probability bands in Fig. 9a. The figure suggests that the hazard
ratio of the second group (stage II) over the first group (stage IIA) decreases steadily.
We draw the Bayes estimators of the two cumulative hazard functions 1 and 2 with
their pointwise 90% probability bands and the empirical cumulative hazard (ECH)
123
(a) (b)
0.8
0.5
0.6
0.4
0.3
0.4
0.2
0.2
0.1
0.0
0.0
−1 0 1 −1 0 1
gamma1 gamma1
Fig. 8 Panels a and b present the posterior probabilities of γ1 for the Leukemia and Ovarian data sets,
respectively
Table 5 P-values of the three frequentist test statistics for the proportional hazards against monotone hazard
ratio and the DIC, and p D values for the proposed model with γ1 = −1, 0, 1, respectively
p-values DIC and p D

GS1 GS2 DS γ1 = −1 γ1 = 0 γ1 = 1
Leukemia 0.6897 0.6807 0.1660 176.73, 1.39 174.64, 0.92 174.72, 1.28
Ovarian 0.0571 0.0507 0.0298 127.52, 2.03 130.73, 1.05 133.46, 1.67
(b) (c)
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
(a)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Bayes Bayes
2
ECH ECH
1
90% PB 90% PB
Hazard ratio
Lambda2(t)
Lambda1(t)
0
−4 −3 −2 −1
100 200 300 400 100 200 300 400 100 200 300 400
time time time
Fig. 9 Part a draws the Bayes estimator of H (t) with its poinwise 90% probability band, part b for 1
and part c for 2 .
functions in Fig. 9b and c, respectively. The Bayes estimators and ECH functions are
close and are located inside the probability bands.
5 Concluding remarks
We proposed a Bayesian approach for estimating the two hazard functions under the
monotone hazard ratio constraint and developed an efficient MCMC algorithm. We
demonstrated with simulated and real data sets that the MCMC algorithm, based on
the BB approach, converges well and provides reliable results.
In this paper, we modeled the monotone hazard ratio nonparametrically. An alterna-
tive model is a piecewise constant monotone hazard ratio, which provides information
123
Yongdai Kim et al.
about when the hazard ratio changes. The proposed BB approach can be easily modi-
fied to this model to save significant computational costs.
The proposed model can be extended to a case where there are more than two haz-
ard functions. Suppose there are three hazard functions λ1 , λ2 and λ3 with λ2 /λ1 and
λ3 /λ2 increasing monotonically. We can model λ2 and λ3 by

(2)
λ2 (t) = exp γ0 + H (2) (t) λ1 (t)
and

λ3 (t) = exp γ0(3) + H (2) (t) + H (3) (t) λ1 (t)
where H (2) and H (3) are two independent gamma processes a priori. The proposed
MCMC algorithm can be easily modified for this model as well.
Studying asymptotic properties of the posterior distribution is worth pursuing. With-
out H, Kim and Lee (2003b) and Kim √ (2006) proved that the convergence rate of the
BB and full Bayesian posteriors is 1/ n. We think, however, that the convergence
√
rate of the posterior of H to the true hazard ratio would be slower than 1/ √ n, as the
optimal convergence rate for the hazard function is typically slower than 1/ n. This
conjecture would partly explain the wider probability interval of γ0 for the proposed
model compared to the results for the proportional hazards model in Table 1 and the
wider probability band for 2 in Fig. 9c compared to that of 1 in Fig.9b.
Acknowledgment This work was supported by the Korea Science and Engineering Foundation (KOSEF)
grant funded by the Korea government (MEST) (R01-2007-000-20045-0(2008)).
References
Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical methods based on counting processes.
Springer, New York
Arjas E, Gasbarra D (1996) Bayesian inference of survival probabilities under stochastic ordering con-
straints. J Am Stat Assoc 91:1101–1109
Brunk HD, Franck WE, Hanson DL, Hogg RV (1966) Maximum likelihood estimation of the distribution
of two stochastically ordered random variables. J Am Stat Assoc 61:1067–1080
Deshpande JV, Sengupta D (1995) Testing for the hypothesis of proportional hazards in two population.
Biometrika 82:251–261
Dykstra RL (1982) Maximum likelihood estimation of the survival functions of stochastically ordered
random variables. J Am Stat Assoc 77:621–628
Dykstra RL, Kochar S, Robertson T (1991) Statistical inference for uniform stochastic ordering in several
population. Ann Stat 19:870–888
Gelfand AE, Kottas A (2001) Nonparametric Bayesian modeling for stochastic order. Ann Stat 53:865–876
Gill R, Schumacher M (1987) A simple test of the proportional hazards assumption. Biometrika 74:289–300
Hjort NL, Claeskens G (2006) Focussed information criteria and model averaging for Cox’s hazard regres-
sion model. J Am Stat Assoc 101:1449–1464
Ibrahim JG, Chen MH, Sinha D (2001) Bayesian survival analysis. Springer-Verlag, New York
Kalbfleisch JD (1978) Nonparametric Bayesian analysis of survival time data. J R Stat Soc Ser B 40:214–
221
Kim Y, Lee J (2003) Bayesian analysis of proportional hazard models. Ann Stat 31:493–511
Kim Y, Lee J (2003) Bayesian bootstrap for proportional hazards models. Ann Stat 31:1905–1922
123
Kim Y (2006) The Bernstein-von Mises theorem for the proportional hazard model. Ann Stat 34:1678–1700
Laud PW, Damien P, Smith AFM (1998) Bayesian nonparametric and covariate analysis of failure time data.
In: Practical nonparametric and semiparametric Bayesian statistics. Springer, New York, pp 213–225
Lo AY (1982) Bayesian nonparametric statistical inference for Poisson point processes. Z Wahrsch Verw
Gebiete 59:55–66
Mukerjee H (1996) Estimation of survival functions under uniform stochastic ordering. J Am Stat Assoc
91:1684–1689
Paestgaard JT, Huang J (1996) Asymptotic theory for nonparametric estimation of survival curves under
order restriction. Ann Stat 24:1679–1716
Ripley BD (2006) Stochastic simulation. Wiley, New York
Sengupta D, Bhattacharjee A, Rajeev V (1998) Testing for the proportionality of hazards in two samples
against the increasing cumulative hazard ratio alternative. Scand J Stat 25:637–647
Spiegelhalter DJ, Best N, Carlin B, Linde A (2002) Bayesian measures of model complexity and fit (with
discussion). J R Stat Soc Ser B 64:583–639
123

Bayesian Analysis Hazard

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Bayesian Analysis Hazard

Загружено:

Авторское право:

Доступные форматы

Lifetime Data Anal

Bayesian analysis for monotone hazard ratio

Yongdai Kim · Jin Kyung Park · Gwangsu Kim

Received: 9 April 2009 / Accepted: 2 July 2010

Keywords Bayesian bootstrap · Censoring · Monotone hazard ratio · Order

Y. Kim (B) · G. Kim

ordering, which assumes that S1 (t)/S2 (t) is nonincreasing/nondecreasing in t, has

2 Model and prior

Let (xsi , δsi ), s = 1, 2, i = 1, . . . , n s be observations of pairs of right censored right-

λ2 (t) = exp (γ0 + γ1 H (t)) λ(t)

where γ0 ∈ (−∞, ∞), γ1 ∈ {−1, 0, 1} and H (·) is a nondecreasing nonnegative func-

λ2 (t)/λ1 (t) = exp (γ0 + γ1 H (t)) .

Note that the hazard ratio is modeled nonparametrically, as H is completely unspeci-

3 Posterior: Bayesian bootstrap approach

In this section, we develop an efficient MCMC algorithm to calculate the BB posterior

3.1 Bayesian bootstrap for the proportional hazards model: review

Remark A similar approach to the BB is to assume a piecewise constant hazard func-

3.2 Bayesian bootstrap posterior

as in (3) where λ = (λ1 , . . . , λq ). Then, the BB posterior of θ is given by

where π(θ ) = π(γ0 )π(γ1 )π(H )π(λ).

3.3 MCMC algorithm

Also, the conditional BB posterior distribution of λk given γ0 , γ1 , H and data is a

The difficult part is to generate H from the conditional BB posterior. To gen-

p = min{max{x1i : δ1i = 1}, max{x2i : δ2i = 1}}.

× Wlvl −1 exp(−cWl )I (Wl ≥ 0),

where vl = c(H0∗ (tl ) − H0∗ (tl−1 )). Let

Then, the conditional posterior distribution of Wl is simplified as

π(Wl |others) = h l (exp(γ1 Wl )) Wlvl −1 exp(−cWl )I (Wl ≥ 0), (8)

h l (y) = y αl exp(−βl y). (9)

π(γ0∗ |γ1 , λ, H, Data)q(γ0∗ , γ0 )

where π(γ0 |γ1 , λ, H, Data) is in (5).

and π(h|γ0 , λ, H, Data) is in (6).

• Sampling λ given γ0 , γ1 , H and data: For k = 1, . . . , q, generate λk from

− Let Wk = H (tk ) − H (tk−1 ). Let W1 = 0 and Wk = 0 for k > p.

0 1000 2000 0 1000 2000 0 1000 2000

−3 −1 1 0 2 4 6 8 0.1 0.3 0.5 0.7

Method Point estimate 90% Interval

BB with the MHR model −0.6438 (−1.1583, −0.0898)

(n 1 , n 2 ) 30% censoring 50% censoring

(50,50) (0.0610, 0.8955, 0.0435) (0.0570, 0.8120, 0.1310)

(a) (b) Bayes (c) Bayes

(n 1 , n 2 ) 30% censoring 50% censoring

(50,50) (0.9550, 0.0445, 0.0005) (0.9505, 0.0465, 0.0030)

4.3 Prior sensitivity

H0 (t) c Simulation 1 Simulation 2

log(1 + t) 0.1 (0.2778, 0.4789, 0.2433) (1.0000, 0.0000, 0.0000)

4.4 Data analysis

The empirical posterior probability

As is done in Simulation 1, Fig. 5 presents the traceplots of γ0 and Figs. 6 and 7

0 1000 2000 0 1000 2000 0 1000 2000

−4 −2 0 2 0 2 4 6 8 0.5 1.5 2.5

0 1000 2000 0 1000 2000 0 1000 2000

−4 0 2 4 6 0 2 4 6 0.0 0.4 0.8

p-values DIC and p D

Вам также может понравиться