An Intensive Course in Stochastic Processes

An Intensive Course in Stochastic Processes and
Stochastic Dierential Equations in

Mathematical Biology
Part I
Discrete-Time Markov Chains
Linda J. S. Allen
Texas Tech University
Lubbock, Texas U.S.A.
National Center for Theoretical Sciences
National Tsing Hua University
August 2008
L. J. S. Allen Texas Tech University
Acknowledgement
I thank Professor Sze Bi Hsu and Professor Jing Yu for the invitation
to present lectures at the National Center for Theoretical Sciences at
the National Tsing Hua University.
COURSE OUTLINE
Part I: Discrete-Time Markov Chains - DTMC
Theory
Applications to Random Walks, Populations, and Epidemics
Part II: Branching Processes
Theory
Applications to Cellular Processes, Network Theory, and
Populations
Part III: Continuous-Time Markov Chains - CTMC
Theory
Applications to Populations and Epidemics
Part IV: Stochastic Dierential Equations - SDE
Comparisons to Other Stochastic Processes, DTMC and CTMC
Some Basic References for this Course
[1 ] Allen, LJS. 2003. An Introduction to Stochastic Processes with
Applications to Biology. Prentice Hall, Upper Saddle River, NJ.
[2 ] Allen, LJS. 2008. Chapter 3: An Introduction to Stochastic
Epidemic Models. Mathematical Epidemiology, Lecture Notes in
Mathematics. Vol. 1945. pp. 81-130, F. Brauer, P. van den
Driessche, and J. Wu (Eds.) Springer.
[3-20 ] Karlin and Taylor. 1975. A First Course in Stochastic Processes.
2nd Ed. Acad. Press, NY;
Kimmel and Axelrod. 2002. Branching Processes in Biology.
Springer-Verlag,NY.
Other references will be noted.
Background:
What is a Stochastic Model?
A stochastic model is formulated in terms of a stochastic process.
A stochastic process is a collection of random variables
{X(t; s)|t T, s S},
where T is the index set and S is a common sample space. The index
set often represents time, such as
T = {0, 1, 2, . . .} or T = [0, )
Time can be discrete or continuous.
The study of stochastic processes is based on probability theory.
How do Stochastic Epidemic Models Dier from
Deterministic Models?
A deterministic model is formulated in terms of xed not random
variables whose dynamics are solutions of dierential or dierence
equations.
A stochastic model is formulated in terms of random variables
whose probabilistic dynamics depend on solutions to dierential or
dierence equations.
A solution of a deterministic model is a function of time or space
and is dependent on the initial data.
A solution of a stochastic model is a probability distribution or
density function which is a function of time or space and is dependent
on the initial distribution or density. One sample path over time or
space is one realization from this distribution.
Stochastic models are used to model the variability inherent in
the process due to demography or the environment. Stochastic models
are particularly important when the variability is large relative to the
mean, e.g., small population size may lead to population extinction.
Whether the Random Variables Associated with The
Stochastic Process are Discrete or Continuous
Distinguishes Some Types of Stochastic Models.
A random variable X(t; s) of a stochastic process assigns a real
value to each outcome A S in the sample space and a probability
(or probability density),
Prob{X(t; s) A} [0, 1].
The values of the random variable constitute the state space, X(t; s).
For example, the number of cases associated with a disease may have
the following discrete or continuous set of values for its state space:
{0, 1, 2, . . .} or [0, N].
The state space can be discrete or continuous and correspondingly,
the random variable is discrete or continuous. For simplicity, the
sample space notation is suppressed and X(t) is used to denote
a random variable indexed by time t. The stochastic process is
completely dened when the set of random variables {X(t)} are
related by a set of rules.
The Choice of Discrete or Continuous Random Variables
with a Discrete or Continuous Index Set Denes the Type
of Stochastic Model.
Discrete Time Markov Chain (DTMC): t {0, t, 2t, . . .}, X(t) is a discrete
random variable. The term chain implies that the random variable is discrete.
X(t) {0, 1, 2, . . . , N}
Continuous Time Markov Chain (CTMC): t [0, ), X(t) is a discrete random
variable.
X(t) {0, 1, 2, . . . , N}
Diusion Process, Stochastic Dierential Equation (SDE): t [0, ), X(t) is a
continuous random variable.
X(t) [0, N]
Note: These are three major types of stochastic processes that will be discussed but
are not the only types of stochastic processes.
The Following Graphs Illustrate the Solution of a
Dierential Equation versus Sample Paths of a Stochastic
Epidemic Model
0 500 1000 1500 2000
0
5
10
15
20
25
30
35
Time Steps
N
u
m
b
e
r

o
f

I
n
f
e
c
t
i
v
e
s
,

I
(
t
)
Figure 1: Solution of number of infectious individuals for a dierential equation
of a SIR epidemic model versus three sample paths of a discrete-time Markov
chain.
Part I:
Discrete-Time Markov Chains
Denitions, Theorems, and Applications
Let X
n
= a discrete random variable dened on a nite {1, 2, . . . , N} or
countably innite state space, {1, 2, . . .}. The index set {0, 1, 2, . . .} often
represents the progression of time. The variable n is used instead of t.
Denition 1. A discrete time stochastic process {X
n
}
n=0
is said to have the Markov
property if
Prob{X
n
= i
n
|X
0
= i
0
, . . . , X
n1
= i
n1
} = Prob{X
n
= i
n
|X
n1
= i
n1
},
where the values of i
k
{1, 2, . . .} for k = 0, 1, 2, . . . , n. The stochastic process
is then called a Markov chain. A Markov stochastic process is a stochastic process
in which the future behavior of the system depends only on the present and not on
its past history.
Denition 2. The probability mass function associated with the random variable X
n
is denoted {p
i
(n)}
i=0
, where
p
i
(n) = Prob{X
n
= i}. (1)
Reference for Part I: [1] Chapters 2 and 3.
One-Step Transition Probabilities
Denition 3. The one-step transition probability p
ji
(n) is the probability that the
process is in state j at time n + 1 given that the process was in state i at the
previous time n, for i, j = 1, 2, . . ., that is,
p
ji
(n) = Prob{X
n+1
= j|X
n
= i}.
Denition 4. If the transition probabilities p
ji
(n) do not depend on time n, they
are said to be stationary or homogeneous. In this case, p
ji
(n) p
ji
. If the
transition probabilities are time-dependent, p
ji
(n), they are said to be nonstationary
or nonhomogeneous.The transition matrix
P =
0
B
B
B
@
p
11
p
12
p
13

p
21
p
22
p
23

p
31
p
32
p
33

.
.
.
.
.
.
.
.
.
1
C
C
C
A
.
The column elements sum to one,
P
p
ji
= 1. A matrix with this property is called
a stochastic matrix, P and P
n
are stochastic matrices.
NStep Transition Probabilities
Denition 5. The n-step transition probability, denoted p
(n)
ji
, is the probability of
moving or transferring from state i to state j in n time steps,
p
(n)
ji
= Prob{X
n
= j|X
0
= i}.
The n-step transition matrix is P
(n)
=
p
(n)
ji
.
Then P
(1)
= P, P
(0)
= I, identity matrix and, in general, P
(n)
= P
n
. Let
p(n) = (p
1
(n), p
2
(n), . . .)
T
be the probability mass vector,
X
i=1
p
i
(n) = 1.
Then p(n + 1) = Pp(n). In general,
p(n + m) = P
n+m
p(0) = P
n
`
P
m
p(0)
= P
n
p(m).
Classication of States
Denition 6. The state j can be reached from the state i (or state j is accessible
from state i) if there is a nonzero probability, p
(n)
ji
> 0, for some n 0 denoted
i j. If j i, and if i j, then i and j are said to communicate, or to be in
the same class, denoted i j.
Denition 7. A set of states C is closed if it is impossible to reach any state outside
of C from any state inside C by one-step transitions, i.e., p
ji
= 0 if i C and
j / C.
The relation i j can be represented in graph theory as a directed edge.
The relation i j is an equivalence relation. The equivalence relation on the
states dene a set of equivalence classes. These equivalence classes are known as
communication classes of the Markov chain.
Denition 8. If there is only one communication class, then the Markov chain is
said to be irreducible, but if there is more than one communication class, then the
Markov chain is said to be reducible.
A sucient condition that shows that a Markov chain is irreducible is the
existence of a positive integer n such that p
(n)
ji
> 0 for all i and j; that is,
P
n
> 0, for some positive integer n. For a nite Markov chain, irreducibility can be
checked from the directed graph. A nite Markov chain with states {1, 2, . . . , N}
is irreducible if there is a directed path from i to j for every i, j {1, 2, . . . , N}.
Gamblers Ruin Problem or Random Walk
Example 1. The states {0, 1, 2 . . . , N} represent the amount of money of the
gambler. The gambler bets $1 per game and either wins or loses each game. The
gambler is ruined if he/she reaches state 0. The probability of winning (moving to the
right) is p > 0 and the probability of losing (moving to the left) is q > 0, p+q = 1.
This model can also be considered a random walk on a grid with N + 1 points.
The one-step transition probabilities are p
00
= 1 and p
NN
= 1. All other elements
are zero. There are three communication classes: {0}, {1, 2, . . . , N 1}, and
{N}. The Markov chain is reducible. The sets {0} and {N} are closed, but the
set {1, 2, . . . , N 1} is not closed. Also, states 0 and N are absorbing; the
remaining states are transient.
0 1 2 N
The transition matrix for the gamblers ruin problem is
P =
0
B
B
B
B
B
B
B
B
B
B
B
@
1 q 0 0 0
0 0 q 0 0
0 p 0 0 0
0 0 p 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 q 0
0 0 0 0 0
0 0 0 p 1
1
C
C
C
C
C
C
C
C
C
C
C
A
.
Periodic Chains
Example 2. Suppose the states are {1, 2, . . . , N} with transition matrix
P =
0
B
B
B
B
B
@
0 0 0 1
1 0 0 0
0 1 0 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1 0
1
C
C
C
C
C
A
.
Beginning in state i it takes exactly N time steps to return to state i, P
N
= I.
The chain is periodic with period equal to N.
1 2 3 N
Denition 9. The period of state i, d(i), is the greatest common divisor of all
integers n 1 for which p
(n)
ii
> 0; that is, d(i) = g.c.d{n|p
(n)
ii
> 0 and n 1}.
If a state i has period d(i) > 1, it is said to be periodic of period d(i). If the
period of a state equals one, it is said to be aperiodic.
Periodicity is a class property: i j, implies d(i) = d(j). Thus, we speak of
a periodic class or chain or an aperiodic class or chain.
In the gamblers ruin problem or random walk model with absorbing boundaries,
Example 1, the classes {0} and {N} are aperiodic. The class {1, 2, . . . , N 1}
has period 2.
Transient and Recurrent States
Denition 10. Let f
(n)
ii
denote the probability that, starting from state i, X
0
= i,
the rst return to state i is at the nth time step, n 1; that is,
f
(n)
ii
= Prob{X
n
= i, X
m
= i, m = 1, 2, . . . , n 1|X
0
= i}.
The probabilities f
(n)
ii
are known as rst return probabilities. Dene f
(0)
ii
= 0.
Denition 11. State i is said to be transient if
P
n=1
f
(n)
ii
< 1. State i is said to be
recurrent if
P
n=1
f
(n)
ii
= 1.
Denition 12. The mean recurrence time for state i is
ii
=
X
n=1
nf
(n)
ii
.
Denition 13. If a recurrent state i satises
ii
< , then it is said to be positive
recurrent, and if it satises
ii
= , then it is said to be null recurrent.
An example of a positive recurrent state is an absorbing state. The mean
recurrence time of an absorbing state is
ii
= 1.
First Passage Time and Recurrent Chains
The probability f
(n)
ji
for j = i is dened similarly.
Denition 14. Let f
(n)
ji
denote the probability that, starting from state i, X
0
= i,
the rst return to state j, j = i is at the nth time step, n 1,
f
(n)
ji
= Prob{X
n
= j, X
m
= j, m = 1, 2, . . . , n 1|X
0
= i}, j = i.
The probabilities f
(n)
ji
are known as rst passage time probabilities. Dene f
(0)
ji
= 0.
Denition 15. If X
0
= i, then the mean rst passage time to state j is denoted as
ji
= E(T
ji
) and dened as
ji
=
X
n=1
nf
(n)
ji
, j = i.
We use these denitions to verify alternative denitions for transient and recurrent
states and recurrent and transient communication classes and chains.
Theorem 1. A state i is recurrent (transient) if and only if
P
n=0
p
(n)
ii
diverges
(converges); that is,
X
n=0
p
(n)
ii
= (< ).
Recurrence and transience are class properties; i j, state i is recurrent
(transient) i state j is recurrent (transient).
The 1-D, Unrestricted Random Walk is
Transient unless it is Symmetric.
Example 3. Consider the 1-D, unrestricted random walk. The chain is irreducible and
periodic of period 2.

-2 -1 0 1 2
Let p be the probability of moving to the right and q be the probability of moving
left, p+q = 1. We verify that the state 0 or the origin is recurrent i p = 1/2 = q.
However, if the origin is recurrent, then all states are recurrent because the chain is
irreducible. Starting from the origin, it is impossible to return in an odd number of
steps,
p
(2n+1)
00
= 0 for n = 0, 1, 2, . . . .
In 2n steps, there are a total of n steps to the right and a total of n steps to the
left, and the n steps to the left must be the reverse of those steps taken to the right
in order to return to the origin.
2n
n
=
(2n)!
n!n!
dierent paths (combinations) that begin and end at the origin. The probability of
occurrence of each one of these paths is p
n
q
n
. Thus,
X
n=0
p
(n)
00
=
X
n=0
p
(2n)
00
=
X
n=0
2n
n
p
n
q
n
.
We need an asymptotic formula for n! known as Stirlings formula to verify
recurrence:
n! n
n
e
n
2n.
Stirlings formula gives the following approximation:
p
(2n)
00
=
(2n)!
n!n!
p
n
q
n
4n(2n)
2n
e
2n
2n
2n+1
e
2n
p
n
q
n
=
(4pq)
n
n
. (2)
There exists a positive integer N such that for n N,
(4pq)
n
2
n
< p
(2n)
00
<
2(4pq)
n
n
.
Considered as a function of p, the expression 4pq = 4p(1 p) has a maximum at
p = 1/2. If p = 1/2, then 4pq = 1 and if p = 1/2, then 4pq < 1.
When p = 1/2 = q (symmetric),
X
n=0
p
(2n)
00
< N +
X
n=N
2(4pq)
n
n
< .
The latter series converges by the ratio test.
When p = q,
X
n=0
p
(2n)
00
>
X
n=N
(4pq)
n
2
n
=
1
2
X
n=N
1
n
= .
The latter series diverges because it is just a multiple of a divergent p-series.
Therefore, by Theorem 1, state 0 is recurrent i p = 1/2 = q but the chain
is irreducible, so all states are recurrent. The chain is transient i p = q; there is
a positive probability that an object starting from the origin will never return to the
origin. The object tends to +or to .
Summary of Classication Schemes
Markov chains or classes can be classied as
Periodic or Aperiodic
Then further classied as
Transient or Recurrent
Then recurrent MC can be classied as
Null recurrent or Positive recurrent.
The term ergodic refers to a MC that is aperiodic, irreducible
and recurrent; strongly ergodic if it is positive recurrent and weakly
ergodic if it is null recurrent.
Basic Theorems for Markov Chains (MC)
Theorem 2 (Basic Limit Theorem for Aperiodic MC). An ergodic MC has the property
lim
n
p
(n)
ij
=
1
ii
,
where
ii
is the mean recurrence time for state i; i and j are any states of the
chain. [If
ii
= , then lim
n
p
(n)
ij
= 0.]
Theorem 3 (Basic Limit Theorem for Periodic MC). A recurrent, irreducible, and d-
periodic MC has the property
lim
n
p
(nd)
ii
=
d
ii
and p
(m)
ii
= 0 if m is not a multiple of d, where
ii
is the mean recurrence time for
state i. [If
ii
= , then lim
n
p
(nd)
ii
= 0.]
Theorem 4. If j is a transient state of a MC, and i is any other state, then
lim
n
p
(n)
ji
= 0.
The rst two proofs apply discrete renewal theory (Karlin and Taylor, 1975). These
theorems also apply to classes in a MC.
The 1-D Unrestricted Symmetric Random Walk
is Null Recurrent.
Example 4. The unrestricted random walk model is irreducible and
periodic with period 2. The chain is recurrent i it is a symmetric
random walk, p = 1/2 = q (Example 3). Recall that the 2n-step
transition probability satises
p
(2n)
00

1
n
and hence, lim
n
p
(2n)
00
= 0. The Basic Limit Theorem for Periodic
Markov chains states that d/
00
= 0. Thus,
00
= . When
p = 1/2 = q, the chain is null recurrent.
It can be shown in a 2-D symmetric lattice random walk (probability
1/4 moving in each of 4 directions), the chain is null recurrent. But in
a 3-D symmetric lattice random walk (probability 1/6 moving in each
of 6 directions), the chain is transient.
Stationary Probability Distribution
Denition 16. A stationary probability distribution of a MC is a probability vector
= (
1
,
2
, . . .)
T
,
P
i=1
i
= 1, that satises
P = .
Example 5. Let
P =
0
B
B
B
@
a
1
0 0
a
2
a
1
0
a
3
a
2
a
1

.
.
.
.
.
.
.
.
.
1
C
C
C
A
,
where a
i
> 0 and
P
i=1
a
i
= 1. There exists no stationary probability distribution
because P = implies = 0, the zero vector. It is impossible for the sum of the
elements of to equal one.
According the the Basic Limit Theorem for MC, every strongly ergodic MC
converges to a stationary probability distribution. For a periodic MC, this is not the
case.
A Strongly Ergodic MC Converges to a
Stationary Distribution.
Theorem 5. A strongly ergodic MC with states {1, 2, . . .} and transition matrix
P has a unique positive stationary probability distribution = (
1
,
2
, . . .)
T
,
P = , such that
lim
n
P
n
p(0) = .
Example 6. The following transition matrix is based on a strongly ergodic MC:
P =
0
@
0 1/4 0
1 1/2 1
0 1/4 0
1
A
.
The stationary probability distribution is = (1/6, 2/3, 1/6)
T
with mean recurrence
times
11
= 6,
22
= 3/2, and
33
= 6. The columns of P
n
approach the stationary
probability distribution,
lim
n
P
n
=
0
@
1/6 1/6 1/6
2/3 2/3 2/3
1/6 1/6 1/6
1
A
.
The Basic Theorems Simplify for Finite MC
Facts: In nite MC, there are NO null recurrent states and not all states are
transient. In addition, in nite MC, a stationary probability distribution, is an
eigenvector of P corresponding to an eigenvalue one.
Theorem 6. An irreducible nite MC is positive recurrent. In addition, an irreducible,
aperiodic nite MC has a unique positive stationary distribution such that
lim
n
P
n
p(0) = .
Example 7. The transition matrix for an irreducible nite MC is
P =
1/2 1/3
1/2 2/3
.
The stationary probability distribution satises P = ,
= (2/5, 3/5)
T
.
Mean recurrence times are
11
= 5/2 and
22
= 5/3.
Biological Applications of DTMC Processes
We will apply these denitions and theorems to some biological
examples:
(1) Gamblers Ruin Problem or Random Walk
(2) Birth and Death Process
(3) Logistic Growth Process
(4) SIS Epidemic Process
(5) SIR Epidemic Process
(6) Chain Binomial Epidemic Process
(1) Gamblers Ruin Problem
The gamblers ruin problem is a classical problem in DTMC theory.
The model can also be considered a random walk on a spatial grid with
N + 1 grid points. If N , the spatial grid is semi-innite.
Probability of Absorption
Let a
k
be the probability of absorption into state 0 (ruin) beginning
with a capital of k, 1 k N 1 and let b
k
be the probability of
absorption into state N (win and game stops) beginning with a capital
of k. If there is only one absorbing state at 0, as in a population model,
then probability of absorption is probability of extinction. If a
kn
and
b
kn
represent absorption into the two states, 0 and N, respectively,
after the nth game or step, then
a
k
=
k=0
a
kn
and b
k
=
n=0
b
kn
Probability of Absorption
We solve a boundary value problem (bvp) for a
k
(dierence equation) and use
the fact that a
k
+ b
k
= 1 to solve b
k
. The bvp is
a
k
= pa
k+1
+ qa
k1
a
0
= 1
a
N
= 0
This is a homogeneous linear dierence equation. Assume a
k
=
k
= 0 to obtain
the characteristic equation p
2
+ q = 0. Then
a
k
=
(q/p)
N
(q/p)
k
(q/p)
N
1
, p = q
b
k
=
(q/p)
k
1
(q/p)
N
1
, p = q.
If p = 1/2 = q, then
a
k
=
N k
N
, p = 1/2 = q
b
k
=
k
N
, p = 1/2 = q.
The Probability of Absorption for N = 100 and
k = 50.
p = probability of winning (moving to right)
q = 1 p = probability of losing (moving to left),
Prob. a
50
b
50
q = 0.50 0.5 0.5
q = 0.51 0.880825 0.119175
q = 0.55 0.999956 0.000044
q = 0.60 1.00000 0.00000
Table 1: Gamblers ruin problem with a beginning capital of k = 50
and a total capital of N = 100.
Expected Time Until Absorption
In terms of the gamblers ruin problem, we will determine the mean
time until absorption. In terms of population models, if there is only
one absorbing boundary at 0, this represents population extinction and
the mean time until absorption is the mean time until extinction.
Let
k
= the expected time until absorption in the gamblers ruin
problem. We solve the following bvp:
k
= p(1 +
k+1
+ q(1 +
k1
) = 1 + p
k+1
+ q
k1
,
0
= 0 =
N
This is a nonhomogeneous, linear dierence equation. To solve the
homogeneous equation, let = = 0 to obtain the characteristic
equation: p
2
+ q = 0. A particular solution is
k
= k/(q p),
q = p.
The Expected Time to Absorption for N = 100
and k = 50.
k
=
k
q p
N
q p
1 (q/p)
k
1 (q/p)
N
, q = p
k(N k), q = p
Prob. a
50
b
50

50
q = 0.50 0.5 0.5 2500
q = 0.51 0.880825 0.119175 1904
q = 0.55 0.999956 0.000044 500
q = 0.60 1.00000 0.00000 250
Table 2: Gamblers ruin problem with a beginning capital of k = 50
and a total capital of N = 100.
Expected Time Until Absorption as a Function
of Initial Captial k for N = 100 and q = 0.55.
0 20 40 60 80 100
0
200
400
600
800
Initial capital
E
x
p
e
c
t
e
d

d
u
r
a
t
i
o
n
Figure 2: Expected duration of the games,
k
for k = 0, 1, 2, . . . , 100,
when q = 0.55 and N = 100.
Random Walk on a Semi-Innite Domain,
N
Probability of Extinction
a
k
=
1, p < q
q
p
k
, p q.
Expected Time Until Extinction
k
=
k
q p
, p < q
, p q.
(2) Birth and Death Process
A birth and death process is related to the gamblers ruin problem, but the
probability of a birth (winning) or death (losing) are not constant but depend on
the size of the population and size N is not absorbing. Let X
n
, n = 0, 1, 2, . . .
denote the size of the population. The birth and death probabilities are b
i
and d
i
,
b
0
= 0 = d
0
, b
N
= 0, b
i
> 0 and d
i
> 0 for i = 1, 2, . . .. During the time
interval n n + 1, at most one event occurs, either a birth or a death. Assume
p
ji
= Prob{X
n+1
= j|X
n
= i}
=
8
>
>
<
>
>
:
b
i
, if j = i + 1
d
i
, if j = i 1
1 (b
i
+ d
i
), if j = i
0, if j = i 1, i, i + 1
for i = 1, 2, . . ., p
00
= 1, p
j0
= 0 for j = 0, and p
N+1,N
= b
N
= 0.
The Transition Matrix for a Birth and Death
Process
The transition matrix P has the following form:
0
B
B
B
B
B
B
B
B
B
@
1 d
1
0 0 0
0 1 (b
1
+ d
1
) d
2
0 0
0 b
1
1 (b
2
+ d
2
) 0 0
0 0 b
2
0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 1 (b
N1
+ d
N1
) d
N
0 0 0 b
N1
1 d
N
1
C
C
C
C
C
C
C
C
C
A
.
To ensure P is a stochastic matrix,
sup
i{1,2,...}
{b
i
+ d
i
} 1.
During each time interval, n to n + 1, either the population size
increases by one, decreases by one, or stays the same size. This is a
reasonable assumption if the time interval is suciently small.
Eventual Extinction Occurs with Probability One.
There are two communication classes, {0} and {1, . . . , N}. The
rst one is is positive recurrent and the second one is transient.
There exists a unique stationary probability distribution , P = ,
where
0
= 1 and
i
= 0 for i = 1, 2, . . . , N. Eventually population
extinction occurs from any initial state (Theorem 4).
lim
n
P
n
p(0) = .
However, the expected time to extinction may be very long!
Expected Time to Extinction in a Birth and
Death Process.
Let
k
= the expected time until extinction for a population with initial size k.
k
= b
k
(1 +
k+1
) + d
k
(1 +
k1
) + (1 (b
k
+ d
k
))(1 +
k
)
= 1 + b
k
k+1
+ d
k
k1
+ (1 b
k
d
k
)
k
and
N
= 1 + d
N
N1
+ (1 d
N
)
N
. This can be expressed in matrix form:
D = c, where = (
0
,
1
, . . . ,
N
)
T
, c = (0, 1, . . . , 1)
T
, and D is
0
B
B
B
B
B
B
B
@
1 | 0 0 0 0 0

d
1
| b
1
d
1
b
1
0 0 0
0 | d
2
b
2
d
2
b
2
0 0
.
.
. |
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 | 0 0 0 d
N
d
N
1
C
C
C
C
C
C
C
A
=
1 0
D
1
D
N
.
Denition 17. An N N matrix A = (a
ij
) is diagonally dominant if
a
ii
N
X
j=1,j=i
|a
ij
|.
Matrix A is irreducibly diagonally dominant if A is irreducible and diagonally dominant with strict
inequality for at least one i
Expected Time until Extinction in a Birth and
Death Process.
Matrix D
N
is irreducibly diagonally dominant, det(D
N
) = 0 and det(D) =
det(D
N
) . Thus, D is nonsingular and the solution to the expected time until
extinction is
= D
1
c.
Because matrix D is tridiagonal, simple recursion relations can be applied to obtain
explicit formulas for the
k
, k = 1, 2, . . . , N.
Theorem 7. Suppose {X
n
}, n = 0, 1, 2, . . . , N, is a general birth and death
process with X
0
= m 1 satisfying b
0
= 0 = d
0
, b
i
> 0 for i = 1, 2, . . . , N1,
and d
i
> 0 for i = 1, 2, . . . , N. The expected time until population extinction
satises
m
=
8
>
>
>
<
>
>
>
:
1
d
1
+
N
P
i=2
b
1
b
i1
d
1
d
i
, m = 1
1
+
m1
P
s=1
"
d
1
d
s
b
1
b
s
N
P
i=s+1
b
1
b
i1
d
1
d
i
#
, m = 2, . . . , N.
Diagonally dominant and irreducibly diagonally dominant matrices are nonsingular.
An Example of a Simple Birth and Death
Process with N = 20.
Example 8. Suppose the maximal population size is N = 20, where the birth
and death probabilities are linear: b
i
bi = 0.03i, for i = 1, 2, . . . , 19,
d
i
di = 0.02i, for i = 1, 2, . . . , 20, a simple birth and death process. Since
b > d, there is population growth.
0 5 10 15 20
0
2
4
6
x 10
4
Initial population size
E
x
p
e
c
t
e
d

d
u
r
a
t
i
o
n
b > d
Figure 3: Expected time until population extinction when the maximal
population size is N = 20, b
i
= 0.03i, and d
i
= 0.02i.
(3) Logistic Growth Process
Assume b
i
d
i
= ri (1 i/K), where r = intrinsic growth
rate and K = carrying capacity.
Two cases:
(a) b
i
= r
i
i
2
2K
and d
i
= r
i
2
2K
, i = 0, 1, 2, . . . , 2K
(b) b
i
=
ri, i = 0, 1, 2, . . . , N 1
0, i N
and d
i
= r
i
2
K
, i =
0, 1, . . . , N
We Plot the Expected Time to Extinction for
Two Cases.
Example 9. Let r = 0.015, K = 10, and N = 20. The population persists much
longer in case (a).
0 5 10 15 20
0
2
4
6
8
10
x 10
6
(a)
E
x
p
e
c
t
e
d

t
i
m
e

t
o

e
x
t
i
n
c
t
i
o
n
0 5 10 15 20
0
1
2
2.5
x 10
5
(b)
E
x
p
e
c
t
e
d

t
i
m
e

t
o

e
x
t
i
n
c
t
i
o
n
Figure 4: Expected time until population extinction when the birth and death
rates satisfy (a) and (b) and the parameters are r = 0.015, K = 10, and
N = 20.
Quasistationary Probability Distribution
When the expected time to extinction is very long, it is reasonable to examine
the dynamics of the process prior to extinction. Dene the probability conditioned
on nonextinction:
q
i
(n) = Prob{X
n
= i|X
j
= 0, j = 0, 1, 2, . . . , n 1}
=
p
i
(n)
1 p
0
(n)
for i = 1, 2, . . . , N. Note q(n) = (q
1
(n), q
2
(n), . . . , q
N
(n))
T
denes a
probability distribution because
N
X
i=1
q
i
(n) =
P
N
i=1
p
i
(n)
1 p
0
(n)
=
1 p
0
(n)
1 p
0
(n)
= 1.
Let Q
n
= the random variable for the population size at time n conditional on
nonextinction; q
i
(n) = Prob{Q
n
= i}. This quasistationary process is a nite
irreducible MC. The stationary probability distribution for this process is denoted as
q
; q
is referred to as the quasistationary probability distribution.

Dierence equations for q
i
(n) can be derived based on those for p
i
(n) [i.e.,
p(n+1) = Pp(n)]. From these dierence equations the quasistationary probability
distribution q
can be determined. It will be seen that q
cannot be calculated by a
direct method but by an indirect method, an iterative scheme.
An approximation to the process {Q
n
} yields a strongly ergodic MC, {
Q
n
}, with
associated probability distribution q(n). For this new process, a transition matrix,
P, and the limiting positive stationary probability distribution q
can be dened.
The stationary probability distribution q
is an approximation for the quasistationary

probability distribution q
.
Dierence equations for q
i
(n + 1) are derived from the identity p(n + 1) =
Pp(n).
q
i
(n + 1) =
p
i
(n + 1)
1 p
0
(n + 1)
=
p
i
(n + 1)
1 p
0
(n)
1 p
0
(n)
1 p
0
(n + 1)
p
i
(n + 1)
1 p
0
(n)
1 p
0
(n)
1 p
0
(n) d
1
p
1
(n)
or
q
i
(n + 1)(1 d
1
q
1
(n)) =
p
i
(n + 1)
1 p
0
(n)
.
Using the identity for p
i
(n + 1), the following relation is obtained:
q
i
(n + 1)[1 d
1
q
1
(n)] = b
i1
q
i1
(n) + (1 b
i
d
i
)q
i
(n) + d
i+1
q
i+1
(n)
for i = 1, 2, . . . , N, b
0
= 0, and q
i
(n) = 0 for i / {1, 2, . . . , N}. It is
similar to the dierence equation satised by p
i
(n) except for an additional factor
multiplying q
i
(n + 1). An analytical solution to q
cannot be found directly from

these equations since the coecients depend on n, but q
can be solved by an
iterative method.
Approximate Quasistationary Probability
Distribution
To approximate the quasistationary probability distribution, q
, let d
1
= 0. That is,
when the population size equals one, the probability of dying is zero. Then
q
i
(n + 1) = b
i1
q
i1
(n) + (1 b
i
d
i
) q
i
(n) + d
i+1
q
i+1
(n),
i = 2, . . . , N 1, q
1
(n + 1) = (1 b
1
) q
1
(n) + d
2
q
2
(n), and q
N
(n + 1) =
b
N1
q
N1
(n) + (1 d
N
) q
N
(n). The new transition matrix corresponding to this
approximation satises
P =
0
B
B
B
B
B
B
B
@
1 b
1
d
2
0 0
b
1
1 (b
2
+ d
2
) 0 0
0 b
2
0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1 (b
N1
+ d
N1
) d
N
0 0 b
N1
1 d
N
1
C
C
C
C
C
C
C
A
.
Note that

P is a submatrix of the original transition matrix P, where the rst
column and rst row of P are deleted and d
1
= 0. The MC q(n + 1) =

P q(n),
is strongly ergodic and thus q(n) converges to a unique stationary probability
distribution, q
, where
q
i+1
=
b
i
b
1
d
i+1
d
2
q
1
and
N
X
i=1
q
i
= 1.
Approximate Quasistationary Probability
Distribution
Example 10. The approximate quasistationary probability distribution, q
, is compared
to the quasistationary probability distribution q
when r = 0.015, K = 10, and

N = 20 in cases (a) and (b). Both distributions have good agreement for N = 20,
but when N = 10 and K = 5, then the two distributions dier, especially for values
near zero.
5 10 15 20
0
0.05
0.1
0.15
0.2
Population size
P
r
o
b
a
b
i
l
i
t
y
(a)
5 10 15 20
0
0.05
0.1
0.15
0.2
Population size
P
r
o
b
a
b
i
l
i
t
y
(b) (b)
2 4 6 8 10
0
0.05
0.1
0.15
0.2
Population size
P
r
o
b
a
b
i
l
i
t
y
(c)
Figure 5: Quasistationary probability distribution, q
(solid curve), and the

approximate quasistationary probability distribution, q
(diamond marks),
when r = 0.015, K = 10, and N = 20 in cases (a) and (b). In (c),
r = 0.015, K = 5, N = 10, where b
i
= ri and d
i
= ri
2
/K.
Probability Distribution Associated with Logistic
Growth when N = 100, K = 50 and X
0
= 5.
p(n) = (p
1
(n), . . . , p
N
(n))
T
, n = 0, 1, . . . , 2000.
0
50
100
0
1000
2000
0
0.2
0.4
0.6
0.8
1
State
Time, n
P
r
o
b
a
b
i
l
i
t
y
Figure 6: The stochastic logistic probability distribution, p(n), r = 0.004,
K = 50, N = 100, X
0
= 5.
(4) To Understand the Stochastic SIS Epidemic
Model, We Review the Dynamics of the
Deterministic SIS Epidemic Model.
Deterministic SIS:
S
I
dS
dt
=
N
SI + (b + )I
dI
dt
=
N
SI (b + )I
where > 0, > 0, N > 0 and b 0, S(t) + I(t) = N.
The Dynamics of the Deterministic SIS
Epidemic Model Depend on the Basic
Reproduction Number.
= transmission rate
b = birth rate = death rate
= recovery rate
N = total population size = constant.
Basic Reproduction Number:
R
0
=
b +
If R
0
1, then lim
t
I(t) = 0.
If R
0
> 1, then lim
t
I(t) = N
1
1
R
0
.
Formulation of the SIS Stochastic MC Epidemic
Model
Let I
n
denote the discrete random variable for the number of infected (and infectious)
individuals with associated probability function
p
i
(n) = Prob{I
n
= i}
where i = 0, 1, 2, . . . , N is the total number infected at time t. The probability
distribution is
p(n) = (p
0
(n), p
1
(n), . . . , p
N
(n))
T
for t = 0, 1, 2, . . . . Now we relate the random variables {I
n
} indexed by time n
by dening the probability of a transition from state i to state j, i j, at time
n + 1 as
p
ji
(n) = Prob{I
n+1
= j|I
n
= i}.
For the Stochastic Model, Assume that the
Time Interval is Suciently Small, Such that the
Number of Infectives Changes by at Most One.
That is,
i i + 1, i i 1 or i i.
Either there is a new infection, birth, death, or a recovery. Therefore, the
transition probabilities are
p
ji
(n) =
8
>
>
>
>
>
<
>
>
>
>
>
:
i(N i)/N = b
i
, j = i + 1
b + = d
i
, j = i 1
1 [i(N i)/N + (b + )i] =
1 [b
i
+ d
i
], j = i
0, j = i + 1, i, i 1,
Then the SIS Epidemic Process is similar to a birth and death process.
Three Sample Paths of the DTMC SIS Model
are Compared to the Solution of the
Deterministic Model.
0 500 1000 1500 2000
0
10
20
30
40
50
60
70
Time Steps
N
u
m
b
e
r

o
f

I
n
f
e
c
t
i
v
e
s
Figure 7: R
0
= 2, = 0.01, b = 0.0025 = , N = 100, S
0
= 98, and
I
0
= 2.
Even Though R
0
> 1, the DTMC SIS Epidemic
Model Predicts the Epidemic Ends.
When R
0
> 1, the Deterministic SIS epidemic model predicts that an endemic
equilibrium is reached. This is not the case for the Stochastic SIS epidemic model.
lim
n
p
0
(n) = 1.
As mentioned earlier, this absorption at zero may take an exponential amount of
time. But when N is large and I
0
= i is small, for large time n,
0 < p
0
(n) P
0
= constant < 1
An Estimate for P
0
Can be Obtained From the
Gamblers Ruin Problem on a Semi-Innite
Domain.
When N is large and i is small:
Probability Movement Right = p =
N
i(N i) i
Probability Movement Left = q = (b + )i
Based on an random walk model on a semi-innite domain, an estimate for the
probability of no epidemic (probability of ruin) with a capital k is a
k
= (q/p)
k
:
Suppose I
0
= k. Then
Probability of no Epidemic = P
0
b +
k
=
1
R
0
k
Graphs Probability Distribution are Bimodal,
Showing the Probability of Now Epidemic and
the Quasistationary Distribution.
R
0
= 2, I(0) = 3, P
0
1/8
0
50
100
0
1000
2000
0
0.2
0.4
0.6
0.8
1
i
Time Steps
P
r
o
b
{
I
(
t
)
=
i
}
(5) We Review the Dynamics of the
Deterministic SIR Epidemic Model.
Deterministic SIR: Basic Reproduction Number R
0
=

b +
S I R
dS
dt
=
N
SI + b(I + R)
dI
dt
=
N
SI (b + )I
dR
dt
= I bR
If R
0
> 1 and b > 0, then lim
t
I(t) =

I > 0.
If R
0
> 1 and b = 0, then lim
t
I(t) = 0. An epidemic occurs if R
0
S(0)
N
> 1.
If R
0
1, then lim
t
I(t) = 0.
Formulation of a DTMC SIR Epidemic Model
Results In A Bivariate Process.
S
n
+ I
n
+ R
n
= N = maximum population size.
Let S
n
and I
n
denote discrete random variables for the number of susceptible
and infected individuals, respectively. These two variables have a joint probability
function
p
(s,i)
(n) = Prob{S
n
= s, I
n
= i}
where R
n
= N S
n
I
n
. For this stochastic process, we dene transition
probabilities as follows:
p
(s+k,i+j),(s,i)
= Prob{(S, I) = (k, j)|(S(t), I(t)) = (s, i)}
=
8
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
:
i(N i)/N, (k, j) = (1, 1)
i, (k, j) = (0, 1)
bi, (k, j) = (1, 1)
b(N s i), (k, j) = (1, 0)
1 [i(N i)/N + i + b(N s)], (k, j) = (0, 0)
0, otherwise
In multivariate processes the transition matrix is often too large and complex to
write down.
Three Sample Paths of the DTMC SIR Epidemic
Model are Compared to the Solution of the
Deterministic Model.
0 500 1000 1500 2000
0
5
10
15
20
25
30
35
Time Steps
N
u
m
b
e
r

o
f

I
n
f
e
c
t
i
v
e
s
,

I
(
t
)
Figure 8: R
0
= 2, R
0
S
0
/N = 1.96, = 0.01, b = 0, = 0.005,
N = 100,S
0
= 98, and I
0
= 2.
(6) Chain Binomial Epidemic Model
There are two basic models known as the Greenwood and Reed-Frost models,
originally developed in 1928 and 1931, respectively. These models apply to small
epidemics or to outbreaks within a household.
Both models are DTMC models that depend on the two random variables S
t
and
I
t
, bivariate MC. The latent period is the time from t to t + 1 and the infectious
period is contracted to a point. Therefore, at time t + 1, there are only newly
infected individuals from the previous time t.
S
t+1
+ I
t+1
= S
t
.
Given there are i infectives, let p
i
= probability that a susceptible individual
does not become infected during the time period t to t + 1.
Greenwood Model: p
i
= p
The transition probability for (s
t
, i
t
) (s
t+1
, i
t+1
) depends only on p, s
t
, and
s
t+1
and is based on the binomial probability distribution:
p
s
t+1
,s
t
=
s
t
s
t+1
p
s
t+1
(1 p)
s
t
s
t+1
.
Sample paths are denoted {s
0
, s
1
, . . . , s
t1
, s
t
}. The epidemic stops when
s
t
= s
t1
.
E(S
t+1
|S
t
= s
t
) = ps
t
E(I
t+1
|S
t
= s
t
) = (1 p)s
t
.
Sample Paths of the Greenwood Chain-Binomial
Model.
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
t
s
t
Figure 9: Four sample paths for the Greenwood chain binomial model
when s
0
= 6 and i
0
= 1, {6, 6}, {6, 5, 5}, {6, 4, 3, 2, 1, 1}, and
{6, 2, 1, 0, 0}.
Reed-Frost Model: p
i
= p
i
The transition probability for (s
t
, i
t
) (s
t+1
, i
t+1
) is again based on the
binomial probability distribution but depends on p, s
t
, s
t+1
, and i
t
:
p
(s,i)
t+1
,(s,i)
t
=
s
t
s
t+1
(p
i
t
)
s
t+1
(1 p
i
t
)
s
t
s
t+1
E(S
t+1
|S
t
= s
t
) = s
t
p
i
t
E(I
t+1
|S
t
= s
t
) = s
t
(1 p
i
t
).
The Duration and Size of the Epidemic Can be
Calculated.
Sample Path Duration Size
{s
0
, . . . , s
t1
, s
t
} T W Greenwood Reed-Frost
3 3 1 0 p
3
p
3
3 2 2 2 1 3(1 p)p
4
3(1 p)p
4
3 2 1 1 3 2 6(1 p)
2
p
4
6(1 p)
2
p
4
3 1 1 2 2 3(1 p)
2
p
2
3(1 p)
2
p
3
3 2 1 0 0 4 3 6(1 p)
3
p
3
6(1 p)
3
p
3
3 2 0 0 3 3 3(1 p)
3
p
2
3(1 p)
3
p
2
3 1 0 0 3 3 3(1 p)
3
p 3(1 p)
3
p(1 + p)
3 0 0 2 3 (1 p)
3
(1 p)
3
Table 3: All of the sample paths, their duration, and size are computed
for the Greenwood and Reed-Frost models when s
0
= 3 and i
0
= 1.
This Concludes Part I on DTMC.
Theory
Theory
Applications to Cellular Processes, Network Theory, and Populations
Theory
Part II
Branching Processes
Linda J. S. Allen
August 2008
Acknowledgement
COURSE OUTLINE
Theory
Theory
Populations
Theory
Part II:
Branching Processes
The subject of branching processes began in 1845 with Bienayme
and was advanced in the 1870s with the work of Reverend Henry
William Watson, a clergyman and mathematician, and the biometrician
Francis Galton.
Galton in 1873 posed a problem and two questions whose solutions
were not resolved until 1930:
Suppose adult males (N in number) in a population each have dierent
surnames. Suppose in each generation, a
0
percent of the adult males have no
male children who survive to adulthood; a
1
have one such child; a
2
have two,
and so on up to a
5
, who have ve.
1 Find what proportion of the surnames become extinct after r generations.
2 Find how many instances there are of the same surname being held by m persons.
Reference for Part II: [1], Chapter 4.
We will Discuss Single-Type and Multi-Type
Branching Processes (BP)
A. Single-Type Galton-Watson BP: Each generation, keep track of
only one type of individual, cell, etc.
Applications:
(1) Family Names
(2) Cell Division
(3) Network Theory
B. Multi-type Galton Watson BP: Each generation, keep track of k
types of individuals, cells, etc.
Application:
(1) k Dierent Age Groups in a Population.
A. Single-Type Galton-Watson Branching
Processes
The type of problem studied by Galton and Watson is appropriately
named a Galton-Watson Branching Process.
Discrete time branching processes are DTMC.
Branching processes are frequently studied separately from Markov
chains because
(a) a wide variety of applications in branching processes: electron multipliers,
neutron chain reactions, population growth, survival of mutant genes, changes
in DNA and chromosomes, cell cycle, cancer cells, chemotherapy, and newtwork
theory.
(b) techniques other than the transition matrices are used to study their behavior:
probability generating functions.
Assumptions in Galton-Watson Branching
Process
Let X
0
= total population size at the zeroth generation and let
X
n
= total population size at the nth generation.
The process {X
n
}
n=0
has state space {0, 1, 2, . . .} and will be
referred to as a branching process (bp).
Each individual in generation n gives birth to Y ospring in the
next generation of the same type (single type bp), where Y is a random
variable that takes values in {0, 1, 2 . . .}.
The ospring distribution is
Prob{Y = k} = p
k
, k = 0, 1, 2, . . . .
Each individual gives birth independently of other individuals.
An Illustration of a BPA Stochastic Realization
or Sample Path
Let X
0
= 1 population size (married couple), where the family history
is followed over time.
0
1
2
3
Figure 1: A sample path or stochastic realization of a branching process
{X
n
}
n=0
. In the rst generation, four individuals are born, X
1
= 4.
The four individuals give birth to three, zero, four, and one individuals,
respectively, making a total of eight individuals in generation 2, X
2
= 8.
We Digress Here to Talk about Generating
Functions
Assume X is a discrete random variable with state space
{0, 1, 2, . . .}. Let the probability mass function of X equal:
Prob{X = j} = p
j
,
j = 0, 1, 2, . . . , where

j=0
p
j
= 1.
Mean or First Moment:
X
= E(X) =
X
j=0
jp
j
Variance or Second Moment about the Mean:
2
X
= E[(X
X
)
2
] = E(X
2
)
2
X
=
X
j=0
j
2
p
j

2
X
.
nth Moment: E(X
n
) =
X
j=0
j
n
p
j
Denition of Probability Generating Function
Denition 1. The probability generating function (pgf) of X is dened
on a subset of the reals
P
X
(t) = E(t
X
) =
j=0
p
j
t
j
, some t R.
Because

j=0
p
j
= 1, the sum converges absolutely for |t| 1
implying P
X
(t) is well dened for |t| 1 and innitely dierentiable
for |t| < 1. As the name implies, the pgf generates the probabilities
associated with the distribution
P
X
(0) = p
0
, P
X
(0) = p
1
, P
X
(0) = 2!p
2
.
In general, the kth derivative of the p.g.f. of X satises
P
(k)
X
(0) = k!p
k
.
The PGF can be Used to Calculate the Mean
and Variance.
Note that P
X
(t) =
j=1
jp
j
t
j1
for 1 < t < 1.
The Mean is
P
X
(1) =
X
j=1
jp
j
= E(X) =
X
.
Also P
X
(t) =
j=1
j(j 1)p
j
t
j2
implies
P
X
(1) =
X
j=1
j(j 1)p
j
= E(X
2
) E(X).
The Variance is
2
X
= E(X
2
) E(X) + E(X) [E(X)]
2
= P
X
(1) +P
X
(1) [P
X
(1)]
2
.
Other Generating Functions
Denition 2. The moment generating function (mgf) is
M
X
(t) = E(e
tX
) =
X
j=0
p
j
e
jt
some t R.
The cumulant generating function (cgf) is the natural logarithm of
the moment generating function,
K
X
(t) = ln[M
X
(t)].
Note M
X
(t) is always well-dened for t 0. The mgf generates
the moments:
M
X
(0) = 1, M
X
(0) =
X
= E(X), M
X
(0) = E(X
2
),
M
(k)
X
(0) = E(X
k
).
An Example Applying PGF and MGF
Poisson:
8
<
:
j
e
j!
, j = 0, 1, 2, . . . ,
0, otherwise,
> 0.
The pgf of the Poisson distribution is
P
X
(t) = e
(t1)
which can be easily veried:
P
X
(t) =
X
j=0
t
j
j
e
j!
= e
e
t
X
j=0
(t)
j
e
t
j!
= e
(t1)
The mgf satises M
X
(t) = P
X
(e
t
) so that M
X
(t) = e
(e
t
1)
. Hence,
the mean and variance of the Poisson distribution are
X
= P
X
(1) =
2
X
= P
X
(1) + P
X
(1) [P
X
(1)]
2
=
2
+
2
=
We Return to the Galton-Watson Branching
Process
We will calculate the pgf of X
n
, the total population size in generation n. Let
X
0
= 1. Denote the pgf of X
n
as h
n
and the pgf of X
0
as h
0
.
h
0
(t) = t.
Recall Y is the ospring distribution; subscripts on Y relate to number of ospring.
In the next generation, each individual gives birth to k individuals with probability
p
k
. The pgf of X
1
is
h
1
(t) =
X
k=0
p
k
t
k
= f(t).
Then X
2
= Y
1
+ + Y
X
1
because each of the X
1
individuals gives birth to Y
individuals and the sum of all these births is X
2
. Note for a xed sum m of iid
random variables Y
i
,
P
m
i=0
Y
i
, the pgf is
P
P
Y
i
(t) = E(t
Y
1
t
Y
m
) = E(t
Y
1
) E(t
Y
m
) = [f(t)]
m
.
The PGF of X
2
is a Composition.
But X
2
is a sum of a random number X
1
of iid Y
i
. The pgf of X
2
is
h
2
(t) = E
t
P
X
1
0
Y
i
!
=
X
j=0
t
j
Prob
8
<
:
X
1
X
i=0
Y
i
= j
9
=
;
=
X
j=0
t
j
X
m=0
Prob
8
<
:
X
1
X
i=0
Y
i
= j|X
1
= m
9
=
;
Prob {X
1
= m}
=
X
j=0
t
j
X
m=0
p
m
Prob
8
<
:
X
1
X
i=0
Y
i
= j|X
1
= m
9
=
;
=
X
m=0
p
m
X
j=0
Prob
8
<
:
m
X
i=0
Y
i
= j
9
=
;
t
j
=
X
m=0
p
m
[f(t)]
m
= f(f(t))
h
2
(t) = f(f(t)).
In general, the pgf of X
n
is the nfold composition:
h
n
(t) = f(f
n1
(t)) = f(f( (f(t)) )) = f
n
(t).
PGF of X
n
if X
0
= N.
The derivation of the generating function h
n
is based on the fact
that X
0
= 1. If X
0
= N, then h
0
(t) = t
N
and the pgf for the
ospring is f(t). The process begins with N independent branches.
Then the pgf of X
n
is
h
n
(t) = [f
n
(t)]
N
, when X
0
= N.
0
1
2
3
Figure 2: A sample path or stochastic realization of a branching process
{X
n
}
n=0
, where X
0
= 3, X
1
= 5, and X
2
= 9.
An Example of a Single Type Branching Process.
Example 1. Suppose a bp with X
0
= 1 has an ospring distribution with at most
one birth, p
0
= 1/4, p
1
= 3/4, and p
k
= 0, k = 2, 3, . . . . The pgf for X
1
is
f(t) =
1
4
+
3
4
t.
The pgf for X
2
is f
2
(t) = f(f(t)) =
1
4
`
1 +
3
4
+
`
3
4
2
t. The pgf for X
n
is
f
n
(t) =
1
4
1 +
3
4
+ +
3
4
n1
!
+
3
4
n
t
= 1
3
4
n
+
3
4
n
t.
If p
k
(n) is the probability that the population size is k in generation n, then
p
0
(n) = 1
3
4
n
and p
1
(n) =
3
4
n
.
If X
0
= N, then the p.g.f. of X
n
is just a binomial expansion of f
n
(t):
[f
n
(t)]
N
=
N
0
(p
0
(n))
N
+
N
1
(p
0
(n))
N1
p
1
(n)t
+ +
N
N
(p
1
(n))
N
t
N
.
Galtons Questions Can be Addressed for this
Example.
(1) After r generations, the probability that all surnames have
gone extinct is [p
0
(r)]
N
, the probability that N 1 surnames have
gone extinct is N[p
0
(r)]
N1
p
1
(r), and so on. The probability that
no surnames have gone extinct is [p
1
(r)]
N
. In Example 1, p
0
(r) =
1 (3/4)
r
so that
lim
r
[p
0
(r)]
N
= 1.
(2) We do not address Galtons question (2) in general but note that
in a single branching process, X
0
= 1, the probability there are exactly
m surnames the same in generation r is p
m
(r). In Example 1, the
probability that there are two or more equal surnames in any generation
is zero.
We Study Population Extinction.
Our goal is to state the one of the main theorems for branching
processes to determine the asymptotic probability of population
extinction:
lim
n
Prob{X
n
= 0} = lim
n
p
0
(n).
The pgf for the ospring distribution is
f(t) =
k=0
p
k
t
k
. (1)
Denote the mean number of births as m,
m = f
(1) = lim
t1
(t) =
k=1
kp
k
.
We Make 5 Reasonable Assumptions Regarding
the PGF.
Assume the pgf for the ospring distribution has the following ve
properties:
(1) f(0) = p
0
> 0, 0 < p
0
+ p
1
< 1, and f(1) = 1.
(2) f(t) is continuous for t [0, 1].
(3) f(t) is innitely dierentiable for t [0, 1).
(4) f
(t) =
k=1
kp
k
t
k1
> 0 for t (0, 1], where f
(1).
(5) f
(t) =
k=2
k(k 1)p
k
t
k2
> 0 for t (0, 1).
Main Theorem for Branching Processes.
Theorem 1. Assume the ospring distribution {p
k
} and the pgf f(t)
satisfy properties (1)(5). In addition, assume X
0
= 1. If m 1, then
lim
n
Prob{X
n
= 0} = lim
n
p
0
(n) = 1
and if m > 1, then there exists q < 1 such that f(q) = q and
lim
n
Prob{X
n
= 0} = lim
n
p
0
(n) = q.
If m 1, then Theorem 1 states that the probability of ultimate extinction
is one. If m > 1, then there is a positive probability 1 q that the bp does
not become extinct (e.g., a family name does not become extinct, a mutant gene
becomes established, a population does not die out).
Indication of Proof
(1) The sequence {p
0
(n)}
n
is monotone increasing and bounded above:
lim
n
p
0
(n) = q.
(2) The limit is a xed point of f:
q = lim
t
p
0
(n) = lim
n
f(p
0
(n 1)) = f(q)
(3) m 1 i f
(t) < 1 for [0, 1).

0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
y=t
t
f
(
t
)
Figure 3: Two dierent probability generating functions y = f(t)
intersect y = t in either one or two points on [0, 1].
All States are Transient in Galton-Watson BP,
Except State Zero
Note that the zero state is absorbing. In terms of MC theory, the
one-step transition probability
p
00
(n) = 1.
Theorem 2. Assume the ospring distribution {p
k
} and the p.g.f. f(t)
0
= 1. Then the
states 1, 2, . . . , are transient. In addition, if the mean m > 1, then
lim
n
Prob{X
n
= 0} = q = 1 Prob{ lim
n
X
n
= },
where 0 < q < 1 is the unique xed point of the pgf, f(q) = q.
A Corollary to the BP Theorem when X
0
= N.
Corollary 1. Assume the ospring distribution {p
k
} and the p.g.f. f(t)
0
= N. If m 1,
then
lim
n
Prob{X
n
= 0} = lim
n
[p
0
(n)]
N
= 1.
If m > 1, then
lim
n
Prob{X
n
= 0} = lim
n
[p
0
(n)]
N
= q
N
< 1.
Applications of Discrete-Time Branching
Processes.
Single Type Process: {X
n
}
(1) Family Names
(2) Cell Cycle
(3) Network Theory
(1) An Example of a BP due to Lotka
Example 2. Lotka assumed the number of sons a male has in his lifetime has the
following geometric probability distribution:
p
0
= 1/2 and p
k
=
3
5
k1
1
5
for k = 1, 2, . . . .
Note that
P
k=1
p
k
= 1/2 and
f(t) =
1
2
+
1
5
X
k=1
3
5
k1
t
k
=
1
2
+
1
5
t
1 3t/5
.
m = f
(1) =
1/5
(1 3/5)
2
=
5
4
> 1.
The xed points of f(t) are found by solving
1
2
+
t
5 3t
= t or 6t
2
11t + 5 = 0
so that xed point is q = 5/6. A male has a probability of 5/6 that his line of
descent becomes extinct and a probability of 1/6 that his descendants will continue
forever.
(2) An Application to the Cell Cycle
Each cell after completing its life cycle, doubles in size, then divides
into two progeny cells of equal sizes (Kimmel and Axelrod,2002). After
cell division, some cells die, some remain inactive or quiesce and some
keep dividing or proliferating. After cell division:
(1) Cell proliferation, probability p
2
(2) Cell death, probability p
0
(3) Cell quiescence, probability p
1
, p
0
+ p
1
+ p
2
= 1.
Proliferating
Proliferating Dead Quiescent Proliferating Dead Quiescent
D
The Cell Cycle is a Galton-Watson Process
Let X
n
be the number of proliferating cells at time n. The pgf is
f(t) = (p
0
+ p
1
)
2
+ 2p
2
(p
0
+ p
1
)t + p
2
2
t
2
= (p
2
t + p
0
+ p
1
)
2
The mean of the proliferating cells is
m = f
(1) = 2p
2
.
Reference: Kimmel and Axelrod (2002)
(3) An Application of BP to Network Theory
In disease networks, individuals are referred to as vertices or nodes
in a network and their connectedness (by an edge) to other individuals
in the network is described by a degree distribution.
Let p
k
= probability that a node or vertex in the network is
connected by an edge to k other vertices. Then {p
k
}
k=0
is known as
the degree distribution for the network. In disease transmission, it is
important to determine the distribution of degrees for nodes reached
from a random node. For a randomly chosen node, the probability of
reaching a node of degree k is proportional to kp
k
(because there are
k ways to reach this node). But for calculating the spread of disease,
we do not count the edge on which the disease entered, hence the
probability associated with spread has degree k 1, called the excess
degree, q
k1
kp
k
. Thus,
q
k1
=
kp
k
k=1
kp
k
, k = 1, 2, . . . .
References: Newman (2002), Brauer (2008)
PGF for the Degree Distribution and Excess
Degree Distributions
f
0
(t) =
k=0
p
k
t
k
and f
1
(t) =
k=0
q
k
t
k
=
k=1
kp
k
t
k1
m
0
,
where m
0
=

k=0
kp
k
is the mean of the degree distribution. Not
every connection in the network leads to disease transmission. Thus,
we dene the mean transmissiblity of the disease as T, 0 < T < 1,
the probability that the disease is transmitted along an edge. The
binomial distribution can be applied to a node of degree k to determine
the probability of m k transmissions:
k
m
T
m
(1 T)
km
.
The PGF as a Function of Transmissibility
The probability r
m
that there are m transmissions is
r
m
=
X
k=m
p
k
k
m
T
m
(1 T)
km
.
Thus, the pgf associated with {r
m
}
m=0
is
f
0
(t, T) =
X
m=0
r
m
t
m
=
X
m=0
"

X
k=m
p
k
k
m
T
m
(1 T)
km
#
t
m
=
X
k=0
p
k
k
X
m=0
k
m
(tT)
m
(1 T)
km
=
X
k=0
p
k
(1 T + tT)
k
= f
0
(1 T + tT).
Also, f
1
(t, T) = f
1
(1 T + tT).
The Basic Reproduction Number is Dened for a
Network
The mean of the excess degree distribution is dened as the basic
reproduction number of the disease, R
0
.
R
0
=
df
1
(1 T + tT)
dt
t=1
= Tf
1
(1).
It follows from the identities:
f
0
(t) =
X
k=0
p
k
t
k
and f
1
(t) =
X
k=0
q
k
t
k
=
X
k=1
kp
k
t
k1
m
0
,
f
1
(t) =
f
0
(t)
m
0
. (2)
Thus,
R
0
= Tf
1
(1) = T
f
0
(1)
m
0
. (3)
An Example where the Network has a Poisson
Degree Distribution
Example 3. Assume that the degree distribution has a Poisson distribution with
parameter . This type of network is known as a Poisson random graph. The
generating function for a Poisson distribution is f
0
(t) = e
(t1)
, where m
0
= .
Applying the identities (2) and (3): f
1
(t) = f
0
(t) and
R
0
= Tm
0
.
Applying Theorem 1, the probability the disease dies out (q) is the xed point of
f
1
(t, T) = f
0
(t, T) = e
([1T+tT]1)
= e
R
0
(t1)
. That is,
q = e
R
0
(q1)
.
For example, if R
0
= 2 and initially one infectious individual is introduced into the
population, then the probability the disease dies out is q 0.203. The probability
the disease becomes endemic is 1 q 0.797.
An Example where the Graph is Complete.
Example 4. Suppose the disease network is a complete graph with
N 2 nodes, i.e., each node has exactly N 1 edges or connections
for a total of N(N 1)/2 edges. The degree distribution is p
N1
= 1
and p
k
= 0 for k = N 1. The generating functions f
0
(t) = t
N1
and f
1
(t) = t
N2
. Thus, the basic reproduction number is
R
0
= T(N 2).
An Example Comparing 2 Connections to N 1
Connections.
Example 5. Suppose in the disease network everyone is connected to only 2 individuals:
p
2
= 1, f
0
(t) = t
2
, f
1
(t) = t, R
0
= T.
Suppose there is one individual with N 1 connections ( small world network):
p
2
=
N 1
N
, p
N1
=
1
N
, f
0
(t) =
N 1
N
t
2
+
1
N
t
N1
, f
1
(t) =
2
3
t +
1
3
t
N2
and R
0
= T
2
3
+
N 2
3
.
Comparing the R
0
to the complete graph:
2 connections < small world < complete graph
T < T
2
3
+
N 2
3
< T(N 2).

B. Multitype Galton Watson Branching
Processes
In a multitype bp, each individual may give birth to dierent
types or classications of individuals in the population - k types.
When k = 1, the bp is a single type bp. There is an ospring
distribution corresponding to each of these dierent types of individuals.
For example, the population may be divided according to age or size
and in each generation, individuals may age or grow to another
age or size class. In addition, in each generation, individuals give birth
to new individuals in the youngest age or smallest size class.
Denote the multitype bp as {X(n)}
n=0
, a vector of random
variables,
X(n) = (X
1
(n), X
2
(n), . . . , X
k
(n))
T
.
with k dierent types of individuals. Each random variable X
i
(n)
has k associated random variables, {Y
ji
}
k
j=1
, where Y
ji
is the random
variable for the ospring distribution for an individual of type i to give
birth to an individual of type j = 1, 2, . . . , k.
We Extend the PGF to Multitype BP.
Let p
i
(s
1
, s
2
, . . . , s
k
) denote the probability an individual of type i
gives birth to s
1
individuals of type 1, s
2
individuals of type 2, . . ., and
s
k
individuals of type k; that is,
p
i
(s
1
, s
2
, . . . , s
k
) = Prob{Y
1i
= s
1
, Y
2i
= s
2
, . . . , Y
ki
= s
k
}.
Dene the pgf for X
i
, f
i
: [0, 1]
k
[0, 1] as follows:
f
i
(t
1
, t
2
, . . . , t
k
) =
s
k
=0

s
2
=0
s
1
=0
p
i
(s
1
, s
2
, . . . , s
k
)t
s
1
1
t
s
2
2
t
s
k
k
,
for i = 1, 2, . . . , k.
The PGF for Multitype BP when X(0) =
i
.
Let
i
denote a k-vector with the ith component one and the
remaining components zero,
i
= (
1i
,
2i
, . . . ,
ki
)
T
,
where
ij
is the Kronecker delta symbol. Then X(0) =
i
means
there is initially one individual of type i in the population. The pgf
for X
i
(0) given X(0) =
i
is f
0
i
(t
1
, t
2
, . . . , t
k
) = t
i
and the pgf for
X
i
(n) given X(0) =
i
is f
n
i
(t
1
, t
2
, . . . , t
k
):
X
s
k
=0

X
s
1
=0
Prob{X
1
(n) = s
1
, . . . , X
k
(n) = s
k
|X(0) =
i
}t
s
1
1
t
s
k
k
,
f
1
i
= f
i
. Let F F(t
1
, . . . t
k
) = (f
1
, . . . , f
k
) denote the vector of pgf
F : [0, 1]
k
[0, 1]
k
. The function F has a xed point at (1, 1, . . . , 1)
since f
i
(1, 1, . . . , 1) = 1. Ultimate extinction of the population depends
on whether F has another xed point in [0, 1]
k
which depends on the
mean.
We Compute the Mean for Multitype BP in
Terms of the PGF.
Let m
ji
denote the expected number of births of a type j
individual by a type i individual; that is,
m
ji
= E(X
j
(1)|X(0) =
i
) for i, j = 1, 2, . . . , k.
The means can be dened in terms of the pgf:
m
ji
=
f
i
(t
1
, . . . , t
k
)
t
j
t
1
=1,...,t
k
=1
.
Dene the k k expectation matrix,
M =
0
B
B
B
@
m
11
m
12
m
1k
m
21
m
22
m
2k
.
.
.
.
.
.
.
.
.
m
k1
m
k2
m
kk
1
C
C
C
A
.
If matrix M is regular (i.e., M
p
> 0, for some p > 0), then M has
a simple eigenvalue of maximum modulus which we denote as .
Multitype Branching Process Theorem
Theorem 3. Assume each of the components functions f
i
of the
pgf F(t
1
, . . . , t
k
) = (f
1
(t
1
, . . . , t
k
), . . . , f
k
(t
1
, . . . , t
k
)) are nonlinear
functions of the variables t
1
, . . . , t
k
and the expectation matrix M is
regular. If the dominant eigenvalue of M satises 1, then
lim
n
Prob{X(n) = 0|X(0) =
i
} = 1,
i = 1, 2, . . . , k. If the dominant eigenvalue of M satises > 1, then
there exists a vector q = (q
1
, q
2
, . . . , q
k
)
T
, q
i
[0, 1), i = 1, 2, . . . , k,
the unique nonnegative solution to F(t
1
, t
2
, . . . , t
k
) = (t
1
, t
2
, . . . , t
k
),
such that
lim
n
Prob{X(n) = 0|X(0) =
i
} = q
i
,
i = 1, 2, . . . , k.
Corollary to the Multitype Branching Process
Theorem when X(0) = (r
1
, . . . , r
k
)
T
.
Corollary 2. Suppose the hypotheses of Theorem 3 hold and X(0) =
(r
1
, . . . , r
k
)
T
. Then if the dominant eigenvalue of matrix M satises
> 1,
lim
n
Prob{X(n) = 0|X(0) = (r
1
, r
2
. . . , r
k
)
T
} = q
r
1
1
q
r
2
2
q
r
k
k
.
(1) Application of Multitype BP to
Age-Structured Populations.
Suppose there are k age classes. An individual of type i either
survives to become a type i + 1 individual with probability p
i+1,i
> 0
or dies with probability 1 p
i+1,i
, i = 1, 2, . . . , k 1. Probability
p
k+1,k
= 0. An individual of type i gives birth to r individuals of type
1 with probability b
i,r
. The ospring distribution for an individual of
type i is
b
i,r
0, and
r=0
b
i,r
= 1, i = 1, 2, . . . , k.
The mean of the ospring distribution is
b
i
=
r=1
rb
i,r
.
The Expectation Matrix has the form of a Leslie
Matrix Model.
The expectation matrix can be calculated from the pgfs f
i
:
f
i
(t
1
, t
2
, . . . , t
k
) = [p
i+1,i
t
i+1
+ (1 p
i+1,i
)]
r=0
b
i,r
t
r
1
, i = 1, . . . , k.
e.g., f
1
= [p
21
t
2
+ (1 p
21
)]
b
1,r
t
r
1
,
m
11
=
f
1
t
1
t
i
=1
=
X
rb
1,r
= b
1
, m
21
=
f
1
t
2
t
i
=1
= p
21
M =
0
B
B
B
B
B
@
b
1
b
2
b
k1
b
k
p
21
0 0 0
0 p
32
0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 p
k,k1
0
1
C
C
C
C
C
A
.
The form of matrix M is known as a Leslie matrix. Assume matrix
M is regular and that the pgfs are nonlinear. Then Theorem 3 can be
applied.
An Example of a Stochastic Age-Structured
Branching Process.
Example 6. Suppose there are two age classes with expectation matrix
M =
b
1
b
2
p
21
0
3/4 1
1/2 0
.
The characteristic equation of M is
2
(3/4) 1/2 = 0 so that
the dominant eigenvalue is = (3 +
41)/8 1.175 > 1. Suppose

the birth probabilities are
b
1,r
=
1/2, r = 0
1/4, r = 1, 2
0, r = 0, 1, 2
, b
2,r
=
1/4, r = 0, 2
1/2, r = 1
0, r = 0, 1, 2
.
The mean number of births for each age class is
b
1
= 3/4 =
r=1
rb
1,r
and b
2
= 1 =
r=1
rb
2,r
(the values in the rst row of M).
To Find the Probability of Extinction, we Find
the Fixed Points of the PGF
The pgfs for the two age classes are
f
1
(t
1
, t
2
) = [(1/2)t
2
+ 1/2][1/2 + (1/4)t
1
+ (1/4)t
2
1
]
f
2
(t
1
, t
2
) = 1/4 + (1/2)t
1
+ (1/4)t
2
1
.
Since > 1, the preceding system F = (f
1
, f
2
) has a unique xed
point on [0, 1) [0, 1). The xed point (q
1
, q
2
) is found by solving
f
1
(q
1
, q
2
) = q
1
and f
2
(q
1
, q
2
) = q
2
. The solution is
(q
1
, q
2
) (0.6285, 0.6631).
Thus, if there are initially ve individuals of age 1 and three
individuals of age 2, then the probability of ultimate extinction of the
total population is approximately
(0.6285)
5
(0.6631)
3
0.0286.
There are Many Applications of Branching
Processes in Biology.
Several good references devoted to Branching Processes, in addition
to [1], Chapter 4:
1. Harris, TE. 1963. The Theory of Branching Processes. Prentice
Hall, NJ.
2. Jagers, P. 1975. Branching Processes with Biological Applications.
Wiley, Chichester.
3. Kimmel, M and D Axelrod. 2002. Branching Processes in Biology.
Springer Verlag, NY.
4. Mode, CJ. 1971. Multitype Branching Processes Theory and
Applications. Elsevier, NY.
This Concludes Part II on Branching Processes.
Theory
Theory
Theory
Applications to Populations and Epidemics1
Part III
Continuous-Time Markov Chains
Linda J. S. Allen
August 2008
Acknowledgement
COURSE OUTLINE
Theory
Theory
Populations
Theory
Basic Reference for Part III of this Course
[1 ] Allen, LJS. 2003. An Introduction to Stochastic Processes with
Applications to Biology. Prentice Hall, Upper Saddle River, NJ.
Chapters 5, 6, 7
[2 ] Other references will be noted.
Part III:
Continuous-Time Markov Chains - CTMC
Some Basic Denitions and Notation
Denition 1. Let {X(t)}, t [0, ), be a collection of discrete
random variables with values in {0, 1, 2, . . .}. Then the stochastic
process {X(t)} is called a continuous time Markov chain if it satises
the following condition:
For any sequence of real numbers 0 t
0
< t
1
< < t
n
< t
n+1
,
Prob{X(t
n+1
) = i
n+1
|X(t
0
) = i
0
, X(t
1
) = i
1
, . . . , X(t
n
) = i
n
}
= Prob{X(t
n+1
) = i
n+1
|X(t
n
) = i
n
}.
Probability distribution {p
i
(t)}
i=0
associated with X(t) is
p
i
(t) = Prob{X(t) = i}
with probability vector p(t) = (p
0
(t), p
1
(t), . . .)
T
.
The Transition Matrix for the CTMC has
Properties similar to DTMC.
Transition probabilities:
p
ji
(t, s) = Prob{X(t) = j|X(s) = i}, s < t
for i, j = 0, 1, 2, . . . . If the transition probabilities only depend on the length of the
time step t s, they are called stationary or homogeneous transition probabilities;
Otherwise they are called nonstationary or nonhomogeneous. We shall assume the
transition probabilities are stationary, p
ji
(t s), t > s.
Generally, the Transition matrix is a stochastic matrix,
X
j=0
p
ji
(t) = 1
unless the process is explosive (blow-up in nite time). If the process is nonexplosive,
then P(t) is stochastic for all time P(t) = (p
ji
(t)) , t > 0 and satises
P(s)P(t) = P(s + t)
for all s, t [0, ).
Waiting Times Between Jumps
The distinction between discrete versus continuous time Markov chains
is that in DTMC there is a jump to a new state at times 1, 2, . . . ,
but in CTMC the jump to a new state may occur at any time t 0.
The collection of random variables {W
i
} denote the jump times or
waiting times and the times T
i
= W
i+1
W
i
are referred to as the
interevent times.
T0 T1 T2 T3
0 W1 W2 W3 W4
Figure 1: One sample path of a CTMC, illustrating waiting times and
interevent times. The process is continuous from the right.
An Example of an Explosive Process
If the waiting times approach a positive constant, W = sup{W
i
},
while the values of the states approach innity,
lim
i
X(W
i
) = ,
then the process is explosive. We will assume the process is
nonexplosive, unless noted otherwise. Sample paths are continuous
from the right, but for ease in sketching sample paths, they are often
drawn as connected rectilinear curves.
0 W1 W2 W3 W4 W
Figure 2: One sample path of a CTMC that is explosive.
The Poisson Process
The Poisson process {X(t)}, t [0, ) is a CTMC with the
following properties:
(1) X(0) = 0.
(2) p
i+1,i
(t) = Prob{X(t + t) = i + 1|X(t) = i} = t + o(t)
p
ii
(t) = Prob{X(t + t) = i|X(t) = i} = 1 t + o(t)
p
ji
(t) = Prob{X(t + t) = j|X(t) = i} = o(t), j i + 2
p
ji
(t) = 0, j < i.
known as innitesimal transition probabilities,
lim
t
p
i+1,i
(t) t
t
= 0.
The transition probabilities are independent of i and j and depend
only on the length of time t.
The Transition Matrix for the Poisson Process.
P(t) =
0
B
B
B
B
B
@
p
00
(t) p
01
(t) p
02
(t)
p
10
(t) p
11
(t) p
12
(t)
p
20
(t) p
21
(t) p
22
(t)
p
30
(t) p
31
(t) p
32
(t)
.
.
.
.
.
.
.
.
.
1
C
C
C
C
C
A
=
0
B
B
B
B
B
@
1 t 0 0
t 1 t 0
0 t 1 t
0 0 t
.
.
.
.
.
.
.
.
.
1
C
C
C
C
C
A
+ o(t).
Note column sums of the matrix are one.
Assumptions (1) and (2) are Used to Derive a
System of Dierential Equations for the Poisson
Process.
Because X(0) = 0, it follows that p
i0
(t) = p
i
(t).
p
0
(t + t) = p
0
(t) [1 t + o(t)] .
Subtracting p
0
(t), dividing by t, and letting t 0,
dp
0
(t)
dt
= p
0
(t), p
0
(0) = 1.
The solution is
p
0
(t) = e
t
.
The Poisson Probabilities are Derived.
Similarly,
p
i
(t + t) = p
i
(t)[1 t + o(t)] + p
i1
(t)[t + o(t)] + o(t),
leads to
dp
i
(t)
dt
= p
i
(t) + p
i1
(t), p
i
(0) = 0, i 1,
a system of dierential-dierence equations. The system can be solved
sequentially beginning with p
0
(t) = e
t
to show
p
1
(t) = te
t
, p
2
(t) = (t)
2
e
t
2!
,
and in general, a Poisson probability distribution with parameter t
p
i
(t) = (t)
i
e
t
i!
, i = 0, 1, 2, . . . .
with mean and variance
m(t) = t =
2
(t).
The Interevent time is Exponentially Distributed.
Let W
1
be the random variable for the time until the process reaches state 1, the
holding time until the rst jump. Then
Prob{W
1
> t} = p
0
(t) = e
t
or Prob{W
1
t} = 1 e
t
;
W
1
is an exponential random variable with parameter . In general, it can be shown
that the interevent time has an exponential distribution. We will show that this is
true in general for Markov processes.
0 2 4 6 8 10 12 14 16
0
2
4
6
8
10
12
14
t
X
(
t
)
Poisson Process, =1
Figure 3: Sample path for a Poisson process with = 1.
Derivation of the Dierential Equations by
Applying the Transition Matrix Leads to a New
Matrix Known as the Generator Matrix.
Writing the probabilities in terms of the transition matrix P(t):
p(t + t) = P(t)p(t)
lim
t0
p(t + t) p(t)
t
= lim
t0
[P(t) I]
t
p(t),
where I is the identity matrix. Thus,
dp
dt
= Qp(t).
where matrix Q is known as the innitesimal generator matrix
Q = lim
t0
[P(t) I]
t
.
The Generator Matrix has some Nice Properties.
The generator matrix
Q =
q
00
q
01
q
02

q
10
q
11
q
12

q
20
q
21
q
22

.
.
.
.
.
.
.
.
.
i=1
q
i0
q
01
q
02

q
10

i=0,i=1
q
i1
q
12

q
20
q
21

i=0,i=2
q
i2

.
.
.
.
.
.
.
.
.
.
(1)Column sum is zero.
(2) The diagonal elements are negative and o-diagonal elements are
nonnegative.
The Generator Matrix for the Poisson Process
The generator matrix for the Poisson process is
Q =
0 0
0
0
0 0
.
.
.
.
.
.
.
.
.
.
The probability distribution p(t) = (p
0
(t), p
1
(t), . . . , p
i
(t), . . .)
T
for
the Poisson process is a solution of
dp(t)
dt
= Qp(t), p
i
(0) =
i0
Embedded Markov Chain
The waiting times are denoted W
i
, i = 0, 1, 2, . . . , and the interevent times
as T
i
= W
i+1
W
i
, i = 0, 1, 2, . . . .
Denition 2. Let Y
n
denote the random variable for the state of a continuous time
Markov chain {X(t)}, t [0, ) at the nth jump,
Y
n
= X(W
n
), n = 0, 1, 2, . . . .
The set of random variables {Y
n
}
n=0
is known as the embedded Markov chain of
the CTMC {X(t)}, t 0.
The embedded Markov chain is a DTMC. It is useful for classifying states
(transient, recurrent, etc.) in the associated CTMC.
Example 1. Consider the Poisson process, where X(0) = X(W
0
) = 0 and
X(W
n
) = n for n = 1, 2, . . . . The embedded Markov chain {Y
n
} satises
Y
n
= n, n = 0, 1, 2, . . . . The transition from state n to n + 1 occurs with
probability 1. The transition matrix for the embedded Markov chain {Y
n
} is
T =
0
B
B
B
B
B
@
0 0 0
1 0 0
0 1 0
0 0 1
.
.
.
.
.
.
.
.
.
1
C
C
C
C
C
A
.
The Transition Matrix of the Embedded MC
Denition 3. The transition matrix T of the embedded Markov chain {Y
n
}
n=0
associated with the generator matrix Q = (q
ij
), where q
ii
= 0, i = 0, 1, 2, . . ., is
T = (t
ij
) or
T =
0
B
B
B
B
B
B
B
@
0
q
01
q
11
q
02
q
22

q
10
q
00
0
q
12
q
22

q
20
q
00
q
21
q
11
0
.
.
.
.
.
.
.
.
.
1
C
C
C
C
C
C
C
A
.
If any q
ii
= 0, the (i, i) element of T is one and the remaining elements in that
column are zero.
Matrix T is a stochastic matrix; the column sums equal one. The transition
probabilities are homogeneous (i.e., independent of n). In addition, T
n
= (t
(n)
ji
),
where t
(n)
ji
= Prob{Y
n
= j|Y
0
= i}.
The Transition Matrix of the Embedded Markov
Chain Computed from the Generator Matrix.
Example 2. Suppose a nite CTMC has a generator matrix given by
Q =
0
B
B
@
1 0 0 1
1 1 0 0
0 1 1 0
0 0 1 1
1
C
C
A
. (1)
The transition matrix of the corresponding embedded Markov chain satises
T =
0
B
B
@
0 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
1
C
C
A
. (2)
From the embedded Markov chain, we can see that the states communicate in the
following manner: 1 2 3 4 1; it is periodic but the CTMC is not
periodic.
Classication Schemes for CTMC
The classication schemes for states in CTMC are the same as for DTMC. The
transition probabilities P(t) = (p
ji
(t)) and the transition matrix for the embedded
Markov chain T = (t
ji
) are used to dene these classication schemes.
Denition 4. State j can be reached from state i, i j, if p
ji
(t) > 0 for some
t 0. State i communicates with state j, i j, if i j and j i. The set
of states that communicate is called a communication class. If every state can be
reached from every other state, the Markov chain is irreducible; otherwise, it is said
to be reducible. A set of states C is closed if it is impossible to reach any state
outside of C from a state inside C, p
ji
(t) = 0 for t 0 if i C and j / C.
If p
ji
(t) =
ji
+ q
ji
t + o(t), then p
ji
(t) > 0 i q
ji
> 0 for j = i
and t suciently small.
Therefore, i j in the CTMC i i j in the embedded Markov chain.
The generator matrix Qin the CTMC is irreducible (reducible) i the transition
matrix T in the embedded Markov chain is irreducible (reducible).
Denitions for Recurrent and Transient States
Let T
ii
be the rst time the chain is in state i after leaving state i,
T
ii
= inf{t > W
1
, X(t) = i|X(0) = i}.
The random variable T
ii
is known as the rst return time. The rst return can
occur at any time t > 0; T
ii
is a continuous random variable.
Denition 5. State i is recurrent (transient) in a CTMC {X(t)}, t 0, if the
rst return time is nite (innite),
Prob{T
ii
< |X(0) = i} = 1 (< 1). (3)
Recall in the DTMC, state i is said to be recurrent (transient) in a DTMC
{Y
n
}, with Y
0
= i, if
X
n=0
f
(n)
ii
= 1 (< 1),
where f
(n)
ii
is the probability that the rst return to state i is at step n.
Theorem 1. State i in a CTMC {X(t)}, t 0, is recurrent (transient) i state i
in the corresponding embedded Markov chain {Y
n
}, n = 0, 1, 2, . . . , is recurrent
(transient).
Recurrent and Transient CTMC
Recurrence or transience in a CTMC can be determined from the properties of
DTMC, the embedded Markov chain and the properties of its transition matrix T.
Corollary 1. A state i in a CTMC {X(t)}, t 0, is recurrent (transient) i
X
n=0
t
(n)
ii
= (< ),
where t
(n)
ii
is the (i, i) element in the transition matrix of T
n
of the embedded
Markov chain {Y
n
}.
Corollary 2. In a nite CTMC, all states cannot be transient and in addition, if the
nite CTMC is irreducible, the chain is recurrent.
Example 3. Note that the transition matrix of the embedded Markov chain for the
Poisson process satises lim
n
T
n
= 0 (lower triangular). For suciently large n
and all i, t
(n)
ii
= 0, which implies
P
n=0
t
(n)
ii
< , i = 1, 2, . . .. Therefore, every
state is transient in the Poisson process. This is an obvious result since each state
X(W
i
) = i can only advance to state i + 1, X(W
i+1
) = i + 1; a return to state
i is impossible.
Null Recurrence and Positive Recurrence
Unfortunately, the concepts of null recurrence and positive
recurrence for a CTMC cannot be dened in terms of the embedded
Markov chain. Positive recurrence depends on the waiting times {W
i
}
so that the embedded Markov chain alone is not sucient to dene
positive recurrence.
Denition 6. State i is positive recurrent (null recurrent) in the CTMC
{X(t)}, t 0, if the mean recurrence time is nite (innite); that is,
ii
= E(T
ii
|X(0) = i) < (= ).
There is no concept of aperiodic and periodic in CTMC because
the interevent time is random. Therefore, the basic limit theorem for
CTMC is simpler.
The Basic Limit Theorem for CTMC
Theorem 2. [Basic limit theorem for CTMC] If the generator matrix Q
of a continuous time, nonexplosive Markov chain {X(t)}, t 0, is
irreducible and positive recurrent, then
lim
t
p
ij
(t) =
1
q
ii
ii
, (4)
where
ii
is the mean recurrence time in the CTMC {X(t)}. In
particular, if the state space is nite and Q is irreducible, then the
process is nonexplosive, the limit (4) exists and is positive.
Corollary 3. A nite CTMC with irreducible generator matrix Q is
positive recurrent.
Proofs can be found in Norris (1999).
The result (4) diers from DTMC due to the term q
ii
in the limit.
A Positive Recurrent CTMC
Example 4. Consider Example 2.
Q =
1 0 0 1
1 1 0 0
0 1 1 0
0 0 1 1
and T =
0 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
.
The matrices are irreducible. All states are positive recurrent. The
embedded Markov chain is periodic with period 4. but the CTMC is
not periodic; periodicity is not dened for CTMC.
Forward Kolmogorov Dierential Equations.
dp
ji
(t)
dt
=
k=0
q
jk
p
ki
(t), i, j = 0, 1, . . . .
Denition 7. The forward Kolmogorov dierential equations expressed
in matrix form are
dP(t)
dt
= QP(t),
where P(t) = (p
ji
(t)) is the matrix of transition probabilities and
Q = (q
ji
) is the generator matrix.
In the case that the initial distribution of the process satises
X(0) = k (p
i
(0) =
ik
), then the transition probability p
ik
(t) is the
same as the state probability p
i
(t) = Prob{X(t) = i|X(0) = k}. In
this case,
dp(t)
dt
= Qp(t),
where p(t) = (p
0
(t), p
1
(t), . . .)
T
.
Backward Kolmogorov Dierential Equations.
dp
ji
(t)
dt
=
k=0
p
jk
(t)q
ki
, i, j = 0, 1, . . . .
Denition 8. The backward Kolmogorov dierential equations
expressed in matrix form are
dP(t)
dt
= P(t)Q,
where P(t) = (p
ji
(t)) is the matrix of transition probabilities and
Q = (q
ji
) is the generator matrix.
These dierential equations depend on the existence of the generator matrix
Q. For nite Markov chains, Q always exists.
The solution P(t) can be found via the forward or backward equations.
In birth and death chains and other applications, the transition matrix P(t) is
dened such that the forward and backward Kolmogorov dierential equations can
be derived.
Denition 9. Let {X(t)}, t 0, be a CTMC with generator matrix
Q. Suppose = (
0
,
1
, . . .)
T
is nonnegative and satises
Q = 0, and
i=0
i
= 1.
Then is called a stationary probability distribution.
A stationary probability distribution can be dened in terms of
the transition matrix P(t) as well. A constant solution is called a
stationary probability distribution if
P(t) = , for t 0,
i=0
i
= 1, and
i
0
for i = 0, 1, 2 . . . . This latter denition can be applied if the transition
matrix P(t) is known and the process is nonexplosive.
Finite Markov Chains
Assume the state space of a nite Markov chain is {0, 1, 2, . . . , N}
and the innitesimal transition probabilities satisfy
p
ji
(t) =
ji
+ q
ji
t + o(t).
The forward and backward Kolmogorov dierential equations are
dP/dt = QP and dP/dt = PQ, with P(0) = I, respectively. The
unique solution is
P(t) = e
Qt
P(0) = e
Qt
,
where e
Qt
is the matrix exponential,
e
Qt
= I + Qt + Q
2
t
2
2!
+ Q
3
t
3
3!
+ =
k=0
Q
k
t
k
k!
.
Methods for calculating e
Qt
can sometimes be used, depending on
the size N, whether Q is diagonalizable, etc. For small N, computer
algebra systems can be applied and for larger N, numerical methods
(provided Q has numerical values), etc.
Two Methods to Calculate e
Qt
.
(1) Suppose Q is an n n diagonalizable matrix with eigenvalues
i
, i =
1, 2, . . . , n. Then Q
k
= H
k
H
1
, where = diag(
1
,
2
, . . . ,
n
) and the
columns of H are the right eigenvectors of Q:
P(t) = e
Qt
= H
X
k=0
k
t
k
k!
H
1
= Hdiag(e
1
t
, e
2
t
, . . . , e
n
t
)H
1
.
(2) Suppose Q is an n n matrix with characteristic equation,
det(I Q) =
n
+ a
n1
n1
+ + a
0
= 0,
which is also a characteristic equation of the dierential equation
x
(n)
(t) + a
n1
x
(n1)
(t) + a
0
x(t) = 0.
To nd e
Qt
, nd n linearly independent solutions, x
1
(t), x
2
(t), . . . , x
n
(t), with
initial conditions
x
1
(0) = 1
x
1
(0) = 0
.
.
.
x
(n1)
1
(0) = 0
9
>
>
>
=
>
>
>
;
,
x
2
(0) = 0
x
2
(0) = 1
.
.
.
x
(n1)
2
(0) = 0
9
>
>
>
=
>
>
>
;
, ,
x
n
(0) = 0
x
n
(0) = 0
.
.
.
x
(n1)
n
(0) = 1
9
>
>
>
=
>
>
>
;
.
P(t) = e
Qt
= x
1
(t)I + x
2
(t)Q + + x
n
(t)Q
n1
.
Reference for Method (2): Leonard (1996).
An Example of a Finite CTMC.
Example 5. Suppose the generator matrix of a nite CTMC with two states is
Q =
a b
a b
,
where a > 0 and b > 0.
Method (1): It is easy to compute Q
n
= [(a + b)]
n1
Q. Then
P(t) = e
Qt
= I
Q
a + b
X
n=1
[(a + b)t]
n
n!
= I
Q
a + b
[e
(a+b)t
1].
Method (2): The eigenvalues of Q are
1,2
= 0, (a + b). The general
solution to this second-order dierential equation x
(t) + (a + b)x
(t) = 0 is
x(t) = c
1
+ c
2
e
(a+b)t
. Applying the initial conditions, x
1
(t) = 1 and x
2
(t) =
(1 e
(a+b)t
)/(a + b), respectively.
P(t) = e
Qt
= x
1
(t)I + x
2
(t)Q =
1
a + b
b + ae
(a+b)t
b be
(a+b)t
a ae
(a+b)t
a + be
(a+b)t
!
.
A Unique Stationary Distribution Exists to this
Example.
lim
t
P(t) =
0
B
@
b
a + b
b
a + b
a
a + b
a
a + b
1
C
A
Transition matrix of the embedded Markov chain:
T =
0 1
1 0
.
Matrices T and Q are irreducible. Therefore, according to Theorem 2, the limit
satises b/(a + b) = 1/(q
11
11
) and a/(a + b) = 1/(q
22
22
). The mean
recurrence time is
ii
=
a + b
ab
, i = 1, 2.
It is interesting to note that the mean recurrence time for the corresponding
embedded Markov chain is
ii
= 2, i = 1, 2.
Another Example of a Finite CTMC
Example 6. In Example 2 the generator matrix Q and transition matrix T of the
embedded Markov chain are
Q =
0
B
B
@
1 0 0 1
1 1 0 0
0 1 1 0
0 0 1 1
1
C
C
A
and T =
0
B
B
@
0 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
1
C
C
A
.
Matrix
P(t) = e
Qt
=
1
4
E +
1
2
e
t
0
B
B
@
cos(t) sin(t) cos(t) sin(t)
sin(t) cos(t) sin(t) cos(t)
cos(t) sin(t) cos(t) sin(t)
sin(t) cos(t) sin(t) cos(t)
1
C
C
A
+
1
4
e
2t
0
B
B
@
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1
C
C
A
,
where E is a 4 4 matrix of ones. Thus,
lim
t
P(t) =
1
4
E.
Matrices T and Q are irreducible. Thus, according to Theorem 2, the mean
recurrence time of the CTMC satises
ii
= 4, i = 1, 2, 3, 4. The stationary
distribution and mean recurrence time agree with the embedded Markov chain.
But in the embedded Markov chain this stationary distribution is NOT the limiting
distribution.
Basic Limit Theorem for Finite CTMC
Theorem 3. Suppose the generator matrix Q of a nite CTMC {X(t)},
t 0, with state space {0, 1, 2, . . . , N} is irreducible. Then the limit
lim
t
p
ij
(t) =
1
q
ii
ii
,
is a stationary probability distribution = (
0
,
1
, . . . ,
N
)
T
, where
i
=
1
q
ii
ii
.
We Show for CTMC the Interevent Time has an
Exponential Distribution.
The exponential waiting time between events characterizes CTMC
because the exponential distribution has a memoryless property. The
interevent time is T
i
= W
i+1
W
i
, where W
i
is the time of the ith
jump. The event may be a birth, death, immigration, or any other
event that changes the value of the state variable. The interevent time
T
i
[0, ) is a continuous random variable.
T0 T1 T2 T3
X(0)=2
0 W1 W2 W3 W4
Figure 4: A sample path or single realization X(t) of a CTMC, t [0, )
illustrating the jump times {W
i
} and the interevent times {T
i
}, X(0) = 2,
X(W
1
) = 3, X(W
2
) = 4, X(W
3
) = 3.
Interevent Time Theorem.
Theorem 4. Let {X(t)}, t 0, be a CTMC such that
j=0,j=n
p
jn
(t) = (n)t + o(t)
and
p
nn
(t) = 1 (n)t + o(t)
for t suciently small. Then the interevent time, T
i
= W
i+1
W
i
given X(W
i
) = n, is an exponential random variable with parameter
(n). The cumulative distribution function for T
i
is F
i
(t) = 1
e
(n)t
. The mean and variance of T
i
are
E(T
i
) =
1
(n)
and Var(T
i
) =
1
[(n)]
2
.
Indication of the Proof for Interevent Time.
Let X(W
i
) = n. The probability that the process moves to a state dierent
from n in time t is
X
j=0,j=n
p
jn
(t) = (n)t + o(t).
The probability of no change in state is p
nn
(t) = 1 (n)t + o(t).
Let G
i
(t) be the probability that the process remains in state n until time t,
G
i
(t) = Prob{T
i
> t}. If state n is not absorbing, then G
i
(0) = 1. Due to the
memoryless property of MC,
G
i
(t + t) = G
i
(t)p
nn
(t) = G
i
(t)(1 (n)t + o(t)).
Subtract G
i
(t) from both sides, divide by t, take the limit as t 0:
dG
i
(t)
dt
= (n)G
i
(t), G
i
(0) = 1.
The solution is G
i
(t) = e
(n)t
.
Prob{T
i
t} = 1 G
i
(t) = 1 e
(n)t
= F
i
(t), t 0.
Stochastic Realizations.
Theorem 5. Let U be a uniform random variable dened on [0, 1] and T be a
continuous random variable dened on [0, ) with Prob{T t} = F(t). Then
T = F
1
(U), where F is the cumulative distribution of the random variable T.
Proof. We want to show that Prob{F
1
(U) t} = F(t). The functiuon F is
strictly increasing, so that F
1
exists. In addition, for t [0, ),
Prob{F
1
(U) t} = Prob{F(F
1
(U)) F(t)}
= Prob{U F(t)}.
Because U is a uniform random variable, Prob{U y} = y for y [0, 1]. Thus,
Prob{U F(t)} = F(t).
For F(t) = 1 e
(n)t
,
T = F
1
(U) =
ln(1 U)
(n)
=
ln(U)
(n)
.
A Simple Birth and Death Process.
Example 7 (Simple birth and death process). In the simple birth and death process,
an event can be a birth or a death, i i + 1 or i i 1.
p
i+j,i
(t) = Prob{X(t) = j|X(t) = i}
=
8
>
>
<
>
>
:
dit + o(t), j = 1
bit + o(t), j = 1
1 (b + d)it + o(t), j = 0
o(t), j = 1, 0, 1.
Given X(W
i
) = n, (n) = (b + d)n which means the interevent time T
i
is
T
i
=
ln(U)
(b + d)n
.
A birth occurs with probability b/(b+d) and a death with probability d/(b+d). The
deterministic analogue is the dierential equation dn/dt = (b d)n, n(0) = n
0
,
with solution
n(t) = n
0
e
(bd)t
.
0 1 2 3 4
0
10
20
30
40
50
n(t)=2e
t
b=2, d=1
t
n
(
t
)
Applications to Birth and Death Processes and
Population Processes.
(1) General Birth and Death Process
(2) SIR Epidemic Process
(3) Competition Process
(4) Predation Process
(1) General Birth and Death Process
p
i+j,i
(t) = Prob{X(t) = j|X(t) = i}
=
8
>
>
<
>
>
:
i
t + o(t), j = 1
i
t + o(t), j = 1
1 (
i
+
i
)t + o(t), j = 0
o(t), j = 1, 0, 1
where
i
0,
i
0 for i = 0, 1, 2, . . . and
0
= 0. If > 0, then
there is immigration.
Forward Kolmogorov dierential equations dP/dt = QP, where Q
has the following form for an innite state space
Q =
0
B
B
B
B
B
@
0

1
0 0
0

1
1

2
0
0
1

2
2

3

0 0
2

3
3

.
.
.
.
.
.
.
.
.
.
.
.
1
C
C
C
C
C
A
.
For a nite state space N, Q is an N N matrix with
N
= 0.
Transition Matrix for the Embedded Markov
Chain
The transition matrix of the embedded Markov chain with an innite
state space is
T =
0
1
/(
1
+
1
) 0 0
1 0
2
/(
2
+
2
) 0
0
1
/(
1
+
1
) 0
3
/(
3
+
3
)
0 0
2
/(
2
+
2
) 0
.
.
.
.
.
.
.
.
.
.
.
.
For a nite state space N, T is an N N matrix with

N
= 0.
The embedded Markov chain can be thought of as a generalized
random walk model with a reecting boundary at zero (and at N
in the nite case). The probability of moving right (or a birth) is
t
i+1,i
=
i
/(
i
+
i
) and the probability of moving left (or a death) is
t
i1,i
=
i
/(
i
+
i
). The transition matrix T is irreducible i
i
> 0
and
i+1
> 0 for i = 0, 1, 2, . . . .
Theorem 6. Let {X(t)}, t 0, be a general birth and death chain. If the state
space is innite, {0, 1, 2, . . .}, a unique positive stationary probability distribution
exists i
i
> 0 and
i1
> 0 for i = 1, 2, . . . , and
X
i=1
1

i1
2

i
< .
The stationary probability distribution is given by
i
=

0
1

i1
2

i
0
, i = 1, 2, . . . (5)
and
0
=
1
1 +
P
i=1
1

i1
2

i
.
If the state space is nite, {0, 1, 2, . . . , N}, then a unique positive stationary
probability distribution exists i
i
> 0 and
i1
> 0 for i = 1, 2, . . . , N. The
stationary probability distribution is given by (5) and (6), where the summation on i
extends from 1 to N.
If
0
= 0 and
i
> 0 for i 1, then = (1, 0, 0, 0, . . .)
T
.
Example 8. Suppose
i
= b and
i
= id for i = 0, 1, 2, . . .. Then, a unique
stationary distribution exists because
1

i1
1
u
2

i
=
b
i
d
i
i!
=
(b/d)
i
i!
1 +
X
i=1
(b/d)
i
i!
= e
b/d
.
The stationary probability distribution is a Poisson distribution,
0
= e
b/d
and
i
=
(b/d)
i
i!
e
b/d
.
Example 9. Suppose
i
= q > 0, i = 1, 2, . . ., and
i
= p > 0, i = 0, 1, 2, . . . ,
where p + q = 1. The embedded Markov chain is a semi-innite random walk
model with reecting boundary conditions at zero. The chain has a unique stationary
probability distribution i
X
j=1
p
q
j
< i p < q.
The stationary probability distribution is a geometric probability distribution,
0
=
q p
q
and
i
=
p
q
1
p
q
.
Simple Birth and Death Process
i
= i and
i
= i, i = 1, 2, . . . .
0
= 0 and
0
= 0; zero is an absorbing state and = (1, 0, 0, . . .)
T
is the unique
stationary probability distribution.
An explicit solution to P(t) is not possible but we can determine the moments
of this distribution by using a technique known as the generating function technique.
First-order partial dierential equations can be derived for the pgf and mgf.
Generating Function Technique: Let P(z, t) =
P
i=0
p
i
(t)z
i
denote the pgf and
M(, t) = P(e
, t), the mgf. Multiply the dierential equation dp/dt = Qp by

z
i
and sum from 0 to :
X
i=0
z
i
dp
i
(t)
dt
=
X
i=0
z
i
(i 1)p
i1
(t) + (i + 1)p
i+1
(t) ( + )ip
i
(t)
.
for i = 1, 2, . . . with initial conditions p
i
(0) =
iN
. Interchange dierentiation
and integration, evaluate at z = 1 to obtain partial dierential equation for the pgf:
P
t
= [(1 z) + z(z 1)]
P
z
, P(z, 0) = z
N
.
The change of variable z = e
leads to mgf equation:

M
t
= [(e
1) + (e
1)]
M
, M(, 0) = e
N
.
Application of the Method of Characteristics
Solves for the MGF.
Application of the method of characteristics to the mgf equation leads to
dt
d
= 1,
d
(e
1) + (e
1)
= d, and
dM
d
= 0,
t(s, 0) = 0, (s, 0) = s, and M(s, 0) = e
sN
.
The mgf is
M(, t) =
8
>
>
>
>
>
<
>
>
>
>
>
:
e
t()
(e
) (e
1)
e
t()
(e
) (e
1)
!
N
, if =
1 (t 1)(e
1)
1 t(e
1)
!
N
, if = .
Making the change of variable = ln z, the pgf is
P(z, t) =
8
>
>
>
>
<
>
>
>
>
:
e
t()
(z ) (z 1)
e
t()
(z ) (z 1)
!
N
, if =
1 (t 1)(z 1)
1 t(z 1)
N
, if = .
Probabilities and Moments Associated with the
Process can be Obtained Directly from the GF.
Recall
P(z, t) =
X
i=0
p
i
(t)z
i
and p
i
(t) =
1
i!
i
P
z
i
z=0
.
The rst term in the expansion of P(z, t) is p
0
(t) = P(0, t):
p
0
(t) =
8
>
>
>
>
<
>
>
>
>
:
e
()t
e
()t
!
N
, if =
t
1 + t
N
, if = .
The mean and variance of the simple birth and death process can be derived
from either the pgf or mgf. For = ,
m(t) = Ne
()t
and
2
(t) = m(t) = N
( + )
( )
e
()t
(e
()t
1).
The mean corresponds to an exponential growth model if > and an exponential
decay model if < . For = , the mean and variance are
m(t) = N and
2
(t) = 2Nt.
Probability of Extinction in the Simple Birth and
Death Process
The probability of extinction, p
0
(t), has a simple expression when t .
Taking the limit,
p
0
() = lim
t
p
0
(t) =
8
>
<
>
:
1, if
N
, if >
This latter result is reminiscent of a semi-innite random walk with an absorbing
barrier at x = 0that is, the gamblers ruin problem, where the probability of losing a
game is and the probability of winning a game is . When the probability of losing
(death) is greater than or equal to the probability of winning (birth), then, in the
long run (t ), the probability of losing all of the initial capital N (probability
of absorption) approaches 1. However, if the probability of winning is greater than
the probability of losing, then, in the long run, the probability of losing all of the
initial capital is (/)
N
.
Probability of Extinction in the General Birth
and Death Process
Theorem 7. Let
0
= 0 in a general birth and death chain with X(0) = m 1.
(i) Suppose
i
> 0 and
i
> 0 for i = 1, 2 . . .. If
X
i=1
2

i
2

i
= ,
then lim
t
p
0
(t) = 1 and if
X
i=1
2

i
2

i
< ,
then
lim
t
p
0
(t) =
P
i=m
2

i
2

i
1 +
P
i=1
2

i
2

i
.
(ii) Suppose
i
> 0 for i = 1, 2, . . .,
i
> 0 for i = 1, 2, . . . , N 1 and
i
= 0 for i = N, N + 1, N + 2, . . . . Then lim
t
p
0
(t) = 1.
An Explosive Birth Process
Theorem 8. A birth process [
i
> 0,
i
= 0 i = 0, 1, . . .] is explosive i
X
i=1
1
i
< .
Example 10. Suppose a birth process satises
i
= bi
k
> 0, i = 1, 2, . . ., where
k > 1. Note that
X
i=1
1
bi
k
=
1
b
X
i=1
1
i
k
is a multiple of a convergent p-series. The birth process is explosive. A deterministic
analogue of this model is the dierential equation
dn
dt
= bn
k
.
Integration of n with initial condition n(0) = N leads to the solution
n(t) =
h
N
1k
(k 1)bt
i
1/(k1)
.
As t N
1k
/[b(k1)], then n(t) . The deterministic solution explodes
at a nite time.
(2) SIR Epidemic Process.
Let S(t), I(t), and R(t) denote random variables for the number of susceptible,
infected (and infectious), and immune individuals, respectively, where S(t) +I(t) +
R(t) = N. This is a bivariate process since there are two independent random
variables, {(S(t), I(t))}. The transition probabilities are
Prob{S(t) = i, I(t) = j|(S(t), I(t))})
=
8
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
:
N
S(t)I(t) t + o(t), (i, j) = (1, 1)
I(t) t + o(t), (i, j) = (0, 1)
1
N
S(t)I(t) + I(t)
t
+o(t), (i, j) = (0, 0)
o(t), otherwise.
Initially (S(0), I(0)) = (s
0
, i
0
), where s
0
+ i
0
= N, s
0
0 and i
0
> 0.
Let p
(i,j)
(t) = Prob{S(t) = i, I(t) = j} be the joint probability. The set
{(i, 0)|i = 0, 1, 2, . . . , N} is closed and absorbing. The epidemic stops when
there are no infectious individuals.
lim
t
N1
X
i=0
p
(i,0)
(t) = 1.
Final Size of an Epidemic.
In the embedded Markov chain of the SIR epidemic process, there
are transitions from state (i, j) to either state (i+1, j1) representing
a susceptible that becomes infected or to (i, j 1) representing a
recovery of an infected individual. The probability of recovery is
p
i
=
j
j + (/N)ij
=
+ (/N)i
.
The probability that a susceptible becomes infected is
1 p
i
=
(/N)ij
j + (/N)ij
=
(/N)i
+ (/N)i
.
Example 11. A stochastic SIR epidemic model with population size N = 4 has 15
states. Group the states into ve sets corresponding to
I: (4, 0), (3, 0), (2, 0), (1, 0), (0, 0)
II: (3, 1), (2, 1), (1, 1), (0, 1)
III: (2, 2), (1, 2), (0, 2)
IV: (1, 3), (0, 3)
V: (0, 4).
1,0 2,0 3,0 4,0 0,0
1,1 2,1 3,1 0,1
1,2 2,2
0,3
0,2
1,3
0,4
Figure 5: Directed graph of the embedded Markov chain of the SIR epidemic model
with N = 4. The maximum path length beginning from state (3, 1) is indicated by
the thick arrows.
The Transition Matrix of the Embedded MC can
be used to Find Final Size.
The transition matrix of the embedded MC has the following block form:
T =
0
B
B
B
B
B
@
I A
1
0 0 0
0 0 A
2
0 0
0 B
1
0 A
3
0
0 0 B
2
0 A
4
0 0 0 B
3
0
1
C
C
C
C
C
A
,
corresponding to the ve sets. The joint probability vector vector: p = (p
(i,j)
) =
(p
I
, p
II
, p
III
, p
IV
, p
V
)
T
. Each of the block matrices in T diers in dimensions and
represents dierent transitions. Matrix I is a 5 5 identity matrix. Matrix A
j
represents recovery, transitions from j infected individuals to j 1, and matrix B
j
represents infection, transitions from j infected individuals to j + 1, e.g.,
A
1
=
0
B
B
B
B
B
@
0 0 0 0
p
3
0 0 0
0 p
2
0 0
0 0 p
1
0
0 0 0 p
0
1
C
C
C
C
C
A
, B
1
=
0
@
1 p
3
0 0 0
0 1 p
2
0 0
0 0 1 p
1
0
1
A
.
In general, suppose S(0) = N 1, I(0) = 1, and R(0) = 0. The probabilities
associated with the nal size of the epidemic {p
f
Ni
} can be determined from the
absorption probabilities:
lim
t
p
(i,0)
(t) = p
f
Ni
, i = 0, 1, 2, . . . , N 1
which can be found using the transition matrix T of the embedded Markov chain:
lim
t
p(t) = p(2N 1) = T
2N1
p(0).
Then p
I
(2N 1) gives the nal size distribution.
0 5 10 15 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Final size
P
r
o
b
a
b
i
l
i
t
y
(a)
R
0
= 0.5
R
0
= 2
R
0
= 5
0 20 40 60 80 100
0
0.1
0.2
0.3
0.4
0.5
0.6
Final size
P
r
o
b
a
b
i
l
i
t
y
(b)
R
0
= 0.5
R
0
= 2
R
0
= 5
Figure 6: Final size of a stochastic SIR epidemic model when I(0) = 1,
S(0) = N 1, = 1, and = 0.5, 2, and 5 In (a), N = 20 and in (b),
N = 100.
N
20 100
0.5 1.9/1.8 2/1.9
1 5.7/3.3 13.5/6.1
2 16.3/ 8.1 80/ 38.3
5 20/15.7 99.3/79.3
10 20/18 100/90
Table 1: Final size and mean nal size for a deterministic/ stochastic SIR epidemic
with = 1, S(0) = N 1, and I(0) = 1.
(3) Competition Models
Lotka-Volterra competition, two species compete for the same resource. The
deterministic model has the following form:
dx
1
dt
= x
1
(a
10
a
11
x
1
a
12
x
2
)
dx
2
dt
= x
2
(a
20
a
21
x
1
a
22
x
2
),
where x
i
(0) > 0, a
ij
> 0 for i = 1, 2 and j = 0, 1, 2. There are four dierent
outcomes depending in the parameters and the initial conditions:
I. If
a
20
a
22
a
10
a
12
and
a
20
a
21
a
10
a
11
, then lim
t
(x
1
(t), x
2
(t)) = (0, a
10
/a
11
).
II. If
a
20
a
22
a
10
a
12
and
a
20
a
21
a
10
a
11
, then lim
t
(x
1
(t), x
2
(t)) = (a
20
/a
22
, 0).
III. If
a
20
a
22
>
a
10
a
12
and
a
20
a
21
<
a
10
a
11
, then lim
t
(x
1
(t), x
2
(t)) = (0, a
10
/a
11
) or
lim
t
(x
1
(t), x
2
(t)) = (a
20
/a
22
, 0).
IV. If
a
20
a
22
<
a
10
a
12
and
a
20
a
21
>
a
10
a
11
, then lim
t
(x
1
(t), x
2
(t)) = (x
1
, x
2
).
Stochastic Competition Process.
Let X
1
(t) and X
2
(t) be random variables for the population size of two
competing species, X
1
, X
2
{0, 1, 2, . . .} and t [0, ). Let p
(i,j)
(t) =
Prob{X
1
(t) = i, X
2
(t) = j}. The competition model is a birth and death process
for two species in which births and deaths depend on the population sizes of one
or both of the species. There are numerous ways to formulate a stochastic model
corresponding to the one deterministic model.
Suppose for two competing species, the stochastic birth rates are denoted
i
(X
1
, X
2
) and death rates
i
(X
1
, X
2
) so that the deterministic model is of the
form dx
i
/dt =
i
(x
1
, x
2
)
i
(x
1
, x
2
), i = 1, 2. One example is
i
(X
1
, X
2
) = a
i0
X
i
and
i
(X
1
, X
2
) = X
i
(a
i1
X
1
+ a
i2
X
2
).
The probability distributions resulting from various birth and death rate assumptions
can dier markedly. Here, we only consider this simple case.
Assume the joint transition probabilities are
Prob{X
1
(t) = i, X
2
(t) = j|(X
1
(t), X
2
(t))}
=
8
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
:
a
10
X
1
(t)t + o(t), (i, j) = (1, 0)
a
20
X
2
(t)t + o(t), (i, j) = (0, 1)
X
1
(t)[a
11
X
1
(t) + a
12
X
2
(t)]t + o(t), (i, j) = (1, 0)
X
2
(t)[a
21
X
1
(t) + a
22
X
2
(t)]t + o(t), (i, j) = (0, 1)
1 X
1
(t)[a
11
X
1
(t) + a
12
X
2
(t)]t
X
2
(t)[a
21
X
1
(t) + a
22
X
2
(t)]t + o(t), (i, j) = (0, 0)
o(t), otherwise.
Stochastic Competition Process.
It follows from forward Kolmogorov dierential equations,
dp
(i,j)
dt
=
1
(i 1, j)p
(i1,j)
+
2
(i, j 1)p
(i,j1)
+
1
(i + 1, j)p
(i+1,j)
+
2
(i, j + 1)p
(i,j+1)
[
1
(i, j) +
2
(i, j) +
1
(i, j) +
2
(i, j)] p
(i,j)
.
Applying the generating function technique, the partial dierential equation for the
mgf
M(, , t) =
X
i,j
p
(i,j)
(t)e
i
e
j
M
t
= a
10
(e
1)
M
+ a
20
(e
1)
M
+ (e
1)
"
a
11
2
M
2
+ a
12
2
M

#
+ (e
1)
"
a
21
2
M

+ a
22
2
M
2
#
,
where M(, , 0) = e
(N
1
+N
2
)
, X
1
(0) = N
1
and X
2
(0) = N
2
.
Dierential Equations for the Moments can be
Obtained from the Partial Dierential Equation
for the MGF.
Dierentiating the partial dierential equation of the mgf M(, , t) =
P
i,j
p
(i,j)
(t)e
i
e
j
with respect to or and evaluation at = 0 = gives
dierential equations for the means or higher order moments, that is,
m
kl
(t) =
k+l
M(, , t)
=0=
= E(X
k
1
(t)X
l
2
(t)).
For example dierential equations for the means are
dm
10
(t)
dt
= a
10
m
10
(t) a
11
m
20
(t) a
12
m
11
(t)
dm
01
(t)
dt
= a
20
m
01
(t) a
21
m
02
(t) a
22
m
11
(t).
The two dierential equations for the means depend on ve unknown variables,
m
ij
(t), and cannot be solved explicitly. However, note that the form of these
equations is similar to the deterministic dierential equations. Sometimes specic
assumptions about E(X
k
1
(t)X
l
2
(t)) are required to approximate the higher-order
moments of the distribution known as moment closure techniques.
An Example of a Competition Process.
Example 12. Let a
10
= 2, a
20
= 1.5, a
11
= 0.03, a
12
= 0.02, a
21
= 0.01, and
a
22
= 0.04. Case IV: a stable positive equilibrium exists, (x
1
, x
2
) = (50, 25).
At t = 5, the means and variances are estimated from 1000 sample paths,
m
X
1
(5) = m
10
(5) = 49.9, m
X
2
(5) = m
01
(5) = 23.2,
X
1
(5) = 9.4, and
X
2
(5) = 6.8.
0 1 2 3 4 5
10
20
30
40
50
60
70
80
Time
P
o
p
u
l
a
t
i
o
n

s
i
z
e
s
20 30 40 50 60 70 80
0
10
20
30
40
50
X
1
X
2
Figure 7: A sample path of the Lotka-Volterra competition model graphed as
a function of time and in the phase plane with birth and death rates given by
(), a
10
= 2, a
20
= 1.5, a
11
= 0.03, a
12
= 0.02, a
21
= 0.01, a
22
= 0.04,
X
1
(0) = 50, and X
2
(0) = 25. The dotted lines indicate the equilibrium values.
(4) Predation Process.
The Lotka-Volterra predator-prey model has the form
dx
dt
= x(a
10
a
12
y)
dy
dt
= y(a
21
x a
20
),
where a
ij
> 0. The equilibrium (a
20
/a
21
, a
10
/a
12
) is neutrally stable
Let X(t) and Y (t) denote random variables for the size of the prey and predator
populations respectively, in a stochastic Lotka-Volterra model. Assume the transition
probabilities satisfy
Prob{X(t) = i, Y (t) = j|(X(t), Y (t))}
=
8
>
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
>
:
a
10
X(t)t + o(t), (i, j) = (1, 0)
a
21
X(t)Y (t)t + o(t), (i, j) = (0, 1)
a
12
X(t)Y (t)t + o(t), (i, j) = (1, 0)
a
20
Y (t)t + o(t), (i, j) = (0, 1)
1 X(t)[a
10
+ a
12
Y (t)]t
Y (t)[a
20
+ a
21
X(t)]t + o(t), (i, j) = (0, 0)
o(t), otherwise.
An Example of the Lotka-Volterra Predation
Process.
0 5 10 15 20
0
50
100
150
200
250
300
Time
P
r
e
d
a
t
o
r

a
n
d

p
r
e
y
0 50 100 150 200 250
0
20
40
60
80
100
120
Prey
P
r
e
d
a
t
o
r
Figure 8: A sample path of the Lotka-Volterra predator-prey model is
compared to the solution of the deterministic model. Solutions are
graphed over time and in the phase plane. The parameter values and
initial conditions satisfy a
10
= 1, a
20
= 1, a
12
= 0.02, a
21
= 0.01,
X(0) = 120, and Y (0) = 40. Solutions with the smaller amplitude
represent the predator.
For other biological applications of CTMC models, please consult
the references.
This Concludes Part III on CTMC.
Theory
Theory
Populations
Theory
Part IV
Stochastic Dierential Equations
Linda J. S. Allen
August 2008
Acknowledgement
COURSE OUTLINE
Theory
Theory
Populations
Theory
Comparison to Other Stochastic Processes: DTMC and CTMC
Basic References for Part IV of this Course
[1 ] Allen, LJS. 2003. An Introduction to Stochastic Processes with Applications
to Biology. Prentice Hall, Upper Saddle River, NJ.
Chapter 8
[2 ] Allen, LJS. 2008. Chapter 3: An Introduction to Stochastic Epidemic Models.
Mathematical Epidemiology, Lecture Notes in Mathematics. Vol. 1945. pp.
81-130, F. Brauer, P. van den Driessche, and J. Wu (Eds.) Springer.
[2 ] Allen, LJS and EJ Allen. 2003. A comparison of three dierent stochastic
population models with regard to persistence time. Theor. Pop. Biol. 64:
439-449.
[3 ] Allen, LJS and P van den Driessche. 2006. Stochastic epidemic models with a
backward bifurcation. Math. Biosci. Eng. 3(3): 445-458.
[4 ] Other references will be noted.
Part IV:
Stochastic Dierential Equations - SDE
Review of Denition and Notation.
Let {X(t)} be a collection of continuous random variables dened on
a probability space, a stochastic process that is continuous in time
t [t
0
, ) and in state
X(t) (, ) or [0, ) or [0, M].
The associated probability density function (pdf) is denoted p(x, t),
Prob{X(t) [a, b]} =
_
b
a
p(x, t) dx.
Denition 1. Assume {X(t)} is a stochastic process continuous in
time and in state. Then {X(t)} is a Markov process if, given any
sequence of times, t
0
< t
1
< < t
n1
< t
n
,
Prob{X(t
n
) y|X(t
0
) = x
0
, X(t
1
) = x
1
, . . . , X(t
n1
) = x
n1
}
= Prob{X(t
n
) y|X(t
n1
) = x
n1
}.
Denition 2. The transition pdf for a continuous time and state
Markov process is the density function for a transition from state x at
time t to state y at time s, t < s. The transition pdf is denoted as
p(y, s; x, t).
The transition pdf is said to be homogeneous or time homogeneous
if
p(y, s + t; x, t + t) = p(y, s; x, t),
where t
0
t < s and t > 0. In this case, the transitions only
depend on the length of time between states s t, and the transition
pdf is denoted as
p(y, x, s t) = p(y, s; x, t).
Discrete Random Walk and the Diusion
Equation
Consider a random walk on the set {0, x, 2x, . . .}. Let p = probability
of moving to the right and q = probability of moving to the left, p + q = 1. Let
{X(t)} be DTMC for this random walk, where t {0, t, 2t, . . .}, X(t)
{0, x, 2x, . . .}, and p
x
(t) = Prob{X(t) = x}. Dene u(x, t) = p
x
(t).
It follows that
u(x, t + t) = pu(x x, t) + qu(x + x, t).
Expanding the right-hand side of the preceding equation using Taylors formula about
the point (x, t),
u(x, t + t) = p
"
u(x, t) +
u(x, t)
x
(x) +

2
u(x, t)
x
2
(x)
2
2
+ O((x)
3
)
#
+ q
"
u(x, t) +
u(x, t)
x
x +

2
u(x, t)
x
2
(x)
2
2
+ O((x)
3
)
#
= u(x, t) + (q p)
u(x, t)
x
x +

2
u(x, t)
x
2
(x)
2
2
+ O((x)
3
).
Subtracting u(x, t) and dividing by t,
u(x, t + t) u(x, t)
t
= (q p)
u(x, t)
x
x
t
+
1
2
2
u(x, t)
x
2
(x)
2
t
+ O
(x)
3
t
!
.
We make the assumptions
lim
t,x0
(p q)
x
t
= c,
lim
t,x0
(x)
2
t
= D,
lim
t,x0
(x)
3
t
= 0.
Letting t and x approach zero, then u(x, t) represents the pdf of the continuous
time and state process X(t) that satises the following partial dierential equation:
u
t
= c
u
x
+
D
2
2
u
x
2
, x (, ).
the diusion equation with drift, where c = drift coecient D = diusion
coecient. The equation is also known as the forward Kolmogorov dierential
equation.
When p = 1/2 = q, so that movement is unbiased or symmetric, the limiting
stochastic process is known as Brownian motion, where c = 0, so that
u
t
=
D
2
2
u
x
2
, x (, ).
Standard Brownian motion with X(0) = 0 is also known as a Wiener Process.
Diusion Process
The assumptions on the limits in the random walk model were necessary to
obtain the diusion equation with drift. These assumptions are very important in
the derivation of the Kolmogorov dierential equations; they are are related to the
innitesimal mean and variance of the process.
Denition 3. Let {X(t)}, t t
0
, be a Markov process with state space (, ),
having continuous sample paths and transition pdf given by p(y, s; x, t), t < s.
Then {X(t)} is a diusion process if its pdf satises the following three assumptions
for any > 0 and x (, ):
(i) lim
t0
+
1
t
Z
|yx|>
p(y, t + t; x, t) dy = 0.
(ii) lim
t0
+
1
t
Z
|yx|
(y x)p(y, t + t; x, t) dy = a(x, t).
(iii) lim
t0
+
1
t
Z
|yx|
(y x)
2
p(y, t + t; x, t) dy = b(x, t).
a(x, t) is the drift coecient and b(x, t) is the diusion coecient.
Similar but slightly stronger conditions that lead to the conditions
above are expressed in terms of the expectation:
(i)
lim
t0
+
1
t
E(|X(t)|
|X(t) = x) = 0, > 2.
(ii)
lim
t0
+
1
t
E(X(t)|X(t) = x) = a(x, t).
(iii)
lim
t0
+
1
t
E([X(t)]
2
|X(t) = x) = b(x, t),
where X(t) = X(t + t) X(t) = y x.
a(x, t) is the drift coecient and b(x, t) is the diusion coecient.
The Forward and Backward Kolmogorov
Dierential Equations Follow From these
Assumptions.
The backward Kolmogorov dierential equation for a time-homogeneous process is
p(y, x, t)
t
= a(x)
p(y, x, t)
x
+
1
2
b(x)
2
p(y, x, t)
x
2
.
The forward Kolmogorov dierential equation for a time-homogeneous process is
p(y, x, t)
t
=
[a(y)p(y, x, t)]
y
+
1
2
2
[b(y)p(y, x, t)]
y
2
.
The pdf p(x, t) with p(x, 0) = (xx
0
) is a solution of the forward Kolmogorov
dierential equation and therefore, we can replace p(y, x, t) with p(x, t):
It follows from the forward Kolmogorov dierential equation,
p(x, t)
t
=
[a(x)p(x, t)]
x
+
1
2
2
[b(x)p(x, t)]
x
2
.
A Solution of a Stochastic Dierential Equation
is a Sample Path of a Diusion Process
If the pdf p(x, t) satises
p(x, t)
t
=
[a(x, t)p(x, t)]
x
+
1
2
2
[b(x, t)p(x, t)]
x
2
.
then a sample path of the process {X(t)} is a solution of the Ito SDE of the form:
dX(t) = a(X(t), t)dt +
q
b(X(t), t)dW(t),
where W(t) is a Wiener process. The dierential equation is equivalent to the Ito
stochastic integral equation:
X(t) = X(0) +
Z
t
0
a(X(), ) d +
Z
t
0
q
b(X(), )dW(),
where the rst integral is a Riemann integral and the second is an Ito stochastic
integral.
See EJ Allen (2007) and Kurtz (1970, 1971)
Wiener Process
Let the drift and diusion coecients a(X(t), t) = 0 and b(X(t), t) = 1. Then
we obtain the diusion equation:
p
t
=
1
2
2
p
x
2
, x (, ).
For p(x, 0) = (x), the solution is the pdf of the Wiener process W(t):
p(x, t) =
1
2t
exp
x
2
2t
!
, x (, )
which is also the pdf of a Normal distribution with mean zero and variance t:
W(t) N(0, t).
The corresponding SDE is dX(t) = dW(t). Sample paths of the Wiener process
are continuous but nowhere dierentiable.
0 0.2 0.4 0.6 0.8 1
!1.5
!1
!0.5
0
0.5
1
1.5
Time
W
(
t
)
Properties of Ito Stochastic Integral Equations.
Let f(t, X(t)) f(t) be a function of the continuous random variable X(t),
where t [a, b] and {X(t)}, t 0, is a stochastic process. Assume
Z
b
a
E(f
2
(t)) dt < , (1)
Theorem 1. Assume f(t) and g(t) are random functions satisfying the inequality (1)
and , a, b, and c are constants satisfying a < c < b. Then
(i)
R
b
a
f(t) dW(t) =
R
b
a
f(t) dW(t)
(ii)
R
b
a
(f(t) + g(t)) dW(t) =
R
b
a
f(t) dW(t) +
R
b
a
g(t) dW(t)
(iii)
R
b
a
f(t) dW(t) =
R
c
a
f(t) dW(t) +
R
b
c
f(t) dW(t).
(iv) E
h
R
b
a
f(t) dW(t)
i
= 0 and
(v) E
R
b
a
f(t) dW(t)
=
R
b
a
E(f
2
(t)) dt ( Ito Isometry).
Itos Formula is like a Chain Rule.
Theorem 2 (Itos Formula). Suppose X(t) is a solution of the following Ito SDE:
dX(t) = (X(t), t) dt + (X(t), t) dW(t).
If F(x, t) is a real-valued function dened for x R and t [a, b], with continuous
partial derivatives, F/t, F/x, and
2
F/x
2
, then
dF(X(t), t) = f(X(t), t) dt + g(X(t), t) dW(t),
where
f(x, t) =
F(x, t)
t
+ (x, t)
F(x, t)
x
+
1
2
2
(x, t)
2
F(x, t)
x
2
g(x, t) = (x, t)
F(x, t)
x
.
There is a multidimensional Itos formula for multivariate processes.
Theory, properties and examples of Ito SDEs, see EJ Allen 2007.
Modeling with Ito SDEs and Lectures I, II, III.
Some Examples Applying Itos Formula and the
Properties
(1) dX = Xdt +dW, X(0) = X
0
, referred to as Langevin equation also
the solution is known as an Ornstein Uhlenbeck process.
Solution: X(t) = X
0
e
t
+ e
t
R
t
0
e
s
dW
E(X) = X
0
e
t
, V ar(X) =

2
2
(e
2t
1)
(2) dX = Xdt + XdW, X(0) = X
0
Solution: X(t) = X
0
e
(
2
/2)t+W
E(X) = X
0
e
t
, V ar(X) = X
2
0
e
2t
(e
2
t
1)
(3) dX = Xdt +
XdW, X(0) = X
0
.
No explicit solution.
E(X) = X
0
e
t
, V ar(X) =
X
0
e
t
(e
t
1)
Comparison to the Birth and Death Process of
CTMC
In the birth and death process for CTMC, we derived formulas for
the mean and variance of the process, where is the birth rate and
is the death rate.
If we write the SDE (3) as follows:
(3) dX = ( )Xdt +
_
( +)XdW, X(0) = X
0
.
Then the mean and variance are
E(X) = X
0
e
()t
, V ar(X) = X
0
( +)

e
(+)t
(e
(+)t
1)
which are the same mean and variance as for the CTMC.
Biological Applications of Ito SDEs.
(1) Population Genetics Process: Diusion Process
(2) Stochastic Logistic Growth Models: Compare
DTMC, CTMC and SDE.
(3) SIR Epidemic Model with an Imperfect Vaccine:
ODE and SDE
(1) Population Genetics Process.
We shall derive the Kolmogorov dierential equations for the gene
frequencies of a population assuming random mating and no selection
nor mutation known as random drift. Assume that the population is
diploid; each individual has two copies of the chromosomes. Assume
that the gene is determined by a single locus with only two alleles, A
and a and three possible genotypes,
AA, Aa, and aa.
In addition, assume the total population size is N. The total number
of alleles equals 2N. Let Y (t) denote the number of A alleles in
the population in generation t and X(t) denote the proportion of
A alleles in the population, X(t) = Y (t)/(2N). Suppose individuals
mate randomly and generations are nonoverlapping. The number of
genes in generation t +1 is derived by sampling with replacement from
the genes in generation t. Given X(t) = x, then Y (t + 1) has a
binomial distribution, b(2N, x); that is,
Y (t + 1) b(2N, x).
Based on these assumptions the drift and diusion coecients a(x) and b(x)
can be derived.
Let Y (t) = Y (t + 1) Y (t) and X(t) = X(t + 1) X(t). Then given
X(t) = x, Y (t) = 2Nx and applying the formula for the mean of a binomial
distribution:
E(Y (t + 1)|X(t) = x) = 2Nx
E(Y (t)|X(t) = 2Nx 2Nx = 0.
Then E([Y (t)]
2
|X(t) = x) equals E(Y
2
(t + 1) 2Y (t + 1)Y (t) +
Y
2
(t)|X(t) = x) which can be simpiled to
E(Y
2
(t + 1)|X(t) = x) 2(2Nx)(2Nx) + 4N
2
x
2
= E(Y
2
(t + 1)|X(t) = x) 4N
2
x
2
.
But this is just the variance of Y (t + 1) which applying the formula for the variance
of the binomial distribution gives
E([Y (t)]
2
|X(t) = x) = Var(Y (t + 1)|X(t) = x)
= 2Nx(1 x).
Because X(t) = Y (t)/(2N),
E(X(t)|X(t) = x) =
1
2N
E(Y (t)|X(t) = x) = 0 = a(x)
E([X(t)]
2
|X(t) = x) =
1
(2N)
2
E([Y (t)]
2
|X(t) = x)
=
2Nx(1 x)
(2N)
2
=
x(1 x)
2N
= b(x).
Hence, the pdf p(x, t) for the population genetics process satises
p
t
=
1
4N
2
(x(1 x)p)
x
2
, 0 < x < 1, (2)
where p(x, 0) = (xx
0
). Note that the forward Kolmogorov dierential equation
is singular at the boundaries, x = 0 and x = 1. Both boundaries are exit boundaries.
At the states zero or one, there is xation of either the allele a or A, respectively. The
solution p(x, t) was derived by Kimura (1955), a complicated expression depending
on the hypergeometric function. We shall examine the solution behavior of p(x, t)
through the corresponding Ito SDE for this process.
The Ito SDE for Random Genetic Drift.
The Ito SDE for random genetic drift has the form
dX(t) =
s
X(t)(1 X(t))
2N
dW(t), X(t) [0, 1],
where X(0) = x
0
, 0 < x
0
< 1. The boundaries 0 and 1 are absorbing [e.g.,
if X(t) = 0 (or 1), then X(t + ) = 0 (or 1) for > 0]. Euler-Maruyama
method is used to numerically solve this SDE. Ten thousand sample paths were used
to approximate the pdf at times t = 10, t = 50, and t = 200.
0 0.2 0.4 0.6 0.8 1
0
200
400
600
800
1000
Proportion of allele A
P
r
o
b
a
b
i
l
i
t
y

2
.
5

1
0
2
Time = 10
0 0.2 0.4 0.6 0.8 1
0
200
400
600
800
1000
P
r
o
b
a
b
i
l
i
t
y

2
.
5

1
0
2
Time = 50
0 0.2 0.4 0.6 0.8 1
0
500
1000
1500
2000
P
r
o
b
a
b
i
l
i
t
y

2
.
5

1
0
2
Time = 200
Figure 1: X(0) = 1/2 and N = 100. Approximate probability histograms at
t = 10, 50, and 200.
The Mean Proportion of the Alleles is Constant.
It was shown by Kimura (1955, 1994), for large t, that
p(x, t) Ce
t/(2N)
, 0 < x < 1.
The pdf is approximately constant and very small when 0 < x < 1 and t is large.
The probability of xation at either x = 0 and x = 1 approaches one as t ,
i.e., p(x, t) tends to innity at x = 0 and x = 1 and to zero for 0 < x < 1 as
t approaches innity. When X(0) = 1/2, xation at 0 or 1 is equally likely; the
probability distribution is symmetric about x = 1/2. It follows from the properties
of the Wiener process that the mean equals the initial proportion:
E(X(t)) = E
2
4
X(0) +
Z
t
0
s
X(t)(1 X(t))
2N
dW(t)
3
5
= X(0).
In deterministic population genetics models, this is referred to as a Hardy-Weinberg
equilibrium. A Hardy-Weinberg equilibrium exists when there is random mating, no
mutation, and no selection; the proportion of alleles stays constant in the population.
(2) Stochastic Logistic Growth Models
Deterministic Model:
dy
dt
= b(y) d(y), y(0) = y
0
,
0 < y
0
< N.
(i) b(0) = 0 = d(0) and b(y) = 0 for y N.
(i) b(y) > 0 for y (0, N) and d(y) > 0 for y (0, N].
(iii) b(y) > d(y) for y (0, K).
(iv) b(y) < d(y) for y (K, N).
lim
t
y(t) = K.
Logistic growth satises (i)-(iv): b(y) d(y) = ry(1 y/K).
Reference: Allen, LJS and EJ Allen. 2003. TPB
DTMC
DTMC {Y (t)}, where Y (t) is a discrete random variable, Y (t)
{0, 1, . . . , N}, t {0, t, 2t, . . .}.
Probability: p
y
(t) = Prob{Y (t) = y}, probability vector: p(t) =
(p
0
(t), . . . , p
N
(t))
T
,
Transition probabilities p
yx
(t) = Prob{Y (t +t = y|Y (t) = x} :
p
yx
(t) =
_
_
b(x)t, x = y 1
d(x)t, x = y + 1
1 [b(x) +d(x)]t, x = y
0, otherwise.
Forward dierence equations, where t + t = n + 1, t = n and
P = (p
yx
(t)) is the transition matrix:
p(n + 1) = Pp(n), p
y
0
(0) = 1 [P
n+1
= PP
n
].
CTMC
CTMC {Y (t)}, where Y (t) is a discrete random variable, Y (t)
{0, 1, . . . , N}, t [0, ).
Innitesimal transition probabilities:
p
yx
(t) =
_
_
b(x)t +o(t), x = y 1
d(x)t +o(t), x = y + 1
1 [b(x) +d(x)]t +o(t), x = y
0, otherwise.
Forward Kolmogorov dierential equations:
dp
dt
= Qp, p
y
0
(0) = 1.
Generator matrix Q satises
Q = lim
t0
P I
t
.
SDE
Stochastic Process {Y (t)},where Y (t) is a continuous random variable,
Y (t) [0, N], t [0, ), pdf p(y, t). Forward Kolmogorov
dierential equations:
p(y, t)
t
=
([b(y) d(y)]p(y, t))
y
+
1
2
2
([b(y) +d(y)]p(y, t))
y
2
p(y, 0) = (y y
0
), absorbing boundary at y = 0 and reective
boundary at y = N.
SDE:
dY (t)
dt
= b(Y (t)) d(Y (t)) +
q
b(Y (t)) + d(Y (t)) dW(t), Y (0) = y
0
.
where E(Y ) [b(Y )d(Y )]t and E([Y ]
2
) [b(y)+d(Y )]t.
Mean Persistence Time
For these two stochastic models we will derive equations to determine the mean
persistence time the mean time until population extinction and compare these
equations. We will use the backward Kolmogorov equations rather than the forward
Kolmogorov equations, to derive the equations.
Let T
y
be the continuous random variable for time until population extinction
given initial population size of y. Let p
y
(t) be the probability function or pdf
associated with T
y
. Let
y
= E(T
y
),
r
y
= E(T
r
y
)
denote the rst and r > 1 moments of the persistence time and the vector moments
as = (
1
, . . . ,
N
)
,
r
= (
r
1
, . . . ,
N
)
, where denotes transpose,

1
= .
We will derive equations satised by , the rst vector moment for DTMC,
CTMC, and SDE models and for the rth vector for CTMC and SDE models. Note
if the initial population size is zero,
0
= 0 =
r
0
.
DTMC Mean Persistence Time
The mean persistence time satises
y
= b(y)t(
y+1
+t)+d(y)t(
y1
+t)+(1[b(y)+d(y)]t)(
y
+t)
or
d(y)
y1
[b(y) +d(y)]
y
+b(y)
y+1
= 1,
for y = 1, . . . , N, where b(N) = 0. Expressed in matrix form D =
1, where 1 = (1, 1, . . . , 1)
and
D =
_
_
_
_
b(1) d(1) b(1) 0 0
d(2) b(2) d(2) b(2) 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 d(N) d(N)
_
_
_
_
.
Since D is irreducibly diagonally dominant, D is nonsingular,
= D
1
1.
CTMC Mean Persistence Time
Let
R(y, t) = Prob{T
y
> t}.
Then the pdf of T
y
is p
T
y
(t) = dR(y, t)/dt. The function R(y, t) is sometimes
called the reliability function in engineering applications. R(y, t) satises the
backward Kolmogorov dierential equation, dP = PQ, that is,
dR
dt
= R
Q
or
dR(y, t)
dt
= b(y)R(y 1, t) (b(y) + d(y))R(y, t) + d(y)R(y + 1, t), y = 1, . . . , N.
But the mean persistence time
y
=
Z

0
tp
T
y
(t) dt =
Z

0
t
dR(y, t)
dt
dt.
Applying integration by parts and the condition lim
t
tR(y, t) = 0 gives
y
=
Z

0
R(y, t) dt.
Next, integrate the backward Kolmogorov dierential equation from t = 0 to
and use the conditions R(y, 0) = 1 and lim
t
R(y, t) = 0 to obtain a dierence
equation for
y
:
1 = b(y)
y1
[b(y) + d(y)]
y
+ d(y)
y+1
for y = 1, . . . , N, where b(N) = 0. This is the same dierence equation as for
the DTMC model. In matrix form D = 1, where 1 = (1, 1, . . . , 1)
and
D =
0
B
B
B
@
b(1) d(1) b(1) 0 0
d(2) b(2) d(2) b(2) 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 d(N) d(N)
1
C
C
C
A
.
Since D is irreducibly diagonally dominant, D is nonsingular,
= D
1
1.
CTMC rth Moment of the Persistence Time
The rth moment of the persistence time,
r
y
can be expressed in terms of
r1
y
.
Similar to the mean, it can be shown the rth moment is
r
y
= r
Z

0
t
r1
R(y, t) dt.
Multiply the backward Kolmogorov equation by r
r1
, integrate from t = 0 to
and apply integration by parts to obtain the following dierence equation:
r
r1
y
= d(y)
r
y1
[b(y) + d(y)]
r
y
+ b(y)
r
y+1
.
Expressed in matrix form D
r
= r
r1
. Thus, the solution to the rth moment is
r
= rD
1
r1
.
Because the matrix D is tridiagonal, closed form expressions exist
for
y
and
r
y
, where Y (0) = y:
y
=
8
>
>
>
>
>
<
>
>
>
>
>
:
1
d(1)
+
N
X
i=1
b(1) b(i 1)
d(1) d(i)
, y = 1
1
+
y1
X
s=1
2
4
d(1) d(s)
b(1) b(s)
N
X
i=s+1
b(1) b(i 1)
d(1) d(i)
3
5
, y = 2, . . . , N.
r
y
=
8
>
>
>
>
>
<
>
>
>
>
>
:
r
r1
1
d(1)
+
N
X
i=1
b(1) b(i 1)
r1
i
d(1) d(i)
, y = 1
r
1
+ r
y1
X
s=1
2
4
d(1) d(s)
b(1) b(s)
N
X
i=s+1
b(1) b(i 1)
r1
i
d(1) d(i)
3
5
, y = 2, . . . , N.
Note
1
y
=
y
.
SDE Mean Persistence Time
As in the case of the CTMC, the reliability function R(y, t) = Prob{T
y
> t}
satises the backward Kolmogorov dierential equation:
R
t
= [b(y) d(y)]
R
y
+
[b(y) + d(y)]
2
2
R
y
2
As shown for the CTMC:
y
=
Z

0
R(y, t) dt,
r
y
= r
Z

0
t
r1
R(y, t) dt.
To otain a dierential equation for
r
y
, multiply by rt
r1
, integrate the backward
Kolmogorov equation from t = 0 to and apply the boundary conditions
r
0
= 0
and d
r
y
/dy|
y=N
= 0.
1 = [b(y) d(y)]
d
y
dy
+
[b(y) + d(y)]
2
d
2
y
dy
2
r
r1
y
= [b(y) d(y)]
d
r
y
dy
+
[b(y) + d(y)]
2
d
2
r
y
dy
2
It is interesting to note that application of a central nite dierence scheme
r
y
dy
r
y+1
r
y1
2
,
d
2
r
y
dy
2

r
y+1
2
r
y
+
r
y1
to the backward Kolmogorov equations and dierential equation moments in the
SDE model:
KDE :
R
t
= [b(y) d(y)]
R
y
+
[b(y) + d(y)]
2
2
R
y
2
1 = [b(y) d(y)]
d
y
dy
+
[b(y) + d(y)]
2
d
2
y
dy
2
r
r1
y
= [b(y) d(y)]
d
r
y
dy
+
[b(y) + d(y)]
2
d
2
r
y
dy
2
lead to the backward Kolmogorov equations and dierence equation moments in the
CTMC:
KDE :
dR(y, t)
dt
= b(y)R(y 1, t) (b(y) + d(y))R(y, t) + d(y)R(y + 1, t)
1 = b(y)
y1
[b(y) + d(y)]
y
+ d(y)
y+1
r
r1
y
= d(y)
r
y1
[b(y) + d(y)]
r
y
+ b(y)
r
y+1
.
III. Numerical Examples.
Example 1:
Stochastic Birth and Death Rates:
b(y) = 2y
y
2
50
and d(y) = y +
y
2
50
, N = 100.
dy
dt
= y
1
y
25
,
where r = 1 and K = 25.
0 2000 4000 6000 8000 10000
0
5
10
15
20
25
30
35
40
45
Time Steps !t
P
o
p
u
l
a
t
i
o
n

S
i
z
e
(a)
0 2 4 6 8 10
0
5
10
15
20
25
30
35
40
45
Time
P
o
p
u
l
a
t
i
o
n

S
i
z
e
(b)
0 2 4 6 8 10
0
5
10
15
20
25
30
35
40
45
(c)
Time
P
o
p
u
l
a
t
i
o
n

S
i
z
e
Figure 2: Three sample paths for each of the stochastic logistic models when
y
0
= 3. (a) DTMC, (b) CTMC, (c) SDE. The smooth curve is the deterministic
solution.
In a Second Example, we Compute the
Persistence Time.
Example 2:
Stochastic Birth and Death Rates:
(a) b(y) = 1.35y
y
2
20
, d(y) = 0.35y +
y
2
20
, N = 27
(b) b(y) = 2y
y
2
20
, d(y) = y +
y
2
20
, N = 40.
dy
dt
= y
1
y
10
,
where r = 1 and K = 10.
The Mean and Standard Deviation Show Good
Agreement among the Three Stochastic Models.
Mean
Example 2 (a) Example 2 (b)
0 5 10 15 20 25
0
50
100
150
200
250
300
350
400
450
Initial Population Size
M
e
a
n

P
e
r
s
is
t
e
n
c
e

T
im
e
(a)
0 10 20 30 40
0
5
10
15
20
25
30
35
40
45
M
e
a
n

P
e
r
s
is
t
e
n
c
e

T
im
e
(b)
Figure 3: DTMC and CTMC models (curve with dots) and SDE model.
Standard Deviation
Example 2 (a) Example 2 (b)
0 5 10 15 20 25
0
50
100
150
200
250
300
350
400
450
S
t
a
n
d
a
r
d

D
e
v
ia
t
io
n
(a)
0 10 20 30 40
0
5
10
15
20
25
30
35
40
45
(b)
S
t
a
n
d
a
r
d

D
e
v
ia
t
io
n
Figure 4: CTMC model (curve with dots) and the SDE model.
Generalizations to Models with Environmental
Variability.
Assumptions: The environment produces random changes in the per capita birth
and per capita death rates of the population that are independent from the random
variations in the births and deaths in the birth-death process.
Y (t) = random variable for the total population size at time t.
B(t) = random variable for the per capita birth rate at time t.
D(t) = random variable for the per capita death rate at time t.
In the CTMC and DTMC Models:
Y (t) {0, 1, 2, . . . , N}
B(t) {b
e
, b
e

1
, b
e
2
1
, },
D(t) {d
e
, d
e

2
, d
e
2
2
, }
Probability : p
y,b,d
(t) = Prob{Y (t) = y, B(t) = b, D(t) = d}.
In the SDE Model:
Y (t) [0, N]
B(t) R
D(t) R
pdf : p(y, b, d, t)
The SDE Model with Environmental Variability
is a System of Three Ito SDEs.
dY (t)
dt
= (
b(Y, B)

d(Y, D)) +
q
((
b(Y, B) +

d(Y, D)))
dW
1
dt
dB(t)
dt
=
1
(b
e
B) +
1
dW
2
dt
dD(t)
dt
=
2
(d
e
D) +
d
dW
3
dt
Similar analyses for Persistence Time can Be Studied for the CTMC, DTMC and
the SDE models with Environmental Variability in the Births and Deaths.
Reference: E Allen, L Allen, H Schurz (2005)
(3) SIR Epidemic Model with Vaccination.
Background:
Pertussis (whooping cough) remains a signicant health threat, in particular
to infants and young children. Following the introduction of immunization in the
mid-1940s, pertussis incidence declined more than 99 percent by 1970 and to an
all-time low of 1,010 cases by 1976 in USA. However, since then, a relevant increase
in disease incidence has been documented, with nearly 26,000 cases reported in the
USA in 2004.
Proposed Reasons for the Observed Increase:
1. Increasing Notication Rate.
2. Reduction of Natural Boosting Due to High Vaccination Coverage.
3. Antigenic Shift of Circulating Pertussis Strain (Current Vaccine Not as
Eective Against New Strain).
Source: CDC. Pertussis --- United States, 1997--2000. MMWR 2002;51:73-76.
Reported Pertussis Cases by Year Reported Pertussis Cases by Year
United States, 1922 United States, 1922 2000 2000
Routine pertussis Routine pertussis
immunization begins immunization begins
C
a
s
e
s

(
T
h
o
u
s
a
n
d
s
)
C
a
s
e
s

(
T
h
o
u
s
a
n
d
s
)
Year Year
0 0
50 50
100 100
150 150
200 200
250 250
300 300
1922 1922 1930 1930 1940 1940 1950 1950 1960 1960 1970 1970 1980 1980 1990 1990 2000 2000
Reported Cases of Pertussis Reported Cases of Pertussis
United States, 1980 United States, 1980 2004 2004
* Preliminary Data * Preliminary Data
CDC. MMWR 1997;46(54):71 CDC. MMWR 1997;46(54):71- -80. Murphy T. Data on file, personal communication, 2001. MMWR 80. Murphy T. Data on file, personal communication, 2001. MMWR 2000;50:1175. 2000;50:1175.
MMWR 2001;50(33):725. MMWR 2002;51:723. MMWR 2003;52:747. Bac MMWR 2001;50(33):725. MMWR 2002;51:723. MMWR 2003;52:747. Bacterial Vaccine Preventable Disease Branch, National Immunization terial Vaccine Preventable Disease Branch, National Immunization
Program, 2004. Program, 2004.
Year
C
a
s
e
s

(
T
h
o
u
s
a
n
d
s
)
7796
6586
4570
11,647
9771
18,957*
0
2
4
6
8
10
12
14
16
18
20
1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
A Compartmental Diagram of the SIR Epidemic
Model with a Vaccinated Group V.
!R
dN
"SI/N #I
dS dI dR
$ V %"VI/N
&S
dV
S I R
V
When = 1 the vaccine is totally useless (no protection against infection).
When = 0 the vaccine is perfect (vaccination is 100% eective).
The Deterministic SIRV Epidemic Model.
dS
dt
= d(N S)
SI
N
S + V + R,
dI
dt
=
SI
N
+
V I
N
(d + )I,
dR
dt
= I (d + )R,
dV
dt
= S (d + )V
V I
N
,
where S(t) + I(t) + R(t) + V (t) = N is the constant population size.
We only consider the variables I, R. and V .
References: Arino, McCluskey, van den Driesche (2003); Allen, van den Driessche
(2006)
Parameter Values Applicable to Pertussis That
Result in a Backward Bifurcation are Given in
the Following Table.
Parameter Typical Value Meaning
* 0.4/day Transmission coecient
1/(21 days) Average infectious period 21 days
d 1/(75 years) Average lifespan 75 years
* 1/(31 days) Average period of immunity 31 days
0.05/day Vaccination rate constant
1/(5 years) Average vaccine waning time 5 years
Table 1: Basic parameter values.
For [0.08866, 0.10884] the model exhibits a backward bifurcation, a region
of bistability. In this region, the vaccine is 89.116% to 91.134% eective.
*Parameters chosen to emphasize backward bifurcation eect.
The Bifurcation Diagram Illustrates the
Backward Bifurcation and the Region of
Bistability.
0.08 0.09 0.1 0.11 0.12
0
10
20
30
40
50
!
I
n
f
e
c
t
i
v
e
Figure 5: Stable equilibrium ( curve) and unstable equilibrium (- - curve) for the
infective state

I in terms of the bifurcation parameter , N = 200. A saddle node
bifurcation at
c
= 0.0887 and a transcritical bifurcation at = 0.109.
The Vaccination Reproduction Number is Less
Than One is Not Sucient to Ensure Disease
Eradication.
The Vaccination Reproduction Number is
R
vac
= R
0
d + +
d + +
where the basic reproduction number is
R
0
=

d +
.
For the basic parameter values R
0
= 8.4:
R
vac
< 1, < 0.109,

> 1, > 0.109.
To ensure disease eradication (global stability of DFE) it is not sucient for
R
vac
< 1 but it must also be the case that <
c
).
The usual estimated proportion to vaccinate, 1 1/R
0
1 1/8.4 0.88,
may be an underestimate if there is a backward bifurcation.
Mathematical Results for the Deterministic
SIRV Model.
Let R
c
= R
vac
(
c
).
Theorem 3. For the SIRV model with a backward bifurcation.
(i) If R
vac
< R
c
, there is no endemic equilibrium. All solutions approach the
DFE.
(ii) If R
c
< R
vac
< 1, then there are two distinct endemic equilibria.
Depending on the initial values, the disease dies out or limits to a constant endemic
value.
(iii) If R
vac
> 1, there there is a unique endemic equilibrium. All solutions
approach the unique endemic equilibrium.
Based on the Changes that Occur in the
Epidemic Model, Stochastic Models can be
Easily Formulated.
Possible Change Probability
(
X)
1
= [1, 0, 0]
T
p
1
= (X
0
X
1
/N)t
(
X)
2
= [1, 0, 0]
T
p
2
= dX
1
t
(
X)
3
= [1, 1, 0]
T
p
3
= X
1
t
(
X)
4
= [0, 1, 0]
T
p
4
= (d + )X
2
t
(
X)
5
= [1, 0, 1]
T
p
5
= (X
1
X
3
/N)t
(
X)
6
= [0, 0, 1]
T
p
6
= (d + )X
3
t
(
X)
7
= [0, 0, 1]
T
p
7
= X
0
t
Table 2: Possible changes in the process for the epidemic model when
t is small.
X
0
= S, X
1
= I, X
2
= R, X
3
= V
We Formulate Three Systems of SDEs that have
the Same Forward Kolmogorov Dierential
Equation and the Same Sample Paths.
SDE (1):
8
>
<
>
:
d
X(t) =

f(t,

X(t)) dt + B(t,

X(t)) d

W
(t)
X(0) = [X
1
(0), X
2
(0), X
3
(0)]
T
,
where

W
(t) = [W
1
(t), W
2
(t), W
3
(t)]
T
is a vector of three independent Wiener
processes. The drift vector

f has the form
f(t,

X(t)) =
2
6
6
6
4
X
0
X
1
N
+
X
1
X
3
N
(d + )X
1
X
1
(d + )X
2
X
0
(d + )X
3
X
1
X
3
N
3
7
7
7
5
and the 3 3 matrix B = V
1/2
. Matrix V is
2
6
6
6
4
X
0
X
1
N
+
X
1
X
3
N
+ (d + )X
1
X
1

X
1
X
3
N
X
1
X
1
+ (d + )X
2
0
X
1
X
3
N
0 X
0
+ (d + )X
3
+
X
1
X
3
N
3
7
7
7
5
.
SDE (2):
8
>
<
>
:
d
X(t) =

f(t,

X(t)) dt + G(t,

X(t)) d

W(t)
X(0) = [X
1
(0), X
2
(0), X
3
(0)]
T
,
where

W(t) = [W
1
(t), W
2
(t), . . . , W
7
(t)]
T
is a vector of 7 independent Wiener
process and the diusion matrix G, GG
T
= V , is a 3 7 matrix of the form
2
6
6
6
6
4
r
X
0
X
1
N
p
dX
1

p
X
1
r
X
1
X
3
N
0 0 0
0 0
p
X
1
0
p
(d + )X
2
0 0
0 0 0
r
X
1
X
3
N
0
p
(d + )X
3
p
X
0
3
7
7
7
7
5
Reference: E Allen, L Allen, A Arciniega, P Greenwood (2008).
SDE (3)
8
>
<
>
:
d
X(t) =

f(t,

X(t)) dt + H(t,

X(t)) d

W
(t)
X(0) = [X
1
(0), X
2
(0), X
3
(0)]
T
,
where

W
(t) = [W
1
(t), W
2
(t), W
3
(t), W
4
(t), W
5
(t)]
T
is a vector of 5
independent Wiener processes and H is the 3 5 matrix, HH
T
= V .
2
6
6
6
6
4
r
X
0
X
1
N
+ dX
1

p
X
1
r
X
1
X
3
N
0 0
0
p
X
1
0
p
(d + )X
2
0
0 0
r
X
1
X
3
N
0
p
(d + )X
3
+ X
0
3
7
7
7
7
5
.
The Expectation Agrees with the Deterministic
Solution when I= 0.
Let X
1
= I = 0, X
2
= R, X
3
= V .
dR
dt
= (d + )R +
q
(d + )R
dW
1
dt
dV
dt
= S (d + )V +
q
S + (d + )V
dW
2
dt
.
where S = N RV . Taking expectations of the preceding system of Ito SDEs
leads to the following system of ordinary dierential equations:
dE(R)
dt
= (d + )E(R)
dE(V)
dt
= E(S) (d + )E(V)
Numerical Comparisons of the SDE Models (1), (2), (3).
The mean and standard deviation for each random variable and for each model are
computed at t = 5 years using the Euler-Maruyama method with 10,000 sample
paths. The parameter values are chosen in the region of bistability but the initial
conditions are such that almost all sample paths are close to the stable endemic
equilibrium (
I,

R,

V ) = (193, 285, 447), N = 1000. X
1
= I, X
2
= R, X
3
= V
Model Variables E(X
i
) (X
i
)
X
1
188.7 24.0
(1) X
2
278.4 28.8
X
3
459.0 46.1
X
1
188.5 24.4
(2) X
2
278.3 29.4
X
3
459.7 47.3
X
1
189.2 23.5
(3) X
2
279.1 27.6
X
3
457.7 44.0
Table 3: Mean and standard deviation for the three SDE epidemic models at t = 5
years. The units of the parameter values are per year: = 365/21, d = 1/75, =
365/32, = 365/20, and =1/5. Vaccine ecacy is = 0.10. Initial conditions
are X
1
(0) = 5, X
2
(0) = 0 = X
3
(0).
The Probability Distributions Associated with
Each of the Random Variables are
Approximately Normal at t = 5.
(
I,

R,

V ) = (193, 285, 447), N = 1000.
0 50 100 150 200 250 300
0
200
400
600
800
1000
(a)
i
P
r
o
b
{
X
1
(
t
)
=
i
}

1
0
4
100 150 200 250 300 350 400
0
100
200
300
400
500
600
700
800
r
P
r
o
b
{
X
2
(
t
)
=
r
}

!

5

!

1
0
4
(b)
200 300 400 500 600 700
0
100
200
300
400
500
(c)
v
P
r
o
b
{
X
3
(
t
)
=
v
}

!

5

!

1
0
4
Figure 6: (a) X
1
= I, (b) X
2
= R, and (c) X
3
= V , 10,000 sample paths.
The stochastic model which includes the variability in the disease dynamics can
be useful in estimating the distribution of and range in the number of cases under
various scenarios.
Next, We Compare The Probability Distribution
Associated With the Infective Population in the
Region of Bistability.
After 5 years, the Probability distribution for the Infective Population Shows a
Bimodal Distribution, Corresponding to the Two Stable Endemic Equilibria.
Case 1:
N = 1000
Prob{I(0) = 5} = 1, Prob{R(0) = 0} = 1,
Prob{V(0) = 0} = 1
(a) (b)
0 50 100 150 200 250 300
0
200
400
600
800
1000
i
P
r
o
b
{
I
(
T
)
=
i
}

!

5

!

1
0
4
!=0.09
0 50 100 150 200 250 300
0
200
400
600
800
1000
i
P
r
o
b
{
I
(
T
)
=
i
}

!

5

!

1
0
4
!=0.095
(c) (d)
0 50 100 150 200 250 300
0
200
400
600
800
1000
i
P
r
o
b
{
I
(
T
)
=
i
}

!

5

!

1
0
4
!=0.10
0 50 100 150 200 250 300
0
200
400
600
800
1000
i
P
r
o
b
{
I
(
T
)
=
i
}

!

5

!

1
0
4
!=0.105
Figure 7: Case 1: (a) = 0.09, (b) = 0.095, (c) = 0.10, (d) = 0.105.
Case 2:
N = 1000
Prob{I(0) = 5} = 1, Prob{R(0) = 0} = 1,
Prob{V(0) = 500} = 1.
(a) (b)
0 50 100 150 200 250 300
0
200
400
600
800
1000
i
P
r
o
b
{
I
(
T
)
=
i
}

!

5

!

1
0
4
!=0.09
0 50 100 150 200 250 300
0
200
400
600
800
1000
i
P
r
o
b
{
I
(
T
)
=
i
}

!

5

!

1
0
4
!=0.095
(c) (d)
0 50 100 150 200 250 300
0
200
400
600
800
1000
i
P
r
o
b
{
I
(
T
)
=
i
}

!

5

!

1
0
4
!=0.10
0 50 100 150 200 250 300
0
200
400
600
800
1000
i
P
r
o
b
{
I
(
T
)
=
i
}

!

5

!

1
0
4
!=0.105
Figure 8: Case 2: (a) = 0.09, (b) = 0.095, (c) = 0.10. (d) = 0.105.
This Concludes Part IV on SDE.
Theory
Theory
Theory
Comparison to Other Stochastic Processes: DTMC and CTMC
Acknowledgement
Thank you to Professor Sze Bi Hsu and Professor Jing Yu for the invitation
to present lectures at the National Center for Theoretical Sciences at the National
Tsing Hua University.
Thank you to Professor Lih-Ing Roeger for the wonderful tour of Taiwan.
Thank you to all of the participants, faculty and sta at the National Center for
Theoretical Sciences for your kind hospitality.
Matlab Programs to Simulate Three Sample Paths for the
Stochastic SIS Epidemic Models:
DTMC, CTMC and SDE
S =susceptible, I = infective, N = S + I, R
0
=
b +
dS
dt
=
SI
N
+ (b + )I
dI
dt
=
SI
N
(b + )I
% Matlab Program # 1
% DTMC SIS Epidemic Model
%Three Sample Paths and the Deterministic Solution
clear
set(0,DefaultAxesFontSize, 18)
beta=1;
g=0.25;
b=0.25;
R0=beta/(b+g)
N=100;
init=2;
LJS Allen Texas Tech University
dt=0.01;
time=25;
sim=3;
for j=1:sim
i(1)=init;
for t=1:time/dt
r=rand; % uniform random number
birth=beta*i(t)*(N-i(t))/N*dt;
death=(b+g)*i(t)*dt;
if r<=birth
i(t+1)=i(t)+1;
elseif r>birth & r<=birth+death
i(t+1)=i(t)-1;
else
i(t+1)=i(t);
end
end
% Sample paths in different colors
if j==1
stairs([0:dt:time],i,r-,LineWidth,2);
hold on
elseif j==2
stairs([0:dt:time],i,g-,LineWidth,2);
else
stairs([0:dt:time],i,b-,LineWidth,2);
end
end
% Eulers Method for Deterministic SIS Model
y(1)=init;
for k=1:time/dt
y(k+1)=y(k)+dt*(beta*(N-y(k))*y(k)/N-(b+g)*y(k));
end
plot([0:dt:time],y,k--,LineWidth,2);
hold off
axis([0,25,0,80]);
xlabel(Time);
ylabel(Number of Infectives);
% CTMC SIS Epidemic Model
% Three Sample Paths and the Deterministic Solution
clear
set(0,DefaultAxesFontSize, 18);
set(gca,fontsize,18);
beta=1;
b=0.25;
g=0.25;
R0=beta/(b+g)
N=100;
init=2;
time=25;
sim=3;
for k=1:sim
clear t s i
t(1)=0;
i(1)=init;
j=1;
while i(j)>0 & t(j)<time
u1=rand; % uniform random number
u2=rand; % uniform random number
tot=(beta/N)*i(j)*(N-i(j))+(b+g)*i(j);
birth=(beta*(N-i(j))/N)/(beta*(N-i(j))/N+b+g);
t(j+1)=t(j)-log(u1)/tot;
if u2 <= birth
i(j+1)=i(j)+1;
else
i(j+1)=i(j)-1;
end
j=j+1;
end
if k==1
stairs(t,i,r-,LineWidth,2)
elseif k==2
stairs(t,i,b-,LineWidth,2)
else
stairs(t,i,g-,LineWidth,2)
end
hold on
end
% Eulers Method Applied to the Deterministic SIS Epidemic Model
dt=0.01;
y(1)=init;
for k=1:time/dt
end
axis([0,time,0,80]);
xlabel(Time);
hold off
% SDE SIS Epidemic Model
% Three Sample Paths and the Deterministic Solution
clear
beta=1;
b=0.25;
g=0.25;
R0=beta/(b+g)
N=100;
init=2;
dt=0.01;
time=25;
sim=3;
for k=1:sim
clear i, t
j=1;
i(j)=init;
t(j)=dt;
% Euler-Maruyama Method
while i(j)>0 & t(j)<25
mu=beta*i(j)*(N-i(j))/N-(b+g)*i(j);
sigma=sqrt(beta*i(j)*(N-i(j))/N+(b+g)*i(j));
rn=randn; % standard normal random number
i(j+1)=i(j)+mu*dt+sigma*sqrt(dt)*rn;
t(j+1)=t(j)+dt;
j=j+1;
end
if k==1
plot(t,i,r-,Linewidth,2);
elseif k==2
plot(t,i,b-,Linewidth,2);
else
plot(t,i,g-,Linewidth,2);
end
hold on
end
% Eulers method applied to the deterministic SIS model.
y(1)=init;
for k=1:time/dt
end
axis([0,time,0,80]);
xlabel(Time);
hold off
0 5 10 15 20 25
0
10
20
30
40
50
60
70
80
Time
N
u
m
b
e
r

o
f

I
n
f
e
c
t
i
v
e
s
DTMC
0 5 10 15 20 25
0
10
20
30
40
50
60
70
80
Time
N
u
m
b
e
r

o
f

I
n
f
e
c
t
i
v
e
s
CTMC
0 5 10 15 20 25
0
10
20
30
40
50
60
70
80
Time
N
u
m
b
e
r

o
f

I
n
f
e
c
t
i
v
e
s
SDE

An Intensive Course in Stochastic Processes

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

An Intensive Course in Stochastic Processes

Загружено:

Авторское право:

Доступные форматы

An Intensive Course in Stochastic Processes and

Stochastic Dierential Equations in

is referred to as the quasistationary probability distribution.

can be determined. It will be seen that q

P, and the limiting positive stationary probability distribution q

is an approximation for the quasistationary

cannot be found directly from

when r = 0.015, K = 10, and

(solid curve), and the

(t) < 1 for [0, 1).

< T(N 2).

41)/8 1.175 > 1. Suppose

For a nite state space N, T is an N N matrix with

, t), the mgf. Multiply the dierential equation dp/dt = Qp by

leads to mgf equation:

, where denotes transpose,

< 1, < 0.109,

Вам также может понравиться