You are on page 1of 5


When we use this recursive relationship, the solution procedure moves back- 401
ward by stage-each time finding the optimal policy for that stage-until it
stage' Dynamic Programming
finds the optimal policy starting at the initial stage.
This backward movernent was demonstrated by the stagecoach problem, where
the optimal policy was found successively beginning in each state at stages 4, 3, 2,
and 1, respectively.t For all dynamic programrning problems, a table such as the
following one would be obtained foreach stage (n : N, N - 1, . . ., 1).

x f"(s", x") fi("") x.

When this table is finally obtained for the initial stage (n : l), the problem of interest
is solved. Because the initial state is known, the initial decision is specified by xf in
this table. The optimal value of the other decision variables is then specified by the
other tables in turn according to the state ofthe system that results from the preceding

lL,3 Deterministic Dynamic Progtamming

This section further elaborates upon the dynamic programming approach to determin-
isric problems, where the state atthe next stage is completely dcterminedby the state
and policy decision at tlrc cunent stage. T,he probabilistic case, where there is a
probability distribution for what the next state vill be, is discussed in the next section.
Deterministic dynamic programming can be described diagrammatically as
shown in Fig. 11.2. Thus at stage ntlre process will be in some state sn. Making
policy decision Jtr then moYes the process to some state sr+r 8t stage (n + 1). The
contribution thcreafter to the objective function under an optimal policy has been
previously calculated to be ff+r(r,+r). Ttle policy decision .ro also makes some
contribution to the objective function. Combining these two quantities in an appro'
priate way provides JJq, xn), the conribution of stages z onward to the objrxtive
function. Optimizing with respct to r, ttpn gives ff(s,) :
t,,(sn, xI). aner finding
xj and f(s,) for each possible value of s, the solution procedure is redy to move
back one stage.

Stagc Stags
n n*l
@ Contribution
/"(s", x.) of x" .,f,1* 1(r,+ r)

trlgure 11.2 Tlrc basic strucorrc for dctcrministic dynamic programming.

t Actually, for this pmblem tlre solution procedurc can rxrve citherbxkward or forward' However, for
many pmblems (cspecially wlren the stagcs conespond lo tit t
pcriodsr, the solution proc.€due , move
402 one way of categorizing deterministic dynamic programming problems is by
Mathematiczl the form of the objective function. For example, the objective might be to minimize
Pmgramming the sum of the contributions from the individual stages (as for the stagecoach problem),
or to maximree such a sum, or to minimize a produd of such terms, and so on.
Another categorization is in terms of the nature of the ser of states for the respective
stages. In particular, the states sn might be representable by a discrete state variable
(as for the stagecoach problem), or by a continuous state variable, or perhaps a state
vector (roore than one variable) is required.
Several examples are presented to illustrate these various possibilities. More
important, they illustrate that these apparently major differences are actually quite
inconsequential (except in terms of computational difficulty) because the underlying
basic structure shown in Fig. 11.2 always remains the same.
The first new example arises in a much different context from the stagecoach
problem, but it has the same mathematicalformulation except that the objective is to
maximize rather than minimize a sum.

Example 2-Distributing Medical Teams to Countries

The woRLD HEALTH couNCIL is devoted to improving health care in the under-
developed countries of the world. It now has five medicol teams available to allocate
among three such countries to improve their medical care, health education, and
training programs. Therefore, the council needs to determine how many teams (if
any) to allocate to each of these countries to maxirnize the total effectiveness of the
five teams. The teams must be kept intact, so the number allocated to each country
must be integer.
The measure of performance being used is additional person-years of life. (For
a particular country, this measure equals the count4l's increased life expectancy in
years times its population.) Table 11.1 gives the estimated additional person-years of
life (in multiples of 1,000) for each country for each possible allocation of medical
Which allocation rnaximizes the measure of performance?

Forurur,lrrox: This problem requires making ttrer, intenelated decisiotts, namely,

how many medical teams to allocate to each of the three counties. Therefore, even
though there is no fixed sequence, these three countries can be considered as the three

Tabb 11.1 Data for the World Hcslth

Coumit koblem ftr-i,t,3
Tla usands { Ailitioaal
Pcrsoa-fears of Ufe

[: tFl
45 70
75 g)
ll0 100
r50 130

stages in .a dynamic prograrnming formulation. The decision variables xo (n I , 2, 3) : 403
would be the number of teams to allocate to st4ge (country) n.
Dpamic Progranming
The identification of the states may not be readily appar"nt. To determine the
stat€s, we ask questions such as the following. What is it that changes from one stage
to the next? Given that the decisions have been made at the previous stages, how can
the status of the situation at the current stage be described? What information abgut
the current state of affairs is necessary to determine the optimal policy hereafter? On
these bases, an appropriate choice for the "state of the system" is

J, : number of medical teams still available for allocation to the

remaining countries (n, . . ., 3).

Thus, at stage 1 (country l), where all three countries remain under consideration for
allocations, rr = 5. However, at stage 2 or 3 (country 2 or 3), sn is just 5 minus the
number bf teams allocated at preceding stages. With the dynamic programming pro-
cedure of solving backward stage by stage, when we are solving at stage 2 or 3, we
shall not yet have solved for the allocations at the preceding stages. Therefore, we
shall consider every possible state we could be in at stage 2 or 3, namely, J, = 0,
1,2,3,4, or 5,
l*t p,(x) be the measure of performance from allocating r, medical teams to
counbry i, as given in Table 11.1. Thus the objective is to choose x1, x2, 13 so as to
Maximize ) p,(r,),
subject to ) r, = 5,
i= I

and the x, are nonnegative integers.

Using the notation presented in Sec. 1L.2, f o(sn, r,) is then


!n{sn, xr) : pr(xn, * maxirnum X pr(rr),

i=r+ I
where the maximum is taken ov€r -r7a1r ... , tr3 such that


and the r, are nonnegative integers, for n : 1,2,3. In a&ition,

fl(s") = t-0.1.-...r.
max f"(s",.ro)

Thercforc, f,(s,, ro) = p[x,) + ff*r(sn - r,)

(wi6 fi &fined totx' zero). These basic relationships arc summarized in Fig. 11.3.
Consequently, &rc, recwsive relationsWp relating ttrc fr, *d functions fI, f!
for this problem is

fl(s.) = mil( {p"(x"'l * JI*r(s, - ro}, for n : 1,2.

404 Stage Stage
.@ *
Valuc;[, (q, r,,) (s" x")
Po \Jtn) "ft*'
= p,(r,) +-fI*,(s, - r,)
ftgure ll.3 the b6ic s&ucirne for t$c Wortd Health Council ptoblem.

For the last stagc (n : 3),

fl(sr) miu( p{xs).

= 13:0'1"'''s3 l

The resulting dynamic programming calculations are given next.

Sor.urrox Pnncrounp: Beginning with the last stage (n : 3), we note that the
values of p3(x3) are given in the last column of Table 11. I, and that these values keep
increasing as we moye down the column. Therefore, with s, medical teams still
t* available for allocation to country 3, the maximum of p3(rJ is automatically achieved
by altocating all .r3 teams, so rl :
s, and fj(sr) :
pr(sr) as shown in the following

0 0
2 ,
3 3
4 4
5 5

We now illove bacl$rad to start from the next-to-last stage (n : 2). Here,
finding .rf requires elculating and comparing fdsz, +) for thc altornative valtm of
ra, natrply, 4 = a, 1, . . ., s2. To illusEate, wo &Pict this sifiration whcn s2 : 2

This diagrarn cur€sporlds !o Ftg. 11.3 exccp that dl duee posoiblc states at stage
3 arc shown. Thus, if :2 = 0, the rcsulting 6ta& at stage 3 will be sz - xz : / -
O = 2, wlrcreas r, = 1 l,eads to state I and .r2 : 2 leads to statc 0. Thc corresponding
values of p2(xr) from the country 2 column of Table I l. I are shown along the links, 405
and the values of f!(sz - xr) from the z : 3 table are given next to the stage 3 Dpamic Programming
nodes. The required calculations for this case of sz : 2 are summarized below.

x2=0: fz{2,0): f\(2): 0 * 7o = 70.

pr{o) +
h: li fzQ, D : p2(r) + JI(l) : ?-o * 50 : 70.
x2=2: fz(2,2) = pz(Z) + f1(o) :45 + o:45.
Because the objective is maximization, x! = 0 or f wittr fi121 : Z\
Proceeding in a similar way with the other possible values of sl 1try it; yietas
the following table.

4 l[s2, r) = p2(x) + "fj(s, - .t )

n=2: J1 0 I 2 1 4 ) ti3) x2

0 0 0 0
I 50 ?o 50 0
2 70 70 45 m 0or I
3 80 90 95 75 95 a

$ lm 100 l15 125 ll0 125 3

5 130 lm 125 145 160 150 r60 4

We now are rcady to move backward to solve the original problcm where we
arc starting fum stage I (n = l). In this case, the only state to be considered is the
starting state of f,t :
5, as depicted below.

sire allocating .rr rnedical teams to country I leads to s statc of (5

- "r,) at stage
2, a choice of .1, : 0 leads to ttre bottom node on tlre right, rr : I lcads to the rrcxt
m& up, and so forttr up to drc top node with.rl = 5. The ccrcspmdingpr(rr) valncs
from Table ll.l arc shown next to the links. The numbers $ext to tlrc no&s arc
ohaincd from the f!(sr) column of the n = 2 tabk. As witr fr = 2, thc calculation
rEded for each alternalive value of ttre decision variable involves adding the corrc-
spording link value and node value, as summarized below.

rt:O: fr(5,0):pr(0) + fI(5): 0 + 160: 160.

,i : l: fr(5, l) : pr(l) + fi(I) : 45 + 125 = 170.

.r, :5.
fr(5,5):pr(5) + fi(g): lfr + O:120.