Вы находитесь на странице: 1из 16

Optimal Control Theory

In this 'chapter we take up the problem of optirnization over time. Such problems are common in econornics. For example, in the theory of investment, firms are assumed to choose the time path of investment expenditures to maxirnize the (discounted) sum of profits over time. In the theory of savings, individuals are assumed to choose the time path of consumption and saving that maxirnizes the (discounted) sum of lifetime utility. These are examples of dynarnic optirnization problems. In this chapter, we study a new technique, optimal control theory, which is used to solve dynarnic optirnization problems. It is fundamental in econornics to assume optirnizing behavior by econornic agents such as firms or consumers. Techniques for solving static optirnization problems have already been covered in chapters 6,12, and 13. Why do we need to learn a new mathematical theory (optimal control theory) for handling dynamic optirnization problems? To demonstrate the need, we consider the following econornic example. Static versus Dynamic Optimization: An Investment Example Suppose that a firm's output depends only on the amount of capital it employs. Let

Q = q(K)
where Q is the firm's output level, q is the production function and K is the amount of capital employed. Assume that there is a well-functioning rental market for the kind of capital the firm uses and that the firrn is able to rent as much capital as it wants at the price R per unit, which it takes as given. To make this example more concrete, imagine that the firm is a fishing company that rents fully equipped units of fishing capital on a daily basis. (A unit of fishing capital would include boat, nets, fuel, crew, etc.). Q is the number of fish caught per day and K is the number of units of fishing capital employed per day. If p is the price of fish, then current profit depends on the amount of fish caught, which in turn depends on the amount of K used and is given by the function n(K):
n(K)
= pq(K)

- RK

1000

CHAPTER 25

OPTIMAL CONTROL THEORY

Ifthe firm's objective is to choose K to maxirnize current profit, tbe optirnal amoun of K is given implicitly by the usual first-order condition: t
n'(K)

--

pq'(K)

- R

=O

But why should the firm care only about current profit? Why would it not take a longer-term view and also care about future profits? A more realistic assumptio n is that the firm's objective is to maximize the discounted sum of profits over an interval of time running from tbe present time (t O) to a given time horizon, T. This is given by tbe functional J[K(t)]

max J[K(t)]

e-ptn[K(t)]

dt

where p is the firrn's discount rate and e-pt is tbe continuous-time discounting factor. J[K (t)] is called a functional to distinguish it from a function. A function maps a single value for a variable like K, (or a finite number of values if K is a vector of different types of capital) into a single number, like the amount of current profit. A functional maps a function like K (t )-or finite number of functions if there is more tban one type of capital-into a single number, like the discounted sum of profits. It appears we now have a dynarnic optirnization problem. The difference between this and the static optirnization problem is that we now have to choose a path of K values, or in other words we have to choose a function of time, K (t), to maxirnize J, rather than having to choose a single value for K to maxirnize tt (K). This is the main reason tbat we require a new mathematical theory. Calculus helps us find the value of K that maxirnizes a function x (K) because we can differentiate n:(K) with respect to K to find the maximum of n:(K). However, calculus is not, in general, suited to helping us find the function of time K (t) that maxirnizes the functional J[K (t)] because we cannot differentiate a functional J[K (t)] witb respect to a function K (t). It tums out, however, that we do not have a truly dynarnic optirnization problem in this example. As a result calculus works well in solving this particular problem. The reason is that the amount of K rented in any period t affects only profits in that period and not in any other periodo Thus it is fairly obvious that the maximum of the discounted sum of profits occurs by maxirnizing profits at each point in time. As a result tbis dynarnic problem is really just a sequence of static optirnization problems. The solution therefore is just a sequence of solutions to a sequence of static optirnization problems. Indeed, this is the justification for spending as much time as we do in econornics on static optirnization problems. . An optirnization problem becomes truly dynarnic only when the econoJ11.lc l choices made in tbe current period affect not only current payoffs (profit) bu

--OUnt

CHAPTER 25

OPTIMAL CONTROL THEORY

1001

ke a tion
r an

, T.

if current output affects only current profit, then in choosing current output, we need only be concerned with its effect on current profit. Hence we choose current output to maximize current profit. But if current output affects current profit and profit at a later date, then in choosing current output, we need to be concerned about its effect on current and future profit. This is a dynamic problem. To turn our fishing firm exampJe into a truly dynamic optimization problem, let us drop the assumption that a renta! market for fishing capital exists. Instead, we suppose that the firm must purchase its own capital. Once purchased, the capital lasts for a long time. Let I (t) be the amount of capital purchased (investment) at time t, and assume that capital depreciates at the rate 8. The amount (stock) of capital owned by the firm at time t is K (t) and changes according to the differential equation

also payoffs (profits) at a later date. The intuition is straightforward:

IOg

= I(t)

- 8K(t)

ion sa ent ; if ted


Ice

which says that, at each point in time, the firrn's capital stock increases by the amount of investment and decreases by the amount of depreciation. Let c[l(t)] be a function that gives the cost of purchasing (investing) the amount I (t) of capital at time t; then profit at time t is
JT[K(t), I(t)]

= pq[K(t)]

- c[l(t)]

~a to
~).

ps ue
)t,

The problem facing the fishing firm at each point in time is to decide how much capital to purchase. This is a truJy dynamic problem because current investment affects current profit, since it is a current expense, and also affects future profits, since it affects the amount of capital available for future production. If the firm's objective is to maximize the discounted sum of profits from zero to T, it maximizes

es th
m n. at

f[l(t)]

foT
K(O)

e-P'JT[K(t),

I(t)]dt

subject to

K~

I (t) - 8K

,f n )f
h e
rt

Ko

Once a path for 1 (t) is chosen, the path of K (t) is completely determined because the initial condition for the capital stock is given at Ko. Thus, the functional J depends on the particular path chosen for 1 (t). There is an infinite number of paths, 1 (r), from which to choose. A few examples of feasible paths are as follows: (i) 1 (t) = 8 Ko. This is a constant amount of investment, just enough to cover depreciation so that the capital stock remains intact at its initial leve!.

1002

CHAPTER 25

OPTIMAL CONTROL THEORY

---.......

(ii) I (t) = O. This is the path of no investment, (iii) I (t) = Aea/. This is a path of investrnent that starts with 1(0) = A and the increases over time at the rate a, if a > O, or decreases at the rate a, if a '" O~ These are just a few arbitrary paths that we mention for illustration. In fact an function of t is a feasible path. The problem is to choose the path that maximiz/ J[I (t)]. Since we know absolutely nothing about what this function of time migh~ look like, choosing the right path would seem to be a formidable task. It tums out that in .the special case in which T = 00 and the function n [K (l), I(t)] takes the quadratic form n[K(t), I(t)] = K - aK2
-

12

the solution to the above problem is a(Ko I* (t) = r) _

o _ K) p

er/

+ oK

l(t)
1(0)

where r is the negative root of the characteristic equation of the differential equation system that, as we shaJl see, results from solving this dynamic optimization problem, and K is the steady-state level of the capital stock that the firm desires, and is given by 1 K=----2[0(p + o)

+ a]

SK ~------------------_.--

-.

o
Figure 25.1 Optimal path of investrnent over time

Figure 25.1 display s the optimal path ofinvestment for the case in which Ko < k. Along the optimal path, investment declines. In the limit as t -+ 00, investment converges to a constant amount equal to o K (since r) < O) so that in the long run the firm's investment is just replacement of depreciation. How did we find this path? We found it using optimal control theory, which is the topic we tum to now.

25.1 The Maximum Principie


Optimal control theory relies heavily on the maximum principie, which amountS to a set of necessary conditions that hold only on optimal paths. Once you kno~ how to apply these necessary conditions, then a knowJedge of basic caJculus an erns differential equations is all that is required to solve dynamic optimization probl sarY Iike the one outlined above. In this section we provide a statement ofthe neces oll conditions ofthe maximum principIe and then provide ajustification. In additi we provide examples to illustrate the use of the maximum principIe.

hen ~ O. any zes ght


't),

25.1

THE MAXIMUM PRINCIPLE

1003

We begin with a definition of the general form of the dynamic optimization problem that we shall study in this section.

D e fin it io n 25. 1

The general form of the dynamic optimization problem with a finite time horizon and a free endpoint in continuous-time models is

max J subject to

l
x

f[x(t), = g[x(t),

y(t), t] dt y(t), t] (given)

(25.1)

x(O) = xo > O

laDn
!S,

The term free endpoint means that x(T) is unrestricted, and hence isfree to be chosen optimally. The significance of this is explored in more detail below. In this general formulation, J is the value of the functional which is to be maximized, x(t) is referred to as the state variable and y(t) is referred to as the control variable. As the name suggests, the control variable is the one directly chosen or controlled. Since the control variable and state variables are linked by a differential equation that is given, the state variable is indirectly inftuenced by the choice of the control variable. In the fishing firm example po sed above, the state variable is the amount of capital held by the firm; the control variable is investment. The example was a free-endpoint problem because there was no constraint placed on the final amount of the capital stock As well, the integrand function, f[x(t), y(t), t], was equal to n [K (t), ! (t)]e-pt, and the differential equation for the state variable, g[x(t), y(t), t)], was simply equal to !(t) - 8K(t). We will examine a number of important variations of this general specification in later sections. In section 25.3 we examine the fixed endpoint version of this problem. This means that x (T), the final value of the state variable, is specified as an equality constraint to be satisfied. In section 25.4 we consider the case in which T is infinity. Finally in section 25.6 we consider the case in which the time horizon, T, is also afree variable to be chosen optimally. Suppose that a unique solution to the dynamic optimization problem in definition 25.1 exists. The solution is a path for the control variable, y(t). Once this is specified, the path for the state variable is automatically determined through the differential equation for the state variable, combined with its given initial condition. We assume that the control variable is a continuous function of time (we relax this assumption in section 25.5) as is the state variable. The necessary conditions that constitute the maximum principie are stated in terms of a Hamiltonian function, which is akin to the Lagrangean function used to solve constrained optimization problems. We begin by defining this function:

1004

CHAPTER 25

OPTIMAL CONTROL THEORY

-------------------------------------------------------------------------------------------------------~
o e fin i t io n
25. 2
The Hamiltonian tion 25.1 is H[x(t), function, H, for the dynamic optimization problem in defini_

y(t), A(t), t]

f[x(t),

y(t), t)]

+ A(t)g[X(t),

y(t), t]

where A(t), referred to as the costate variable, is akin to the Lagrange multiplier in constrained optimization problems.

Forming the Hamiltonian function is straightforward: take the integrand (the function nder the integral sign), and add to it the equation for x multiplied by an, as yet, unspecified function of time, A(t). We can now state the necessary conditions .

Theorem 25.1

The optimal solution path for the control variable, y (t), for the dynarnic optimza. tion problem in definition 25.1 must satisfy the following necessary conditions: (i) The control variable is chosen to maximize H at each point in time: y(t) maximizes H[x(t), y(t), ).,(t), t]. That is, aH =0

ay

(ii) The paths of x(t) and ).,(t) (state and costate variables), are given by the solution to the following system of differential equations:
).,=--

oH

ox
y(t), t]

x = g[x(t),

(iii) The two boundary conditions used to solve the system of differential equations are given by
x(O)

= xo,

A(T)

=O

In writing the first necessary condition, we have assumed that the Harrltonian function is strictly concave in y. This assumption implies that the maximum of H with respect to y will occur as an interior solution, so it can be found by setting the derivative of H with respect to y equal to zero at each point in time. In section 25.5 we relax this assumption.

25.1

THE MAXIMUM PRINCIPLE

1005

:r

'-

The second set of necessary conditions is a system of differential equations. The first one is obtained by taking the derivative of the Hamiltonian function with respect to the state variable, x, and setting )"equal to the negative of this derivative. The second is just the differential equation for the state variable that is given as part of the optimization problem. Necessary conditions (i) and (ii) comprise the maximum principIe. The necessary conditions in (iii) are typically referred to as boundary conditions. In free-endpoint problems, one boundary condition is given, x(O), and the other is provided by a transversality condition, 'A(T) = O.Ajustification for this transversality condition is provided later in the chapter; for now, we will just say that this is a necessary condition for determining the optirnal value of x(T), when x(T) is free to be chosen optimally. The maximum principie provides the first-order conditions. What are the second-order conditions in optimal control theory? In other words, when are the necessary conditions also sufficient to ensure the solution path maximizes the objective functional in equation (25.1)? Although it is beyond our scope to prove it, we state the answer as

Theorem

25.2

The necessary conditions stated in theorem 25.1 are also sufficient for the maximization of J in equation (25.1) if the following conditions are satisfied: (i) f(x, y, 1) is differentiable andjointly concave in x and y. (ii) One of the following is true: g(x, y, t) is linear in (x. y) g(x, y, t) is concave in (x, y) and >..(t) ::: O for 1 E (O, T) g(x, y, t) is convex in (x, y) and >"(/) ::: O for t E (O, T)

The sufficiency conditions are satisfied for all of the problems examined in this chapter. As a result we need look no further than the necessary conditions to solve the dynarnic maximization problems. Example

25.1

Solve the following problem:

subject to

x(O) = 2

1006

CHAPTER 25

OPTIMAL CONTROL THEORY

-........

Solution Step 1 Form the Hanltonian function. H =x

-l

+AY

Step 2 Apply the maximum principle. Since the Hamiltonian is strictly concave in the control variable y and there are no constraints on the choice of y, We can find the maximum of H with respect to y by applying the first-order condition:

ay
This gives

aH

= -2Y+A

=0

y(t)

A(t) 2

(25.2)

Step 3

The differential equation for A(t) is

A=-

aH ih =-1

We now have a system of two differential equations which, after using equation (25.2), is 5c

= -1
A

(25.3)

x=2

(25.4)

Step 4 We obtain the boundary conditions. This is a free-endpoint problem because the value for x(l) is not specified in the problem. Therefore the boundar)' conditions are
x(O)

= 2,

A(l)

=O

Step 5 Solve or analyze the system of differential equations. In this example w~ c1t have a system of linear differential equations, so we proceed by obtaining expli solutions. Because the first differential equation, (25.3), does not depend 00 , we can salve it directly and then substitute the solution into the second equatioo.

25.1

THE MAXIMUM PRINCIPLE

1007

Solving equation (25.3) gives A.(l) = el - t, where el is an arbitrary constant of integration, the value of which is determined by using the boundary condition, A.(l) = O. This gives o = el - 1 for which the solution is el = l. Therefore we have A.(l) = 1 - t. Substituting this solution into equation (25.4) gives
x=--

1- t 2

to which the solution is t t2 =- - -

x(t)

+ e2

where e2 is an arbitrary constant of integration. Its value is determined from the boundary condition x(O) = 2. This gives 2 = e2. The solution then becomes t t2 =- - 2 4

x(t)

+2

To complete the solution to this maximization problem we substitute the solutions to the differential equations back into equation (25.2). Doing this gives

y(t)

= -2-

1- t

as the solution path for the control variable. At t over time and finishes at t = 1 with y(l) = O. An Investment Problem

= O, y(O) =

1/2. It then declines

Suppose that a firm's only factor of production is its capital stock, K, and that its production function is given by the relation a>O where Q is the quantity of output produced. Assuming that capital depreciates at the rate 8 > O,then the change in the capital stock is equal to the firrn's investment, 1, less depreciation, 8 K:

= 1 - 8K

1008

CHAPTER 25

OPTIMAL CONTROL THEORY

-.........

If the price of the firrn's output is a constant $1, and the cost of investment is eq to 2 dollars, then the firm's profits at a point in time are ua) n

K - aK2 _2

The optirnization problem we now consider is to maxirnize the integral sum of profits over a given interval of time (O, T). A more realistic objective would be to maxirnize the present-valued integral sum of profits but we postpone treatment of this problem to the next section.

max

(K - aK2

2) dt

subject to

= I - 8K (given)

K (O) = Ko To solve this, we take the following steps: Step 1 Form the Hamiltonian H

K - aK2

2 + J...(l - 8K)

Step 2 Apply the maximum principie: since the Hamiltonian is strictly concave in the control variable l , we look for the I that maximizes the Hamiltonian by using the first-order condition

aH
Since

= -2

+ J...

=O

(25.5)

a2 H / a 2

= -2 is negative, this gives a maximum. The solution is (t) = J...(t) 2

(25.6)

Step 3 Form the system of differential equations. equation

J...

must obey the differential

. = -- aH = -(1aK

2aK - A8)

25.1

THE MAXIMUM PRINCIPLE

1009

Using equation (25.6) to substitute for 1 (t), the system is


)"= 'A.o
'A.

+ 2aK

- 1

(25.7)

K = - - oK 2

(25.8)

Step 4 Obtain the boundary conditions. The boundary condition for K (t) is given by the initial condition K (O) = Ke: The boundary condition for 'A.(t) is 'A.(T) = O. Step S Solve or analyze the system of differential equations. If the system is linear, as it is in this example, use the techniques of chapter 24 to obtain an explicit solution. We do this next. If the system is nonlinear, it is probably not possible to obtain an explicit solution. In that case, use the techniques of chapter 24 to undertake a qualitative analysis, preferably with the aid of a phase diagram. In either case keep in mind that the system of differential equations obtained from employing optimal control theory provides the solution to the optimization problem. An explicit solution to the system of differential equations (25.7) and (25.8) is obtained using the techniques shown in chapter 24. The homogeneous forro of this system, written in matrix forro is (25.9) The determinant ofthe coefficient matrix ofthe homogeneous system is (-02 -a), which is negative. We therefore know immediately that the steady-state equilibrium is a saddle point. By theorem 24.2, the solutions to the system of differential equations in (25.7) and (25.8) are (25.10)

K(t)

rl --Cler1t

2a

r: - o + --C2er2t +K

2a

(25.11)

where rl and r: are the eigenvalues or roots of the coefficient matrix in equation (25.9), Cl and C2 are arbitrary constants of integration, and and K are the steady-state values of the system, and serve as particular solutions in finding the complete solutions.

1010

CHAPTER 25

OPTIMAL CONTROL THEORY

If A denotes the coefficient matrix in equation (25.9), its characteristic roo (eigenvalues) are given by the equation ts

--

r" rz

= --

tr(A)

1
_/tr(A)2

41AI

where tr(A) denotes the trace of A (sum of the diagonal elements). The roots of equation (25.9) then are

r, , r: = J 8 + a
2

The steady-state values of.le and K are found by setting 5c = O and this and simplifying yields .le= 1 - 2aK 8 .le K

= o. Doing

1(.

28

Solving these for .leand K give the steady-state values 8 1

~=

82

+ a'

K = 2(82 +a)
Fj.
fOI

Because the steady state is a saddle point, it can be reached only along the saddle path and only if the exogenously specified time horizon, T, is large enough to perrnit it to be reached. This leaves only the values of the arbitrary constants of integration to be determined. As usual, they are deterrnined using the boundary conditions K (O) ~ Ko and .Ie(T) = O.First, requiring the solution for K (t) to satisfy its initial condition gives Ko =

so

wt

--e, + --e 2 + K 2a 2a

r, - 8

r2 - 8

After simplifying, this gives e,

2a(Ko - K) - (r2 - 8)e2 r, - 8

Next, requiring the solution for ),.(t) to satisfy its terminal condition giveS 0=

e.e':" + C2

er2T

25.1

THE MAXIMUM PRINCIPLE

1011

from which we get an equation for

C2

in terms of

C) :

Substituting this into the expression for for e,:

e, and simplifying gives the solution

Substituting this solution into the equation for explicit solution for e2:

e2 and

simplifying gives the

e2 =
I(t)

-2a(Ko

- K)e(r,-r2)T

~(r, - 8)e-r2T

-------------,---c=--8 - (r2 - 8)e(r,-r2)T

r, -

This completes the solution. The optimal path of investment is obtained using equation (25.6). If we denote the solution for A(t) in equation (25.10) as A* (t), then the solution for investment, denoted J*(t) is
T !*(t) A * (t)

=-

Figure 25.2 Solution path 1) (1) for investment when Ko < K; solution path /2 (1) for investment when Ko > K

This solution gives the path of investment that maxirnizes total profits over the planning horizon. Figure 25.2 shows two possible solution paths. When Ko < K, the solution is a path like 1, (t) that starts high and declines monotonically to O at time T. When Ko > K, the solution is a path of disinvestment like l: (t) that stays negative from zero to T.

An Economic Interpretation of ). and the Hamiltonian


We introduced A(t) as a sequence or path of Lagrange multipliers. It tums out that there is a natural econornic interpretation of this co-state variable. Intuitively A(t) can be interpreted as the marginal (imputed) value or shadow price of the state variable x(t). This interpretation follows informally from the Lagrange multiplier analogy. But it also follows more formally from a result that is proved in the appendix to the chapter. There it is shown that A(O) is the amount by which J* (the maximum value function) would increase if x(O) (the initial value of the state variable) were to increase by a small amount. Therefore A(O) is the value of a marginal increase in the state variable at time t = O and therefore can be

1012

CHAPTER 25

OPTIMAL CONTROL THEORY

--

interpreted as the most we would be willing to pay (the shadow price) to acquire a bit more of it at time t = O. By extension, A(t) can be interpreted as the shadow price or imputed value of the state variable at any time t. In the investment problem just exarnined, A(t) gives the marginal (imputed) value or shadow price of the firm's capital stock at time t. Armed with this interpretation, the first-order condition (25.5) makes econornic sense: it says that at each moment of time, the firm should carry out the amount of investment thar satisfies the following equality: 21(t)

= A(t)

The lft-hand side is the marginal cost of investment; the right-hand side is the marginal (imputed) value of capital and, as such, gives the marginal benefit of investment. Thus tbe first -order condition of the maximum principie leads to a very simple investment rule: invest up to the point that marginal cost equals marginal benefit. The Harniltonian function too can be given an econornic interpretation. In general, H measures the instantaneous total econornic contribution made by the control variable toward the integral objective function. In the context of the investment problem, H is the sum of total profits earned at a point in time and the accrual of capital that occurs at that point in time valued at its shadow price. Therefore H is the instantaneous total contribution made by the control variable to the integral of profits, J. It makes sense then to choose the control variable so as to maxirnize H at each point in time. This, of course, is what the maximum principie requires.

EXERCISES 1. Solve

max subject to

-(ay

+ bl)

dt

=x -

x(O) = xo

where a, b are positive constants.

25.1

THE MAXIMUM

PRINCIPLE

1013

2.

Solve

subject to

=y

x(O) = xo

3.

Solve
T

max la subject to

-(ay

+ bl + ex) dt + f3y

i = ax
x(O)

= xo

4.

Solve

subject to

i =x

+y

x(O) = xo

S.

Solve

subject to

i =x

+y

x (0).= xo

6.

In equations (25.7) and (25.8) the differential equation system was written in terms of A.and K. For the same model, transform the differential equation system into a systern in 1 and K. Solve this systern of equations for 1(t) and K(t).

1014

CHAPTER 25

OPTIMAL CONTROL THEORY

7.

Assume that price is a constant, p, and the cost of investment is bl ", whereb is a positive constant. Then solve the foJlowing:

max subject to

[p(K

- aK2)

b/2] dt

I - 8K

K(O)

Ko

25 .. 2 Optimization Problems Involving Discounting


Discounting is a fundamental feature of dynamic optimization problems in economic dynamics. In the remainder of this chapter, we assume that p is the going rate of retum in the economy, that there is no uncertainty about this rate of return and that it is constant over time. Recall from chapter 3 that yo = y(t)e-pt is the discounted value (or present value) of y(t). In all of the subsequent models and examples examined in the chapter, firms and consumers will be assumed to maximize the discounted value (present value) of future streams of revenues or benefits net of costs.

The General Form of Autonomous Optimization Problems


Most dynamic optimization problems in economics involve discounting. As a result time enters the objective function explicitly through the term e='. However, if this is the only way the variable t explicitly enters the dynamic optimization problern, the system of differential equations can be made autonomous. The importance of this fact is that autonomous differential equations (ones in which t is not an explicit variable) are much easier to solve than nonautonomous differential equations. We specified the general form of the integrand function in definition 25.1 as f (x, y, t). If this reduces to some function of just x and y multiplied by the term e=e', say F(x, y)e-pt, and if the differential equation given for the state variable does not depend explicitly on t (is autonomous), so that g(x, y, t) specializes 10 G(x, y), then we may state

Вам также может понравиться