Академический Документы
Профессиональный Документы
Культура Документы
Real-time optimization
Real-time optimization problems rely on decision rules that specify how decisions should
maximize future benefit, given the current state of a system. State dependence provides a
convenient way to deal with uncertainty. Some examples:
Reservoir releases Decision rule specifies how current release should depend on current
depend on current soil moisture and temperature. Primary uncertainties are future
meteorological variables.
Real-time optimization can be viewed as a feedback control process:
Input It
System
Control
State xt+1
x t +1 = g(x t , u t , It )
ut
t+1 t
Time loop
Decision rule
ut(xt)
State variables:
Control variables:
Input variables:
Decision rule:
xt
ut
It
u t ( xt )
State equation:
xt + 1 = g ( x t , u t , I t )
initial condition: x0
Dynamic Programming
Dynamic programming provides a general framework for deriving decision rules. Most dynamic
programming problems are divided into stages (e.g. time periods, spatial intervals, etc.):
f1(u1 ,x1, x2)
u0
Stage 2
1
x1
2
x2
u1
t-1
xt-1
ut-1
t
xt
T-1
xT-1
uT-1
T
XT
VT(xT)
I0
It-1
I1
IT-1
where benefit-to-go at t is terminal benefit (salvage value) VT ( xT ) plus sum of benefits for
stages t through T-1:
: Ft ( xt ,..., xT 1 , u t ,..., uT 1 ) =
T 1
f i (ui , xi , xi+1 )
+ VT ( xT )
Benefit from
remaining stages
Terminal
benefit
i =t
Benefit-to-go
subject to:
xi +1 = g t ( xi , u i , I i ) ; i = t ,..., T 1 (state equation)
and other constraints on the decision variables:
{xt , ut } t , {xT } T (decision variables lie within some set t at t = 0,,T).
Objective may be rewritten if we repeatedly apply state equation to write all xi ( i > t ) as
functions of xt , ut ,..., uT 1 , I t ,..., I T 1 :
Ft ( xt ,..., xT , ut ,..., uT 1 ) = Ft ( xt , u t ,..., uT 1 , I t ,..., I T 1 )
Decision rule ut ( xt ) at each t is obtained by finding sequence of controls u t ,..., uT 1 that
maximizes Ft ( xt , u t ,..., uT 1 , I t ,..., I T 1 ) for a given state xt and a given set of specified inputs
I t ,..., I T 1 .
Vt ( xt ) = Max f t (u t , xt , xt +1 ) +
Ft +1 ( xt +1 , u t +1 ,..., uT 1 , I t +1 ,..., I T 1 )
Max
u t
u t +1 ,...uT 1
t = 0, 1, 2
Inflow
x0
x2
x1
Outflow
V2(x2)
u0
u1
f0(u0)
f1(u1)
0 x2 1
= Max[ f1 (u1 ) + V2 ( x1 u1 )]
u1
= Max[u1 + ( x1 u1 )(1 x1 + u1 )]
u1
= Max u 0 + x0 u 0
u0 2
Then the optimization of problem at each stage reduces to an exhaustive search through all
feasible levels of utj to find the one that maximizes:
f t [u tj , xtk , g t ( xtk , u tj , I tl )] + Vt +1[ g t ( xtk , u tj , I tl )]
Inflow It
Storage xt
Release ut
State equation:
xt +1 = xt ut + I t
t = 0, 1, 2
I0
I1
1
0
Terminal (outflow) benefits:
I2
1
V3(x3) = 0 for all x3 values
f2(u2)
0
1
3
Possible state transitions are derived from state equation, inputs, and permissible variable values:
Benefit is shown in parentheses after each feasible control value.
Show the possible transitions with a diagram where each feasible state level xtk is a circle and
each feasible control level utj is a line connecting circles:
Return
10
Control
0(0)
1(3)
1(1)
2(2)
1(3)
1(4)
5
0(0)
1(1)
1(4)
0(0)
2(2)
0
0(0)
2(3)
0(0)
Benefit
0
0(0)
2(3)
2(5)
1(3)
Stage 1
0(0)
1(1)
Stage 2
Stage 3
u2
Identify optimum u2(x2) values for each possible x2, V3(x3) specified as an input:
X2
0
Optimum
0
1
2
0
1
3
+ 0
+ 0
+ 0
=0
=1
= 3 = V2(x2)
Optimum
1
2
1
3
+
+
=1
= 3 = V2(x2)
Optimum
0
0
u1
Identify optimum u1(x1) value for each possible x1, obtain V2(x2) from Stage 2:
x1
0
1
Optimum
0
1
0
4
+
+
3
1
=3
= 5 = V1(x1)
Optimum
0
1
2
0
4
5
+
+
+
3
3
1
=4
= 7 = V1(x1)
=6
Optimum
u0
Identify optimum u0(x0) values for each x0, obtain V1(x1) from Stage 1:
x1
u0(x0)
f0(u0) + V1(x1)
0
1
0
3
+ 5
+ 1
= 5 = V0(x0)
=4
Optimum
0
1
2
0
3
2
+ 7
+ 5
+ 1
= 7 = V0(x0)
=8
=3
Optimum
1
2
3
2
+ 7
+ 5
= 10 = V0(x0)
=7
Optimum
The optimum ut(xt) decision rules for t = 0, 1, 2 define a complete optimum decision strategy:
x0
0
1
2
u0
0
1
1
x1
0
1
2
u1
0
1
1
x2
0
1
2
u2
1
2
2
10
Optimum controls
and corresponding
return values
Optimal
control
Note that there is a path leaving every state value. The optimum paths give a strategy for
maximizing benefit-to-go from t onward, for any value of state xt.
Optimal benefit for each possible initial storage is V0(x0).
Computational Effort
The solution to the discretized optimization problem can be found by exhaustive enumeration
(by comparing benefit-to-go for all possible ut ( xt ) combinations).
Dynamic programming is much more efficient than enumeration since it divides the original T
stage optimization problem into T smaller problems, one for each stage.
To compare computational effort of enumeration and dynamic programming assume:
State dimension = M, Stages = T, Levels = L
Equal number of levels for ut and xt at every stage
2
All possible state transitions are permissible (i.e. L transitions at each stage)
Then total number of Vt evaluations required is:
M(T+1)
Exhaustive enumeration: L
2M
Dynamic Programming: TL
For M = 1, L = 10, T = 10 the number of Vt evaluations required is:
Exhaustive enumeration: 10
11
Dynamic Programming: 10