Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Linear-Quadratic-Gaussian (LQG) Controllers and
Kalman Filters
Emo Todorov
Applied Mathematics and Computer Science & Engineering
University of Washington
Winter 2012
Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 4 1 / 12
LQG in continuous time
Recall that for problems with dynamics and cost
dx = (a (x) + B(x) u) dt + C(x) d
` (x, u) = q (x) +
1
2
u
T
R(x) u
the optimal control law is u
+
= R
1
B
T
v
x
and the HJB equation is
v
t
= q +a
T
v
x
+
1
2
tr
_
CC
T
v
xx
_
1
2
v
T
x
BR
1
B
T
v
x
We now impose further restrictions (LQG system):
dx = (Ax + Bu) dt + Cd
` (x, u) =
1
2
x
T
Qx +
1
2
u
T
Ru
q
T
(x) =
1
2
x
T
Q
T
x
Continuous-time Riccati equations
Substituting the LQG dynamics and cost in the HJB equation yields
v
t
=
1
2
x
T
Qx +x
T
A
T
v
x
+
1
2
tr
_
CC
T
v
xx
_
1
2
v
T
x
BR
1
B
T
v
x
We can now show that v is quadratic:
v (x, t) =
1
2
x
T
V (t) x + (t)
At the nal time this holds with (T) = 0 and V (T) = Q
T
. Then

1
2
x
T

Vx =
1
2
x
T
Qx +x
T
A
T
Vx +
1
2
tr
_
CC
T
V
_
1
2
x
T
VBR
1
B
T
Vx
Using the fact that x
T
A
T
Vx = x
T
VAx and matching powers of x yields
Theorem (Riccati equation)

V = Q + A
T
V + VAVBR
1
B
T
V
=
1
2
tr
_
CC
T
V
_
Linear feedback control law
When v (x, t) =
1
2
x
T
V (t) x + (t), the optimal control u
+
= R
1
B
T
v
x
is
u
+
(x, t) = L (t) x
L (t) , R
1
B
T
V (t)
The Hessian V (t) and the matrix of feedback gains L (t) are independent of
the noise amplitude C. Thus the optimal control law u
+
(x, t) is the same for
stochastic and deterministic systems (the latter is called LQR).
Example:
dx = udt +0.2d
` (x, u) = 0.5u
2
q
T
(x) = 2.5x
2
Linear feedback control law
When v (x, t) =
1
2
x
T
V (t) x + (t), the optimal control u
+
= R
1
B
T
v
x
is
u
+
(x, t) = L (t) x
L (t) , R
1
B
T
V (t)
The Hessian V (t) and the matrix of feedback gains L (t) are independent of
the noise amplitude C. Thus the optimal control law u
+
(x, t) is the same for
stochastic and deterministic systems (the latter is called LQR).
Example:
dx = udt +0.2d
` (x, u) = 0.5u
2
q
T
(x) = 2.5x
2
LQG in discrete time
Consider an optimal control problem with dynamics and cost
x
k+1
= Ax
k
+ Bu
k
` (x, u) =
1
2
x
T
Qx +
1
2
u
T
Ru
Substituting in the Bellman equation v
k
(x) = min
u
` (x, u) + v
k+1
(x
/
) and
making the ansatz v
k
(x) =
1
2
x
T
V
k
x yields
1
2
x
T
V
k
x = min
u
_
1
2
x
T
Qx +
1
2
u
T
Ru +
1
2
(Ax + Bu)
T
V
k+1
(Ax + Bu)
_
The minimum is u
+
k
(x) = L
k
x where L
k
,
_
R + B
T
V
k+1
B
_
1
B
T
V
k+1
A.
Theorem (Riccati equation)
V
k
= Q + A
T
V
k+1
(ABL
k
)
Summary of Riccati equations
Finite horizon
Continuous time

V = Q + A
T
V + VAVBR
1
B
T
V
Discrete time
V
k
= Q + A
T
V
k+1
AA
T
V
k+1
B
_
R + B
T
V
k+1
B
_
1
B
T
V
k+1
A
Average cost
Continuous time (care in Matlab)
0 = Q + A
T
V + VAVBR
1
B
T
V
Discrete time (dare in Matlab)
V = Q + A
T
VAA
T
VB
_
R + B
T
VB
_
1
B
T
VA
Discounted cost is similar; rst exit does not yield Riccati equations.
Relation between continuous and discrete time
The continuous-time system
x = Ax + Bu
` (x, u) =
1
2
x
T
Qx +
1
2
u
T
Ru
can be represented in discrete time with time-step as
x
k+1
= (I +A) x
k
+Bu
k
` (x, u) =

2
x
T
Qx +

2
u
T
Ru
In the limit 0 the discrete Riccati equation reduces to the continuous one:
V = Q + (I +A)
T
V (I +A)
(I +A)
T
VB
_
R +B
T
VB
_
1
B
T
V (I +A)
V = Q + V +A
T
V +VAVB
_
R +B
T
VB
_
1
B
T
V + o
_
2
_
0 = Q + A
T
V + VAVB
_
R +B
T
VB
_
1
B
T
V +
1
o
_
2
_
Maximum principle for LQG systems
For systems in the form
x = a (x) + Bu
` (x, u) = q (x) +
1
2
u
T
Ru
the optimal trajectory satises
x = a (x) BR
1
B
T
= q
x
(x) +a
x
(x)
T
Substituting a (x) = Ax and

q (x) =
1
2
x
T
Qx yields
x = Ax BR
1
B
T
= Qx + A
T
x (0) is given, (T) = Q

T
x (T).
We can write this linear ODE as
y = My
where y = [x; ] and
M =
_
A BR
1
B
T
Q A
T
_
The solution is
y (t) = exp(Mt) y (0)
Now we can solve
_
x (T)
Q
T
x (T)
_
= exp(MT)
_
x (0)
(0)
_
for (0) and x (T).
Maximum principle for LQG systems
For systems in the form
x = a (x) + Bu
` (x, u) = q (x) +
1
2
u
T
Ru
the optimal trajectory satises
x = a (x) BR
1
B
T
= q
x
(x) +a
x
(x)
T
Substituting a (x) = Ax and

q (x) =
1
2
x
T
Qx yields
x = Ax BR
1
B
T
= Qx + A
T
x (0) is given, (T) = Q

T
x (T).
We can write this linear ODE as
y = My
where y = [x; ] and
M =
_
A BR
1
B
T
Q A
T
_
The solution is
y (t) = exp(Mt) y (0)
Now we can solve
_
x (T)
Q
T
x (T)
_
= exp(MT)
_
x (0)
(0)
_
for (0) and x (T).
Encoding targets as quadratic costs
The matrices A, B, Q, R can be time-varying, which is useful for specifying
reference trajectories x
+
k
, and for approximating non-LQG problems.
The cost
_
_
x
k
x
+
k
_
_
2
can be represented in the LQG framework by augmenting
the state vector as
x =
_
x
1
_
,

A =
_
A 0
0 1
_
, etc.
and writing the state cost as
1
2
x
T
Q
k
x =
1
2
x
T
_
D
T
k
D
k
_
x
where D
k
=
_
I, x
+
k
and so D
k
x
k
= x
k
x
+
k
.
If the target x
+
is stationary we can instead include it in the state, and use
D = [I, I]. This has the advantage that the resulting control law is
independent of x
+
and therefore can be used for all targets.
Optimal estimation in linear-Gaussian systems
Consider the partially-observed system
x
k+1
= Ax
k
+ C
k
y
k
= Hx
k
+ D
k
with hidden state x
k
, measurement y
k
, and noise
k
,
k
~ N (0, I).
Given a Gaussian prior x
0
~ N (x
0
,
0
) and a sequence of measurements
y
0
, y
1
, y
k
, we want to compute the posterior p
k+1
(x
k+1
).
We can show by induction that the posterior is Gaussian at all times.
Let p
k
(x
k
) be N (x
k
,
k
). This will act as a prior for estimating x
k+1
.
Now x
k+1
and y
k
are jointly Gaussian, with mean and covariance
E
_
x
k+1
y
k
_
=
_
Ax
k
Hx
k
_
Cov
_
x
k+1
y
k
_
=
_
CC
T
+ A
k
A
T
A
k
H
T
H
k
A
T
DD
T
+ H
k
H
T
_
Kalman lter
Lemma
If u, v are jointly Gaussian with means u, v and covariances
uu
,
vv
,
uv
=
T
vu
,
then u given v is Gaussian with mean and covariance
E[u[v] = u +
uv
1
vv
(v v)
Cov [u[v] =
uu

uv
1
vv
vu
Applying this to our problem with u = x
k+1
and v = y
k
yields
Theorem (Kalman lter)
The mean x and covariance of the Gaussian posterior satisfy
x
k+1
= Ax
k
+ K
k
(y
k
Hx
k
)
k+1
= CC
T
+ (AK
k
H)
k
A
T
K
k
, A
k
H
T
_
DD
T
+ H
k
H
T
_
1
Duality of LQG control and Kalman ltering
LQG controller
State dynamics:
x
k+1
= (ABL
k
) x
k
+ C
k
Gain matrix:
L
k
=
_
R + B
T
V
k+1
B
_
1
B
T
V
k+1
A
Backward Riccati equation:
V
k
= Q + A
T
V
k+1
(ABL
k
)
Kalman lter
Estimated state dynamics:
x
k+1
= (AK
k
H) x
k
+ K
k
y
k
Gain matrix:
K
k
= A
k
H
T
_
DD
T
+ H
k
H
T
_
1
Forward Riccati equation:
k+1
= CC
T
+ (AK
k
H)
k
A
T
This form of duality does not generalize to non-LQG systems. However there is a
different duality which does generalize (see later). It involves an information lter,
computing
1
instead of .
Duality of LQG control and Kalman ltering
LQG controller
State dynamics:
x
k+1
= (ABL
k
) x
k
+ C
k
Gain matrix:
L
k
=
_
R + B
T
V
k+1
B
_
1
B
T
V
k+1
A
Backward Riccati equation:
V
k
= Q + A
T
V
k+1
(ABL
k
)
Kalman lter
Estimated state dynamics:
x
k+1
= (AK
k
H) x
k
+ K
k
y
k
Gain matrix:
K
k
= A
k
H
T
_
DD
T
+ H
k
H
T
_
1
Forward Riccati equation:
k+1
= CC
T
+ (AK
k
H)
k
A
T
This form of duality does not generalize to non-LQG systems. However there is a
different duality which does generalize (see later). It involves an information lter,
computing
1
instead of .

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Загружено:

Авторское право:

Доступные форматы

Linear-Quadratic-Gaussian (LQG) Controllers and

Substituting a (x) = Ax and

x (0) is given, (T) = Q

Substituting a (x) = Ax and

x (0) is given, (T) = Q

Вам также может понравиться