Академический Документы
Профессиональный Документы
Культура Документы
Bruce Francis
1 Introduction 3
1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 What We Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Theorems and Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Linear Algebra 9
2.1 Brief Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 The Jordan Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 The Transition Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.8 Matrix Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.9 Invariant Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Calculus 36
3.1 Jacobians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Optimization over an Open Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Optimizing a Quadratic Function with Equality Constraints . . . . . . . . . . . . . . 41
3.4 Optimization with Equality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Application: Sensor Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
I Classical Theories 57
4 Calculus of Variations 58
4.1 The Brachistochrone Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 The General Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 The Euler-Lagrange Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
1
CONTENTS 2
6 Dynamic Programming 72
6.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2 The Hamilton-Jacobi-Bellman Equation . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8 H2 Optimal Control 97
8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.2 Lyapunov Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.3 Spectral Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.4 Riccati Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.5 The LQR Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.6 Solution of the H2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Introduction
1.1 History
Optimal control is the subject within control theory where the control signals, or the controllers
that generate them, optimize (minimize or maximize) some performance criterion. Let’s look at
some key developments, the first few being optimization problems that came before optimal control
but influenced its development.
1. The isoperimetric problem is to find a closed curve of fixed length and maximum enclosed
area: The solution is of course the circle. The study of this problem goes back to the ancient
Greeks. The problem has the form
max J(u) such that P (u) = l,
u
where u is a closed curve, J(u) denotes the enclosed area, P (u) denotes the perimeter of u,
and l is the given length. Thus J and P are functions that map curves to R.
2. The brachistochrone problem, formulated by Bernoulli in 1696, is to find the shape of
the curve down which a bead sliding from rest and accelerated by gravity will slip (without
friction) from one point to another in the least time. This has the form
min J(u),
u
where u denotes the curve and J(u) is the time it takes for the bead to slide from the start
point to the end point.
3. In the 1940s Wiener developed and solved an optimal filtering problem. A signal s(t) is
corrupted by additive noise n(t) to produce the measured signal y(t) = s(t) + n(t). In the
simplest formulation the signals are zero-mean, stationary random processes and the goal is
to find a filter with input y(t) and output ŝ(t), an estimate of s(t), such that the variance
of the error ŝ(t) − s(t) is minimized. There is an equivalent deterministic problem known as
H2 -filtering,
4. Also in the 1940s, R. S. Phillips and colleagues extended Wiener’s filtering problem to a control
problem. The variance of a tracking error was the object to be minimized. This work initiated
the development of H2 -optimal control, the linear-quadratic regulator (LQR) problem being
a special case.
3
CHAPTER 1. INTRODUCTION 4
5. The term dynamic programming was originally used in the 1940s by Richard Bellman
to describe a process of solving sequential decision problems; a typical problem is to find a
minimum-cost path through a graph. Bellman wrote an influential book, Dynamic Program-
ming, published in 1957. The procedure is widely used, for example in Viterbi decoding.
6. In 1962 the very influential book The Mathematical Theory of Optimal Processes, by L. S.
Pontryagin et al., appeared. The approach to constrained optimization problems is called the
maximum principle.
7. In the late 1970s Zames posed the question, are classical frequency-domain feedback design
methods (e.g., lead/lag compensation) optimal for some appropriate criterion? From this
came H∞ -optimal control.
This set (a subspace) equals the span of the columns of A. So the problem is to find the vector
in V that is closest to b. A more interesting problem, the Nehari problem, is to find the stable
LTI system that is closest to a given unstable LTI system.
so that there can be no doubt about what they mean. The following (real) email correspondence
between a physicist and me emphasizes this point.
Me: On skimming through [your paper], I don’t see any formal theorem statements, with proofs.
So may I ask, are there mathematical results? That would help an outsider like me (of course, you
didn’t write the paper for outsiders).
Physicist: The paper is not written in the formal language of “theorems” and “proofs.” Certainly,
though, there are mathematical results reported that describe the physical properties of complex
modes. ... In general, though, in the physics literature papers are not written as mathematical
papers in the form of theorems, lemmas, and proofs.
Me: I wonder how physicists check each other’s work without explicit assumptions, mathematical
statements, and proofs. For example, your paper talks about “casual suggestions in the literature.”
Had they instead been rigorous statements, one could have checked them to be true or not. ...Your
model in the appendices involves a limit (homogeneous limit), so I guess something converges and
can be proved to converge in a precise way. Or is it that one doesn’t actually prove convergence
but instead verifies by experiment?
Physicist: This is a philosophical question really; for example one cannot prove Maxwell’s equations
mathematically. Mathematics is a tool, it cannot describe physical reality. So in the narrow area
of Electromagnetics, whether something is true or not boils down to whether it satisfies Maxwell’s
equations.
1.4 Problems
1. Give an example of a subset of R2 that is the graph of a function f : R −→ R. Give an
example of a subset of R2 that is not the graph of any function f : R −→ R.
2. Let U, V be two sets and let S be a subset of U × V . Write in logic notation the condition for
S to be the graph of a function from U to V .
In words, for every ε > 0 there exists δ > 0 such that if the state starts in the δ-ball, it will
remain forever in the ε-ball. Write in logic notation the definition that the origin is not stable.
Say this in words. Using your definition, prove that for ẋ = x the origin is not stable.
5. Consider the linear equation Ax = b, where A is a matrix, not assumed to be square, and
b, x are vectors. A necessary and sufficient condition for the equation to be solvable (for x to
exist) is
rank A b = rank A.
CHAPTER 1. INTRODUCTION 6
That is (necessity)
(∃x)Ax = b =⇒ rank A b = rank A
and (sufficiency)
rank A b = rank A =⇒ (∃x)Ax = b.
6. Write a logic statement that there is a unique solution to the equation Ax = b, i.e., there
exists a solution and it is unique.
7. Consider the polynomial p(s) = s3 + a2 s2 + a1 s + a0 , with real coefficients. Consider the two
conditions: 1) The roots of p(s) have negative real parts; 2) The coefficients ai are all positive.
Which condition is necessary for the other? Is it sufficient?
8. The set of integers, 0, ±1, ±2, ±3, . . . , is denoted Z. One way to say an integer is even is that
it is a multiple of 2. Thus, if x is an integer, the following says it is even:
Write the following statements in logic notation, using only the set Z:
9. Prove rigorously that, if m and n are coprime integers (their greatest common divisor equals
1), then every integer that is a multiple of m and a multiple of n is also a multiple of mn.
10. If a single-input, single-output plant has a transfer function with a right half-plane zero, there’s
a maximum achievable gain margin. Make this into a theorem statement.
Alice can pass ECE1655 and either she’s not brilliant or she doesn’t work hard.
(g) The statement
Alice can pass ECE1655 only if she’s brilliant and she works hard
is equivalent to
if either Alice is not brilliant or she doesn’t work hard, then she can’t pass
ECE1655.
(h) A necessary condition for the origin of ẋ = Ax to be stable is that all eigenvalues of A
satisfy Re λ ≤ 0.
(i) A sufficient condition for the origin of ẋ = Ax to be stable is that all eigenvalues of A
satisfy Re λ ≤ 0.
(j) A necessary condition for the origin of ẋ = Ax to be asymptotically stable is that all
eigenvalues of A satisfy Re λ < 0.
(k) A sufficient condition for the origin of ẋ = Ax to be asymptotically stable is that all
eigenvalues of A satisfy Re λ < 0.
s−1
(l) Every bounded input to the system with transfer function 2 results in a bounded
s −1
output.
1
(m) Every nonzero bounded input to the system with transfer function results in an
s−1
unbounded output.
12. Many results in optimal control are in the form of providing a necessary condition for optimal-
ity. For example, the maximum principle and the Hamilton-Jacobi-Bellman equation. This
problem urges you to understand what necessary condition means.
(∃x)P (x)
is true means there exists an optimal solution. Discuss the meanings of the following
statements. Is any the right one, i.e., the one we want in a theorem statement about
optimality?
(b) Now consider the function f (u) = −u2 . Suppose P (x) means that f is maximized at the
point x and Q(x) means that the derivative of f equals zero at x. Which if any of the
three statements is true?
(c) Repeat for f (u) = −u3 .
CHAPTER 1. INTRODUCTION 8
13. The physicist’s final email is contradictory: Mathematics can’t describe physical reality, yet
Maxwell’s equations, which are mathematics, do. I believe physicists sometimes forget that
all of physics is based on models. The models of electromagnetics—the concepts of electric
and magnetic fields, charged particles and the forces on them, the differential equations and
boundary conditions, and so on—are abstract concepts and relationships that we use to help
us understand how we perceive phenomena. In fact all these concepts are mathematical: A
force is a vector-valued function of space and time, etc. We have used these models to build
technology, and in that sense the models have been wildly successful. But one, and especially
a control engineer, must not forget that models are not real.
Imagine a real battery connected to a small real DC motor sitting on some real lab bench
somewhere on the planet Earth. Write a brief essay that distinguishes between a model and
reality. In particular, answer this question: Does there exist a sequence of models, of ever
increasing complexity, that converges to reality?
Chapter 2
Linear Algebra
This is a background chapter on linear algebra: subspaces, matrix representations, linear matrix
equations, and invarianft subspaces. The material is from ECE557. If you took ECE410, then some
of this material will be new.
c1 v1 + · · · + ck vk = 0
and then try to solve for the ci ’s. The set is linearly independent iff the only solution is ci = 0 for
every i.
The span of {v1 , . . . , vk }, denoted Span{v1 , . . . , vk }, is the set of all linear combinations of these
vectors.
A subspace V of Rn is a subset of Rn that is also a vector space in its own right. This is true
iff these two conditions hold: If x, y are in V, then so is x + y; if x is in V and c is a scalar, then
cx is in V. Thus V is closed under the operations of addition and scalar multiplication. In R3 the
subspaces are the lines through the origin, the planes through the origin, the whole of R3 , and the
set consisting of only the zero vector.
A basis for a subspace is a set of linearly independent vectors whose span equals the subspace.
The number of elements in a basis is the dimension of the subspace.
The rank of a matrix is the dimension of the span of its columns. This can be proved to equal
the dimension of the span of its rows.
The equation Ax = b has a solution iff b belongs to the span of the columns of A, equivalently
rank A = rank A b .
9
CHAPTER 2. LINEAR ALGEBRA 10
When a solution exists, it is unique iff the columns of A are linearly independent, that is, the rank
of A equals its number of columns.
The inverse of a square matrix A is a matrix B such that BA = I. If this is true, then AB = I.
The inverse is unique and we write A−1 . A square matrix A is invertible iff its rank equals its
dimension (we say “A has full rank”); equivalently, its determinant is nonzero. The inverse equals
the adjoint divided by the determinant.
Av = λv.
Here λ is a real or complex number and v is a nonzero real or complex vector; λ is an eigenvalue
and v a corresponding eigenvector. The eigenvalues of A are unique but the eigenvectors are not:
If v is an eigenvector, so is cv for any real number c 6= 0. The spectrum of A, denoted σ(A), is its
set of eigenvalues. The spectrum consists of n numbers, in general complex, and they are equal to
the zeros of the characteristic polynomial det(sI − A).
x1 x2
M1 D M2
Take D = 1, M1 = 1, M2 = 1/2, x3 = ẋ1 , x4 = ẋ2 . You can derive that the model is ẋ = Ax, where
0 0 1 0
0 0 0 1
A= .
0 0 −1 1
0 0 2 −2
The equation Av = λv says that the action of A on an eigenvector is very simple—just multi-
plication by the eigenvalue. Likewise, the motion of x(t) starting at an eigenvector is very simple.
Lemma 2.1 If x(0) is an eigenvector v of A and λ the corresponding eigenvalue, then x(t) = eλt v.
Thus x(t) is an eigenvector too for every t.
CHAPTER 2. LINEAR ALGEBRA 11
ẋ = Ax, x(0) = v
has a unique solution—this is from differential equation theory. So all we have to do is show that
eλt v satisfies both the initial condition and the differential equation, for then eλt v must be the
solution x(t). The initial condition is easy:
eλt v = v.
t=0
The result of the lemma extends to more than one eigenvalue. Let λ1 , . . . , λn be the eigenvalues
of A and let v1 , . . . , vn be corresponding eigenvectors. Suppose the initial state x(0) can be written
as a linear combination of the eigenvectors:
x(0) = c1 v1 + · · · + cn vn .
This is certainly possible for every x(0) if the eigenvectors are linearly independent. Then the
solution satisfies
Example
−1 1 1 −1
A= , λ1 = 0, λ2 = −3, v1 = , v2 =
2 −2 1 2
x(0) = c1 v1 + c2 v2
is equivalent to
x(0) = V c,
where V is the 2 × 2 matrix with columns v1 , v2 and c is the vector (c1 , c2 ). Solving gives c1 = c2 =
1/3. So
1 1
x(t) = v1 + e−3t v2
3 3
CHAPTER 2. LINEAR ALGEBRA 12
The case of complex eigenvalues is only a little complicated. If λ1 is a complex eigenvalue, some
other, say λ2 , is its complex conjugate: λ2 = λ1 . The two eigenvectors, v1 and v2 , can be taken to
be complex conjugates too (easy proof). Then if x(0) is real and we solve
x(0) = c1 v1 + c2 v2 ,
we’ll find that c1 , c2 are complex conjugates as well. Thus the equation will look like
Example
0 −1 1 1
A= , λ1 = j, λ2 = −j, v1 = , v2 =
1 0 −j j
Av1 = λ1 v1 , Av2 = λ2 v2
The latter matrix is the Jordan form of A. It is unique up to reordering of the eigenvalues. The
mapping A 7−→ AJF = V −1 AV is called a similarity transformation. Example:
−1 1 1 −1 0 0
A= , V = , AJF = .
2 −2 1 2 0 −3
Corresponding to the eigenvalue λ1 = 0 is the eigenvector v1 = (1, 1), the first column of V . All
other eigenvectors corresponding to λ1 have the form cv1 , c 6= 0. We call the subspace spanned by
v1 the eigenspace corresponding to λ1 . Likewise, λ2 = −3 has a one-dimensional eigenspace.
These results extend from n = 2 to general n. Note that in the preceding result we didn’t
actually need distinctness of the eigenvalues — only linear independence of the eigenvectors.
Theorem 2.1 The Jordan form of A is diagonal, i.e., A is diagonalizable by similarity transforma-
tion, iff A has n linearly independent eigenvectors. A sufficient condition is n distinct eigenvalues.
The great thing about diagonalization is that the equation ẋ = Ax can be transformed via
w = V −1 x into ẇ = AJF w, that is, n decoupled equations:
ẇi = λi wi , i = 1, . . . , n.
Now we look at how to construct the Jordan form when there are not n linearly independent
eigenvectors. We start where A has only 0 as an eigenvalue.
Nilpotent matrices
Consider
0 1 0 0 1 0
0 0 0 , 0 0 1 . (2.1)
0 0 0 0 0 0
For both of these matrices, σ(A) = {0, 0, 0}. For the first matrix, the eigenspace Ker A is two-
dimensional and for the second matrix, one-dimensional. These are examples of nilpotent matrices:
A is nilpotent if Ak = 0 for some k ≥ 1. The following statements are equivalent:
1. A is nilpotent.
4. It is similar to a matrix of the form (2.1), where all elements are 0’s, except 0’s or 1’s on the
first diagonal above the main one. This is called the Jordan form of the nilpotent matrix.
CHAPTER 2. LINEAR ALGEBRA 14
The rank of A is 3 and hence the kernel has dimension 2. We can compute that
0 0 0 1 0 0 0 0 1 1
0 0 0 0 1 0 0 0 −1 −1
2 3 , A4 = 0.
A = 0 0 0 0 0 , A =
0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
v5 = (0, 0, 0, 0, 1).
Then take
We get
v1 = (0, 0, 1, 0, 0).
0 0 0 0 0
0 0 1 0 0
0 0 0 1 0 .
0 0 0 0 1
0 0 0 0 0
In general, the Jordan form of a nilpotent matrix has 0 in each entry except possibly in the first
diagonal above the main diagonal which may have some 1s.
A nilpotent matrix has only the eigenvalue 0. Now consider a matrix A that has only one
eigenvalue, λ, i.e.,
det(sI − A) = (s − λ)n .
det[rI − (A − λI)] = r3 ,
i.e., A − λI has only the zero eigenvalue, and hence A − λI =: N , a nilpotent matrix. So the Jordan
form of N must look like
0 ? 0
0 0 ? ,
0 0 0
To recap, if A has just one eigenvalue, λ, then its Jordan form is λI + N , where N is a nilpotent
matrix in Jordan form.
An extension of this analysis results in the Jordan form in general. Suppose A is n × n and
λ1 , . . . , λp are the distinct eigenvalues of A and m1 , . . . , mp are their multiplicities; that is, the
characteristic polynomial is
Then A is similar to
A1
AJF =
.. ,
.
Ap
CHAPTER 2. LINEAR ALGEBRA 16
where Ai is mi × mi and it has only the eigenvalue λi . Thus Ai has the form λi I + Ni , where Ni is
a nilpotent matrix in Jordan form. Example:
0 0 1 0
0 0 0 1
A= 0 0 −1
1
0 0 2 −2
As we saw, the spectrum is σ(A) = {0, 0, 0, −3}. Thus the Jordan form must be of the form
0 ? 0 0
0 0 ? 0
AJF = 0 0 0
.
0
0 0 0 −3
Since A has rank 2, so does AJF . Thus only one of the stars is 1. Either is possible, for example,
0 0 0 0
0 0 1 0
AJF = 0
.
0 0 0
0 0 0 −3
This has two “Jordan blocks”:
0 0 0
A1 0
AJF = , A1 = 0 0 1 , A2 = −3.
0 A2
0 0 0
AV = V AJF
implies
A2 V = AV AJF = V A2JF .
Ak V = V AkJF ,
and then
eAt V = V eAJF t ,
so finally
eAt = V eAJF t V −1 .
The matrix exponential eAJF t is easy to write down. For example, suppose there’s just one eigen-
value, so AJF = λI + N , N nilpotent, n × n. Then
ẋ = Ax, x(0) = x0
gives
sX(s) − x0 = AX(s).
This yields
Comparing
shows that etA , (sI − A)−1 are Laplace transform pairs. So one can get etA by finding the matrix
(sI − A)−1 and then taking the inverse Laplace transform of each element.
CHAPTER 2. LINEAR ALGEBRA 18
2.5 Stability
The concept of stability is fundamental in control engineering. Here we look at the scenario where
the system has no input, but its state has been perturbed and we want to know if the system will
recover.
The maglev example is a good one to illustrate this point. Suppose a feedback controller has
been designed to balance the ball’s position at 1 cm below the magnet. Suppose if the ball is placed
at precisely 1 cm it will stay there; that is, the 1 cm location is a closed-loop equilibrium point.
Finally, suppose there is a temporary wind gust that moves the ball away from the 1 cm position.
The stability questions are, will the ball move back to the 1 cm location; if not, will it at least stay
near that location?
So consider
ẋ = Ax.
Obviously if x(0) = 0, then x(t) = 0 for all t. We say the origin is an equilibrium point—if you
start there, you stay there. Equilibrium points can be stable or not. While there are more elaborate
and formal definitions of stability for the above homogeneous system, we choose the following two:
The origin is asymptotically stable if x(t) −→ 0 as t −→ ∞ for all x(0). The origin is stable
if x(t) remains bounded as t −→ ∞ for all x(0). Since x(t) = eAt x(0), the origin is asymptotically
stable iff every element of the matrix eAt converges to zero, and is stable iff every element of the
matrix eAt remains bounded as t −→ ∞. Of course, asymptotic stability implies stability.
Asymptotic stability is relatively easy to characterize. Using the Jordan form, one can prove
this very important result, where < denotes “real part”:
Theorem 2.2 The origin is asymptotically stable iff the eigenvalues of A all satisfy < λ < 0.
Let’s say the matrix A is stable if its eigenvalues satisfy < λ < 0. Then the origin is asymptot-
ically stable iff A is stable.
Now we turn to the more subtle property of stability. We’ll do some examples, and we may as
well have A in Jordan form.
Consider the nilpotent matrix
0 0
A=N = .
0 0
Obviously, x(t) = x(0) for all t and so the origin is stable. By contrast, consider
0 1
A=N = .
0 0
Then
eN t = I + tN,
which is unbounded and so the origin is not stable. This example extends to the n × n case: If A
is nilpotent, the origin is stable iff A = 0.
CHAPTER 2. LINEAR ALGEBRA 19
Here’s the test for stability in general in terms of the Jordan form of A:
A1
AJF =
.. .
.
Ap
Recall that each Ai has just one eigenvalue, λi , and that Ai = λi I + Ni , where Ni is a nilpotent
matrix in Jordan form.
Theorem 2.3 The origin is stable iff the eigenvalues of A all satisfy < λ ≤ 0 and for any eigenvalue
with < λi = 0, the nilpotent matrix Ni is zero, i.e., Ai is diagonal.
The origin is stable since there are two 1 × 1 Jordan blocks. Now consider
0 −1 1 0
1 0 0 1
A= .
0 0 0 −1
0 0 1 0
The eigenvalues are j, j, −j, −j and so the Jordan form must look like
j ? 0 0
0 j 0 0
AJF = .
0 0 −j ?
0 0 0 −j
Since the rank of A − jI equals 3, the upper star is 1; since the rank of A + jI equals 3, the lower
star is 1. Thus
j 1 0 0
0 j 0 0
AJF = 0 0 −j
.
1
0 0 0 −j
Since the Jordan blocks are not diagonal, the origin is not stable.
K
CHAPTER 2. LINEAR ALGEBRA 20
The equation is
M ÿ + Dẏ + Ky = 0.
Example Two points move on the line R. The positions of the points are x1 , x2 . They move toward
each other according to the control laws
ẋ1 = x2 − x1 , ẋ2 = x1 − x2 .
The eigenvalues are λ1 = 0, λ2 = −2, so the origin is stable but not asymptotically stable. Obviously,
the two points tend toward each other; that is, the state x(t) tends toward the subspace
V = {x : x1 = x2 }.
This is the eigenspace for the zero eigenvalue. To see this convergence, write the initial condition
as a linear combination of eigenvectors:
1 −1
x(0) = c1 v1 + c2 v2 , v1 = , v2 = .
1 1
Then
2.6 Subspaces
Let X = Rn and let V, W be subspaces of X . Then V + W denotes the set
{v + w : v ∈ V, w ∈ W},
and it is a subspace of X . The set union V ∪ W is not a subspace in general unless one is contained
in the other. The intersection V ∩ W is however a subspace. As an example:
X = R3 , V a line, W a plane.
CHAPTER 2. LINEAR ALGEBRA 21
For example, think of V, W as two planes in R3 that intersect in a line. Then the dimension equation
evaluates to
3 = 2 + 2 − 1.
Two subspaces V, W are independent if V ∩ W = 0. This is not the same as being orthogonal.
For example two lines in R2 are independent iff they are not colinear (i.e., the angle between them
is not 0), while they are orthogonal iff the angle is 90◦ .
Every vector x in V + W can be written as
x = v + w, v ∈ V, w ∈ W.
If V, W are independent, then v, w are unique. Think of v as the component of x in V and w as its
component in W. Let’s prove uniqueness. Suppose
x = v + w = v 1 + w1 .
Then
v − v1 = w1 − w.
The left-hand side is in V and the right-hand side in W. Since the intersection of these two subspaces
is zero, both sides equal 0.
Clearly, V, W are independent iff
x = u + v + w, u ∈ U, v ∈ V, w ∈ W.
If V, W are independent subspaces, we write their sum as V ⊕ W. This is called a direct sum.
Likewise for more than two.
Let’s finish this section with a handy fact: Every subspace has an independent complement, i.e.,
V⊂X =⇒ (∃W ⊂ X ) X = V ⊕ W.
Think of X as R3 and V as a plane. Then W can be any line not in the plane.
1
In this chapter when we speak of lines we mean lines through 0. Similarly for planes.
CHAPTER 2. LINEAR ALGEBRA 22
A(ax) = aAx, a ∈ R, x ∈ X .
It is an important fact that an LT is uniquely determined by its action on a basis. That is, if
A : X → Y is an LT and if {e1 , . . . , en } is a basis for X , then if we know the vectors Aei , we can
compute Ax for every x ∈ X , by linearity.
Example For us, the most important example is an LT generated by a matrix. Let A ∈ Rm×n .
For each vector x in Rn , Ax is a vector in Rm . The mapping x 7→ Ax is an LT A : Rn → Rm .
Linearity is easy to check.
Example Take a vector in the plane and rotate it counterclockwise by 90◦ . This defines an LT
A : R2 → R2 . Note that A is not given as a matrix; it’s given by its domain, its co-domain, and its
action on vectors. If we take a vector to be represented by its Cartesian coordinates, x = (x1 , x2 ),
then we’ve chosen a basis for R2 . In that case A maps x = (x1 , x2 ) to Ax = (−x2 , x1 ), and so
there’s an associated rotation matrix
0 −1
.
1 0
We’ll return to matrix representation later.
Example Let X = Rn and let {e1 , . . . , en } be a basis. Every vector x in X has a unique expansion
x = a1 e1 + · · · + an en , ai ∈ R.
Let a denote the vector (a1 , . . . , an ), the n-tuple of coordinates of x with respect to the basis.
The function x 7−→ a defines an LT Q : X → Rn . The equation
x = a1 e1 + · · · + an en
can be written compactly as x = Ea, where E is the matrix with columns e1 , . . . , en and a is the
vector with components a1 , . . . , an . Therefore a = E −1 x and so Qx = E −1 x, that is, the action of
Q is to multiply by the matrix E −1 .
For example, let X = R2 . Take the natural basis
1 0
e1 = , e2 = .
0 1
In this case E = I and Qx = x. If the basis instead is
1 −1
e1 = , e2 = ,
1 1
CHAPTER 2. LINEAR ALGEBRA 23
then
1 −1
E=
1 1
and Qx = E −1 x.
Every LT on finite-dimensional vector spaces has a matrix representation. Let’s do this very
important construction carefully. Let A be an LT X → Y,
Q : X → Rn , R : Y → Rp .
Rn Rp
The left downward arrow gives us the n-tuple, say a, that represents a vector x in the basis
{e1 , . . . , en }. The right downward arrow gives us the p-tuple, say b, that represents a vector y
in the basis {f1 , . . . , fn }. It’s possible to add a fourth LT to complete the square:
A
X Y
Q R
Rn Rp
M
This is called a commutative diagram. The object M in the diagram is the matrix representation
of A with respect to these two bases. Notice that the bottom arrow represents the LT generated
by the matrix M ; we write M in the diagram for simplicity, but you should understand that really
the object is an LT. The matrix M is the p × n matrix that makes the diagram commute, that is,
for every x ∈ X
In particular, take x = ei , the ith basis vector in X . Then a is the n-vector with 1 in the ith entry
and 0 otherwise. So M a equals the ith column of the matrix M . Thus, we have the following recipe
for constructing the matrix M :
1. Take the 1st basis vector e1 of X .
Recall that Q is the LT generated by E −1 , where the columns of E are the basis in the domain of
A. Likewise, R is the LT generated by F −1 , where the columns of F are the basis in the co-domain
of A. Thus the equation M a = b reads
M E −1 x = F −1 Ax. (2.3)
Example Let A : R2 → R2 be the LT that rotates a vector counterclockwise by 90◦ . Let’s first
take the standard bases: e1 = (1, 0), e2 = (0, 1) for the domain and f1 = (1, 0), f2 = (0, 1) for the
co-domain. Following the steps we first apply A to e1 , that is, we rotate e1 counterclockwise by
90◦ ; the result is Ae1 = (0, 1). Then we express this vector in the basis {f1 , f2 }:
Ae1 = 0 × f1 + 1 × f2 .
Thus the first column of M is (0, 1), the vector of coefficients. Now for the second column, rotate
e2 to get (−1, 0) and represent this in the basis {f1 , f2 }:
Ae2 = −1 × f1 + 0 × f2 .
Apply the recipe again. Get Ae1 = (−1, 1). Expand it in the basis {f1 , f2 }:
1 3
(−1, 1) = f1 − f2 .
2 2
Get Ae2 = (−2, −1). Expand it in the basis {f1 , f2 }:
1 3
(−2, −1) = − f1 − f2 .
2 2
Thus
1
− 12
" #
2
M= .
− 23 − 32
Example Let A ∈ Rm×n and let A : Rn −→ Rm be the generated LT. It is easy to check that
A itself is then the matrix representation of A with respect to the standard bases. Let’s do it.
CHAPTER 2. LINEAR ALGEBRA 25
Let {e1 , . . . , en } be the standard basis on Rn and {f1 , . . . , fm } the standard basis on Rm . Then
Ae1 = Ae1 equals the first column, (a11 , a21 , . . . , am1 ), of A. This column can be written as
a11 f1 + · · · + am1 fm ,
and hence (a11 , a21 , . . . , am1 ) is the first column of the matrix representation of A.
Suppose instead that we have general bases, {e1 , . . . , en } on Rn and {f1 , . . . , fm } on Rm . Form
the matrices E and F from these basis vectors. From (2.3) we get that the matrix representation
M with respect to these bases satisfies
M E −1 = F −1 A,
or equivalently
AE = F M.
A very interesting special case of this is where A is square and the same basis {e1 , . . . , en } is
taken for both the domain and co-domain. Then
AE = EM,
An LT has two important associated subspaces. Let A : X → Y be an LT. The kernel (or
nullspace) of A is the subspace of X on which A is zero:
Ker A := {x : Ax = 0}.
Im A := {y : (∃x ∈ X )y = Ax}.
Example Let A : R3 −→ R3 map a vector to its projection on the horizontal plane. Then the kernel
equals the vertical axis, the image equals the horizontal plane, A is neither onto nor one-to-one,
and its matrix with respect to the standard basis is
1 0 0
0 1 0 .
0 0 0
We could modify the co-domain to have A : R3 −→ R2 , again mapping a vector to its projection
on the horizontal plane. Then the kernel equals the vertical axis, the image equals the horizontal
plane, A is onto but not one-to-one, and its matrix with respect to the standard basis is
1 0 0
.
0 1 0
{e1 , . . . , ek },
{e1 , . . . , ek , . . . , en }.
Clearly, rank V = k.
Example Let X be 3-dimensional space, V a plane (2-dimensional subspace), and W a line not in
V. Then V, W are independent subspaces and
X = V ⊕ W.
Im P = V, Ker P = W.
Suppose {e1 , e2 } is a basis for V, {e3 } a basis for W. The induced matrix representation is
1 0 0
P = .
0 1 0
CHAPTER 2. LINEAR ALGEBRA 27
{e1 , . . . , ek , . . . , en } for X .
Then
{Ae1 , . . . , Aek }
Ax = b, A ∈ Rn×m , x ∈ Rm , b ∈ Rn .
The equation is another way of saying b is a linear combination of the columns of A. Thus the
equation has a solution iff b ∈ column span of A, i.e., b ∈ ImA. Then the solution is unique iff rank
A = m, i.e., Ker A = 0.
These results extend to the matrix equation
In this section we study this and similar equations. We could work with LTs but we’ll use matrices
instead.
The first equation is AX = I. Such an X is called a right-inverse of A.
Lemma 2.2 A ∈ Rn×m has a right-inverse iff it’s onto, i.e. the rank of A equals n.
AXy = y.
Now define X to be the matrix whose ith column is xi , i.e., via Xfi = xi . Then AXfi = fi . This
implies AX = I.
Lemma 2.3 A ∈ Rn×m has a left-inverse iff it’s one-to-one, i.e., A has rank m.
2. There exists X such that XA = B iff Ker A ⊂ Ker B., that is,
A
rank A = rank .
B
and let A: R2 → R2 be the generated LT. Clearly, Ker A is the 1-dimensional subspace spanned
1
by . Also,
−1
x ∈ Ker A ⇒ Ax = 0 ∈ Ker A,
or equivalently,
AKer A ⊂ Ker A.
{e1 , . . . , ek },
{e1 , . . . , ek , . . . , en }.
Notice that the lower-left block of A equals zero; this is because V is A-invariant.
Example Let X = R3 , let V be the (x1 , x2 )-plane, and let A : X → X be the LT that rotates a
vector 90◦ about the x3 -axis using the right-hand rule. Thus V is A-invariant. Let us take the bases
1 0
e1 = 0 , e2 = 1 for V
0 0
1
e1 , e2 , e3 = 1 for X .
1
The matrix representation of A with respect to the latter basis is
0 −1 −2
A= 1 0 0 .
0 0 1
Lemma 2.5 The subspace Im V is A-invariant iff the linear equation AV = V A1 has a solution
A1 .
2.10 Problems
1. Are the following vectors linearly independent?
2. Continuing with the same vectors, find a basis for Span {v1 , v2 , v3 }.
4. Show that eA+B = eA eB does not imply that A and B commute, but e(A+B)t = eAt eBt does.
CHAPTER 2. LINEAR ALGEBRA 30
Assume
6. Take
0 0 1 0
0 0 0 1
A= .
0 0 −1 1
0 0 2 −2
(The general construction of the basis for the Jordan form is along these lines.)
7. Let
0 1 0 0
0 0 1 0
A=
0
.
0 0 1
−2 1 0 2
8. Consider
σ ω
A= ,
−ω σ
where σ and ω 6= 0 are real. Find the Jordan form and the transition matrix.
CHAPTER 2. LINEAR ALGEBRA 31
its transition matrix is easy to write down. This problem demonstrates that a matrix with
distinct complex eigenvalues can be transformed into the above form using a nonsingular
transformation. Let
−1 −4
A= .
1 −1
Determine the eigenvalues and eigenvectors of A, noting that they form complex conjugate
pairs. Let the first eigenvalue be written as a + jb with the corresponding eigenvector v1 + jv2 .
Take v1 and v2 as the columns of a matrix V . Find V −1 AV .
11. Show that the origin is asymptotically stable for ẋ = Ax iff all poles of every element of
(sI − A)−1 are in the open left half-plane. Show that the origin is stable iff all poles of every
element of (sI − A)−1 are in the closed left half-plane and those on the imaginary axis have
multiplicity 1.
13. (a) Suppose that σ(A) = {−1, −3, −3, −1 + j2, −1 − j2} and the rank of (A − λI)λ=−3 is 4.
Determine AJF .
(b) Suppose that σ(A) = {−1, −2, −2, −2} and the rank of (A − λI)λ=−2 is 3. Determine
AJF .
(c) Suppose that σ(A) = {−1, −2, −2, −2, −3} and the rank of (A − λI)λ=−2 is 3. Determine
AJF .
15. Summarize all the ways to find exp(At). Then find exp(At) for
1 1 0
A = 0 1 1 .
0 0 2
ẋ1 = −x2
ẋ2 = x1 − 3x2
Do a phase portrait using Scilab or MATLAB. Interpret the phase portrait in terms of the
modal decomposition of the system. Do lots more examples of this type.
18. Prove the following facts about subspaces:
(a) V + V = V
Hint: You have to show V +V ⊂ V and V ⊂ V +V. Similarly for other subspace equalities.
(b) If V ⊂ W, then V + W = W.
(c) If V ⊂ W, then W ∩ (V + T ) = V + W ∩ T .
19. Show that W ∩(V +T ) = W ∩V +W ∩T is false in general by giving an explicit counterexample.
20. Let A be the identity LT on R2 . Take
1 1 2 −1
, = basis for domain, , = basis for co-domain.
1 −1 0 3
Find the matrix A.
21. Let A denote the LT R4 → R5 with the action
x4
x1
x2
0
7→ 2x 4
.
x3
x2 + x3 + 2x4
x4
x2 + x3
Find bases for R4 and R5 so that the matrix representation is
I 0
A= .
0 0
CHAPTER 2. LINEAR ALGEBRA 33
22. Let A be an LT. Show that if {Ae1 , . . . , Aen } is linearly independent, so is {e1 , . . . , en }. Give
an example where the converse is false.
For simplicity, take RC = 1. From circuit theory, we know that y(t) belongs to X too. (This
is steady-state analysis; transient response is neglected.) So the mapping from x(t) to y(t)
defines a linear transformation A : X −→ X . Find the matrix representation of A with
respect to the given basis.
25. Consider the vector space R3 . Let x1 , x2 , and x3 denote the components of a vector x in R3 .
Now let V denote the subspace of R3 of all vectors x where
x1 + x2 − x3 = 0,
2x1 − 3x3 = 0.
28. For a square matrix X, let diagX denote the vector formed from the elements on the diagonal
of X.
Let A : Rn×n −→ Rn be the LT defined by
A : X 7→ diagX.
For each matrix, find its rank, a basis for its image, and a basis for its kernel.
32. You are given the n eigenvalues of a matrix in Rn×n . Can you determine the rank of the
matrix? If no, can you give bounds on the rank?
33. Suppose that A ∈ Rm×n and B ∈ Rn×m with m ≤ n and rank A = rank B = m. Find a
necessary and sufficient condition that AB be invertible.
34. Let A be an LT from X to X , a finite-dimensional vector space. Fix a basis for X and let A
denote the matrix representation of A with respect to this basis. Show that A2 is the matrix
representation of A2 .
AT Ax = AT y.
x = (AT A)−1 AT y.
36. Let L denote the line in the plane that passes through the origin and makes an angle +π/6
radians with the positive x-axis. Let A : R2 → R2 be the LT that maps a vector to its
reflection about L.
37. Fix a vector v 6= 0 in R3 and consider the LT A : R3 → R3 that maps x to the cross product
v × x.
38. Preamble. This problem requires some notation. Suppose f (x, y) is a function of two real
variables, say, f (x, y) = x2 + xy. Then f (·, y) denotes the function x 7→ f (x, y) where y is
temporarily held constant; that is, x2 +xy considered as a function of x alone, with y constant.
So for each y, f (·, y) is a function, indicated by the dot acting as a placemarker. But then
y 7→ f (·, y) is another function, namely, it maps y to the function of x given by x2 + xy.
Therefore, f (·, y) is a function that maps a real number to a real number, whereas y 7→ f (·, y)
is a function that maps a real number to a function.
Another example where this situation comes up is a cart of mass M , input force u, and
output position y. Then y is a function of M and u; let’s write y = G(M, u). Then G(M, ·)
is the input-output map for a given M , and M 7→ G(M, ·) is the map from the mass to the
input-output system.
The problem Fix the n × n matrix A and consider the equation
ẋ = Ax, x(0) = x0 .
As you know, the state at time t starting from x0 at time 0 is x(t, x0 ) = eAt x0 .
(t, x0 ) 7→ x(t, x0 )?
(b) What are the domain and co-domain of x(·, x0 )? Of x(t, ·)?
(c) Is the map x0 7→ x(·, x0 ) a linear transformation? Prove true or give a counterexample.
(d) Is the map t 7→ x(t, ·) a linear transformation? Prove true or give a counterexample.
Chapter 3
Calculus
In this chapter we review some calculus, including the method of Lagrange multipliers.
3.1 Jacobians
Suppose f : R −→ R is a function of class C 2 , twice continuously differentiable. The Taylor series
expansion of f at x is
df 1 d2 f 2 1 d3 f
f (x + ε) = f (x) + (x)ε + (x)ε + (x)ε3 + · · · .
dx 2! dx2 3! dx3
This extends to a function f : Rn −→ R. Thus, in the expression f (x), x is a vector with n
components and f (x) is a scalar. The Jacobian of f , denoted fx , is the 1 × n matrix (row vector)
whose j th element is ∂f /∂xj . We shall write the transpose of fx as ∇f , the gradient of f . Thus
∇f is a column vector.
Another way to think of the Jacobian is via the directional derivative. Let x and h be vectors
and ε a scalar. Consider f (x + εh) as a function of ε and think of its Taylor series at 0:
2 d2
d ε
f (x + εh) = f (x) + ε f (x + εh) + 2
f (x + εh) + ··· .
dε ε=0 2 dε ε=0
Thus
36
CHAPTER 3. CALCULUS 37
Now
d2 d
2
f (x + εh) = fx (x + εh)h.
dε dε
In more manageable terms, the right-hand side is (with the argument dropped and by use of the
chain rule again)
∂ ∂f ∂f
h1 + · · · + hn h.
∂x ∂x1 ∂xn
This in turn equals
hT fxx (x + εh)h,
∂2f
.
∂xi ∂xj
ε2 T
f (x + εh) = f (x) + εfx (x)h + h fxx (x)h + · · · .
2
This generalizes to a function f : Rn −→ Rm . Thus, in the expression f (x), x is a vector with
n components and f (x) is a vector with m components. So we can write
x = (x1 , . . . , xn ), f = (f1 , . . . , fm ).
The Jacobian of f , still denoted fx , is the m × n matrix whose ij th element is ∂fi /∂xj . The
derivative of f at the point x in the direction of the vector h is defined to be
d
f (x + εh) .
dε ε=0
This turns out to be a linear function of the vector h, and it must therefore equal M h for some
matrix M . In fact, M equals the Jacobian of f at x.
Example
m = 1, n = 2, f (x) = c1 x1 + c2 x2 , fx (x) = c1 c2
More generally, if f (x) = cT x, then fx (x) = cT . This can be derived like this:
f (x + εh) = cT (x + εh)
= cT x + εcT h
d
f (x + εh) = cT h
dε
d
= cT h
f (x + εh)
dε ε=0
CHAPTER 3. CALCULUS 38
fx (x) = cT .
Example If
then fx (x) = 2xT . More generally, consider f (x) = xT Qx, where Q is a symmetric matrix. You
can derive that fx (x) = 2xT Q. If Q is not symmetric, then
Thus
Given a function f : Rn −→ R and an open set V in Rn , the problem is to maximize f (x) subject
to x ∈ V. Of course, minimizing f is the same as maximizing −f , so we’re solving that problem
too.
We say xo is a global maximizer if
Minimizer has the obvious definition, and optimizer refers to a point that’s either a maximizer of
minimizer. Finally, we say f is of class C r , or f is C r , if all partial derivatives of f of order up to
r exist and are continuous.
First, the necessary condition:
where o(ε)/ε converges to 0 as ε converges to 0. That’s what little o means. Since xo is a local or
global maximizer and V is open,
fx (xo )h ≤ 0.
Example
Fermat’s principle (1662) is that the path between two points taken by a beam of light is the one
that is traversed in the least time. Snell’s law of refraction follows directly from this statement.
To prove this, consider a light ray from a fixed point p1 to a fixed point p2 . The two points are in
different media, where the speeds of light in the media are respectively c/n1 , c/n2 .
n1 n2
p1
θ1 p
θ2
p2
Let p denote the point where the ray from p1 to p2 passes through the interface of the media. We
want to find p according to Fermat’s principle. Orient an (x, y) coordinate system as shown by the
axes. The time for a ray to pass from p1 to p2 is
n1 n2
J= kp1 − pk + kp − p2 k.
c c
Let
n1 sin θ1 − n2 sin θ2 = 0.
Thus
n1 sin θ2
= ,
n2 sin θ1
which is Snell’s law.
For the sufficient condition, we need the concept of positive definite matrix. Let Q be a real,
square matrix. It is positive semi-definite (written Q ≥ 0) if xT Qx ≥ 0 for all x. It is positive
definite (written Q > 0) if xT Qx > 0 for all x 6= 0. Negative definite and semi-definite are defined
in the obvious way. If Q is symmetric, then it is positive semi-definite iff all its eigenvalues are ≥ 0,
and positive definite iff they’re all positive.
Let H(x) denote the Hessian of f .
The first term inside the brackets is negative, while the second term goes to zero as ε → 0. Thus
for ε sufficiently small
Example
1
f (x) = xT Ax + bT x + c, A symmetric
2
We have
xT A + bT = 0
has a solution; that is, b belongs to the span of the columns of A. Then, if xo is a solution of this
equation and if A is negative definite, then xo is a local maximizer.
Example In the plane, find the point on a given line that is closest to a given point:
given point
given line
optimal point
This is a distance problem. Obviously, you can get the closest point by drawing the perpendicular
from the given point to the given line.
Before we solve this problem, let’s clarify some notation. The norm of x = (x1 , x2 ) is
1/2
kxk = x21 + x22 ,
CHAPTER 3. CALCULUS 42
and this can also be written kxk = (xT x)1/2 , that is,
T
x1
= x21 + x22 .
x x = x1 x2
x2
To develop a solution method, suppose the given point is v = (1, 2) and the equation of the
given line is
x2 = 0.5x1 + 0.2.
cT = −0.5 1 , b = 0.2.
Then x is on the line iff cT x = b. Also, the distance from v to x is kv − xk. Note that kv − xk
is minimum iff kv − xk2 is minimum. Thus we have arrived at the following equivalent problem:
minimize the quadratic function kv − xk2 of x subject to the equality constraint cT x = b. Notice
that
kv − xk2 = (v − x)T (v − x) = v T v − v T x − xT v + xT x.
The right-hand side is a quadratic function of x. Since xT v = v T x (dot product of real vectors is
symmetric), we have
kv − xk2 = v T v − 2v T x + xT x.
min v T v − 2v T x + xT x.
x, cT x=b
(1 − x1 )2 + (2 − x2 )2 ,
to get a function f (x1 ). Set the derivative of f to zero, solve for x1 , then get x2 . The answers are
x1 = 1.52, x2 = 0.96.
Lagrange Multipliers
Now we return to the first example in this section. It had the form
We are going to use the method of Lagrange multipliers. The idea is to absorb the constraint
cT x = b, or equivalently cT x − b = 0, into the function being minimized, leaving an unconstrained
problem. Define the Lagrangian
Here λ is an unknown that multiplies the constraint equation. It turns out a necessary condition
for optimality of x is that L should be stationary with respect to both x and λ, that is,
Lx = 0, Lλ = 0.
fx + λcT = 0, cT x − b = 0,
2x + λc = 2v, cT x = b.
The x is the optimal x, the closest point, and the λ can be discarded—it was introduced only to
solve the problem.
Let’s look at a somewhat more general problem by the Lagrange multiplier method.
minimizex kc − Axk
subject to the constraint Bx = d. Here x, c, d are vectors and A, B matrices. Assume A has full
column rank and B has full row rank.
Define
J(x) = kc − Axk2 = (c − Ax)T (c − Ax)
= cT c − cT Ax − xT AT c + xT AT Ax
= cT c − 2cT Ax + xT AT Ax
and
Here the Lagrange multiplier has to be a vector. Differentiating with respect to x then λ, we get
−2cT A + 2xT AT A + λT B = 0, Bx − d = 0.
Transposing the first gives
−2AT c + 2AT Ax + B T λ = 0, Bx − d = 0.
Collect as one equation:
2AT A B T 2AT c
x
= .
B 0 λ b
If it can be proved that the matrix on the left is invertible, then the optimal x is
2AT A B T −1 2AT c
x= I 0 .
B 0 b
So let’s see that the matrix
2AT A B T
B 0
is invertible. It suffices to prove that the only solution to the homogeneous equation
2AT A B T
x
=0
B 0 λ
is the trivial solution. So start with
2AT A B T
x
= 0.
B 0 λ
Thus
2AT Ax + B T λ = 0, Bx = 0.
Since A has full column rank, the matrix AT A is positive definite, hence invertible. Thus
x + (2AT A)−1 B T λ = 0, Bx = 0.
Multiply the first equation by B and use the second:
B(2AT A)−1 B T λ = 0.
Pre-multiply by λT :
λT B(2AT A)−1 B T λ = 0.
Since (2AT A)−1 is positive definite, it follows that B T λ = 0. Then, since B T has full column rank,
λ = 0. Finally, from the equation
x + (2AT A)−1 B T λ = 0,
we get that x = 0. Thus x = 0, λ = 0 is the only solution of
2AT A B T
x
= 0.
B 0 λ
CHAPTER 3. CALCULUS 45
f, g : R2 −→ R.
The set of all x satisfying the constraint g(x) = 0 typically is a curve. For a given constant c, the
set of all x satisfying f (x) = c is called a level set of f . Now assume xo is a locally optimal point
for the problem ming(x)=0 f (x). That is, if x is nearby xo and g(x) = 0, then f (x) > f (xo ).
level sets of f
∇f f decreasing
∇g
x∗
g=0
Thus there is a scalar λo such that ∇f (xo ) + λo ∇g(xo ) = 0. This implies the gradient of the
function
f (x) + λo g(x)
satisfies
Lx (xo , λo ) = 0, Lλ (xo , λo ) = 0.
In conclusion, a necessary condition for a point xo to be a local optimum for the problem ming(x)=0 f (x)
is that there exist a point λo such that the derivative of the Lagrangian L(x, λ) equals zero at xo , λo .
CHAPTER 3. CALCULUS 46
f : Rn −→ R, g : Rn −→ Rm .
We begin with some definitions. A subset A of Rn is closed if it contains the limit of every
convergent sequence of points in A; that is, if {xk } is a sequence in A that converges to a point in
Rn , then that limit is actually in A. A subset A of Rn is bounded if there exists r > 0 such that
kxk ≤ r for every x ∈ A. A closed and bounded set is said to be compact. (This is not actually
the definition of a compact set, but in finite dimensional space it’s equivalent.)
Now we look at the constraint set C := {x : g(x) = 0}. If g is continuous, C is closed. This is
pretty immediate: If {xk } is a sequence such that g(xk ) = 0 for all k and if the sequence converges
to, say, x, then g(x) = 0 by continuity.
Now it’s a fact from analysis that a continuous function on a compact set achieves its maximum.
Thus if f and g in the given problem are continuous and if C is bounded (and therefore compact),
then the problem
max f (x)
x∈C
is solvable—there is a maximizer. The trouble is that frequently C isn’t bounded; of course, this
doesn’t imply there isn’t a maximizer.
Now we see the mathematical justification of the method of Lagrange multipliers. For the
problem at hand, define the Lagrangian
Theorem 3.1 Suppose f, g are C 2 , the problem maxC f (x) has a local solution xo , and gx (xo ) is
surjective. Then there exists a vector λo such that
Lx (xo , λo ) = 0, Lλ (xo , λo ) = 0.
Proof We’ll do only the simpler case where g is linear, g(x) = Ax. Then gx (x) = A and the
hypothesis is that A has rank m. Suppose without loss of generality that
A= B C ,
x = (y, z), y ∈ Rm .
Then
g(y, z) = 0
⇐⇒ By + Cz = 0
⇐⇒ y = −B −1 Cz.
CHAPTER 3. CALCULUS 47
max f (x)
g(x)=0
Define
Since xo is an optimizer,
z o = 0 I xo
hz (z o ) = 0
−fy (y o , z o )B −1 C + fz (y o , z o ) = 0
−B −1 C
o
fx (x ) = 0. (3.1)
I
Define
−B −1
∗T o
λ = fx (x ) .
0
Then
−B −1
o o
= fx (x ) + fx (x ) B C
0
−I −B −1 C
= fx (xo ) + fx (xo )
0 0
−1
0 −B C
= fx (xo ) .
0 I
Lλ = g(x).
CHAPTER 3. CALCULUS 48
Lloyd’s algorithm in 1D
This algorithm was originally developed for the problem of quantizing data. Let r be a real number
that could take any value in the interval [0, 1]. We want to partition [0, 1] into a finite number,
n, of subintervals, {Vi }i=1,...,n , and then, for each i, designate one point pi in Vi as the codeword.
Then the quantization function would be to map r to pi if r ∈ Vi . The partition and code book are
optimal in a certain sense.
There’s a minor but annoying difficulty with the boundaries of {Vi }i=1,...,n . Strictly speaking
these intervals should not overlap; that is, every point should be in one and only one Vi . But this
complicates the derivation to the point where it obscures the ideas. So we’ll take the subintervals
to be closed and ignore the case where a point lies on the boundaries of two subintervals.
The algorithm is illustrated by an example.
Example (n = 3) Let p1 < p2 < p3 be three arbitrary points in [0, 1]. Construct a partition
{V1 , V2 , V3 } as shown here:
V1 V2 V3
p1 p2 p3
So V1 is from 0 to the midpoint between p1 and p2 , (p1 + p2 )/2; V2 is from (p1 + p2 )/2 to (p2 + p3 )/2;
and V2 is from (p2 + p3 )/2 to 1. This is called the Voronoi partition1 V generated by {pi }; the
intervals are uniquely defined by this property: Vi is the set of all points q whose distance from pi
is less than or equal to the distances from all other pj :
Continuing with the algorithm, update pi to be the centre ci of Vi :
V1 V2 V3
p1 p2 p3
V1 V2 V3
p1 p2 p3
1
Named after the Ukrainian mathematician Georgy Voronoi (1868–1908).
CHAPTER 3. CALCULUS 49
And so on. Does this procedure converge? Let p be the vector (p1 , p2 , p3 ). Then the update law is
p(k + 1) = Ap(k) + b,
V1 V2 V3
p1 p2 p3
Thus the intervals have equal width and the points are their centres, just what you’d like for a
quantizer.
Continuous time
There’s a natural continuous-time version of the algorithm.
Example (continued) Think now of pi , ci , and Vi as evolving in continuous time (ci is the centre
of Vi ):
This leads to
ṗ = Ap + b,
Lloyd’s algorithm in 2D
Consider a convex polytope W in R2 of area AW . Its centroid (point of balance) cW satisfies
Z
(q − cW )dq = 0,
W
and therefore
1
Z
cW = qdq.
AW W
CHAPTER 3. CALCULUS 50
cW
Fixed partition
Let’s extend to n sensors. Consider a convex polytope Q in R2 . Suppose W = {Wi }i=1,...,n is a
given partition:
W1
Q
W3
W2
Now suppose there are n sensors that are to be placed at locations {pi }, one in each cell: pi ∈ Wi :
W1
p1
W3 Q
p2 p3
W2
The cost function for cell i is H(pi , Wi ) and the total cost function is
where p = (p1 , . . . , pn ) denotes the vector of sensor positions. Since W1 , . . . , Wn are all disjoint,
V1
p1 Q
p2 p3
V2
V3
In mathematical terms,
Each Vi is the intersection of half planes. The picture just shown is called a Voronoi diagram, and
the partition is uniquely determined by p.
Lemma 3.4 For a given p, the unique partition that minimizes H(p, W) is the Voronoi partition,
W = V.
Proof Let’s do the case n = 2 for simplicity of explanation. Here’s the picture:
W1
Q
p1 V2
V1 p2
W2
The solid line through Q defines the Voronoi partition; it bisects the line joining p1 and p2 . Let W
be any other partition, shown by the dashed line. We’ll show that
that is,
Z Z Z Z
kq − p1 k2 dq + kq − p2 k2 dq ≤ kq − p1 k2 dq + kq − p2 k2 dq. (3.2)
V1 V2 W1 W2
The procedure converges asymptotically to a Voronoi partition with pi being the centroid of Vi .
However, the limit may be only a local optimum for the function H. For example:
CHAPTER 3. CALCULUS 54
If the algorithm is initialized in either of the two ways shown, it terminates immediately. But the
right-hand value of H is larger than the left-hand value.
References
1. J. Cortes, S. Martinez, T. Karatas, and F. Bullo, “Coverage control for mobile sensing net-
works,” IEEE Transactions on Robotics and Automation, 20(2): 243–255, 2004.
3.6 Problems
1. Let f : R −→ R be defined by
2
x sin(1/x), x 6= 0
f (x) =
0, x = 0.
2. Define f : R2 −→ R by
x2 − x22
x1 x2 21 , x 6= 0
f (x1 , x2 ) = x1 + x22
0, x = 0.
3. Consider
Suppose, given x(0), we want to drive the state x(k) to the origin. There are at least two
ways to do this. The feedback method is to choose F so that A + BF is nilpotent, if possible,
and then set u = F x. We’ll look at the other method, open-loop control. We have
and so on until
where
An−1 B
W = B AB · · · , ũ = (u(n − 1), . . . , u(0)).
We assume (A, B) is controllable. Therefore W is surjective and there exists ũ such that
x(n) = 0. We propose to choose ũ such that x(n) = 0 and kũk2 is minimum. The idea
is to drive the state to the origin using minimum energy. This is precisely our constrained
optimization problem with
ẋ = −x(kxk2 − 1),
where x is a vector. Find the equilibrium points. Linearize the equation at every equilibrium
and see if you can conclude anything about local stability.
7. Let f : Rn −→ Rn be given by
1
f (x) = x.
kxk
The norm is the Euclidean norm, kxk = (xT x)1/2 . The definition of f makes sense as long as
x 6= 0; for completeness you can set f (0) = 0 (the value is irrelevant). Compute fx (x), the
Jacobian of f at x 6= 0.
8. Solve the problem
min kc − Axk.
x
Show that there always exists a solution. When is the solution unique?
9. Find the vector in AKerB that is closest to c, where
1 1 1
A = 2 −2 , B = 1 1 , c = 1 .
3 0 1
CHAPTER 3. CALCULUS 56
11. This is a problem on sensor placement. Suppose we want to place a sensor to detect, say,
temperature.
(a) Suppose the workspace is the unit interval [0, 1]. We want to place a sensor at a location
p ∈ [0, 1] to get “optimal coverage.” To make this precise, we have to define a measure
of coverage error, denoted H(p). Here are some options:
If q is another point, how well the sensor can measure the temperature at location q
depends on the distance |q − p|. Suppose the temperature error is in fact proportional
to |q − p|. Then the average error is proportional to
Z 1
H1 (p) = |q − p|dq
0
Suppose the temperature error is in fact proportional to |q − p|2 . Then the average error
is proportional to
Z 1
H2 (p) = |q − p|2 dq.
0
Find the optimal p (the one that minimizes H) in each of the three cases. Is the optimal
p the centre of the interval in all three cases?
(b) Now consider the extension to a convex polygon region in R2 . As in the course notes,
for H2 (p), the optimal p is the centroid of the region. Give an example where this isn’t
true for H1 (p) or H∞ (p).
Part I
Classical Theories
57
Chapter 4
Calculus of Variations
“300 Years of Optimal Control: From the Brachistochrone to the Maximum Principle,”
H.J. Sussman and J.C. Willems, IEEE Control Systems Magazine, 1997,
Here we study the brachistochrone problem and in a later chapter, the maximum principle. Brachis-
tochrone means “shortest time.”
A tiny spherical wooden bead with a hole drilled through it slides from rest without friction
along a rigid wire:
The starting point A is higher in elevation than the end point B, and the curve of the wire lies in
a vertical plane like this:
58
CHAPTER 4. CALCULUS OF VARIATIONS 59
A x
y B
We want the bead to slide under the force of gravity from A to B. The two points A and B are fixed
in space, but the wire curve is free for us to design. For what curve does the bead slide from A to
B in minimum time? It’s not the straight line from A to B.
This is the brachistochrone problem. It was worked on by Newton, the Bernoulli brothers,
and other great scientists. The problem is harder than a simple calculus problem because we’re
looking for an optimal curve instead of an optimal number or vector. The space of curves is infinite
dimensional. Let a candidate curve be
Claim v 2 = 2gy
This follows from the conservation of energy:
1
mv 2 = mgy.
2
But here’s another derivation:
Proof of claim The force vertically down on the bead is mg and therefore the force tangent to
the curve is
dy
mg .
ds
Newton’s second law gives
dy
mg = ms̈,
ds
i.e.,
dy
g = s̈.
ds
CHAPTER 4. CALCULUS OF VARIATIONS 60
Multiply by 2ṡ:
2g ẏ = 2ṡs̈.
Integrate:
Z t Z t
2g ẏdτ = 2ṡs̈dτ.
0 0
Thus
2gy + c = ṡ2 = v 2 .
Next, we have
ds2 = dx2 + dy 2 ,
and therefore
where prime denotes derivative with respect to x. On the left replace ṡ2 by v 2 = 2gy:
2gy = (1 + y 02 )ẋ2 .
Thus
s
1 + y 02
dt = dx.
2gy
Integrate:
s
x1
1 + y 0 (x)2
Z
t1 = dx.
0 2gy(x)
Changing notation, we have arrived at the problem of finding a curve x(t) to minimize
t2 1/2
1 + ẋ2
Z
J(x) = dt
t1 2gx
where x(t1 ) = x1 , x(t2 ) = x2 are fixed. The function f maps R × Rn × Rn to R and is assumed to
be of class C 2 .
Let us denote by X the vector space of C 1 functions x : R −→ Rn and by Xa the subset such
that x(t1 ) = x1 and x(t2 ) = x2 . This latter is the set of admissible curves. The problem is to find
x ∈ Xa to minimize J(x).
Thus
J : X −→ R.
A function like this whose domain is a function space and whose co-domain is the reals is usually
called a functional. Other interesting examples are the length of a curve, the area surrounded by
a closed curve, etc.
Thus
1/2
1 + ẋ2
1
fx = −
2x 2gx
1 ẋ
fẋ = √
2gx (1 + ẋ2 )1/2
CHAPTER 4. CALCULUS OF VARIATIONS 62
d ẋ ẋ 1 ẍ
fẋ = − √ +√ .
dt 2x 2gx (1 + ẋ2 )1/2 2gx (1 + ẋ2 )3/2
Thus the Euler-Lagrange equation is
1/2
1 + ẋ2
1 ẋ ẋ 1 ẍ
− =− √ 1/2
+√ .
2x 2gx 2
2x 2gx (1 + ẋ ) 2gx (1 + ẋ2 )3/2
This simplifies to
2xẍ + ẋ2 + 1 = 0.
2yy 00 + y 02 + 1 = 0. (4.1)
Instead of solving this equation, it’s easier to propose a solution and then verify it. The path is a
cycloid, that is, a curve generated by a fixed point moving on a rolling wheel:
rθ
x
(x, y)
r θ
The wheel rolls along the x-axis as shown. The black dot traces out a cycloid. Thus r is constant,
θ(t) is a function of time, θ(0) = 0, and
x = rθ − r sin θ, y = r − r cos θ.
1 d 0 1 d sin θ 1
y 00 = y = =− .
ẋ dt rθ̇(1 − cos θ) dt 1 − cos θ r(1 − cos θ)2
Substitute these into (4.1).
For the terminal point (2, −1), the graph is this:
CHAPTER 4. CALCULUS OF VARIATIONS 63
The proof of Theorem 4.1 requires a lemma. Recall the spaces X and Xa . Define X0 to be the
subspace of X of functions h(t) that equal zero at the two times t1 , t2 .
and let
Z t
h(t) = [y(τ ) − c]dτ.
t1
Then h ∈ X0 and
Z t2 Z t2
ky(t) − ck2 dt = [y(t) − c]T [y(t) − c]dt
t1 t1
Z t2
= [y(t) − c]T ḣ(t)dt
t1
Z t2
= y(t)T ḣ(t)dt − cT [h(t2 ) − h(t1 )]
t1
= 0.
Thus y(t) = c.
CHAPTER 4. CALCULUS OF VARIATIONS 64
Proof of Theorem 4.1 Let h ∈ X0 . Then for every ε, xo + εh is in Xa and so J(xo + εh) has a
minimum at ε = 0. Thus
d o
J(x + εh) = 0.
dε ε=0
We have
t2
d d
Z
o o o
J(x + εh) = f [t, x + εh, ẋ + εḣ] dt
dε ε=0 t1 dε ε=0
Z t2
= fx (t, xo , ẋo )h + fẋ (t, xo , ẋo )ḣdt
t1
Therefore we have
Z t2
fx (t, xo , ẋo )h + fẋ (t, xo , ẋo )ḣdt = 0.
t1
Thus
Z t2
d o
[−g + fẋ (t, xo , ẋo )]ḣdt.
J(x + εh)
=
dε ε=0 t1
Example
ẋ = Ax + u
Given x(0), find the minimum energy u such that x(1) = 0. To set this up, we have
Z 1 Z 1
2
ku(t)k dt = kẋ(t) − Ax(t)k2 dt.
0 0
So we define
Then
fx = −2ẋT A + 2xT AT A
This reduces to
ẍ + (A − AT )ẋ − AT Ax = 0.
This is a two-point boundary-value problem. (See one of the exercises.) The corresponding optimal
control is uo = ẋo − Axo .
4.4 Problems
1. In the (t, x)-plane, the problem is to find the curve x(t) from (t1 , x1 ) to (t2 , x2 ) of minimum
length (a straight line). Formulate the problem by writing the length of a curve as an integral
with respect to t, and solve by the calculus of variations.
2. Consider (x, y, t)-space and consider a curve x(t) in the (t, x)-plane from (t1 , x1 ) to (t2 , x2 ).
Rotate this curve about the t-axis. The surface area is
Z t2 p
2π x 1 + ẋ2 dt.
t1
(t1 , x1 )
(t2 , x2 )
where x(t) ∈ R. The values t1 and t2 are fixed. Also, x(t1 ) is fixed, but (unlike in Section 3.2)
x(t2 ) is unconstrained. Let h(t) be an arbitrary C 1 function such that h(t1 ) = 0. It can be
derived (don’t you do it) that
Z t2
d d
J(x + εh) = fx − fẋ hdt + fẋ (t2 , x(t2 ), ẋ(t2 ))h(t2 ).
dε ε=0 t1 dt
Assume xo is locally optimal. Then the first term on the right-hand side equals zero from
Theorem 3.1. It follows that
This must hold for every h(t2 ). It then follows that fẋ (t2 , xo (t2 ), ẋo (t2 )) must equal zero.
The Problem Using the theory just presented, solve the brachistochrone problem (minimum-
time path) where A is at the origin but the point B can be any point on the vertical line shown:
1
A x
y B
√
5. For the brachistochrone problem, derive the formula t1 = θ1 / g. (I think the formula is
correct.)
6. Solve the problem of moving a point p(t) in the plane from p(0) = (1, 1) to p(2) = (0, 0) while
minimizing the average velocity squared
1 2
Z
kṗ(t)k2 dt.
2 0
Chapter 5
The Mathematical Theory of Optimal Processes, L.S. Pontryagin, V.G. Boltyanskii, R.V.
Gamkrelidze, and E.F. Mischenko, Wiley, New York, 1962
We’ll do only a very brief introduction to this approach. You should see Athans and Falb for
the wealth of types of problems that can be formulated. Here we do just two special problems to
illustrate. Proofs are omitted.
M ÿ = u.
We say this system is a double integrator, because y is proportional to the double integral of u. In
fact, we might as well take M = 1 (by redefining u to be u/M ). The problem is to drive the cart
from any (y(0), ẏ(0)) to (y = 0, ẏ = 0) in minimum time. This makes sense only if u is bounded,
say |u(t)| ≤ 1.
Why can’t we use the calculus of variations on this problem?
We shall see that the optimal control signal uo is always with +1 or −1. That is, for every t
we have either uo (t) = 1 or uo (t) = −1. Such a control is said to be a bang-bang control law—it
jumps from one extreme value to the other. Let’s use that knowledge and continue.
Take the natural state model x = (y, ẏ):
0 1 0
ẋ = Ax + Bu, A = , B= .
0 0 1
For u = +1, ẏ = 1 and y is therefore increasing. This graph shows the vector field and two
trajectories—one starting at the origin and the other starting at (1, −1):
67
CHAPTER 5. THE MAXIMUM PRINCIPLE 3
68
-5 -4 -3 -2 -1 0 1 2 3 4 5
-1
-2
-3
1.6
0.8
-4.8 -4 -3.2 -2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2 4 4.8
-0.8
-1.6
-2.4
This leads to the switching curve: The part in the second quadrant is the trajectory backward in
time from the origin with u = −1; the part in the fourth quadrant is the trajectory backward in
time from the origin with u = 1: 2.4
1.6
0.8
-4.8 -4 -3.2 -2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2 4 4.8
-0.8
-1.6
-2.4
Every optimal trajectory has at most one switch, to get onto this switching curve. Here’s the
optimal trajectory starting in the first quadrant: 2.4
1.6
0.8
-4.8 -4 -3.2 -2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2 4 4.8
-0.8
-1.6
-2.4
CHAPTER 5. THE MAXIMUM PRINCIPLE 69
ẋ = f (x, u).
Theorem 5.1 Assume a local optimum uo exists and let xo be the corresponding state. Then the
following conditions hold:
1. There exists λo such that xo , λo satisfy
The form of the theorem is actually a minimum principle. This is because the problem studied
is one of minimization. The name “Maximum Principle” is used nevertheless. Notice that the
theorem gives a necessary condition for a local optimum.
ẋ = u.
This represents a point, in the plane, under velocity control. We want to steer the state from
x(0) = 0 to x(1) = v, a given target vector, while minimizing the control energy
Z 1
J(u) = ku(t)k2 dt,
0
CHAPTER 5. THE MAXIMUM PRINCIPLE 70
and subject to the constraint ku(t)k ≤ 1. We expect the optimal state trajectory to be a straight
line.
The Hamiltonian is
H(x, λ, u) = kuk2 + λT u.
Thus λo is a constant. The third condition in the theorem leads to uo (t) + λo (t) = 0, and hence
the optimal u is a constant vector too: uo (t) = c1 . Thus the problem is solvable only if kc1 k ≤ 1.
Assuming this inequality, we have from
ẋ = Ax + Bu.
Controllability of (A, B) is assumed. The goal is to drive the state from x(0) = x0 to x(t1 ) = 0 in
minimum time t1 . The control signal u is required to be piecewise continuous and to satisfy the
constraint u ∈ Ω, defined as the unit cube, i.e.,
u ∈ Ω iff (∀i)|ui | ≤ 1.
Theorem 5.2 Assume a local optimum uo and to1 exist and let xo be the corresponding state. Then
the following conditions hold:
This implies that uoi (t), the ith component of the optimal control, equals −1 if the ith component
of B T λo (t) is positive and equals +1 if the ith component of B T λo (t) is negative. Thus the optimal
control is confined to the set of vertices of Ω. This is called bang-bang control.
λ̇o1 = 0
λ̇o2 = −λo1 .
The solution is
Since B T λ = λ2 , we get that uo (t) equals −1 if λo2 (t) is positive and equals +1 if λo2 (t) is negative.
We conclude that the optimal control signal equals ±1 and switches at most once. That’s all the
theorem gives us. More details, such as the equation of the switching curve, have to be derived as
in the first section of the chapter.
5.3 Problems
1. Think of the bang-bang optimal control signal for the double integrator. Discuss how it
could be implemented. What sensors would be required? Is it a feedback controller? Is this
controller robust to sensor noise and modeling errors?
Chapter 6
Dynamic Programming
Dynamic programming (DP) is a clever approach to certain types of optimization problems. It was
developed by Richard Bellman and made popular in his book.
6.1 Examples
Example Let {x1 , . . . , xn } be a finite sequence of real numbers and consider the problem mini xi
of finding the minimum. If asked to write a program to solve this, you would undoubtedly write
this to compute the minimum, a:
a = x1 ;
for i = 2 : n
a = min(a, xi );
end
The DP method is exactly the same except in reverse order. Define the value function
that is, V (i) is the minimum “cost-to-go” starting at xi . The value V (1) is what we seek.
Of course, V (n) = xn . Suppose we know V (i) for some i, 1 < i < n. Then
V (i − 1) = min {xi−1 , . . . , xn }
= min {xi−1 , V (i)}.
V (n) = xn
Thus the minimization problem is a recursion of small minimization problems over just pairs of
numbers. There are n − 1 compare operations, and so the complexity is linear.
72
CHAPTER 6. DYNAMIC PROGRAMMING 73
n12 n22
n13 n23
The nodes are labeled nij , where i is interpreted as the stage and j as the node number at that
stage. Thus there’s one node at stage 0, three nodes at stage 1, etc. One wants to travel from the
start node n01 to the end node n31 with minimum cost. Each link has a cost, labeled like this (not
all are shown):
c011 c111
c012 c112
c013
Thus, ckij is the cost from node i at stage k to node j at stage k + 1. The cost of a path is defined
to be the sum of the costs of the links.
We define the value function, a real-valued function of the nodes, as follows: V (nij ) is the
minimum cost to go from node nij to the end node. The value function at stage 3 is obviously 0.
Thus V (n31 ) = 0. The value function at stage 2 is obviously just the cost of the last link:
V (n22 )
V (n23 )
Now to the value function at stage 1. We will invoke the so-called principle of optimality:
Consider an optimal path from n01 to n31 ; if this path goes through node n1j at stage 1, then the
subpath from node n1j to n31 is optimal too. That is, for every optimal path, the cost-to-go is
minimum at each point along the path. Note that we’re not saying the initial subpath is optimal,
CHAPTER 6. DYNAMIC PROGRAMMING 74
but rather the cost-to-go is. Thus at node n11 , since there are just three links out, we have
After the other values are computed at stage 1, one computes V (n01 ), which equals the minimum
cost path from start to end. After the value function is computed at every node, it’s easy to find
optimal paths by moving left to right.
ẋ = Ax + Bu, x(0) = x0
The arguments of J are the initial state x0 and the input signal u. Implicitly, u is such that J is
finite. This problem is a special case of
The argument t in the integrand has been dropped for convenience. Because A, B, Q, R are constant
matrices and the upper limit on the integral is ∞, you can check that V (τ, ξ) is independent of τ ,
that is, V is a function of only ξ in this instance. Nevertheless, we’ll keep the two arguments in
order to get the general HJB equation.
For any δτ > 0 we have
Z ∞ Z τ +δτ Z ∞
L(x, u)dt = L(x, u)dt + L(x, u)dt.
τ τ τ +δτ
Let uτ denote the piece of u defined over (τ, τ + δτ ) and uτ the piece of u defined over (τ + δτ, ∞).
R τ +δτ R∞
Then the term τ L(x, u)dt is a function of x(τ ) and uτ , while τ +δτ L(x, u)dt is a function of
CHAPTER 6. DYNAMIC PROGRAMMING 75
x(τ + δτ ) and uτ ; but x(τ + δτ ) is a function of x(τ ) = ξ and uτ . Minimizing over u we get
Z ∞
V (τ, ξ) = min L(x, u)dt
u τ
Z τ +δτ Z ∞
= min min L(x, u)dt + L(x, u)dt
uτ uτ τ τ +δτ
Z τ +δτ Z ∞
= min L(x, u)dt + min L(x, u)dt
uτ τ uτ τ +δτ
Z τ +δτ
= min L(x, u)dt + V (τ + δτ, x(τ + δτ )) .
uτ τ
and therefore
∂V ∂V
V (τ + δτ, x(τ + δτ )) = V (τ, ξ) + δτ (τ, ξ) + δτ (τ, ξ)f (ξ, u(τ )).
∂τ ∂x
Also,
Z τ +δτ
L(x, u)dt = δτ L[ξ, u(τ )].
τ
Thus we have
∂V ∂V
V (τ, ξ) = min δτ L[ξ, u(τ )] + V (τ, ξ) + δτ (τ, ξ) + δτ (τ, ξ)f (ξ, u(τ ))
u(τ ) ∂τ ∂x
and hence
∂V ∂V
0 = min L[ξ, u(τ )] + (τ, ξ) + (τ, ξ)f (ξ, u(τ )) .
u(τ ) ∂τ ∂x
In this equation, ξ and u(τ ) are dummy variables. Let’s replace them by x ∈ Rn and u ∈ Rm :
∂V ∂V
min L(x, u) + (τ, x) + (τ, x)f (x, u) = 0.
u ∂τ ∂x
As mentioned at the start of the derivation, V (τ, x) is a function only of x: V (x). So in this
time-invariant case the HJB equation is
dV
min L(x, u) + (x)f (x, u) = 0.
u dx
CHAPTER 6. DYNAMIC PROGRAMMING 76
The derivation of this equation wasn’t rigorous. What we have is an equation satisfied by an
optimal control law, u as a function of x, and the value function, V (x), under certain conditions, if
an optimal control exists. What to do with the equation? Solve the minimization problem on the
left-hand side for u as a function of x and dV /dx; then equate the left-hand side to zero.
That is,
f (x, u) = x + u, L(x, u) = x2 + u2 .
min x2 + u2 + Vx (x + u) = 0.
u
To do the minimization, differentiate with respect to u and set the derivative to zero:
2u + Vx = 0.
x2 + u2 + Vx (x + u) = 0
Thus
1 √
u = − Vx = −(1 ± 2)x.
2
Only the solution
√
u = −(1 + 2)x
2uT R + Vx B = 0,
CHAPTER 6. DYNAMIC PROGRAMMING 77
and so
1
u = − R−1 B T VxT .
2
Substituting into
xT Qx + uT Ru + Vx (Ax + Bu) = 0
gives
1
xT Qx + Vx Ax − Vx BR−1 B T VxT = 0.
4
Now we somehow have to find a V (x) satisfying this equation and such that u makes J finite. It
turns out that a quadratic function will work: V (x) = xT P x. Substituting Vx = 2xT P into the
equation gives
xT Qx + 2xT P Ax − xT P Vx BR−1 B T P x = 0,
or equivalently, to get a symmetric matrix,
xT Qx + xT P Ax + xT AT P x − xT P Vx BR−1 B T P x = 0.
This can be written as
xT (Q + P A + AT P − P BR−1 B T P )x = 0
and this leads to the Riccati equation:
Q + P A + AT P − P BR−1 B T P = 0.
It remains to study when this equation has a solution P such that J is finite for
u = −R−1 B T P x.
We’ll do this in a later chapter.
Let’s summarize what we’ve done. We assumed an optimal control law exists and we derived a
formula for it, but we don’t know when it is valid, that is, we don’t know if the Riccati equation
has a solution, and, if it does, we don’t know if J is finite. This is typical of DP. It provides an
existence condition but you still have to do a lot of work.
6.3 Problems
1. Find the minimum cost path.
5 4 1
2 3 3 4 2
1 2
start 2 finish
2 1 3 3 3
3 2 3 2
3 4
4 1 1
5 5
CHAPTER 6. DYNAMIC PROGRAMMING 78
2. Discrete-time LQR:
79
Preamble to Part II
In the next three chapters we formulate optimal control problems in terms of signal norms. This
leads to function spaces and operators on them. This is a branch of mathematics called functional
analysis. Here we introduce and motivate the main ideas.
The performance of a system should be measured by norms, for example, how large a tracking
error is, or how large a control signal is. So let us begin with some familiar terms for deterministic
signals. For a sinusoidal signal x(t) = A cos(ωt + φ), the zero-to-peak value is |A|. We write this as
kxk∞ and call it the infinity norm: kxk∞ = maxt |x(t)|. In an electric circuit, the power dissipated
in a resistor at time t is i(t)2 R and the energy dissipatedRis the integral of this over time. Extending
this, we shall think of the energy of a signal x(t) as x(t)2 dt and we shall write this as kxk22 ,
the square of the 2-norm. Thus for a signal x(t), there are two norms: kxk∞ , the zero-to-peak
value, and kxk2 , the square-root of the energy. Thus we have two norms to measure signal size for
deterministic signals.
Then there are random signals. Let x be a zero-mean random variable. Its root-mean-square
(rms) value qualifies as a norm: kxk = (E x2 )1/2 . This extends to a random vector: The norm is
kxk = (E xT x)1/2 , which can also be written kxk = Tr (E xxT )1/2 , where Tr denotes trace. This
extends to zero-mean stationary random signals.
Now we turn to norms of systems. To get a glimpse of this concept, consider the equation
y = Au, where u, y are vectors and A is a matrix. We shall think of this equation as defining a
system—input u, output y. We are going to define two norms for this system: The first is for a
specific input, and the second is what is called an induced norm.
For the first system norm, suppose u is a zero-mean white random vector, that is, its covariance
matrix equals the identity matrix: E uuT = I. The term “white” refers to the fact that two different
components of u are uncorrelated. The covariance matrix of y equals AAT and therefore the norm
of y equals Tr AAT , or equivalently, Tr AT A. This motivates introducing the Frobenius norm of a
matrix:
1/2
kAkF = Tr AT A .
1/2
You can check that this equals Σi,j a2ij . Thus, the Frobenius norm of A equals the rms output
when the input is the standard white vector.
The second system norm is defined by saying that its square equals the maximum output energy
when the input energy equals 1:
It is a fact that this induced norm equals σmax (A), the largest singular value of A.
80
Summary of Spaces
81
82
kcxk = |c|kxk
kx + yk ≤ kxk + kyk.
A vector space with a norm is a normed space.
Inner products
The space Rn has the inner product hx, yi, also written as a dot product. If x, y are regarded as
column vectors, then the inner product can also be written xT y. Finally, the Euclidean norm can
be defined in terms of the inner product:
In a general vector space X , an inner product hx, yi must have three properties: hx, yi = hy, xi;
for every y the map x 7→ hx, yi is linear; and hx, xi is positive for all nonzero x. A vector space with
83
CHAPTER 7. INTRODUCTION TO FUNCTION SPACES 84
an inner product is an inner product space. The space C[t1 , t2 ] with the norm kxk2 is an inner
product space, the inner product being
Z t2
hx, yi = x(t)y(t)dt.
t1
Completeness
Consider the set R and a sequence {an } in it. It is called a Cauchy sequence if
Evidently this sequence is “trying to converge” in the sense that the elements in the sequence are
getting closer and closer together. That it does converge in R is a feature of that set, a feature
called completeness: Every Cauchy sequence in R has a limit in R. For example, the interval
(0, 1] is not complete, because 1/n is a Cauchy sequence that converges to 0 6∈ (0, 1].
In a normed space, every convergent sequence is a Cauchy sequence (a good exercise). If,
conversely, every Cauchy sequence converges in the space, the space is said to be complete. A
complete normed space is called a Banach space; a complete inner-product space is called a
Hilbert space. The advantage of completeness is that in principle one doesn’t have to know a
limit to test if a sequence converges.
The space C[t1 , t2 ] with the norm kxk∞ is a Banach space, while C[t1 , t2 ] with the norm kxk2 is
not. To see this latter fact, note that the sequence
x1
x2
x3 . . .
t1 t2
is a Cauchy sequence in the norm kxk2 , but it converges to a step function, which is not continuous
and therefore not in C[t1 , t2 ]. It is not a Cauchy sequence in the norm kxk∞ .
Consider again C[t1 , t2 ] with the norm kxk∞ . The set of polynomial functions of t is a subset,
P[t1 , t2 ], of C[t1 , t2 ]. It is a subspace, that is, it is a vector space itself, but it is not complete. For
example, the function x(t) = sin t belongs to C[t1 , t2 ]; if xn (t) denotes the truncation at the nth
term of the Taylor series of x(t), then xn ∈ P[t1 , t2 ]; also, xn (t) converges to x in the sense that
lim kx − xn k∞ = 0.
n→∞
The closure of P[t1 , t2 ], denoted P[t1 , t2 ], is defined to be the limits of all sequences in P[t1 , t2 ]
that converge in C[t1 , t2 ]. The Weierstrass approximation theorem says that
P[t1 , t2 ] = C[t1 , t2 ],
that is, any continuous function on a closed interval can be approximated uniformly by a polyno-
mial. Now let’s enlarge C[t1 , t2 ]. There are certainly discontinuous functions that are bounded and
CHAPTER 7. INTRODUCTION TO FUNCTION SPACES 85
therefore for which the norm kxk∞ is finite. We are going to call this class of functions L∞ [t1 , t2 ].
1 With appropriate consideration, L∞ [t , t ] is a Banach space.
1 2
On the other hand, consider C[t1 , t2 ] with the norm kxk2 . As we saw, it’s not complete. But
it can be embedded in a complete normed space, which is denoted L2 [t1 , t2 ]. The construction of
this completion is somewhat involved, hence we omit it. For us, it’s good enough to accept that
L2 [t1 , t2 ] contains functions x for which there is a sequence {xn } of continuous functions, or even
polynomials, such that
lim kx − xn k2 = 0.
n→∞
Consider the subset cdn of signals of duration n, that is, x(k) equals zero for k > n. It’s routine to
prove that cdn is a subspace of the vector space `2 , and that furthermore it is a closed set within
`2 . Being closed, cdn is complete, and therefore is a Hilbert space. Now consider the subset cf d of
signals of finite duration, that is, x(k) equals zero within a finite time:
Again, cf d is a subspace of the vector space `2 . However cf d is not a closed set within `2 . To see
this, note that the sequence xi ,
1/2k , k ≤ i
xi (k) =
0, k>i
Vector-valued functions
Now we turn to vector-valued signals. Let x(t) denote a function where t is a real number and x(t)
is an n-dimensional real vector. The L2 -norm of x is defined to be
Z ∞ 1/2
2
kxk2 = kx(t)k dt .
−∞
1
Actually, we’re glossing over a subtle point. Let x(t) be the function defined on the interval [0, 1] by saying that
x(t) = n for t = 1/n, n ≥ 1, and x(t) = 0 otherwise. Sketch the graph of this function. It is unbounded but is zero
except at a countable number of points. We say that x equals zero almost everywhere and we set kxk∞ = 0.
CHAPTER 7. INTRODUCTION TO FUNCTION SPACES 86
The norm kx(t)k is the Euclidean norm of the vector x(t). Usually it is irrelevant what the dimension
is so we continue to write the space as L2 (R).
For a bounded signal the norm is
kxk∞ = sup kx(t)k.
t
Again, the right-hand norm is the Euclidean norm. We write L2 (R) for the class of such functions.
We have just seen the definition of Hilbert space: a complete inner product space. Thus Hilbert
space is an abstract concept for which there are many instances, L2 [t1 , t2 ] being one. This idea of
defining an abstract concept is very common in mathematics because one can get a general result
that applies in many instances.
In a classification of spaces, Hilbert and Banach spaces sit here:
topological space
normed space
Hilbert space
Proof If y = 0 then both sides of the inequality equal 0. So now assume y 6= 0. Define
hx, yi
c= .
kyk2
Then
0 ≤ kx − cyk2
= hx − cy, x − cyi
= kxk2 − chy, xi − chx, yi + c2 kyk2 .
CHAPTER 7. INTRODUCTION TO FUNCTION SPACES 87
chx, yi ≤ kxk2 ,
and so
The proof is left as an exercise. The picture going with the lemma is this:
y x+y
x−y
x
In an inner product space, we say x, y are orthogonal and write x ⊥ y if hx, yi = 0. For
example, ejnt ⊥ ejmt in L2 [0, 2π]. Here, the field of scalars is C and the inner product is
Z 2π
hx, yi = x(t)y(t)dt.
0
A set V in a vector space is convex if, whenever x, y ∈ V, all points on the line from x to y are
in V, i.e.,
Theorem 7.1 Let X be an inner product space, V a complete convex subset, and x ∈ X . There is
a unique vector in V that is closest to x.
Proof Define
δ = inf kx − vk.
v∈V
Then there is a sequence {vn } in V such that kx − vn k converges to δ. If we can show that {vn } is
a Cauchy sequence, then, because V is complete, v o = limn vn exists, belongs to V, and is closest to
x.
To prove the sequence is a Cauchy sequence, by the parallelogram equality we have
Therefore
k(vn + vm )/2 − xk ≥ δ.
Therefore
The right-hand side is arbitrarily small for n, m sufficiently large. This proves the Cauchy property.
Finally, uniqueness is proved like this. Suppose
kx − v o k = kx − vk = δ.
Thus v o = v.
The preceding result is not true in general in a normed space, as shown in an exercise.
A subspace V in a Hilbert space X may not be closed, as we saw. Its orthogonal complement,
denoted V ⊥ , is the set of vectors orthogonal to every vector in V. The set V ⊥ is a subspace and it
is closed. If V is closed, then
X = V ⊕ V ⊥,
Theorem 7.2 Let X be a Hilbert space and V a closed subspace. Let x ∈ X and let v o be the vector
in V that is closest to x. Then x − v o ⊥ V.
Proof Suppose not. Then there is a vector v in V of unit norm and such that
hx − v o , vi = c 6= 0.
Then
θ̈ + θ̇ = u.
The state is x = (θ, θ̇). Suppose the control objective is to drive the state from x(0) = (0, 0) to
x(1) = (1, 0) using minimum energy, that is,
Z 1
kuk2 = u(t)2 dt
0
hv1 , ui = 1, hv2 , ui = 0.
The vectors v1 , v2 are linearly independent. Let V denote their span. Since V is finite-dimensional,
it is closed. Let W denote the set of control signals driving the state to the desired point:
If u ∈ W, then every vector of the form u + p, p ⊥ V, belongs to W. Thus the picture looks like
this:
W
⊥
V
It follows that the optimal u lies at the intersection of V and W. So write u = c1 v1 +c2 v2 . Substitute
this into the constraint equations and solve for c1 , c2 :
1
uo (t) = (1 + e − 2et ).
3−e
CHAPTER 7. INTRODUCTION TO FUNCTION SPACES 90
7.3 Operators
Let X , Y be normed spaces and let T : X −→ Y be a linear function; function, mapping, transfor-
mation are synonymous. We say T is bounded if
(∃b)(∀x)kT xk ≤ bkxk.
The least bound b for which this inequality holds is called the norm of T , denoted kT k. It is a fact
that boundedness and continuity are equivalent.
A good example is a BIBO stable system. For example, consider the LTI system with transfer
function
1
G(s) =
s+1
and let T denote the time-domain mapping from input to output. If we take X , Y both to be the
space of bounded continuous functions on the time interval [0, ∞), with norm
ẋ = Ax + Bu, x(0) = 0.
Suppose dim x = n, dim u = m. Fix a time, say t = 1, and consider the map T from u to x(1). Let
us take the domain of T , denoted U, to be the Hilbert space L2 [0, 1], that is, m-dimensional vectors
u(t) each of whose components lives in L2 [0, 1]. The inner product on U is
Z 1
hu, vi = u(t)T v(t)dt.
0
The co-domain of T is Rn . It’s not hard to show that T is bounded. That it is bounded, even
though A may not be stable, is because the time interval is finite. Nothing very bad can happen in
finite time. We’ll see later what the norm of T is.
hT x, yi = hx, T ∗ yi.
hT u, xi = hu, T ∗ xi
CHAPTER 7. INTRODUCTION TO FUNCTION SPACES 91
the left-hand inner product is in Rn and the right-hand one is in L2 [0, 1]. We have
Z 1 T Z 1
A(1−t) T (1−t)
hT u, xi = e Bu(t)dt x= u(t)T B T eA xdt.
0 0
Denoting T ∗ x by v, we have
Z 1
∗
hu, T xi = u(t)T v(t)dt.
0
The image of T , denoted Im T , is the set of all vectors T x as x ranges over all X . The image
is a subspace of Y, though it may not be closed. The kernel of T , denoted Ker T , is the set of all
vectors x such that T x = 0. The kernel is a closed subspace of X .
Proof
(∀x)kT xo − yk ≤ kT x − yk.
Define y o = T xo ∈ Im T .
CHAPTER 7. INTRODUCTION TO FUNCTION SPACES 92
It is claimed that y o − y ⊥ Im T . To prove this, following the proof of the projection theorem
suppose to the contrary that there exists y1 ∈ Im T such that
ky1 k = 1, hy − y o , y1 i = c 6= 0.
Then
ky − (y o + cy1 )k2 = k(y − y o ) − cy1 k2
= ky − y o k2 − c2
< ky − y o k2 .
Since y o + cy1 ∈ Im T , there exists x such that T x = y o + cy1 . Thus
ky − T xk < ky − T xo k.
y o − y ∈ (Im T )⊥ = Ker T ∗ .
Hence T ∗ y o = T ∗ y, i.e., T ∗ T xo = T ∗ y.
(Sufficiency) Assume
T xo − y ∈ Ker T ∗ = (Im T )⊥ .
Thus xo is optimal.
Continuing with the same setup, suppose the equation T x = y is solvable. If it has more than
one solution, then it has infinitely many. Suppose we’d like a solution x of minimum norm.
Theorem 7.4 Assume the equation T x = y has a solution and that T ∗ has closed image. The
vector xo minimizes kxk subject to T x = y iff xo = T ∗ z where z is any vector such that T T ∗ z = y .
Proof Fix one solution x̄ of T x = y. Then any other solution has the form x̄ − x̃ where x̃ ∈ Ker T .
Thus the problem
min kxk
T x=y
By the projection theorem, and since Ker T is closed, the latter minimum exists and is unique: Let
it be achieved by x̃o . Define xo = x̄ − x̃o . Also by the projection theorem, xo belongs to (Ker T )⊥ ,
and thus to Im T ∗ , since it is closed. Thus xo = T ∗ z for some z. Multiplying this equation by T
gives y = T T ∗ z.
CHAPTER 7. INTRODUCTION TO FUNCTION SPACES 93
ẋ = Ax + Bu, x(0) = 0.
As before, let T denote the mapping from u to x(1). The domain of T , denoted U, is L2 [0, 1] and
the co-domain of T is Rn . The adjoint T ∗ is the mapping Rn −→ U given by
T (1−t)
(T ∗ x)(t) = B T eA x.
Let us pose the problem of finding the minimum norm u such that x(1) equals a prescribed target
vector. You’re asked to solve this in an exercise.
7.5 Problems
1. Prove that in a normed space every convergent sequence is a Cauchy sequence.
Show that the norm kxk∞ in C[0, 1] does not satisfy the parallelogram equality, and therefore
C[0, 1] with this norm is not an inner product space.
5. Consider the system with transfer function G(s) = 1/(s + 1). If the input is in L2 [0, ∞), does
that mean the output tends to zero as t −→ ∞?
6. Consider C[0, 1] with the norm kxk∞ . Define V to be the set of functions satisfying
Z 1/2 Z 1
x(t)dt − x(t)dt = 1.
0 1/2
Show that V is closed (hence complete) and convex, but it does not have an element of
minimum norm, that is, a vector closest to 0.
ẋ = Ax + Bu, x(0) = 0.
Assume (A, B) is controllable. Find u in L2 [0, 1] of minimum norm such that x(1) = v, a
given vector.
8. Let p(t) be a polynomial with real coefficients and of degree at most 1; that is, it has the form
p(t) = c0 + c1 t with c0 , c1 real numbers. The task is to approximate the quadratic t2 over the
range 0 ≤ t ≤ 1. This is expressed by saying that
Z 1
[p(t) − t2 ]2 dt
0
CHAPTER 7. INTRODUCTION TO FUNCTION SPACES 94
Then the problem is to find p in V that is closest to t2 . Solve this problem via the projection
theorem.
9. Consider the space Rn×m of n × m matrices. There is a natural inner product, namely,
hX, Y i = trace X T Y.
The trace of a square matrix is the sum of its diagonal elements. This inner product reduces
to the usual one for vectors when m = 1. Consider the optimization problem
minimizeX kA − BXCk.
minimizeX kA − T (X)k,
where T is the linear transformation given by T (X) = BXC. Derive the adjoint of T and
write down the equation for an optimal X.
10. (a) Let H2 denote the space of rational transfer functions that have real coefficients and are
strictly proper. (It’s not a Hilbert space because it is not complete, but it’s a perfectly
good inner-product space.) Examples are
1 s2
, 3
s + 1 s + 3s2 + s + 1
but not
1 1
1, , , sin s.
s−1 s+j
Show that this is an inner product space with inner product
Z ∞
1
hF, Gi = F (−jω)G(jω)dω.
2π −∞
What is the corresponding norm of f (t), the inverse Laplace transform of F (s)?
CHAPTER 7. INTRODUCTION TO FUNCTION SPACES 95
(c) If you haven’t had a course in complex variables, you’ll have to look up the residue
theorem. For F, G ∈ H2 , show that hF, Gi equals the sum of the residues of F (−s)G(s)
at its poles in the left half-plane. For example, if F (s) = 1/(s + 1), then kF k22 equals the
residue of
1
F (−s)F (s) =
(−s + 1)(s + 1)
(g) Let T be the operator H2 −→ H2 that maps V to U V , U as in the preceding part. Find
the adjoint operator T ∗ . What are T T ∗ and T ∗ T ? Be careful: The adjoint of T is not the
mapping V (s) 7→ U (−s)V (s); this is because U (−s) has a pole in the right half-plane,
and therefore U (−s)V (s) is not in H2 in general, even though V is in H2 .
11. Let H∞ denote the set of functions G(s) that are analytic and bounded in the open right
half-plane. For example, if G(s) is rational, then it has no poles in the closed right half-plane
and it is proper (its numerator degree is not greater than its denominator degree). On this
space define the norm
Now let
s−1
F (s) = .
s+1
Prove that F H∞ , the set of all products F G, where G ranges over H∞ , is closed in H∞ .
minimizex kc − Axk.
Take
1 1 1
A = 2 0 , c = 1 .
3 1 −1
hX, Y i = Trace X T Y.
Let S denote the subspace of symmetric matrices. Find S ⊥ . Given A, find B, C such that
A = B + C, B ∈ S, C ∈ S ⊥.
14. Find the distance in the Frobenius norm from a square matrix to the nearest lower triangular
matrix.
P
15. Einstein’s summation convention is that an expression like k aik bkj is abbreviated to aik bkj .
That is, in any product expression, a repeated index (k in this case) means summation. Thus
if A and B are matrices such that the product C = AB is defined, then cij = aik bkj . Using
this convention, prove that the trace of AB equals the trace of BA (assuming AB and BA
are square).
16. Consider the vector space Rn×n with the inner product hX, Y i = traceX T Y . The set of
symmetric matrices is a subspace, S. So the problem
min kA − Bk
B∈S
18. Consider an imaginary one-dimensional straight road going off to infinity in both directions.
Model the road as R. Consider a countably infinite number of cars on the road; assume each
car has been labeled with an integer number. At any particular time, the cars could be at
any particular points on the road. Let xk (t) denote the location on the road of car k at time
t and let x(t) be the infinite vector
Let’s fix t and drop the argument t in x(t). Thus x is a vector with an infinite number of
components, each of which could be any real number. Finally, let ` denote the set of all such
vectors x.
The set ` is a vector space. But it is not a normed space, and hence it cannot be a Hilbert
or Banach space. Make ` a topological vector space (you’ll have to look up the definition of
this). Modify the graph on page 86 and show where ` fits.
H2 Optimal Control
The symbol H2 stands for the space of all stable, strictly proper transfer functions, such as
1 2s − 1
, 2
s + 1 s + 5s + 2
but not
1 s
, .
s s+1
There’s a natural inner product (dot product) and norm:
Z ∞ Z ∞ 1/2
1 1 2
< P, Q >= P (jω)Q(jω) dω, kP k2 = |P (jω)| dω .
2π −∞ 2π −∞
Here the bar denotes complex conjugate. The space extends to matrices, as we’ll see.
In this chapter we study optimal control design in this space.
8.1 Overview
This section gives an overview of the standard H2 problem. An example illustrates how to set
problems up.
Let L(R, Rn ) denote the space of all signals from R to Rn . We review the L2 -norm of a signal
u in L(R, Rn ). For each time t, u(t) is a vector in Rn ; denote its Euclidean norm by ku(t)k. The
L2 -norm of u is then defined to be
Z ∞ 1/2
2
kuk2 = ku(t)k dt .
−∞
The space L2 (R, Rn ), or just L2 (R) if convenient, consists of all signals for which this norm is finite.
For example, the norm is finite if u(t) converges to 0 exponentially as t → ∞. (Caution: kuk2 < ∞
does not imply that u(t) → 0 as t → ∞—think of a counterexample.)
Before defining a norm for a transfer matrix, we have to deal a norm for complex matrices. Let
R be a p × m complex matrix, that is, R ∈ Cp×m . There are many possible definitions for kRk; we
need one. Let R∗ denote the complex-conjugate transpose of R. The matrix R∗ R is Hermitian and
positive semidefinite. Recall that the trace of a square matrix is the sum of the entries on the main
diagonal. It is a fact that the trace also equals the sum of the eigenvalues.
The first definition for kRk (there will be another in the next chapter) is [trace(R∗ R)]1/2 .
97
CHAPTER 8. H2 OPTIMAL CONTROL 98
Example
2+j j
R =
1 − j 3 − 2j
∗ 2−j 1+j 2+j j 7 6 + 3j
R R = =
−j 3 + 2j 1 − j 3 − 2j 6 − 3j 14
√
kRk = (7 + 14)1/2 = 21
Observe in this example that if rij denotes the ijth entry in R, then
1/2
XX
kRk = |rij |2 .
i j
H2 -Norm
∞ 1/2
1
Z
kGk2 = trace [G(jω)∗ G(jω)] dω
20 i −∞
Note that the integrand equals the square of the first-definition norm of G(jω).
Concerning this definition is an important input-output fact. Let G be a stable, causal, LTI
system with input u of dimension m and output y of dimension p. Let ei , i = 1, . . . , m, denote
the standard basis vectors in Rm . Thus, δei is an impulse applied to the ith input; Gδei is the
corresponding output. Then the H2 -norm of the transfer matrix G is related to the average L2 -
norm of the output when impulses are applied at the input channels.
Pm
Theorem 8.1 kGk22 = 2
i=1 kGδei k2
with A stable, that is, all eigenvalues with negative real part. This matrix notation stands for the
transfer matrix:
A B
:= C(sI − A)−1 B + D,
C D
Then kGk2 = ∞ unless D = 0, in which case the following procedure does the job:
CHAPTER 8. H2 OPTIMAL CONTROL 99
AL + ALT + BB T = 0.
z w
G
y u
- K
We must define the concept of internal stability for this setup. Start with a minimal realization of
G:
A B
G(s) = .
C D
We shall assume that D22 = 0, that is, the transfer matrix from u to y is strictly proper. This is
a condition to guarantee existence of closed-loop transfer matrices. Thus the realization for G has
the form
A B1 B2
G(s) = C1 D11 D12 .
C2 D21 0
Now set w = 0 and write the state equations describing the controlled system:
ẋ = Ax + B2 u
y = C2 x
ẋK = AK x K + B K y
u = CK xK + DK y.
Eliminate u and y:
ẋ A + B2 DK C2 B2 CK x
= .
ẋK BK C2 AK xK
We call this latter matrix the closed-loop A-matrix. It can be checked that its eigenvalues do not
depend on the particular minimal realizations chosen for G and K. The closed-loop system is said
to be internally stable if this closed-loop A-matrix is stable, that is, all its eigenvalues have negative
real part. It can be proved that, given G, an internally stabilizing K exists iff (A, B2 ) is stabilizable
and (C2 , A) is detectable.
Let Tzw denote the system from w to z, with transfer matrix Tzw (s). The H2 -optimal control
problem is to compute an internally stabilizing controller K that minimizes kTzw k2 . The following
conditions guarantee the existence of an optimal K:
(A2) the matrices D12 and D21 have full column and row rank, respectively;
(A4) D11 = 0.
The first assumption is, as mentioned above, necessary and sufficient for existence of an internally
stabilizing controller. In (A2) full column rank of D12 means that the control signal u is fully
weighted in the output z. This is a sensible assumption, for if, say, some component of u is not
weighted, there is no a priori reason for the optimal controller not to try to make this component
unbounded. Dually, full row rank of D21 means that the exogenous signal w fully corrupts the mea-
sured signal y; it’s like assuming noise for each sensor. Again, this is sensible, because otherwise the
optimal controller may try to differentiate y, that is, the controller may be improper. Assumption
(A3) is merely technical—an optimal controller may exist without it. In words, the assumption says
there are no imaginary axis zeros in the cross systems from u to z and from w to y. Finally, (A4)
guarantees that kTzw k2 is finite for every internally stabilizing and strictly proper controller (recall
that Tzw must be strictly proper).
The problem is said to be regular if assumptions (A1) to (A4) are satisfied. Sometimes when
we formulate a problem they are not initially satisfied; for example, we may initially not explicitly
model sensor noise. Then we must modify the problem so that the assumptions are satisfied. This
process is called regularization.
CHAPTER 8. H2 OPTIMAL CONTROL 101
Under these assumptions, the MATLAB commands h2syn and h2lqg compute the optimal con-
troller. These functions are part of the Robust Control Toolbox of MATLAB. The following example
illustrates the H2 design technique.
?
fh vm vs
- j - Gm - Gs j
−6 K − 6fe
fm fs
Two robots, a master, Gm , and a slave, Gs , are controlled by one controller, K. A human provides
a force command, fh , to the master, while the environment applies a force, fe , to the slave. The
controller measures the two velocities, vm and vs , together with fe via a force sensor. In turn it
provides two force commands, fm and fs , to the master and slave. Ideally, we want motion following
(vs = vm ), a desired master compliance (vm a desired function of fh ), and force reflection (fm = fe ).
For simplicity of computation we shall take Gm and Gs to be SISO with transfer functions
1 1
Gm (s) = , Gs (s) = .
s 10s
We shall design K for two test inputs, namely, fe (t) is the finite-width pulse
10, 0 ≤ t ≤ 0.2
fe (t) = (8.1)
0, t > 0.2,
indicating an abrupt encounter between the slave and a stiff environment, and fh (t) is the triangular
pulse
2t, 0≤t≤1
fh (t) = −2t + 4, 1 ≤ t ≤ 2 (8.2)
0, t > 2,
αs fs
The Laplace transforms of fe and fh are not rational:
10 2 −s 2
1 − e−0.2s ,
Fe (s) = Fh (s) = 1 − e .
s s2
CHAPTER 8. H2 OPTIMAL CONTROL 102
To get a tractable problem, we shall use second- and third-order Padé approximations,
T s (T s)2 T s (T s)2
−T s
e ≈ 1− + 1+ +
2 12 2 12
and
T s (T s)2 (T s)3 T s (T s)2 (T s)3
−T s
e ≈ 1− + − 1+ + + .
2 10 120 2 10 120
Using the third-order approximation for Fe (s) and the second-order one for Fh (s), we get
0.2 0.23 s2 0.2s (0.2s)2 (0.2s)3
Fe (s) ≈ 20 + 1+ + +
2 120 2 10 120
=: Ge (s)
, 2
s s2
Fh (s) ≈ 2 1+ +
2 12
=: Gh (s).
Incorporating these two prefilters into the preceding block diagram leads to this:
wh- vm- v we
?
Gh - h - Gm s Gs h Ge
−6 K −6
fm fs
The two exogenous inputs wh and we are unit impulses. The vector of exogenous inputs is therefore
wh
w= .
we
This figure compares fh (t) with the impulse response of Gh (fh (t) dash and the impulse response
of Gh solid):
CHAPTER 8. H2 OPTIMAL CONTROL 103
And this figure is for fe (t) (fe (t) dash and the impulse response of Ge solid):
The error in the second plot is larger because fe (t) is not continuous. The control system is shown
in here
z w
G
y u
- K
Am 0 0 B m Ch 0 0 −Bm 0
0 As B s Ce 0 0 0 0 −Bs
0 0 Ae 0 0 Be 0 0
0 0 0 Ah Bh 0 0 0
αv Cm −αv Cs 0 0 0 0 0 0
−αc Cm 0 0 αc Ch 0 0 0 0
. (8.3)
0 0 −αf Ce 0 0 0 αf I 0
0 0 0 0 0 0 0 αs I
0 0 Ce 0 0 0 0 0
0 Cs 0 0 0 0 0 0
Cm 0 0 0 0 0 0 0
For the data at hand, D21 = 0, so (A2) fails. Evidently, the condition D21 = 0 reflects the fact
that no sensor noise was modelled, that is, perfect measurements of vm , vs , fe were assumed. Let us
add sensor noises, say of magnitude . Then w is augmented to a 5-vector and the state matrices
of G change appropriately so that the realization becomes
A 0 B1 B2
C1 0 0 D12 .
C2 I 0 0
Some trial-and-error is required to get suitable values for the weights; the following values give
reasonable responses:
The MATLAB function h2syn can be used to compute the optimal controller. The next figure
shows plots of vs (t) and vm (t) when the system is commanded by fh (t) (also shown) (vs (solid), vm
(dash), and fh (dot)):
%
% Program by Dan Davison
%
% Program summary:
%
% (1) - find state-space model of G and regularize it
% (2) - use H2SYN (in mu-tools) to find optimal K
% - controller is stored in AK,BK,CK,DK
% (3) - simulate response to two types of inputs
clear
%
% (1) - SETUP STATE-SPACE MODEL FOR G
%
numG_m = [1];
denG_m = [1 0];
[A_m,B_m,C_m,D_m] = tf2ss(numG_m,denG_m);
numG_s = [1];
denG_s = [10 0];
[A_s,B_s,C_s,D_s] = tf2ss(numG_s,denG_s);
numG_h = [2];
temp = [1/12 1/2 1];
denG_h = conv(temp,temp);
CHAPTER 8. H2 OPTIMAL CONTROL 106
[A_h,B_h,C_h,D_h] = tf2ss(numG_h,denG_h);
[n_m,m_m]=size(B_m);
[n_s,m_s]=size(B_s);
[n_e,m_e]=size(B_e);
[n_h,m_h]=size(B_h);
[p_m,n_m]=size(C_m);
[p_s,n_s]=size(C_s);
[p_e,n_e]=size(C_e);
[p_h,n_h]=size(C_h);
B2 = [ -B_m zeros(n_m,m_s)
zeros(n_s,m_m) -B_s
zeros(n_e,m_m) zeros(n_e,m_s)
zeros(n_h,m_m) zeros(n_h,m_s)];
B = [B1 B2];
% weights on, resp, v_m - v_s, f_h - v_m, f_m - f_e, f_s
w_v = 10;
w_z = 5;
w_f = 10;
w_s = .01;
weight = diag([w_v w_z w_f w_s]);
C1 = weight*C1;
C = [C1;C2];
D11 = zeros(4,5);
D12 = [0 0
0 0
1 0
0 1];
D12 = weight*D12;
D21= [epsilon 0 0 0 0
0 epsilon 0 0 0
0 0 epsilon 0 0];
D22= zeros(3,2);
% run h2syn
plant=pck(A,B,C,D);
[kk,gg,kfi,gfi,hamx,hamy]=h2syn(plant,3,2,2);
[AK,BK,CK,DK]=unpck(kk);
[nk,mk]=size(BK);
[pk,nk]=size(CK);
CK1 = CK(1,1:nk);
CK2 = CK(2,1:nk);
BK1 = BK(1:nk,1);
BK2 = BK(1:nk,2);
BK3 = BK(1:nk,3);
%
CHAPTER 8. H2 OPTIMAL CONTROL 108
DM = [0 0; 0 0; 1 0; 0 0];
Tmax = 10;
delT = .01;
T1=0:delT:Tmax;
fe = zeros(length(T1),1);
fh = zeros(length(T1),1);
for i=1:length(T1)*1/Tmax
t = (i-1)*Tmax/length(T1);
fh(i) = 2*t;
end
for i=length(T1)*1/Tmax+1:length(T1)*2/Tmax
t = (i-1)*Tmax/length(T1);
fh(i) = -2*t+4;
end
Tmax = 10;
delT = .01;
T2=0:delT:Tmax;
fe = zeros(length(T2),1);
fh = zeros(length(T2),1);
for i=1:length(T2)*.2/Tmax
t = (i-1)*Tmax/length(T2);
fe(i) = 10;
end
After that tutorial on the use of H2 optimal control, we return to the theory.
AT X + XA + M = 0
is called a Lyapunov equation. Here A, M , X are all square matrices, say n × n, with M
symmetric.
One situation is where A and M are given and the equation is to be solved for X. Existence
and uniqueness are easy to establish in principle. Define the linear map
Then the Lyapunov equation has a solution X iff M ∈ Im L; if this condition holds, the solution is
unique iff L is one-to-one, hence invertible. Let σ()˙ denote the set of eigenvalues—the spectrum—of
a matrix or linear transformation. It can be shown that
So the Lyapunov equation has a unique solution iff A has the property that no two of its eigenvalues
add to zero. For example, if A is stable, the unique solution is
Z ∞
T
X= eA t M eAt dt.
0
T
This can be proved as follows. Let P (t) = eA t M eAt . Then
Ṗ (t) = AT P (t) + P (t)A.
Integrate from t = 0 to ∞.
We’ll be more interested in another situation—where we want to infer stability of A.
Theorem 8.2 Suppose A, M , X satisfy the Lyapunov equation, (M, A) is detectable, and M and
X are positive semi-definite. Then A is stable.
Proof For a proof by contradiction, suppose A has some eigenvalue λ with Re λ ≥ 0. Let x be
a corresponding eigenvector. Pre-multiply the Lyapunov equation by x∗ , the complex-conjugate
transpose, and post-multiply by x to get
(2Re λ)x∗ Xx + x∗ M x = 0.
Both terms on the left are ≥ 0. Hence x∗ M x = 0, which implies that M x = 0 since M ≥ 0. Thus
A − λI
x = 0.
M
By detectability we must have x = 0, a contradiction.
J −1 HJ = −JHJ = −H T
Notice that X is then uniquely determined by H, i.e. H 7→ X is a function. We shall denote this
function by Ric and write X = Ric(H).
To recap, Ric is a (nonlinear) function R2n×2n → Rn×n which maps H to X where
− I
X (H) = Im .
X
The domain of Ric, denoted dom Ric, consists of Hamiltonian matrices H with two properties,
namely, H has no eigenvalues on the imaginary axis and the two subspaces
− 0
X (H), Im
I
are complementary.
Some properties of X are given below.
CHAPTER 8. H2 OPTIMAL CONTROL 112
(i) X is symmetric
AT X + XA − XP X + Q = 0
(iii) A − P X is stable.
To prove this, note that there exists a stable matrix H − in Rn×n such that
X1 X1
H = H −.
X2 X2
to get
T T
X1 X1 X1 X1
JH = J H −. (8.5)
X2 X2 X2 X2
Now JH is symmetric; hence so is the left-hand side of (8.5); hence so is the right:
T
(−X1T X2 + X2T X1 )H − = H − (−X1T X2 + X2T X1 )T
T
= −H − (−X1T X2 + X2T X1 ).
−X1T X2 + X2T X1 = 0.
Post-multiply by X1−1 :
I I
H = X1 H − X1−1 . (8.6)
X X
Now pre-multiply by X −I :
I
X −I H = 0.
X
A − P X = X1 H − X1−1 .
The following result gives verifiable conditions under which H belongs to dom Ric.
A −BR−1 B T
H=
−Q −AT
with Q ≥ 0, R > 0, (A, B) stabilizable, and (Q, A) detectable. Then H ∈ dom Ric. Let X = Ric(H)
and F = −R−1 B T X. Then X ≥ 0 and A + BF is stable. Finally, if (Q, A) is observable, then
X > 0.
Re-arrange:
Thus
and hence
z ∗ A − jωI R−1 B = 0
A − jωI
x = 0.
Q
By stabilizability and detectability it follows that x = z = 0, a contradiction.
Next, we’ll show that
− 0
X (H), Im
I
are complementary. This requires a preliminary step. As in the proof of Lemma 8.1 bring in
X1 , X2 , H − so that
− X1
X (H) = Im
X2
X1 X1
H = H −. (8.11)
X2 X2
We want to show that X1 is nonsingular, i.e. Ker X1 = 0. First, it is claimed that Ker X1 is
−
H -invariant. To prove this, let x ∈ Ker X1 . Pre-multiply (8.11) by I 0 to get
Pre-multiply by xT X2T , post-multiply by x, and use the fact that X2T X1 is symmetric (see (8.4)) to
get
H − x = λx (8.13)
Re λ < 0, 0 6= x ∈ Ker X1 .
Pre-multiply (8.11) by 0 I :
−QX1 − AT X2 = X2 H − . (8.14)
CHAPTER 8. H2 OPTIMAL CONTROL 115
(AT + λI)X2 x = 0.
x∗ X2T A + λI B = 0.
AT X + XA − XBR−1 B T X + Q = 0,
or equivalently
(A + BF )T X + X(A + BF ) + XBR−1 B T X + Q = 0.
Thus
Z ∞
T
X= e(A+BF ) t (XBR−1 B T X + Q)e(A+BF )t dt. (8.15)
0
Qe(A+BF )t x = 0, ∀t ≥ 0.
But this implies that x belongs to the unobservable subspace of (Q, A) and so x = 0.
ẋ = Ax + Bu, x(0) = x0 ,
A −BR−1 B T
H= , X = Ric(H), F = −R−1 B T X.
−Q −AT
By Theorem 8.3, X is well-defined, X ≥ 0, and A + BF is stable. The associated Riccati equation
is
AT X + XA − XBR−1 B T X + Q = 0,
Theorem 8.4 The control signal that minimizes J is u = F x, it is the unique optimal control, and
for this control signal J = xT0 Xx0 .
The proof needs a lemma. Let us denote by L2 the class of signals that are square-integrable
on the time interval [0, ∞).
Proof Assume J < ∞. Then u ∈ L2 because R > 0. Let C := Q1/2 and y := Cx. Then y ∈ L2
too. By detectability, there exists K such that A + KC is stable. A standard observer to estimate
x is
x̂˙ = Ax̂ + Bu + K(C x̂ − y)
= (A + KC)x̂ + Bu − Ky.
Since A + KC is stable, u ∈ L2 , and y ∈ L2 , so x̂(t) → 0. By observer theory, x̂(t) − x(t) → 0.
Thus x(t) → 0.
Proof of Theorem The proof is a trick using the completion of a square. Let u be an arbitrary
control input for which J is finite. We shall differentiate the quadratic form x(t)T Xx(t) along the
solution of the plant equation. To simplify notation, we suppress dependence on t. We have
d T
(x Xx) = ẋT Xx + xT X ẋ
dt
= (Ax + Bu)T Xx + xT X(Ax + Bu)
= xT (AT X + XA)x + 2uT B T Xx
= xT (XBR−1 B T X − Q)x + 2uT B T Xx from the Riccati equation
= −xT Qx + xT XBR−1 B T Xx + 2uT B T Xx
= −xT Qx + xT XBR−1 B T Xx + 2uT B T Xx + (uT Ru − uT Ru)
—this was the completion of squares trick
= −xT Qx − uT Ru + kR−1/2 B T Xx + R1/2 uk2 .
Rearranging terms we have
d T
xT Qx + uT Ru = − (x Xx) + kR−1/2 B T Xx + R1/2 uk2 .
dt
Now integrate from t = 0 to t = ∞ and use the lemma:
Z ∞
T
J = x0 Xx0 + kR−1/2 B T Xx + R1/2 uk2 dt.
0
CHAPTER 8. H2 OPTIMAL CONTROL 117
Thus J is minimum iff R−1/2 B T Xx + R1/2 u ≡ 0, i.e., u = F x. The other conclusion follows.
The LQR solution provides a very convenient way to stabilize an LTI plant. Given A, B, select
Q, R with Q ≥ 0, (Q, A) detectable, and R > 0. Then the optimal F stabilizes A + BF . This is the
preferred method over pole assignment.
The LQR solution is rarely implementable as it stands, because it requires that there be a sensor
for each state variable; that is, x must be fully sensed. We look next at the generalization of the
LQR problem to the more general case where x is not fully sensed.
Moreover,
The first term in the minimum cost, kGc B1 k22 , is associated with optimal control with state
feedback and the second, kF2 Gf k22 , with optimal filtering. These two norms can easily be computed
as follows:
u = F2 x̂.
The matrix F2 is the optimal feedback gain were x directly measured; L2 is the optimal filter gain;
x̂ is the optimal estimate of x.
The proof involves optimality in an inner-product space; so projection theory applies. Let H2
denote the (Hardy) space of transfer matrices P (s) that are stable and strictly proper. This has a
natural inner product,
Z ∞
1
< P, Q >= trace [P (jω)∗ Q(jω)] dω,
2π −∞
⊥
which is consistent with our norm definition: kP k22 =< P, P >. Likewise, let H2 denote the space
of transfer matrices that are antistable (all poles in Re s > 0) and strictly proper. Same inner
⊥
product, same norm. Then H2 and H2 are orthogonal spaces:
⊥
P ∈ H2 , Q ∈ H2 =⇒ < P, Q >= 0.
CHAPTER 8. H2 OPTIMAL CONTROL 119
⊥
The sum H2 ⊕ H2 consists of all transfer matrices that are strictly proper and have no poles on
the imaginary axis.
Finally, let us introduce the notation P ∼ (s) := P (−s)T . A stable matrix P (s) is said to be
allpass if P ∼ P = I. For example,
s−1
s+1
is an allpass function.
Proof of Theorem 8.5 Let K be any proper, stabilizing controller. Start with the system
equations
ẋ = Ax + B1 w + B2 u
z = C1 x + D12 u
and define a new control variable, v := u − F2 x. The equations become
ẋ = AF2 x + B1 w + B2 v
z = C1F2 x + D12 v
or in the frequency-domain
Z = Gc B1 W + U V.
Tzw = Gc B1 + U Tvw
⊥
You will prove in Problem 5 the following fact: U is allpass and U ∼ Gc belongs to H2 . This implies
that Gc B1 and U Tvw are orthogonal matrices in H2 (Tvw belongs to H2 by internal stability). So
from the previous equation
- K
CHAPTER 8. H2 OPTIMAL CONTROL 120
Note that K stabilizes G iff K stabilizes the above system (the two closed-loop systems have identical
A-matrices). So
and therefore the theorem will be proved once we show the following: For the setup in the previous
block diagram, the unique optimal controller is
A + B2 F2 + L2 C2 −L2
F2 0
and the minimum value of kTvw k2 equals kF2 Gf k2 . Notice in this setup that A + B2 F2 is stable.
By the assignment C1 ← −F2 , the previous statement becomes this: For
A B1 B2
G(s) = C1 0 I
C2 D21 0
ẋ = Ax + B1 w + B2 u
z = C1 x + D12 u
y = C2 x + w
x̂˙ = (A + B2 F2 − B1 C2 )x̂ + B1 y
u = F2 x̂,
so
ė = (A − B1 C2 )e.
It’s now easy to infer internal stability from stability of A + B2 F2 and A − B1 C2 . For zero initial
conditions on x, x̂, we have e(t) ≡ 0. Hence
u = F2 x̂ = F2 x. (8.16)
kTzw k2 ≥ kGc B1 k2 .
But for the present controller, (8.16) implies that v ≡ 0, i.e., Tvw = 0. Thus the present controller
is optimal and the minimum cost is kGc B1 k2 . Finally, for uniqueness it can be shown (an exercise)
that the unique solution of Tvw = 0 is the controller above.
8.7 Problems
1. Take G(s) = 1/(s + 1) and compute the H2 -norm kGk2 by the three methods: time-domain,
state-space, residue theorem
3. Consider
ẋ = Ax + Bu, x(0) = x0
with A stable. True or false: For every u in L2 [0, ∞), x(t) tends to 0 as t tends to ∞.
4. Suppose u and y are scalar-valued signals and the transfer function from u to y is 1/s2 . For
the standard canonical realization (A, B, C) consider the optimization problem
Z ∞
min ρy(t)2 + u(t)2 dt,
u=F x 0
7. You know that right half-plane zeros place definite performance limitations on the control of
a system. This exercise illustrates this fact in the present context.
Consider the system
ẋ = Ax + Bu, x(0) = x0
CHAPTER 8. H2 OPTIMAL CONTROL 122
z = Cx.
Then
If A is stable, we might like to see how small we can make kZk2 by suitable choice of stable
U (s). In particular, we might like to know if kZk2 can be made arbitrarily small.
Let
s−1
C(sI − A)−1 B = .
(s + 2)(s + 3)
Compute
8. This problem concerns optimization in the space R2 with respect to three norms:
fh v
- j - P (s) -
−6
K(s)
Both P (s) and K(s) are SISO transfer functions. The plant is P (s) = 1/s and the human
force input fh is as follows:
CHAPTER 8. H2 OPTIMAL CONTROL 123
fh
6
2 −
A
A
1 −
A
A
A
A -
1 2 3 t
The output is a velocity v. It is desired to design a proper transfer function K(s) to achieve
internal stability of the feedback system and minimize the compliance error kfh − vk2 . Set
this up as a problem in H2 optimal control.
10. This problem relates to the LQR problem and whether or not J < ∞ implies u ∈ L2 . Show
that it is true if R is positive definite. Hint: You have to show that if R1/2 u is in L2 , then u
is too.
11. Consider
−1 1
A= .
1 −1
Regard A as the linear map R2 −→ R2 defined by x 7→ Ax. Let V be one of the two 1-
dimensional invariant subspaces and let V be the linear map V −→ R2 given by x 7→ x. Find
the linear map A1 : V −→ V that satisfies the equation V A1 = AV . This map is called the
restriction of A to V.
12. Take the LQR problem with A = B = Q = R = 1. Form the Hamiltonian matrix H given in
Theorem 6.2. Find its invariant subspaces. Are there any of the form
I
Im ?
P
(We saw that the LQR problem reduces to looking for an invariant subspace of this form.)
Let λ1 , λ2 denote the eigenvalues of A + BF for the optimal F . Of course, the two eigenvalues
have to be in the left half-plane and be complex conjugates if not real. Are they otherwise
freely assignable by choice of q1 , q2 , r?
Chapter 9
H∞ Optimal Control
The symbol H∞ stands for the space of all stable, proper transfer functions, such as
1 2s − 1 s
, 2 ,
s + 1 s + 5s + 2 s + 1
but not
1
.
s
There’s a natural norm, namely,
but no inner product. Thus H∞ is a Banach space. The space extends to matrices, as we’ll see.
In this chapter we study optimal control design in this space. This chapter begins with a tutorial
overview, followed by some of the underlying theory.
9.1 Overview
Let R be a complex p × m matrix. The singular values of R are defined as the square roots of
the eigenvalues of R∗ R. The maximum singular value of R, denoted σmax (R), has the properties
required of a norm and is our second definition for kRk.
Example
The singular values of
2+j j
R=
1 − j 3 − 2j
equal 4.2505, 1.7128. These are computed via the function svd in MATLAB. Thus kRk = 4.2505.
The importance of this second definition of matrix norm is derived from the following fact. Let
u ∈ Cm and let y = Ru, so y ∈ Cp . The fact is that
σmax (R) = max{kyk : kuk = 1}.
124
CHAPTER 9. H∞ OPTIMAL CONTROL 125
This has the interpretation that if we think of R as a system with input u and output y, then
σmax (R) equals the system’s gain, that is, maximum output norm over all inputs of unit norm.
Now we can define the H∞ norm of a stable p × m transfer matrix G(s):
So here we used the second-definition norm of G(jω). If G(s) is scalar-valued, its norm equals the
peak magnitude on the Bode plot.
Concerning this definition is an important input-output fact. Let G be a stable, causal, LTI
system with input u of dimension m and output y of dimension p. The norm H∞ -norm of the
transfer matrix G is related to the maximum L2 -norm of the output over all inputs of unit norm.
Thus the major distinction between kGk2 and kGk∞ is that the former is an average system gain
for known inputs, while the latter is a worst-case system gain for unknown inputs.
It is useful to be able to compute kGk∞ by state-space methods. Let
A B
G(s) = ,
C D
with A stable, that is, all eigenvalues with negative real part. The computation of kGk∞ using
state-space methods involves the Hamiltonian matrix
where γ is a positive number. The matrices γ 2 − DDT , γ 2 − DT D are invertible provided they
are positive definite, equivalently, γ 2 is greater than the largest eigenvalue of DDT (or DT D),
equivalently, γ > σmax (D).
Theorem 9.2 Let γmax denote the maximum γ such that H has an eigenvalue on the imaginary
axis. Then kGk∞ = max{σmax (D), γmax }.
The theorem suggests the following procedure: Plot, versus γ, the distance from the imaginary
axis to the nearest eigenvalue of H; then γmax equals the maximum γ for which the distance equals
zero; then kGk∞ = max{σmax (D), γmax }. A more efficient procedure is to compute γmax by a
bisection search.
The H∞ -optimal control problem is to compute an internally stabilizing controller K that mini-
mizes kTzw k∞ for the standard setup. This problem is much harder than the H2 problem. Instead of
seeking a controller that actually minimizes kTzw k∞ , a simpler problem is to search for a controller
that gives kTzw k∞ < γ, where γ is a pre-specified parameter. If γ is too small, a controller will not
exist, so we need a test for existence. With this, the following procedure leads to a controller that
is close to optimal:
1. Start with a large enough γ so that a controller exists.
2. Test existence for smaller and smaller values of γ until eventually γ is close to the minimum
γ for existence.
CHAPTER 9. H∞ OPTIMAL CONTROL 126
Example
The next figure shows a single-loop analog feedback system.
z
6
1 w2 z
6
2
?
W 2 1
6 6
w1 e ?y u
- i - F - i - K - P -
−6
The plant is P and the controller K; F is an antialiasing filter for future digital implementation of
the controller (it is a good idea to include F at the start of the analog design so that there are no
surprises later due to additional phase lag). The basic control specification is to get good tracking
over a certain frequency range, say [0, ω1 ]; that is, to make the magnitude of the transfer function
from w1 to e small over this frequency range. The weighted tracking error is z1 in the figure, where
the weight W is selected to be a lowpass filter with bandwidth ω1 . We could attempt to minimize
the H∞ -norm from w1 to z1 , but this problem is not regular. To regularize it, another input, w2 ,
is added and another signal, z2 , is penalized. The two weights 1 and 2 are small positive scalars.
The design problem is to minimize the H∞ -norm
w1 z1
from w = to z = .
w2 z2
The preceding figure can then be converted to the standard block diagram by stacking the states
of P , F , and W to form the state of G.
The plant transfer function is taken to be
20 − s
P (s) = .
(s + 0.01)(20 + s)
This can be regarded as an approximation of the time-delay system 1s e−40s , an integrator cascaded
with a time delay of 40 time units. With a view toward subsequent digital control with sampling
period h = 0.5, the filter F is taken to have bandwidth π/0.5, the Nyquist frequency ωN :
1
F (s) = .
(0.5/π)s + 1
The weight W is then taken to have bandwidth one-fifth the Nyquist frequency:
2
1
W (s) = .
(2.5/π)s + 1
CHAPTER 9. H∞ OPTIMAL CONTROL 127
10 1
10 0
10 -1
10 -2
10 -3
10 -4
10 -3 10 -2 10 -1 10 0 10 1 10 2
The solid curve is the Bode magnitude plot of the sensitivity function, that is, the transfer function
from w1 to e, namely, 1/(1 + P KF ). Also shown are the magnitude plots for W (dash) and F (dot).
Evidently, the design has achieved some tracking error attenuation over the bandwidth of W . A
greater degree of attenuation could be achieved by tuning the weights W , 1 , and 2 .
% input data
clear
% parameters
h=0.021;
z=20;
[AP,BP,CP,DP]=tf2ss([-1 z],conv([1 .01],[1 z]));
[AF,BF,CF,DF]=tf2ss(1,[0.5/pi 1]);
numW=1;
CHAPTER 9. H∞ OPTIMAL CONTROL 128
eps1=0.01;
eps2=0.01;
% build G
% design
[nK,mK]=size(BK);
AS=[AP BP*CK zeros(nP,nF);zeros(nK,nP) AK BK*CF;-BF*CP zeros(nF,nK) AF];
BS=[0*BP;0*BK;BF];
CS=[-CP 0*CK 0*CF];
DS=1;
% discretize K
[AK,BK]=c2d(AK,BK,h);
% stability check
Btmp=[0*BP;BF];
Ctmp=[0*CP CF];
[Atmp,Btmp]=c2d(Atmp,Btmp,h);
Abar=[Atmp Btmp*CK;-BK*Ctmp AK];
max(abs(eig(Abar)));
% analysis
w=logspace(-3,2,200);
j=sqrt(-1);
p=freqrc(AP,BP,CP,DP,w);
f=freqrc(AF,BF,CF,DF,w);
k=dfreqrc(AK,BK,CK,DK,w,h);
r=(1-exp(-j*h*w))./(j*h*w);
tmp=ones(1,length(w))./(1+p.*r.*k.*f);
magS2=abs(tmp);
[magS1,ph]=bode(AS,BS,CS,DS,1,w);
[magW,ph]=bode(AW,BW,CW,DW,1,w);
[magF,ph]=bode(AF,BF,CF,DF,1,w);
loglog(w,magS1,w,magS2)
%loglog(w,magS1,w,magW,w,magF)
We want to design a controller C(s) so that the feedback loop is stable and also has some measure
of stability robustness. A good way to do this is to require the Nyquist plot of P C to stay outside
the circle centred at −1 and radius, say, 0.2. Let S denote the transfer function from r to e (known
as the sensitivity function). It turns out that kSk−1
∞ equals the distance in the complex plane from
the critical point −1 to the closest point on the Nyquist plot of P C. (Reference: ECE356 course
notes.) Thus stability robustness is equivalent to the inequality
|S(jω)| ≤ 5, ∀ω.
Suppose that, in addition to stability robustness, we want to design the controller C(s) so that
the system tracks signals r(t) up to, say, 1 rad/s. Thus we want, say,
|S(jω)| ≤ 0.1, ∀ω ≤ 1.
This allows at most 10 percent tracking error for sinusoidal reference signals. Therefore we want
the magnitude Bode plot of S to lie under the dashed line:
5
1
1
|S(jω)|
0.1
To handle these two specs together it is convenient to construct a weighting function W (s) such
that |W (jω)| ≈ 10 over the frequency range [0, 1] and |W (jω)| ≈ 0.2 over the frequency range
[1, ∞). Then the two specs become one: kW Sk∞ ≤ 1. For computational reasons, we want W (s) to
be rational. So its magnitude can’t be discontinuous, and we need some transition from magnitude
10 to magnitude 0.2. To keep things simple, let’s try the weighting function
αs + 1
W (s) = γ .
βs + 1
To recap, we have arrived at the problem of designing a controller C(s) that stabilizes P (s) and
achieves the inequality kW Sk∞ ≤ 1, where
0.574s + 1
W (s) = 10 .
28.7s + 1
The problem may not be solvable. That is, there may not be any stabilizing controller such that
kW Sk∞ ≤ 1. If so, we have to compromise somehow; relax either the tracking error or the stability
margin or both. This is a great feature of this way of doing control design: We can sensibly make
tradeoffs.
The obvious problem at hand is to minimize kW Sk∞ over all C(s) that stabilize P (s). If this
minimum is less than or equal to 1, our specifications are feasible. However,
W
WS =
1 + PC
is a nonlinear function of C(s), and moreover C(s) is constrained to stabilize. We need to change
the optimization parameter. Notice that the P (s) in our example is strictly proper and belongs to
H∞ .
Lemma 9.1 A proper rational controller C(s) stabilizes a strictly proper P ∈ H∞ iff it has the
form
Q
C= , Q ∈ H∞ .
1 − PQ
Proof (Necessity) Suppose C stabilizes. Let Q equal the transfer function from r to u:
C
Q= .
1 + PC
Solve for C to get C = Q/(1 − P Q).
(Sufficiency) Suppose C is given by the formula in the lemma. Then all closed-loop transfer
functions belong to H∞ . For example, the transfer function from r to y equals
PC
= P Q.
1 + PC
Also, the sensitivity function S equals 1−P Q. And so on for all other closed-loop transfer functions.
The lemma changes the problem of minimizing kW Sk∞ over C to the problem
min kW (1 − P Q)k∞ .
Q∈H∞
Let L∞ (jR) denote the space of proper transfer functions that have no poles on the imaginary axis.
Then F := W P1−1 belongs to L∞ (jR), while W P2 Q belongs to H∞ . Thus the minimum of kW Sk∞
over all stabilizing controllers seems to be very close to the distance from F to H∞ . The gap arises
from the fact that the set
{W P2 Q : Q ∈ H∞ }
is a proper subset of H∞ because W P2 is strictly proper. That is, if X is the function in H∞ that
is closest to F and if we get Q from W P2 Q = X, then Q may not be proper. This can be rectified
by a high-frequency correction.
That is, we want to find an X in H∞ that is closest to R in the infinity norm. There are two
ways to view this problem: 1) R is unstable and X has to be stable, while both are causal; 2) R is
noncausal and X has to be causal, while both are stable.
The second way turns out to be more useful. To view the problem in this way, we have to
suppose R and X are two-sided Laplace transforms, e.g.,
Z ∞
R(s) = r(t)e−st dt.
−∞
The region of convergence is taken to include the imaginary axis, so that the underlying system is
stable. Thus for R(s) the ROC must be Re s < 1. Therefore the time-domain equation giving rise
to R(s) must be
Z ∞
y(t) = r(t − τ )u(τ )dτ.
−∞
Λr : u 7→ r ∗ u, L2 (R) −→ L2 (R),
called the Laurent operator derived from r, is equivalent to the frequency-domain operator
in the sense that they have equal induced norms, since the Fourier transform is norm-preserving,
by Theorem ??. The norm of the latter operator equals kRk∞ = 1.
CHAPTER 9. H∞ OPTIMAL CONTROL 133
Likewise for X: For X(s) the ROC must include the imaginary axis. The time-domain equation
giving rise to X(s) must be
Z ∞
y(t) = x(t − τ )u(τ )dτ.
−∞
Λx : u 7→ x ∗ u, L2 (R) −→ L2 (R)
U 7→ XU : L2 (jR) −→ L2 (jR).
This operator, L2 [0, ∞) −→ L2 (−∞, 0], is called the Hankel operator derived from r, denoted
Γr . It maps the future into the past. On the other hand, since the system with transfer function X
is causal, its Hankel operator Γx equals 0.
Notice that a Hankel operator is a piece of a Laurent operator. Thus kΛr k ≥ kΓr k. Notice also
that
kR − Xk∞ = kΛr − Λx k.
Our original problem was to minimize kR − Xk∞ . We’ve seen that a lower bounded for this norm
is kΓr k. However, Nehari’s theorem says the lower bound is tight:
Theorem 9.3 The distance from R in L∞ (jR) to H∞ equals kΓr k. Moreover, the distance is
achieved (there is an optimal X).
So it remains to compute the norm kΓr k and then to compute the optimal X.
where A is antistable (all eigenvalues in Re s > 0). Suppose A is n × n. Such R belongs to L∞ (jR).
The inverse two-sided Laplace transform of R(s) is
−CeAt B, t < 0
(
r(t) =
0, t≥0
The Hankel operator Γr maps a function u in L2 [0, ∞) to the function y in L2 (−∞, 0] defined
by
Z ∞
y(t) = r(t − τ )u(τ )dτ, t < 0,
0
that is,
Z ∞
y(t) = −CeAt e−Aτ Bu(τ )dτ, t < 0.
0
Then
Γr = Ψo Ψc .
Γr
L2 [0, ∞) L2 (−∞, 0]
Hankel
Ψc Ψo
Cn
Since kΓr k = kΨo Ψc k, it remains to compute the latter norm.
CHAPTER 9. H∞ OPTIMAL CONTROL 135
The self-adjoint operators Ψc Ψ∗c and Ψ∗o Ψo map Cn to itself. Thus they have matrix repre-
sentations with respect to the standard basis on Cn . Define the controllability and observability
gramians
Z ∞
T
Lc := e−At BB T e−A t dt (9.1)
0
Z ∞
T
Lo := e−A t C T Ce−At dt (9.2)
0
It is routine to show that Lc and Lo are the unique solutions of the Lyapunov equations
ALc + Lc AT = BB T (9.3)
AT Lo + Lo A = C T C (9.4)
We state without proof this fact: The norm of Ψo Ψc equals the square root of the norm of
(Ψo Ψc )∗ (Ψo Ψc ), and this in turn equals the largest eigenvalue of the matrix Lc Lo .
Example Let’s complete the example from the first section. We have
0.574s + 1 1 + 0.5s 0.736
F (s) = 10 = + (a function in H∞ ),
28.7s + 1 1 − 0.5s 1 − 0.5s
and so
where
1.47
R(s) = − .
s−2
A state model for R(s) is
A = 2, B = 1, C = −1.47.
Lc = 1/4, Lo = 0.541.
Thus
p
kΓr k = Lc Lo = 0.368.
Thus the distance from F to H∞ equals 0.368. Our design specs are therefore easily feasible.
We omit the construction of X and a controller that meets the specs. MATLAB has tools to
design controllers based on the approach in this chapter.
CHAPTER 9. H∞ OPTIMAL CONTROL 136
9.5 Problems
1. Let U (s) = s/(s + 1). Suppose G ∈ H∞ and we want to approximate it by U V for some
V ∈ H∞ , that is, we want to minimize kG − U V k∞ . In general we can’t take V = G/U
because G/U has a pole at s = 0 unless it’s cancelled by a zero of G, so G/U is not in H∞ .
Thus in general the error norm kG − U V k∞ can be made only arbitrarily small, and not zero,
by suitable choice of V .
(a) Write in proper logic notation (using ∀ and ∃ where appropriate) the mathematical
statement of this: “Let G belong to H∞ . Then the norm kG − U V k∞ can be made
arbitrarily small by suitable choice of V in H∞ .” In this statement U is given and fixed
and should not be quantified.
(b) Write in proper logic notation the negation of your logic statement in part (a).
(c) Convert the preceding logic statement into a natural sounding sentence or sentences in
words.
(d) Write in proper logic notation the mathematical statement of this: “In general, kG −
U V k∞ cannot be made equal to zero.”
2. In the scalar-valued case prove that RL2 equals the set of all real-rational functions that are
strictly proper and have no poles on the imaginary axis.
3. Show that Ψc is surjective if (A, B) is controllable and that Ψo is injective if (C, A) is observ-
able.
Ψ∗c : Cn → L2 [0, ∞)
T
(Ψ∗c x)(t) = −B T e−A t x, t≥0
Ψ∗o : L (−∞, 0] → C
2 n
Z 0
T
Ψ∗o y = eA t C T y(t)dt.
−∞
5. Prove that the matrix representations of Ψc Ψ∗c and Ψ∗o Ψo are Lc and Lo respectively.
Epilogue
So where do we stand now in 2010? Let’s review and try to draw some conclusions.
1. The three classical topics, calculus of variations, the maximum principle, dynamic program-
ming, were included for historical interest.
The brachistochrone problem is beautiful, isn’t it? Find a curve that optimizes a scalar
quantity, the time to slide down.
The maximum principle is very general, and can include many kinds of constraints. However,
for many problems I don’t think a practical solution has been provided by the necessary
condition. Try some problem harder than time-optimal control of the double integrator. For
example, try a cart-pendulum system with the problem of swinging up the pendulum in
minimum time. The state space is R4 and so the switching set is a 3D hypersurface. It’s hard
to compute this hypersurface, and then how are you going to implement the controller?
Dynamic programming is indeed very powerful and the HJB equations are very important
and have played, and continue to play, an important role in optimal control.
2. I love the function space method. The reason is that it seems perfectly suited to systems
theory. A system is a function that maps an input to an output, that is, a system is a
mapping from one set to another. So right from the get-go one is into block diagrams, spaces
of signals, and operators. This is, in my view, the cleanest and clearest way to formulate a
problem. The subsystems may have differential equation models, but those are just special
ways of modeling maps.
137