Академический Документы
Профессиональный Документы
Культура Документы
1 of 11
http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...
UR
Russ Tedrake
Russ Tedrake, 2014
How to cite these notes
Note: These are working notes that will be updated throughout the Fall 2014 semester.
C12
Trajectory Optimization
I've argued that optimal control is a powerful framework for specifying complex behaviors
with simple objective functions, letting the dynamics and constraints on the system shape the
resulting feedback controller (and vice versa!). But the computational tools that we've
provided so far have been limited in some important ways. The numerical approaches to
dynamic programming which involve putting a mesh over the state space do not scale well to
systems with state dimension more than four or five. Linearization around a nominal
operating point (or trajectory) allowed us to solve for locally optimal control policies (e.g.
using LQR) for even very high-dimensional systems, but the effectiveness of the resulting
controllers is limited to the region of state space where the linearization is a good
approximation of the nonlinear dynamics. The computational tools for Lyapunov analysis
from the last chapter can provide, among other things, an effective way to compute estimates
of those regions. But we have not yet provided any real computational tools for approximate
optimal control that work for high-dimensional systems beyond the linearization around a
goal. That is precisely the goal for this chapter.
The big change that is going to allow us to scale to high-dimensional systems is that we are
going to give up the goal of solving for the optimal feedback controller for the entire state
space, and instead attempt to find an optimal control solution that is valid from only a single
initial condition. Instead of representing this as a feedback control function, we can represent
this solution as a trajectory,
, typically defined over a finite interval. In our graphsearch dynamic programming algorithms, we discretized the dynamics of the system on a
mesh spread across the state space. This does not scale to high-dimensional systems, and it
was difficult to bound the errors due to the discretization. If we instead restrict ourselves to
optimizing only a single initial condition, then a different discretization scheme emerges: we
can discretize the state and input trajectories over time.
12.1 PF
11/14/2014 9:26 PM
Underactuated Robotics
2 of 11
http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...
Some trajectory optimization problems may also include additional constraints, such as
collision avoidance ( can not cause the robot to be inside an obstacle) or input limits (e.g.
), which can be defined for all time or some subset of the trajectory.
As written, the optimization above is an optimization over continuous trajectories. In order to
formulate this as a numerical optimization, we must parameterize it with a finite set of
numbers. Perhaps not surprisingly, there are many different ways to write down this
parameterization, with a variety of different properties in terms of speed, robustness, and
accuracy of the results. We will outline just a few of the most popular below. I would
recommend [78] for additional details.
12.2 C
TN
O
Before we dive in, we need to take a moment to understand the optimization tools that we
will be using. In the graph-search dynamic programming algorithm, we magically were able
to provide an iterative algorithm that was known to converge to optimal cost-to-go function.
With LQR we were able to reduce the problem to a matrix Riccati equation, for which we
have scalable numerical methods to solve. In the Lyapunov analysis chapter, we were able to
formulate a very specific kind of optimization problem--a semi-definite program (or
SDP)--which is a subset of convex optimization, and relied on custom solvers like SeDuMi
or Mosek to solve the problems for us. Convex optimization is a hugely important subset of
nonlinear optimization, in which we can guarantee that the optimization has no "local
minima". In this chapter we won't be so lucky, the optimizations that we formulate may have
local minima and the solution techniques will at best only guarantee that they give a locally
optimal solution.
The generic formulation of a nonlinear optimization problem is
11/14/2014 9:26 PM
Underactuated Robotics
3 of 11
http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...
12.3 T
11/14/2014 9:26 PM
Underactuated Robotics
4 of 11
http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...
Then perhaps the simplest mapping of the trajectory optimization problem onto a nonlinear
program is to fix the break points at even intervals, , and use Euler integration to write
Nothing compares to running it yourself, and poking around in the code. But you can also
click here to watch the result. I hope that you recognize the parabolic trajectory from the
initial condition up to the switching surface, and then the second parabolic trajectory down
to the origin. You should also notice that the transition between
and
is
imperfect, due to the discretization error. As an exercise, try increasing the number of knot
points (the variable N in the code) to see if you can get a sharper response, like this.
If you take a moment to think about what the direct transcription algorithm is doing, you will
see that by satisfying the dynamic constraints, the optimization is effectively solving the
(Euler approximation of the) differential equation. But instead of marching forward through
time, it is minimizing the inconsistency at each of the knot points simultaneously. While it's
easy enough to generalize the constraints to use higher-order integration schemes, paying
attention to the trade-off between the number of times the constraint must be evaluated vs the
density of the knot points in time, if the goal is to obtain smooth trajectory solutions then
another approach quickly emerges: the approach taken by the so-called collocation methods.
12.3.2 Direct Collocation
In direct collocation (c.f., [80]), both the input trajectory and the state trajectory are
represented explicitly as piecewise polynomial functions. In particular, the sweet spot for this
algorithm is taking
to be a first-order polynomial -- allowing it to be completely defined
11/14/2014 9:26 PM
Underactuated Robotics
5 of 11
http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...
Figure 12.2 - Cubic spline parameters used in the direct collocation method.
Once again, direct collocation effectively integrates the equations of motion by satisfying the
constraints of the optimization -- this time producing a third-order approximation of the
dynamics with effectively two evaluations of the plant dynamics per segment. [81] claims,
without proof, that as the break points are brought closer together, the trajectory will
converge to a true solution of the differential equation. Once again it is very natural to add
additional terms to the cost function or additional input/state constraints, and very easy to
calculate the gradients of the objective and constraints. I personally find it very nice to
explicitly account for the parametric encoding of the trajectory in the solution technique.
E12.2 (Direct Collocation for the Double Integrator)
Direct collocation also easily solves the minimum-time problem for the double integrator.
The performance is similar to direct transcription, but the convergence is visibly different.
Try it for yourself:
% note: requires Drake ver >= 0.9.7
cd(fullfile(getDrakePath,'examples'));
DoubleIntegrator.runDircol;
% make sure you take a look at the code!
edit('DoubleIntegrator.runDircol')
11/14/2014 9:26 PM
Underactuated Robotics
6 of 11
http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...
variables to
only--we can compute
ourselves by knowing
. This is exactly the approach taken in the shooting methods.
, and
Computing gradients
Again, providing gradients of the objectives and constraints to the solver is not strictly
required -- most solvers will obtain them from finite differences if they are not provided -but I feel strongly that the solvers are faster and more robust when exact gradients are
provided. Now that we have removed the decision variables from the program, we have to
take a little extra care to compute the gradients. This is still easily accomplished using the
chain rule. To be concise (and slightly more general), let us define
as the discrete-time approximation of the continuous dynamics;
for example, the forward Euler integration scheme used above would give
Then we have
where the gradient of the state with respect to the inputs can be computed during the
"forward simulation",
These simulation gradients can also be used in the chain rule to provide the gradients of any
constraints. Note that there are a lot of terms to keep around here, on the order of (state dim)
(control dim) (number of timesteps). Ouch. Note also that many of these terms are zero;
for instance with the Euler integration scheme above
if
By solving for
ourselves, we've removed a large number of constraints from the
optimization. If no additional state constraints are present, and the only gradients we need to
compute are the gradients of the objective, then a surprisingly efficient algorithm emerges.
I'll give the steps here without derivation, but will derive it in the Pontryagin section below:
1. Simulate forward:
from
2. Calculate backwards:
from
3. Extract the gradients:
11/14/2014 9:26 PM
Underactuated Robotics
7 of 11
http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...
with
Here
.
is a vector the same size as
The equation governing is known as the adjoint equation, and it represents a dramatic
efficient improvement over calculating the huge number of simulation gradients described
above. In case you are interested, the adjoint equation is known as the backpropagation
algorithm in the neural networks literature and it is one of the primary reasons that training
neural networks became so popular in the 1980's; super fast GPU implementations of this
algorithm are also one of the reasons that deep learning is incredibly popular right now (the
availability of massive training databases is perhaps the other main reason).
To take advantage of this efficiency, advocates of the shooting methods often use penalty
methods instead of enforcing hard state constraints; instead of telling the solver about the
constraint explicitly, you simply add an additional term to the cost function which gives a
large penalty commesurate with the amount by which the constraint is violated. These are not
quite as accurate and can be harder to tune (you'd like the cost to be high compared to other
costs, but making it too high can lead to numerical conditioning issues), but they can work.
12.3.4 Discussion
Although the decision about which algorithm is best may depend on the situation, in our
work we have come to favor the direct collocation method (and occasionally direct
transcription) for most of our work. There are a number of arguments for and against each
approach; I will try to discuss a few of them here.
Solver performance
to avoid local minima. direct transcription and collocation can take an initial guess in , too.
Implicit dynamics
Another potential advantage of the direct transcription and collocation methods is that the
dynamics constraints can be written in implicit form.
Variations in the problem formulation
There are number of useful variations to the problem formulations I've presented above. By
far the most common is the addition of a terminal cost, e.g.:
These terms are easily added to the cost function in the any of methods, and the adjoint
equations of the shooting method simply require the a modified terminal condition
11/14/2014 9:26 PM
Underactuated Robotics
8 of 11
http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...
One potential complaint about the direct transcription and collocation algorithms is that we
tend to use simplistic numerical integration methods and often fix the integration timestep
(e.g. by choosing Euler integration and selecting a ); it is difficult to bound the resulting
integration errors in the solution. One tantalizing possibility in the shooting methods is that
the forward integration could be accomplished by more sophisticated methods, like
variable-step integration. But I must say that I have not had much success with this approach,
because while the numerical accuracy of any one function evaluation might be improved,
these integration schemes do not necessarily give smooth outputs as you make incremental
changes to the initial conditions and control (changing
by could result in taking a
different number of steps in the integration scheme). This lack of smoothness can wreak
havoc on the convergence of the optimization. If numerical accuracy is a premium, then I
think you will have more success by imposing consistency constraints (e.g. as in the
Runge-Kutta 4th order simulation with 5th order error checking method) as addition
constraints on the time-steps; shooting methods do not have any particular advantage here.
12.4 P
'M
The tools that we've been developing for numerical trajectory optimization are closely tied to
theorems from (analytical) optimal control. Let us take one section to appreciate those
connections.
What precisely does it mean for a trajectory,
, to be locally optimal? It means that if
I were to perturb that trajectory in any way (e.g. change
by ), then I would either incur
higher cost in my objective function or violate a constraint. For an unconstrained
optimization, a necessary condition for local optimality is that the gradient of the objective at
the solution be exactly zero. Of course the gradient can also vanish at local maxima or saddle
points, but it certainly must vanish at local minima. We can generalize this argument to
constrained optimization using Lagrange multipliers.
12.4.1 Constrained optimization with Lagrange multipliers
Given the equality-constrained optimization problem
11/14/2014 9:26 PM
Underactuated Robotics
9 of 11
http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...
Note that
be satisfied.
Given the two solutions which satisfy the necessary conditions, the negative solution is
clearly the minimizer of the objective.
11/14/2014 9:26 PM
Underactuated Robotics
10 of 11
http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...
and set the derivatives to zero to obtain the adjoint equation method described for the
shooting algorithm above:
In fact the statement can be generalized even beyond this to the case where has constraints.
The result is known as Pontryagin's minimum principle -- giving necessary conditions for a
trajectory to be optimal.
T12.1 - Pontragin's Minimum Principle
Adapted from [82]. Given the initial conditions,
, a continuous dynamics,
11/14/2014 9:26 PM
Underactuated Robotics
11 of 11
over
http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...
defined
Note that the terms which are minimized in the final line of the theorem are commonly
referred to as the Hamiltonian of the optimal control problem,
It is distinct from, but inspired by, the Hamiltonian of classical mechanics. Remembering that
has an interpretation as
12.5 T
Convex
12.6 LT
FD
Once we have obtained a locally optimal trajectory from trajectory optimization, there is still
work to do...
12.6.1 Model-predictive control
12.6.2 Time-varying LQR
Take
the LQR chapter).
, and
12.7 I
LQR D
D
P
12.8 CS: A
Next chapter
Underactuated Robotics
11/14/2014 9:26 PM