Академический Документы
Профессиональный Документы
Культура Документы
Optimistic Search [Thayer and Ruml, 2008] takes ad- bestdb = argmin b
d(n)
vantage of the fact that weighted A∗ tends to return solutions n∈open∧fb(n)≤w·fb(bestfb)
much better than the bound suggests. It does so by running Note that bestdb is chosen from a focal list based on bestfb
weighted A∗ with a bound wopt that is twice as suboptimal as
the desired bound, wopt = (w − 1) · 2 + 1, and then expand- because fb(bestfb) is our current estimate of the cost of an
ing some additional nodes after a solution is found to ensure optimal solution, thus nodes with fb(n) ≤ w · fb(bestfb) are
that the desired bound was met. Specifically, it repeatedly ex- those nodes that we suspect lead to solutions within the de-
pands bestf , the node with the smallest f (n) = g(n) + h(n) sired suboptimality bound. At every expansion, EES chooses
of all nodes on the open list, until w · f (bestf ) ≥ g(sol), from among these three nodes using the function:
where sol is the solution returned by the weighted A∗ search. selectN ode
Because f (bestf ) is a lower bound on the cost of an optimal
1. if fb(bestdb) ≤ w · f (bestf ) then bestdb
solution, this proves that the incumbent is within the desired
suboptimality bound. Because the underlying search order 2. else if fb(bestfb) ≤ w · f (bestf ) then bestfb
ignores the length of solutions, so does optimistic search. 3. else bestf
Skeptical Search [Thayer and Ruml, 2011] replaces the We first consider bestdb, as pursuing nearer goals should lead
aggressive weighting of optimistic search with a weighted to a goal fastest, satisfying the second objective of bounded
search on an inadmissible heuristic, as in fb′ (n) = g(n) + suboptimal search. bestdb is selected if its expected solution
cost can be shown to be within the suboptimality bound. If
w·b h(n), where b h is some potentially inadmissible estimate bestdb is unsuitable, bestfb is examined. We suspect that this
of the cost-to-go from n to a goal. Then, as with optimistic
node lies along a path to an optimal solution. Expanding this
search, bestf is repeatedly expanded to prove that the incum-
node can also expand the set of candidates for bestdb by rais-
bent solution is within the desired bound. Because skeptical
ing the estimated cost of bestfb. We only expand bestfb if it
search is only relying on cost-to-go estimates for guidance, it
cannot prefer shorter solutions within the bound. is estimated to lead to a solution within the bound. If neither
bestfb nor bestdb are thought to be within the bound, we re-
A∗ǫ [Pearl and Kim, 1982] uses a distance-to-go estimate
to determine search order. A∗ǫ sorts its open list on f (n). turn bestf . Expanding it can raise our lower bound f (bestf ),
It maintains the subset of all open nodes that it believes can allowing us to consider bestdb or bestfb in the next iteration.
lead to w-admissible solutions, called the focal list. These Theorem 1 If b h(n) ≥ h(n) and g(opt) represents the cost
are the nodes where f (n) ≤ w · f (bestf ). From focal, it se- of an optimal solution, then for every node n expanded by
lects the node that appears to be as close to a goal as possible, EES, it is true that f (n) ≤ w · g(opt), and thus EES returns
the node with minimum d(n), b for expansion. db is a poten- w-admissible solutions.
tially inadmissible estimate of the number of actions along a Proof: selectN ode will always return one of bestdb, bestfb or
minimum-cost path from n to a goal. While this algorithm
does follow the objectives of suboptimal search, it performs bestf . No matter which node n is selected we will show that
poorly in practice [Thayer et al., 2009]. This is a result of f (n) ≤ w · f (bestf ). This is trivial when bestf is chosen.
using f (bestf ) to determine which nodes form focal. When When bestdb is selected:
we use a lower bound to determine membership in the focal fb(bestdb) ≤ w · f (bestf ) by selectN ode
list, it is often the case that the children of an expanded node
will not qualify for inclusion in focal because f (n) tends to g(best b) + b
d h(best b) d ≤ w · f (bestf ) by def. of fb
rise along any path. As a result of f ’s tendency to rise, bestf g(bestdb) + h(bestdb) ≤ w · f (bestf ) by h ≤ bh
tends to have low depth and high d. b This results in an unfor-
f (bestdb) ≤ w · f (bestf ) by def. of f
tunate and inefficient thrashing behavior at low to moderate
suboptimality bounds, where focal is repeatedly emptied until f (bestdb) ≤ w · g(opt) by admissible h
bestf is finally expanded, raising f (bestf ) and refilling focal.
When bestdb is a solution, h(bestdb) = 0 and f (bestdb) = Next, consider an optimistic approach, analogous to that
g(bestdb), thus the cost of the solution represented by bestdb used by optimistic search and skeptical search:
is within a bounded factor w of the cost of an optimal solu- selectN odeopt
tion. The bestfb case is analogous. 2 1. if no incumbent then bestdb
2. else if fb(bestdb) ≤ w · f (bestf ) then bestdb
3.1 The Behavior of EES
3. else if fb(bestfb) ≤ w · f (bestf ) then bestfb
Explicit Estimation Search takes both the distance-to-go es- 4. else bestf
timate as well as the cost-to-go estimate into consideration
when deciding which node to expand next. This allows it to In its initial phase, EESopt finds the shortest solution it be-
prefer finding solutions within the bound as quickly as possi- lieves to be within the bound. Then, after finding an incum-
ble. It does this by operating in the same spirit as A∗ǫ , using bent solution, it behaves exactly like EES. The search ends
both an open and a focal list. However, unlike A∗ǫ , EES con- when either w · f (bestf ) is larger than the cost of the in-
sults an unbiased estimate of the cost-to-go for nodes in order cumbent solution or when a new goal is expanded, because in
either case EESopt has produced a w-admissible solution. We
to form its focal list. fb(n) is less likely to rise along a path will examine both EES and EESopt in our evaluation below.
than f (n), considering that f (n) can only rise or stay the
same by the admissibility of h(n). Relying on an unbiased 3.3 Implementation
estimate tempers the tendency for estimated costs to always EES is structured like a classic best-first search: insert the ini-
rise and avoids the thrashing behavior of A∗ǫ . tial node into open, and at each step, we select the next node
Like optimistic and skeptical search, EES uses a cleanup for expansion using selectN ode. To efficiently access bestfb,
list, a list of all open nodes sorted on f (n), to prove its sub- bestdb, and bestf , EES maintains three queues, the open list,
optimality bound. Rather than doing all of these expansions focal list, and cleanup list. These lists are used to access
after having found a solution, EES interleaves cleanup expan- bestfb, bestdb, and bestf respectively. The open list contains
sions with those directed towards finding a solution. We can
see this in the definition of selectN ode, where EES may ei- all generated but unexpanded nodes sorted on fb(n). The node
ther raise the lower bound on optimal solution cost by select- at the front of the open list is bestfb. focal is a prefix of the
ing bestf , or it may instead pursue a solution by expanding b focal contains all of those nodes where
open list ordered on d.
either of the other two nodes. It can never run into the prob- b b
f (n) ≤ w · f (best b). The node at the front of focal is best b.
lem of having an incumbent solution that falls outside of the f d
desired suboptimality bound because every node selected for cleanup contains all nodes from open, but is ordered on f (n)
expansion is within the bound. instead of fb(n). The node at the front of cleanup is bestf .
Much like A∗ǫ or weighted A∗ , EES will become faster We need to be able to quickly select a node at the front of
as the bound is loosened. Like A∗ǫ , EES becomes a greedy one these queues, remove it from all relevant data structures,
search on d. b This contrasts with weighted A∗ , skeptical and reinsert its children efficiently. To accomplish this, we
search, and optimistic search which focus exclusively on cost implement cleanup as a binary heap, open as a red-black tree,
estimates and thus do not minimize search time. and focal as a heap synchronized with a left prefix of open.
This lets us perform most insertions and removals in logarith-
3.2 Alternative Node Selection Strategies mic time except for transferring nodes from open onto focal
as bestfb changes. This requires us to visit a small range of
Although the formulation of selectN ode is directly moti- the red-black tree in order to put the correct nodes in focal.
vated by the stated goals of bounded suboptimal search, it
is natural to wonder if there are other strategies that may have 3.4 Simplified Approaches
better performance. We consider two alternatives, one con- With three queues to manage, EES has substantial overhead
servative and one optimistic. First, a more conservative ap- when compared to other bounded suboptimal search algo-
proach that prefers to do the bound-proving expansions, those rithms. One way to simplify EES would be to remove some
on bestf , as early as possible and so it first considers bestf : of the node orderings. It is clear that we can not ignore the
selectN odecon cleanup list, or we would lose bounded suboptimality. If we
1. if fb(bestfb) > w · f (bestf ) then bestf were to eliminate the open list, and synchronize focal with
cleanup instead, we would obtain A∗ǫ . If we were to ignore fo-
2. else if fb(bestdb) ≤ w · f (bestf ) then bestdb cal, and instead only expand either bestf or bestfb, we would
3. else bestfb be left with a novel interleaved implementation of skeptical
If bestf isn’t selected for expansion, it then considers bestdb search, but this would not be able to prefer the shorter of two
and bestfb as before. This expansion order produces a solu- solutions within the bound. We can conclude that EES is only
tion within the desired bound by the same argument as that as complicated as it needs to be. In the next section, we will
for selectN ode. While this alternate select node seems quite see that its overhead is worthwhile.
different on the surface, it turns out that it is identical to the
original rules. If we were to strengthen the rules for selecting 4 Empirical Evaluation
each of the three nodes by including the negation of the pre- From the description of the algorithm, we can see that EES
vious rules, we would see that the two functions are identical. was designed to work well in domains where solution cost
and length can differ and it should do particularly well on sible heuristics and estimates of solution length outperform
problems with computationally expensive node expansion weighted A* and optimistic search. A∗ǫ performs particularly
functions and heuristics. We do not know how much of an poorly in this domain, with run times that are hundreds of
advantage using distance estimates will provide, nor do we times larger than the other search algorithms, a direct result of
know how much the overhead of EES will degrade perfor- the thrashing behavior described previously. Further we see
mance in domains where expansion and heuristic computa- that both variants of EES are significantly faster than skep-
tion are cheap. To gain a better understanding of these as- tical search, with EESopt being slightly faster than EES for
pects of the algorithm, we performed an empirical evaluation most suboptimality bounds. Both variants take about half the
across six benchmark domains. All algorithms were imple- time of skeptical search to solve problems at the same bound.
mented in Objective Caml and compiled to native code on Not only are both EES variants consistently faster than skep-
64-bit Intel Linux systems with 3.16 GHz Core2 duo proces- tical search, their performance is more consistent, as noted by
sors and 8 GB of RAM. Algorithms were run until they solved the tighter confidence intervals around their means.
the problem or until ten minutes had passed. Vacuum World In this domain, which follows the first state
We examined the following algorithms: space presented in Russell and Norvig, page 36 [2010], a
weighted A∗ (wA*) [Pohl, 1970] For domains with many du- robot is charged with cleaning up a grid world. Movement
plicates and consistent heuristics, it ignores duplicate states as is in the cardinal directions, and when the robot is on top of
this has no impact on the suboptimality bound and typically a pile of dirt, it may clean it up. We consider two variants
improves performance [Likhachev et al., 2003]. of this problem, a unit-cost implementation and another more
A∗ǫ (A* eps) [Pearl and Kim, 1982] using the base distance- realistic variant where the cost of taking an action is one plus
to-go heuristic d to sort its focal list. the number of dirt piles the robot has cleaned up (the weight
Optimistic search as described by Thayer and Ruml [2008]. from the dirt drains the battery faster). We used 50 instances
that are 200 by 200, each cell having a 35% probability of
Skeptical search with b h and db estimated from the base h and
being blocked. We place ten piles of dirt and the robot ran-
d using the on-line single step heuristic corrections presented
domly in unblocked cells and ensure that the problem can be
in Thayer and Ruml [2011]. The technique calculates the
solved. For the unit cost domain, we use the minimum span-
mean one step error in both h and d along the current search
ning tree of the dirt piles and the robot for h and for d we
path from the root to the current node. This measurement of
estimate the cost of a greedy traversal of the dirt piles. That
heuristic error is then used to produce a more accurate, but
is, we make a freespace assumption on the grid and calculate
potentially inadmissible heuristic.
the number of actions required to send the robot to the nearest
EES using the same estimates of b h and db as skeptical. pile of dirt, then the nearest after that, and so on. For h on the
EES Opt is EES using selectN odeopt . heavy vacuum problems, we compute the minimum spanning
We evaluated these algorithms on the following domains: as before, order the edges by greatest length first, and then
Dock Robot We implemented a dock robot domain inspired multiply the edge weights by the current weight of the robot
by Ghallab et al. [2004] and the depots domain from the In- plus the number of edges already considered. d is unchanged.
ternational Planning Competition. Here, a robot must move The center panel of Figure 1 shows the relative perfor-
containers to their desired locations. Containers are stacked mance of the algorithms on the unit-cost vacuum problem.
at a location using a crane, and only the topmost container We see that there is very little difference between EES and
on a pile may be accessed at any time. The robot may drive EES Opt. which both outperform the other search algo-
between locations and load or unload itself using the crane rithms about an order of magnitude, solving the problems in
at the location. We tested on 150 randomly configured prob- tenths of seconds instead of requiring several seconds. Again,
lems having three locations laid out on a unit square and ten what these two algorithms have in common that differs from
containers with random start and goal configurations. Driv- other approaches is their ability to rely on inadmissible cost-
ing between the depots has a cost of the distance between to-go estimates and estimates of solution length for search
them, loading and unloading the robot costs 0.1, and the cost guidance. The performance gap between EES and skeptical
of using the crane was 0.05 times the height of the stack of search is not as large here as it is in domains with actions of
containers at the depot. h was computed as the cost of driv- differing costs. In unit cost domains like this, searches that
ing between all depots with containers that did not belong to become greedy on cost-to-go behave identically to those that
them in the goal configuration plus the cost of moving the become greedy on distance-to-go. Removing the distinction
deepest out of place container in the stack to the robot. d was between solution cost and solution length removes one of the
computed similarly, but 1 is used rather than the actual costs. advantages that EES holds over skeptical search.
We show results for this domain in the leftmost plot of Fig- The right most panel of Figure 1 shows the performance of
ure 1. All plots are laid out similarly, with the x-axis repre- the algorithms for the heavy vacuum robot domain. Of the
senting the user-supplied suboptimality bound and the y-axis domains presented here, this has the best set of properties for
representing the mean CPU time taken to find a solution (of- use with EES; the heuristic is relatively expensive to compute
ten on a log10 scale). We present 95% confidence intervals and there is a difference between solution cost and solution
on the mean for all plots. In dock robots, we show results length. For very tight bounds, the algorithms all perform sim-
for suboptimality bounds of 1.2 and above. Below this, no ilarly. As the bound is relaxed, both versions of EES clearly
algorithm could reliably solve the instances within memory. dominate the other approaches, being between one and two
Here we see that the techniques that rely on both inadmis- orders of magnitude faster than other approaches. Both vari-
Dock Robot Vacuum World Heavy Vacuum World
A* eps
0.6 Optimistic
300 Skeptical
wA* 1
EES
A* eps wA*
200
wA* 0 Optimistic
Optimistic Skeptical
Skeptical A* eps
EES 0 EES
EES Opt. EES Opt.
100
-0.6
-1
1.6 2.4 2 4 8 16
Suboptimality Suboptimality Suboptimality
Figure 1: Mean CPU time required to find solutions within a given suboptimality bound.
ants solve the problems in fractions of a second instead of terms of time to solutions, EES is almost indistinguishable
tens of seconds. A∗ǫ makes a strong showing for high subop- from weighted A∗ and skeptical search and worse, by about
timality bounds in this domain precisely because it can take an order of magnitude, than optimistic search. If we examine
advantage of the difference between solution cost and solu- these results in terms of nodes generated (omitted for space),
tion length. Of the algorithms that take advantage of distance we see that EES, skeptical, and optimistic search examine a
estimates, it makes the weakest showing, as it begins to fail similar number of nodes for many suboptimality bounds; the
to find solutions for many problems at suboptimality bounds time difference is due to the differing overheads of the algo-
as large as 4, whereas the EES algorithms solve all instances rithms.
down to a suboptimality bound of 1.5. The right panel of Figure 2 shows the performance of the
Dynamic Robot Navigation Following Likhachev et algorithms on inverse tiles problems, where the cost of mov-
1
al. [2003], the goal is to find the fastest path from the initial ing a tile is the inverse of its face value, f ace . This sepa-
state of the robot to some goal location and heading, taking rates the cost and length of a solution without altering other
momentum into account. We use worlds that are 200 by 200 properties of the domain, such as its connectivity and branch-
cells in size. We scatter 25 lines, up to 70 cells in length, with ing factor. This simple change to the cost function makes
random orientations across the domain and present results av- these problems remarkably difficult to solve, and reveals a
eraged over 100 instances. We precompute the shortest path hitherto unappreciated brittleness in previously proposed al-
from the goal to all states, ignoring dynamics. To compute h, gorithms. We use a weighted Manhattan distance for h and
we take the length of the shortest path from a node to a goal the unit Manhattan distance for d. Explicit estimation search
and divide it by the maximum velocity of the robot. For d, we and skeptical search are the only algorithms shown for this
report the number of actions along that path. domain, as all of the other algorithms fail to solve at least
The leftmost panel of Figure 2 shows performance of the half of the problems within a 600 second timeout across all
algorithms in a dynamic robot navigation domain. As in suboptimality bounds shown in the plot. This failure can be
heavy vacuums and dock robots, there is a substantial dif- attributed in part to their inability to correct the admissible
ference between the number of actions in a plan and the cost heuristic for the problem into a stronger inadmissible heuris-
of that plan, however here computing the heuristic is cheap. tic during search, as this is the fundamental difference be-
For tight bounds, EES is faster than A∗ǫ , however for more tween skeptical search, which can solve the problem, and op-
generous bounds A∗ǫ pulls ahead. When we evaluate these al- timistic search, which cannot. The ability to solve instances
gorithms in terms of nodes expanded (omitted for space), we is not entirely due to reliance on d(n), as A∗ǫ is unable to
would see that their performance is similar. The better times solve many of the instances in this domain. EES is signifi-
of A∗ǫ in this domain can be attributed to reduced overhead. cantly faster than skeptical search in this domain, about 5 to
Sliding Tiles Puzzles We examined the 100 instances of the 6 times as fast for moderate suboptimality bounds and a little
15-puzzle presented by Korf [1985]. The center panel of Fig- more than a full order of magnitude for tight bounds, because
ure 2 shows the relative performance of the algorithms on it can incorporate distance-to-go estimates directly for guid-
the unit-cost sliding tiles puzzle. We use Manhattan distance ance. While we show data out to a suboptimality bound of
plus linear conflicts for both h and d. This is exactly the 50, this trend holds out to at least 100,000.
wrong kind of domain for EES. Node generation and heuris- Summary In our benchmark domains, we saw that ex-
tic evaluation are incredibly fast, and there is no difference plicit estimation search was consistently faster than other ap-
between the number of actions in a solution and the cost of proaches for bounded suboptimal search in domains where
that solution. Such a domain lays bare the overhead of EES actions had varying costs. Although A∗ǫ was faster for some
and prevents it from taking advantage of its ability to distin- bounds in one domain, its behavior is so brittle that it is of
guish between cost and length of solutions. We see that, in little practical use. EES’ advantage increased in those do-
Dynamic Robot Navigation Korf's 100 15 Puzzles 100 Inverse 15 Puzzles
1 2 0
wA* A* eps Skeptical
Skeptical EES Opt. EES Opt.
Optimistic EES EES
A* eps Skeptical
EES Opt. 1 wA*
EES Optimistic
0
-1
-1
-1
-2
-2
1 2 3 1.6 2.4 20 40
Suboptimality Suboptimality Suboptimality
Figure 2: Mean CPU time required to find solutions within a given suboptimality bound.