Вы находитесь на странице: 1из 359

Changes of Problem Representation

Studies in Fuzziness and Soft Computing


Editor-in-chief
Prof. Janusz Kacprzyk
Systems Research Institute
Polish Academy of Sciences
ul. Newelska 6
01-447 Warsaw, Poland
E-mail: kacprzyk@ibspan.waw.pl
http://www.springer.de/cgi-bin/search_book.pl?series = 2941

Further volumes of this series can Vol. 100. S.-H. Chen (Ed.)
Evolutionary Computation
be found at our homepage. in Economics and Finance, 2002
ISBN 3-7908-1476-8
Vol. 90. B. Bouchon-Meunier, J. Gutirrez-RÌos,
L. Magdalena and R. R. Yager (Eds.)
Technologies for Constructing Intelligent Systems 2, Vol. 101. S. J. Ovaska and L. M. Sztandera (Eds.)
2002 Soft Computing in Industrial Electronics, 2002
ISBN 3-7908-1455-5 ISBN 3-7908-1477-6

Vol. 91. J. J. Buckley, E. Eslami and T. Feuring Vol. 102. B. Liu


Fuzzy Mathematics in Economics and Engineering, Theory and Practice of Uncertain Programming,
2002 2002
ISBN 3-7908-1456-3 ISBN 3-7908-1490-3
Vol. 92. P. P. Angelov
Evolving Rule-Based Models, 2002 Vol. 103. N. Barnes and Z.-Q. Liu
ISBN 3-7908-1457-1 Knowledge-Based Vision-Guided Robots, 2002
Vol. 93. V. V. Cross and T. A. Sudkamp ISBN 3-7908-1494-6
Similarity and Compatibility in Fuzzy Set Theory,
2002 Vol. 104. F. Rothlauf
ISBN 3-7908-1458-X Representations for Genetic and Evolutionary
Algorithms, 2002
Vol. 94. M. MacCrimmon and P. Tillers (Eds.) ISBN 3-7908-1496-2
The Dynamics of Judicial Proof, 2002
ISBN 3-7908-1459-8
Vol. 105. J. Segovia, P. S. Szczepaniak and
Vol. 95. T. Y. Lin, Y. Y. Yao and L. A. Zadeh (Eds.) M. Niedzwiedzinski (Eds.)
Data Mining, Rough Sets and Granular Computing, E-Commerce and Intelligent Methods, 2002
2002 ISBN 3-7908-1499-7
ISBN 3-7908-1461-X
Vol. 96. M. Schmitt, H.-N. Teodorescu, A. Jain, Vol. 106. P. Matsakis and L. M. Sztandera (Eds.)
A. Jain, S. Jain and L. C. Jain (Eds.) Applying Soft Computing in Defining Spatial
Computational Intelligence Processing Relations, 2002
in Medical Diagnosis, 2002 ISBN 3-7908-1504-7
ISBN 3-7908-1463-6
Vol. 107. V. Dimitrov and B. Hodge
Vol. 97. T. Calvo, G. Mayor and R. Mesiar (Eds.)
Social Fuzziology, 2002
Aggregation Operators, 2002
ISBN 3-7908-1506-3
ISBN 3-7908-1468-7
Vol. 98. L. C. Jain, Z. Chen and N. Ichalkaranje Vol. 108. L. M. Sztandera and C. Pastore (Eds.)
(Eds.) Soft Computing in Textile Sciences, 2003
Intelligent Agents and Their Applications, 2002 ISBN 3-7908-1512-8
ISBN 3-7908-1469-5
Vol. 99. C. Huang and Y. Shi Vol. 109. R. J. Duro, J. Santos and M. GranÄa (Eds.)
Towards Efficient Fuzzy Information Processing, Biologically Inspired Robot Behavior Engineering,
2002 2003
ISBN 3-7908-1475-X ISBN 3-7908-1513-6
Eugene Fink

Changes of
Problem Representation
Theory and Experiments

With 271 Figures


and 40 Tables

Springer-Verlag Berlin Heidelberg GmbH


Professor Eugene Fink
University of South Florida
Computer Science and Engineering
4202 East Fowler Av., ENB-118
33620-5399 Tampa, Florida
USA
eugene@csee.usf.edu

ISSN 1434-9922
ISBN 978-3-7908-2518-3 ISBN 978-3-7908-1774-4 (eBook)
DOI 10.1007/978-3-7908-1774-4
Library of Congress Cataloging-in-Publication Data applied for
Die Deutsche Bibliothek ± CIP-Einheitsaufnahme
Fink, Eugene: Changes of problem representation: theory and experiments; with 40 tables / Eugene Fink. ±
Heidelberg; New York: Physica-Verl., 2003
(Studies in fuzziness and soft computing; Vol. 110)

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is con-
cerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, repro-
duction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts
thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its
current version, and permission for use must always be obtained from Physica-Verlag. Violations are liable
for prosecution under the German Copyright Law.

° Springer-Verlag Berlin Heidelberg 2002


Originally published by Physica-Verlag Heidelberg in 2002
Softcover reprint of the hardcover 1st edition 2002
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Preface

The purpose of our research is to enhance the efficiency of AI problem solvers


by automating representation changes. We have developed a system that
improves the description of input problems and selects an appropriate search
algorithm for each given problem.
Motivation. Researchers have accumulated much evidence on the impor-
tance of appropriate representations for the efficiency of AI systems. The
same problem may be easy or difficult, depending on the way we describe
it and on the search algorithm we use. Previous work on the automatic im-
provement of problem descriptions has mostly been limited to the design of
individual learning algorithms. The user has traditionally been responsible
for the choice of algorithms appropriate for a given problem.
We present a system that integrates multiple description-changing and
problem-solving algorithms. The purpose of the reported work is to formalize
the concept of representation and to confirm the following hypothesis:
An effective representation-changing system can be built from three parts:
• a library of problem-solving algorithms;
• a library of algorithms that improve problem descriptions;
• a control module that selects algorithms for each given problem.
Representation-changing system. We have supported this hypothesis by
building a system that improves representations in the PRODIGY problem-
solving architecture. The library of problem solvers consists of several search
engines available in PRODIGY. The library of description changers contains
novel algorithms for selecting primary effects, generating abstractions, and
discarding irrelevant elements of a problem encoding. The control module
chooses and applies appropriate description changers, reuses available de-
scriptions, and selects problem solvers.
Improving problem description. The implemented system includes seven
algorithms for improving the description of a given problem. First, we for-
malize the notion of primary effects of operators and give two algorithms
for identifying primary effects. Second, we extend the theory of abstraction
search to the PRODIGY domain language and describe two techniques for
VI Preface

abstracting preconditions and effects of operators. Third, we present auxil-


iary algorithms that enhance the power of abstraction by identifying relevant
features of a problem and generating partial instantiations of operators.
Top-level control. We define a space of possible representations of a given
problem and view the task of changing representation as a search in this space.
The top-level control mechanism guides the search, using statistical analysis
of previous results, control rules, and general heuristics. First, we formalize
the statistical problem involved in finding an effective representation and
derive a solution to this problem. Then, we describe control rules for selecting
representations and present a mechanism for the synergetic use of statistical
techniques, control rules, and heuristics.

Acknowledgments

The reported work is the result of my good fortune to be a graduate student at


Carnegie Mellon University. I gratefully acknowledge the help of my advisors,
co-workers, and friends, who greatly contributed to my work and supported
me during six long years of graduate studies.
My greatest debt is to Herbert Alexander Simon, Jaime Guillermo Car-
bonell, and Maria Manuela Magalhaes de Albuquerque Veloso. They provided
stimulating ideas, guidance, and advice in all aspects of my work, from its
strategic course to minute details of implementation and writing.
I am grateful to Derick Wood, Qiang Yang, and Jo Ebergen, who guided
my research before I entered Carnegie Mellon. Derick Wood and Qiang Yang
also supervised my work during the three summerS that I spent away from
Carnegie Mellon after entering the Ph.D. program. They taught me research
and writing skills, which proved invaluable for my work.
I thank my undergraduate teachers of math and computer science, espe-
cially my advisor Robert Rosebrugh, who encouraged me to pursue a graduate
degree. I also thank my first teachers of science, Maria Yurievna Filina, Niko-
lai Moiseevich Kuksa, and Alexander Sergeevich Golovanov. Back in Russia,
they introduced me to the wonderful world of mathematics and developed
my taste for learning and research.
I am thankful to Richard Korf for his valuable comments on my research
and writing. I have also received thorough feedback from Karen Haigh, Josh
Johnson, Catherine Kaidanova, Savvas Nikiforou, Mary Parrish, Henry Row-
ley, and Yury Smirnov l .
I did this work in the stimulating research environment of the PRODIGY
group. I fondly remember my discussions with members of the group, includ-
ing Jim Blythe, Daniel Borrajo, Michael Cox, Rujith DeSilva, Rob Driskill,
Karen Haigh, Vera Kettnaker, Craig Knoblock, Erica Melis, Steven Minton,
Alicia Perez, Paola Rizzo, Yury Smirnov, Peter Stone, and Mei Wang l . My
Preface VII

special thanks are to Jim Blythe, who helped me understand PRODIGY code
and adapt it for my system.
I have received support and encouragement from fellow graduate students,
Claudson Bornstein, Tammy Carter, Nevin Heintze, Bob Monroe, Henry
Rowley, and Po-Jen Yang l .
Dmitry Goldgof and Lawrence Hall have encouraged me to publish this
work and helped with preparation of the final manuscript. Svetlana Vainer,
a graduate student in mathematics, aided me in constructing the statistical
model used in my system. Evgenia Nayberg, a fine-arts student, assisted me
in designing illustrations. Savvas Nikiforou helped me with Jb'IE;X formatting.
I am grateful to my fiancee, Lena Mukomel, for her love and encour-
agment. I am also grateful to my parents, who provided help and support
through my studies.
Finally, I thank my friends outside computer science. Natalie Gurevich,
Alex Gurevich, and Lala Matievsky played an especially important role in my
life. They helped me to immigrate from Russia and establish my priorities
and objectives. I have received much support from Catherine Kaidanova,
Natalia Kamneva, Alissa Kaplunova, Alexander Lakher, Irina Martynov, Alex
Mikhailov, Evgenia Nayberg, Michael Ratner, Alexander Rudin, and Viktoria
Suponiskayal . I am also thankful to my Canadian friends, Elverne Bauman,
Louis Choiniere, Margie Roxborough, Marty Sulek, Alison Syme, and Linda
Wood l , who helped me to learn the culture of Canada and the United States.
The work was sponsored by the Defense Advanced Research Projects
Agency (DARPA) via the Navy, under grant F33615-93-1-1330, and the Air
Force Research Laboratory, under grant F30602-97-1-0215.

1 The names are in alphabetical order.


Contents

Part I. Introduction

1. Motivation............................................... 3
1.1 Representations in problem solving .. . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Informal examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Alternative definitions of representation. . . . . . . . . . . . . 6
1.1.3 Representations in the SHAPER system. . . . . . . . . . . . . . 8
1.1.4 The role of representation. . . . . . . . . . . . . . . . . . . . . . . .. 12
1.2 Examples of representation changes. . . . . . . . . . . . . . . . . . . . . .. 14
1.2.1 Tower-of-Hanoi Domain. . . . . . . . . . . . . . . . . . . . . . . . . .. 14
1.2.2 Constructing an abstraction hierarchy. . . . . . . . . . . . . .. 16
1.2.3 Selecting primary effects .......................... 19
1.2.4 Partially instantiating operators. . . . . . . . . . . . . . . . . . .. 21
1.2.5 Choosing a problem solver. . . . . . . . . . . . . . . . . . . . . . . .. 21
1.3 Related work .......................................... 23
1.3.1 Psychological evidence. . . . . . . . . . . . . . . . . . . . . . . . . . .. 24
1.3.2 Automatic representation changes . . . . . . . . . . . . . . . . .. 25
1.3.3 Integrated systems ............................... 26
1.3.4 Theoretical results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 27
1.4 Overview of the approach ............................... 28
1.4.1 Architecture of the system. . . . . . . . . . . . . . . . . . . . . . . .. 29
1.4.2 Specifications of description changers ............... 30
1.4.3 Search in the space of representations. . . . . . . . . . . . . .. 32
1.5 Extended abstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 34

2. Prodigy search ........................................... 39


2.1 PRODIGY system ....................................... 40
2.1.1 History......................................... 40
2.1.2 Advantages and drawbacks ............... . . . . . . . .. 41
2.2 Search engine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 43
2.2.1 Encoding of problems. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 43
2.2.2 Incomplete solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 44
2.2.3 Simulating execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 46
2.2.4 Backward chaining ............................... 47
2.2.5 Main versions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 49
X Contents

2.3 Extended domain language . . . . . . . . . . . . . . . . . . . ... . 53 . . ... . .


2.3.1 Extended operators . . . . . . . . . . . . . . . . . . . . . . 53 . . . . . ... .
2.3.2 Inference rules .. . . . . . . . . . . . . . . . . . . . . . . . .55 . . ... . . .
2.3.3 Complex types . . . . . . . . . . . . . . . . . . . . . . .... .58. . . . . . . .
2.4 Search control. . . . . . . . . . . . . . . . . . . . . . . . . . . .... .61. . . . . . . .
2.4.1 Avoiding redundant search ...... . ................. 61
2.4.2 Knob values . . . . . . . . . . . . . . . . . . . . . . . . ... . 64 . .. .. . . . .
2.4.3 Control rules ....................... . . . . . ..... . .. 65
2.5 Completeness .. .... .. . .. ........... . ... .. ....... ... .... 66
2.5.1 Limitation of means-ends analysis . . . . . . . . . . . ... . 67 . ..
2.5.2 Clobbers among if-effects . . . . . . . . . . . . . . . . .. . . .70. . . . .
2.5.3 Other violations of completeness . .. . . . .. . .......... 73
2.5.4 Completeness proof. . . . . . . . . . . . . . . . . . . . ... . .75 . . . . . .
2.5.5 Performance of the extended solver . . . . . . . . . . ... . .77. .
2.5.6 Summary of complete ness results . . . . . . . . . . . .... . .78 ..

Part II. Description changers

3. Primary effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . 81. . . .. . . ..


3.1 Search with primary effects . . . . . . . . . . . . . . . . . . . .... 82 .. .. . . .
3.1.1 Motivating examples . . . . . . . . . . . . . . . . . . . ... . 82 . . . . . . .
3.1.2 Main definitions. . . . . . . . . . . . . . . . . . . . . . . . 83
.. . .. .. . .
3.1.3 Search algorithm . . . . . . . . . . . . . . . . . . . . . .... 86
. . . .. .. .
3.2 Completeness of primary effects . . . . . . . . . . . . . . . . . .. . .88. . . . .
3.2.1 Completeness and solution costs. . . . . . . . . . . . . . . 89 . .. ..
3.2.2 Condition for completeness . . . . . . . . . . . . . . . ... . 90 . . . . .
3.3 Analysis of search reduction . . . ... . ...... ..... . ...... . . .. 92
3.4 Automatically selecting primary effects . . . . . . . . . . . . . .. . .95. . .
3.4.1 Selection heuristics . . .. . . . . . . . . . . . . . . . . . ... . 96
. .....
3.4.2 Instantiating the operators .. . .. . . .. . ......... . .... 99
3.5 Learning additional primary effects ... . . . ..... . ...... .. . .. 107
3.5.1 Inductive learning algorithm . .. . ............. . ..... 108
3.5.2 Selection heuristics ...... . .... .. ...... . . . . . ....... 111
3.5.3 Sample complexity . . ......... ... ... . . . . . . ........ 111
3.6 ABTWEAK experiments .. ... . .. .... . .. . .. . . . ......... .. . . 115
3.6.1 Controlled experiments . .. .. . .. . ... ...... . . .. . . . ... 115
3.6.2 Robot world and machine shop . . . .. ......... . ...... 118
3.7 PRODIGY experiments ... .... . .... . .... . .. . .. .. ...... . ... 120
3.7.1 Domains from ABTWEAK ... . . . . . . .. ... . . ...... . ... 120
3.7.2 Sokoban puzzle and STRIPS world . . . . .. ...... . . . .. . 122
3.7.3 Summary of experimental results . ... .... ... ..... .. . 131
Contents XI

4. Abstraction ............................................... 133


4.1 Abstraction in problem solving ........................... 133
4.1.1 History of abstraction ............................. 133
4.1.2 Hierarchical problem solving ....................... 135
4.1.3 Efficiency and possible problems ................... 135
4.1.4 Avoiding the problems ............................ 138
4.1.5 Ordered monotonicity ............................. 140
4.2 Hierarchies for the PRODIGY domain language .............. 141
4.2.1 Additional constraints ............................ 142
4.2.2 Abstraction graph ................................ 144
4.3 Partial instantiation of predicates ......................... 149
4.3.1 Improving the granularity ......................... 149
4.3.2 Instantiation graph ............................... 151
4.3.3 Basic operations .................................. 154
4.3.4 Construction of a hierarchy ........................ 157
4.3.5 Level of a given literal ............................ 160
4.4 Performance of the abstraction search ..................... 160

5. Summary and extensions .. ............................... 167


5.1 Abstracting the effects of operators ....................... 167
5.2 Identifying the relevant literals ........................... 175
5.3 Summary of work on description changers ................. 182
5.3.1 Library of description changers ..................... 182
5.3.2 Unexplored description changes .................... 184
5.3.3 Toward a theory of description changes .............. 190

Part III. Top-level control

6. Multiple representations . ................................. 193


6.1 Solvers and changers .................................... 193
6.1.1 Domain descriptions .............................. 193
6.1.2 Problem solvers .................................. 194
6.1.3 Description changers .............................. 195
6.1.4 Representations .................................. 195
6.2 Description and representation spaces ..................... 195
6.2.1 Descriptions, solvers, and changers ................. 195
6.2.2 Description space ................................. 197
6.2.3 Representation space ............................. 197
6.3 Utility functions ........................................ 199
6.3.1 Gain function .................................... 199
6.3.2 Additional constraints ............................ 200
6.3.3 Representation quality ............................ 201
6.3.4 Use of multiple representations ..................... 202
6.3.5 Summing gains ................................... 203
XII Contents

6.4 Simplifying assumptions ................................. 204

7. Statistical selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205


7.1 Selection task .......................................... 205
7.1.1 Previous and new results .......................... 205
7.1.2 Example and general problem ...................... 206
7.2 Statistical foundations .................................. 208
7.3 Computation of the gain estimates ........................ 211
7.4 Selection of a representation and time bound ............... 214
7.4.1 Candidate bounds ................................ 216
7.4.2 Setting a time bound ............................. 216
7.4.3 Selecting a representation ......................... 218
7.4.4 Selection without past data ........................ 220
7.5 Empirical examples ..................................... 222
7.5.1 Extended transportation domain ................... 222
7.5.2 Phone-call domain ................................ 223
7.6 Artificial tests .......................................... 225

8. Statistical extensions ..................................... 231


8.1 Problem-specific gain functions ........................... 231
8.2 Problem sizes .......................................... 233
8.2.1 Dependency of time on size ........................ 233
8.2.2 Scaling of running times ........................... 235
8.2.3 Artificial tests ................................... 237
8.3 Similarity among problems .............................. 239
8.3.1 Similarity hierarchy ............................... 239
8.3.2 Choice of a group ................................ 242
8.3.3 Empirical examples ............................... 243

9. Summary and extensions ................................. 245


9.1 Preference rules ........................................ 245
9.1.1 Preferences ...................................... 245
9.1.2 Preference graphs ................................ 246
9.1.3 Use of preferences ................................ 249
9.1.4 Delaying representation changes .................... 250
9.2 Summary of work on the top-level control ................. 250

Part IV. Empirical results

10. Machining Domain ....................................... 259


10.1 Selecting a description .................................. 259
10.2 Selecting a solver ....................................... 267
10.3 Different time bounds ................................... 274
Contents XIII

11. Sokoban DOInain ......................................... 279


11.1 Three representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
11.2 Nine representations .................................... 287
11.3 Different time bounds ................................... 293

12. Extended Strips Domain .................................. 299


12.1 Small-scale selection .................................... 299
12.2 Large-scale selection .................................... 309
12.3 Different time bounds ................................... 315

13. Logistics Domain ......................................... 321


13.1 Selecting a description and solver ......................... 321
13.2 Twelve representations .................................. 328
13.3 Different time bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 334

Concluding remarks .......................................... 339

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Part 1

Introd uction
1. Motivation

Could you restate the problem? Could you restate it still differently?
- George Polya [1957], How to Solve It.

The performance of all reasoning systems crucially depends on problem rep-


resentation: the same problem may be easy or difficult, depending On the way
we describe it. Researchers in psychology, cognitive science, and artificial in-
telligence have accumulated much evidence on the importance of appropriate
representations for human problem solvers and AI systems.
In particular, psychologists have found out that human subjects often sim-
plify difficult problems by changing their representation. The ability to find
an appropriate problem reformulation is a crucial skill for mathematicians,
physicists, economists, and experts in many other areas. AI researchers have
shown the impact of changes in problem description On the performance of
search systems and pointed out the need to automate problem reformulation.
Although researchers have long realized the importance of effective repre-
sentations, they have done little investigation in this area, and the notion of
"good" representations has remained at an informal level. The user has tradi-
tionally been responsible for providing appropriate problem descriptions, as
well as for selecting search algorithms that effectively use these descriptions.
The purpose of our work is to automate the process of revising problem
representation in AI systems. We formalize the concept of representation,
explore its role in problem solving, and develop a system that evaluates and
improves representations in the PRODIGY problem-solving architecture.
The work On the system for changing representations has consisted of
two main stages, described in Parts II and III. First, we outline a frame-
work for the development of algorithms that improve problem descriptions
and apply it to designing several novel algorithms. Second, we construct an
integrated AI system, which utilizes the available description improvers and
problem solvers. The system is named SHAPER for its ability to change the
shape of problems and their search spaces. We did not plan this name to
be an acronym; however, it may be retroactively deciphered as Synergy of
Hierarchical Abstraction, Primary Effects, and other Representations. The
central component of the SHAPER system is a control module that selects
appropriate algorithms for each given problem.

E. Fink, Changes of Problem Representation


© Springer-Verlag Berlin Heidelberg 2002
4 1. Motivation

We begin by explaining the concept of representations in problem solving


(Section 1.1), illustrating their impact on problem complexity (Section 1.2),
and reviewing the previous work on representation changes (Section 1.3). We
then outline our approach to the automation of representation improvements
(Section 1.4) and give a summary of the main results (Section 1.5).

1.1 Representations in problem solving

Informally, a problem representation is a certain view of a problem and an


approach to solving it. Scientists have considered various formalizations of
this concept; its exact meaning varies across research contexts. The repre-
sentation of a problem in an AI system may include the initial encoding
of the problem, data structures for storing relevant information, production
rules for drawing inferences about the problem, and heuristics that guide the
search for a solution.
We explain the meaning of representation in our research and introduce
related terminology. First, we give informal examples that illustrate this no-
tion (Section 1.1.1). Second, we review several alternative formalizations (Sec-
tion 1.1.2) and define the main notions used in the work on the SHAPER sys-
tem (Section 1.1.3). Third, we discuss the role of representation in problem
solving (Section 1.1.4).

1.1.1 Informal examples

We consider two examples that illustrate the use of multiple representations.


In Section 1.2, we will give a more technical example, which involves repre-
sentation changes in the PRODIGY architecture.
Representations in geometry. Mathematicians have long mastered the
art of constructing and fine-tuning sophisticated representations, which is
one of their main tools for addressing complex research tasks [Polya, 1957].
For example, when a scientist works on a difficult geometry problem, she
usually tries multiple approaches: pictorial reasoning, analytical techniques,
trigonometric derivations, and computer simulations.
These approaches differ not only in the problem encoding, but also in the
related mental structures and strategies [Qin and Simon, 1992]. For example,
the mental techniques for analyzing geometric sketches are different from the
methods for solving trigonometric equations.
A mathematician may attempt many alternative representations of a
given problem, moving back and forth among promising approaches [Kaplan
and Simon, 1990]. For instance, she may consider several pictorial representa-
tions, then try analytical techniques, and then abandon her analytical model
and go back to one of the pictures.
1.1 Representations in problem solving 5

If several different representations provide useful information, the math-


ematician may use them in parallel and combine the resulting inferences.
This synergetic use of alternative representations is a standard mathematical
technique. In particular, proofs of geometric results often include equations
along with pictorial arguments.
The search for an appropriate representation is based on two main pro-
cesses: the retrieval or construction of candidate representations and the eval-
uation of their utility. The first process may involve look-up of a matching
representation in a library of available strategies, modification of an "almost"
matching representation, or development of a completely new approach. For
example, the mathematician may reuse an old sketch, draw a new one, or
even devise a new framework for solving this type of problem.
After constructing a new representation, the mathematician estimates its
usefulness for solving the problem. If it does not look promising, she may
prune it right away or store it as a back-up alternative; for example, she
may discard the sketches that clearly do not help. Otherwise, she uses the
representation and evaluates the usefulness of the resulting inferences.
To summarize, different representations of a given problem support dif-
ferent inference techniques, and the choice among them determines the ef-
fectiveness of the problem-solving process. Construction of an appropriate
representation can be a difficult task, and it can require a search in a certain
space of alternative representations.
Driving directions. We next give an example of representation changes in
everyday life and show that the choice of representation can be important
even for simple tasks. In this example, we consider the use of directions for
driving to an unfamiliar place.
Most drivers employ several standard techniques for describing a route,
such as a sketch of the streets that form the route, pencil marks on a city map,
and verbal directions for reaching the destination. When a driver chooses one
of these techniques, she commits to certain mental structures and strategies.
For instance, if the driver uses a map, she has to process pictorial information
and match it to the real world. On the other hand, the execution of verbal
instructions requires discipline in following the described steps and attention
to the relevant landmarks.
When the driver selects a representation, she should consider her goals,
the effectiveness of alternative representations for achieving these goals, and
the related trade-offs. For instance, she may describe the destination by its
address, which is a convenient way for recording it in a notebook or quickly
communicating to others; however, the address alone may not be sufficient
for finding the place without a map. The use of accurate verbal directions
is probably the most convenient way for driving to the destination. On the
other hand, a map may help to identify gas stations or restaurants close to
the route; moreover, it can be a valuable tool if the driver gets lost.
6 1. Motivation

A representation ...
• includes a machine language for the description of reasoning tasks and a specific
encoding of a given problem in this language [Amarel, 1968].
• is the space expanded by a solver algorithm during its search for a solution
[Newell and Simon, 1972].
• is the state space of a given problem, formed by all legal states of the simulated
world and transitions between them [Korf, 1980].
• "consists of both data structures and programs operating on them to make new
inferences" [Larkin and Simon, 1987].
• determines a mapping from the behavior of an AI system on a certain set of
inputs to the behavior of another system, which performs the same task on a
similar set of inputs [Holte, 1988].

Fig. 1.1. Definitions ofrepresentation in artificial intelligence and cognitive science.


These definitions are not equivalent, and thus they lead to different formal models.

If an appropriate representation is not available, the driver may construct


it from other representations. For instance, if she has the destination address,
she may find a route on a map and write down directions. When people con-
sider these representation changes, they often weigh the expected simplifi-
cation of the task against the cost of performing the changes. For instance,
even if the driver believes that written directions facilitate the trip, she may
decide that they are not worth the writing effort.
To summarize, this example shows that people employ multiple repre-
sentations not only for complex problems, but also for routine tasks. When
people repeatedly perform a certain task, they develop standard representa-
tions and techniques for constructing them. Moreover, the familiarity with
the task facilitates the selection among available representations.

1.1.2 Alternative definitions of representation

Although AI researchers agree on their intuitive understanding of representa-


tion, they have not yet developed a standard formalization of this notion. We
review several formal models and discuss their similarities and differences; in
Figure 1.1, we summarize the main definitions.
Problem formulation. Amarel [1961; 1965; 1968] was first to point out the
impact of representation on the efficiency of search algorithms. He considered
some problems of reasoning about actions in a simulated world and discussed
their alternative formulations in the input language of a search algorithm.
The discussion included two types of representation changes: modifying the
encoding of a problem and translating it to different languages.
In particular, he demonstrated that a specific formulation of a problem
determines its state space, that is, the space of possible states of the simulated
world and transitions between them. Amarel pointed out that the efficiency
of problem solving depends on the size of the state space, as well as on the
1.1 Representations in problem solving 7

allowed transitions, and that change of a description language may help to


reveal hidden properties of the simulated world.
Van Baalen [1989] adopted a similar view in his doctoral work on a theory
of representation design. He defined a representation as a mapping from con-
cepts to their syntactic description in a formal language and implemented a
program that automatically improved descriptions of simple reasoning tasks.
Problem space. Newell and Simon [1972] investigated the role of repre-
sentation in human problem solving. In particular, they observed that the
human subject always encodes a given task in a problem space, that is,
"some kind of space that represents the initial situation presented to him,
the desired goal situation, various intermediate states, imagined or experi-
enced, as well as any concepts he uses to describe these situations to himself"
[Newell and Simon, 1972].
They defined a representation as the subject's problem space, which de-
termines partial solutions considered by the human solver during his search
for a complete solution. This definition is also applicable to AI systems since
all problem-solving algorithms are based on the same principle of searching
among partial solutions.
Observe that the problem space may differ from the state space of the
simulated world. In particular, the subject may disregard some of the allowed
transitions and, on the other hand, consider impossible world states. For
instance, when people work on difficult versions of the Tower of Hanoi, they
sometimes attempt illegal moves [Simon et al., 1985]. Moreover, the problem
solver may abstract from the search among world states and use an alternative
view of partial solutions. In particular, the search algorithms in the PRODIGY
architecture explore the space of transition sequences (see Section 2.2), which
is different from the space of world states.
State space. Korf [1980] described a formal framework for changing rep-
resentations and used it in designing a system for automatic improvement
of the initial representation. He developed a language for describing search
problems and defined a representation as a specific encoding of a given prob-
lem in this language. The encoding includes the initial state of the simulated
world and operations for transforming the state; hence, it defines the state
space of the problem.
Korf pointed out the correspondence between the problem encoding and
the resulting state space, which allowed him to view a representation as a
space of named states and transitions between them. This view under lied his
techniques for changing representation. In particular, he defined a represen-
tation change as a transformation of the state space and considered two main
types of transformations: isomorphism and homomorphism. An isomorphic
representation change was renaming of the states without changing the struc-
ture of the space. On the other hand, a homomorphic transformation was a
reduction of the space by abstracting some states and transitions.
8 1. Motivation

Observe that Korf's notion of representation did not include the behavior
of a problem-solving algorithm. Since performance depended not only on the
state space but also on the search strategies, a representation in his model
did not uniquely determine the efficiency of problem solving.
Data and programs. Simon suggested a general definition of representa-
tion as "data structures and programs operating on them," and used it in the
analysis of reasoning with pictorial representations [Larkin and Simon, 1987].
When describing the behavior of human solvers, he viewed their initial encod-
ing of a given problem as a "data structure," and the available productions
for modifying it as "programs." Since the problem encoding and rules for
changing it determined the subject's search space, this view was similar to
the earlier definition by Newell and Simon [1972].
If we apply Simon's definition in other research contexts, the notions
of data structures and programs may take different meanings. The general
concept of "data structures" encompasses any form of a system's input and
internal representation of related information. Similarly, the term "programs"
may refer to any strategies and procedures for processing a given problem.
In particular, when considering an AI architecture with several search
engines, we may view the available engines as "programs" and the informa-
tion passed among them as "data structures." We will use this approach to
formalize representation changes in the PRODIGY system.
System's behavior. Holte [1988] developed a framework for the analysis
and comparison of learning systems, which included rigorous mathematical
definitions of task domains and their representations. He considered repre-
sentations of domains rather than specific problems, which distinguished his
view from the earlier definitions.
A domain in Holte's framework includes a set of elementary entities, a col-
lection of primitive functions that describe the relationships among entities,
and legal compositions of primitive functions. For example, we may view the
world states as elementary objects and transitions between them as primi-
tive functions. A domain specification may include not only a description of
reasoning tasks, but also a behavior of an AI system on these tasks.
A representation is a mapping between two domains that encode the same
reasoning tasks. This mapping may describe a system's behavior on two dif-
ferent encodings of a problem. Alternatively, it may show the correspondence
between the behavior of two different systems that perform the same task.

1.1.3 Representations in the SHAPER system

The previous definitions of representation have been aimed at the analysis of


its role in problem solving, but researchers have not applied theoretical results
to automation of representation changes. Korf utilized his formal model in the
development of a general-purpose system for improving representations, but
1.1 Representations in problem solving 9

A problem solver is an algorithm that performs some type of reasoning task.


When we invoke this algorithm, it inputs a given problem and searches for a solu-
tion; it may solve the problem or report a failure.
A problem description is an input to a problem solver. In most systems, it in-
cludes allowed operations, available objects, the initial state of the world, a logical
statement describing the goals, and possibly some heuristics for guiding the search.
A domain description is the part of the problem description that is common for
a certain class of problems. It usually does not include specific objects, an initial
state, or a goal.
A representation is a domain description with a problem solver that uses this de-
scription. A representation change may involve improving a description, selecting
a new solver, or both.
A description changer is an algorithm for improving domain descriptions. When
we invoke the changer, it inputs a given domain and modifies its description.
A system for changing representations is an AI system that automatically
improves domain descriptions and matches them with appropriate problem solvers.

Fig. 1.2. Definitions of the main objects in the SHAPER system. These notions
underlie our formal model of automatic representation changes.

he encountered several practical shortcomings of the model, which prevented


complete automation of search for appropriate representations.
Since the main purpose of our work is the construction of a fully auto-
mated system, we develop a different formal model, which facilitates the work
on SHAPER. We follow Simon's view of representation as "data structures and
programs operating on them;" however, the notion of data structures and
programs in SHAPER differs from their definition in the research on human
problem solving [Newell and Simon, 1972]. We summarize our terminology in
Figure 1.2.
The SHAPER system uses PRODIGY search algorithms, which play the
role of "programs" in Simon's definition. We illustrate the functions of a
solver algorithm in Figure 1.3(a): given a problem, the algorithm searches
for its solution and either finds some solution or terminates with a failure.
In Chapter 6, we will discuss two types of failures: exhausting the available
search space and reaching a time limit.
A problem description is an input to the solver algorithm, which encodes
a certain reasoning task. This notion is analogous to Amarel's "problem for-
mulation," which is a part of his definition of representation. The solver's
input must satisfy certain syntactic and semantic rules, which form the input
language of the algorithm.
When the initial description of a problem does not obey these rules, we
have to translate it into the input language before applying the solver [Paige
and Simon, 1966; Hayes and Simon, 1974]. If a description satisfies the lan-
guage rules but causes a long search, we may modify it for efficiency reasons.
10 1. Motivation

description of problem solution


the problem solver or failure

(a) Use of a problem-solving algorithm.

initial description description new description ~ problem


of the problem changer of the problem solver

(b) Changing the problem description before application of a problem solver.

initial description description new description problem


of the domain changer of the domain solver
problem
instance

(c) Changing the domain description.


Fig. 1.3. Description changes in problem solving. A changer algorithm generates
a new description, and then a solver algorithm uses it to search for a solution.

A description-changing algorithm is a procedure for converting the initial


problem description into an input to a problem solver, as illustrated in Fig-
ure 1.3(b). The conversion may serve two goals: (1) translating the problem
into the input language and (2) improving performance of the solver.
The SHAPER system performs only the second type of description changes.
In Figure 1.4, we show the three main categories of these changes: decompos-
ing the initial problem into smaller subproblems, enhancing the description
by adding relevant information, and replacing the original problem encoding
with a more appropriate encoding.
Note that there are no clear-cut boundaries between these categories.
For example, suppose that we apply some abstraction procedure (see Sec-
tion 1.2) to determine the relative importance of different problem features
and then use important features in constructing an outline of a solution. We
may view it as enhancement of the initial description with the estimates of
importance. Alternatively, we may classify abstraction as decomposition of
the original problem into two subproblems: constructing a solution outline
and then turning it into a complete solution.
A problem description in PRODIGY consists of two main parts, a domain
description and a problem instance. The first part includes the properties of
a simulated world, which is called the problem domain. For example, if we
apply PRODIGY to solve the Tower-of-Hanoi puzzle (see Section 1.2.1), the
domain description specifies the legal moves in this puzzle. The second part
1.1 Representations in problem solving 11

description
changer

(a) Decomposing a problem into subproblems. We may simplify a reasoning


task by breaking it into smaller subtasks. For example, a driver may subdivide the
search for an unfamiliar place into two stages: getting to the appropriate highway
exit and finding her way from the exit. In Section 1.2.2, we will give a technical
example of problem decomposition based on an abstraction hierarchy.

,- - iIriti"ai- "
" description )
description
changer ==and:=
: deduced "
, information '
-------'"

(b) Enhancing a problem description. If some important information is not ex-


plicit in the initial description, we may deduce it and add to the description. If the
addition of new information affects the problem-solving behavior, we view it as a
description change. For instance, a mathematician may enhance a geometry sketch
by an auxiliary construction. As another example, we may improve performance of
PRODIGY by adding control rules.

description
changer

(c) Replacing a problem description. If the initial description contains un-


necessary data, then improvements may include not only additional relevant in-
formation, but also the deletion of irrelevant data. For example, a mathematician
may simplify her sketch by erasing some lines. In Section 5.2, we will describe a
technique for detecting irrelevant features of PRODIGY problems.

Fig. 1.4. Main categories of description changes. We may (a) subdivide a problem
into smaller tasks, (b) extend the initial description with additional knowledge,
and (c) replace a given problem encoding with a more effective encoding.

encodes a particular reasoning task, which includes an initial state of the


simulated world and a goal specification. For example, a problem instance in
the Tower-of-Hanoi Domain consists of the initial positions of all disks and
their desired final positions.
The PRODIGY system first parses the domain description and converts
it into internal structures that encode the simulated world. Then, PRODIGY
uses this internal encoding in processing specific problem instances. Observe
that the input description determines the system's internal encoding of the
12 1. Motivation

domain. Thus, the role of domain descriptions in our model is similar to that
of "data structures" in the general definition.
When using a description changer, we usually apply it to the domain
encoding and utilize the resulting new encoding for solving multiple problem
instances (Figure 1.3c). This strategy allows us to amortize the running time
of the changer algorithm over several problems.
A representation in SHAPER consists of a domain description and a
problem-solving algorithm that operates on this description. Observe that,
if the algorithm does not make random choices, the representation uniquely
defines the search space for every problem instance. This observation relates
our definition to Newell and Simon's view of representation as a search space.
We use this definition in the work on a representation-changing system,
which automates the two main tasks involved in improving representations.
First, it analyzes and modifies the initial domain description with the purpose
of improving the search efficiency. Second, it selects an appropriate solver
algorithm for the modified domain description.

1.1.4 The role of representation

Researchers have used several different frameworks for defining the concept
of representation. Despite these differences, most investigators have reached
consensus on their main qualitative conclusions:
• The choice of a representation affects the complexity of a given problem;
both human subjects and AI systems are sensitive to changes in the prob-
lem representation.
• Finding the right approach to a given problem is often a difficult task, which
may require a heuristic search in a space of alternative representations.
• Human experts employ advanced techniques for construction and evalu-
ation of new representations, whereas amateurs often try to utilize the
original problem description.
Alternative representations differ in explicit information about properties
of the problem domain. Every representation hides some features of the do-
main and highlights other features [Newell, 1965; Van Baalen, 1989; Peterson,
1994]. For example, when a mathematician describes a geometric object by
a set of equations, she hides visual features of the object and highlights some
of its analytical properties.
Explicit representation of important information enhances the perfor-
mance. For instance, if a student of mathematics cannot solve a problem,
the teacher may help her by pointing out the relevant features of the task
[Polya, 1957]. As another example, we may improve efficiency of an AI sys-
tem by encoding useful information in control rules [Minton, 1988], macro
operators [Fikes et at., 1972], or an abstraction hierarchy [Sacerdoti, 1974].
On the other hand, the explicit representation of irrelevant data may
have a negative effect. In particular, when a mathematician tries to utilize
1.1 Representations in problem solving 13

some seemingly relevant properties of a problem, she may attempt a wrong


approach. If we provide irrelevant information to an AI system and do not
mark this information as unimportant, then the system attempts to use it,
which takes extra computation and often leads to exploring useless branches
of the search space. For example, if we allow the use of unnecessary extra
operations, the branching factor of search increases, resulting in a larger
search time [Stone and Veloso, 1994).
Since problem-solving algorithms differ in their use of available informa-
tion, they perform efficiently with different domain descriptions. Moreover,
the utility of explicit knowledge about the domain may depend on a specific
problem instance. We usually cannot find a "universal" description, which
works well for all solver algorithms and problem instances. The task of con-
structing good descriptions has traditionally been left to the user.
The relative performance of solver algorithms also depends on specific
problems. Most analytical and experimental studies have shown that different
search techniques are effective for different classes of problems, and no solver
algorithm can consistently outperform all its competitors [Minton et al., 1994;
Stone et al., 1994; Knoblock and Yang, 1994; Knoblock and Yang, 1995;
Smirnov, 1997). To ensure efficiency, the user has to make an appropriate
selection among the available algorithms.
To address the representation problem, researchers have designed a num-
ber of algorithms that deduce hidden properties of a given domain and im-
prove the domain description. For example, they constructed systems for
learning control rules [Mitchell et al., 1983; Minton, 1988], replacing opera-
tors with macros [Fikes et al., 1972; Korf, 1985a), abstracting unimportant
features of a domain [Sacerdoti, 1974; Knoblock, 1993), and reusing past
problem-solving episodes [Hall, 1989; Veloso, 1994).
These algorithms are themselves sensitive to changes in problem encoding,
and their ability to learn useful information depends on the initial descrip-
tion. For instance, most systems for learning control rules require a certain
generality of predicates in the domain encoding, and they become ineffective
if predicates are either too specific or too general [Etzioni and Minton, 1992;
Veloso and Borrajo, 1994).
As another example, abstraction algorithms are very sensitive to the de-
scription of available operators [Knoblock, 1993). If the operator encoding is
too general, or the domain includes unnecessary operations, these algorithms
fail to construct an abstraction hierarchy. In Section 1.2, we will illustrate
such failures and discuss related description improvements.
To ensure the effectiveness of learning algorithms, the user has to decide
which algorithms are appropriate for a given domain. She may also need to
adjust the domain description for the selected algorithms. An important next
step in AI research is to develop a system that automatically accomplishes
these tasks.
14 1. Motivation

1.2 Examples of representation changes

All AI systems are sensitive to the description of input problems. If we use


an inappropriate domain encoding, then even simple problems may become
difficult or unsolvable. Researchers have noticed that novices often construct
ineffective domain descriptions because intuitively appealing encodings are
often inappropriate for AI problem solving.
On the other hand, expert users prove proficient in finding good descrip-
tions; however, the construction of a proper domain encoding is often a dif-
ficult task, which requires creativity and experimentation. The user usually
begins with a semi-effective description and tunes it, based on the results of
problem solving. If the user does not provide a good domain encoding, then
automatic improvements are essential for efficiency.
To illustrate the need for description changes, we present a puzzle domain,
whose standard encoding is inappropriate for PRODIGY (Section 1.2.1). We
then show modifications to the initial encoding that drastically improve effi-
ciency (Sections 1.2.1-1.2.4), and discuss the choice of an appropriate problem
solver (Section 1.2.5). The SHAPER system is able to perform these improve-
ments automatically.

1.2.1 Tower-of-Hanoi Domain

We consider the Tower-of-Hanoi puzzle, shown in Figure 1.5, which has


proved difficult for most problem-solving algorithms, as well as for human
subjects. It once served as a standard test for AI systems, but it gradually
acquired the negative connotation of a "toy" domain. We utilize this puzzle
to illustrate basic description changes in SHAPER; however, we will use larger
domains for the empirical evaluation of the system.
The puzzle consists of three vertical pegs and several disks of different
sizes. Every disk has a hole in the middle, and several disks may be stacked
on a peg (Figure 1.5a). The rules allow moving disks from peg to peg, one disk
at a time; however, the rules do not allow placing any disk above a smaller
one. In Figure 1.7, we show the state space of the three-disk puzzle.
When using a classical AI system, we have to specify predicates for encod-
ing of world states. If the Tower-of-Hanoi puzzle has three disks, we may de-
scribe its states with three predicates, which denote the positions of the disks:
(small-on <peg», (medium-on <peg», and (large-on <peg», where <peg> is a
variable that denotes an arbitrary peg. We obtain literals describing a specific
state by substituting the appropriate constants in place of variables. For in-
stance, the literal (small-on peg-I) means that the small disk is on the first peg.
The legal moves are encoded by production rules for modifying the world
state, which are called operators. The description of an operator consists
of precondition predicates, which must hold before its execution, and effects,
which specify the predicates added to the world state or deleted from the state
upon the execution. In Figure 1.5(b), we give an encoding of all allowed moves
1.2 Examples of representation changes 15

(a) Tower-of-Hanoi puzzle.

<from> <to> <from> <to> <from> <to>

AI -LA I
rHH
$11
move-small( <from>, <to» move-medium(<from>, <to» move-large( <from>, <to»
Pre: (small-on <from» Pre: (medium-on <from» Pre: (large-on <from»
E.ff: del (small-on <from» not (small-on <from» not (small-on <from»
add (small-on <to» not (small-on <to» not (medium-on <from»
Eff: del (medium-on <from» not (small-on <to»
add (medium-on <to» not (medium-on <to»
Eff: del (large-on <from»
add (large-on <to»

(b) Encoding of operations in the three-disk puzzle.

peg-l peg-2 peg-3 peg-l peg-2 peg-3

Initial State New State


(small-on peg-I) (small-on peg-2)
(medium-on peg-I) I--
move-small
(peg-I, peg-2) - (medium-on peg-I)
(large-on peg -I) (large-on peg-I)

(c) Example of executing an instantiated operator.

Fig. 1.5. Tower-of-Hanoi Domain and its encoding in PRODIGY. The player may
move disks from peg to peg, one at a time, without ever placing a disk on top of a
smaller one. The traditional task is to move all disks from the left-hand peg to the
right-hand (Figure 1.6).
16 1. Motivation

in the three-disk puzzle. This encoding is based on the PRODIGY domain


language, described in Sections 2.2 and 2.3; however, we slightly deviate from
the exact PRODIGY syntax in order to improve readability.
The <from> and <to> variables in the operator description denote arbi-
trary pegs. When a problem-solving algorithm uses an operator, it instanti-
ates the variables with specific constants. For example, if the algorithm needs
to move the small disk from peg-I to peg-2, then it can execute the operator
move-small(peg-I,peg-2) (Figure 1.5c). The precondition of this operator is
(small-on peg-I); that is, the small disk must initially be on peg-I. The exe-
cution results in deleting (small-on peg-I) from the current state and adding
(small-on peg-2).
In Figure 1.6, we show the encoding of a classic problem in the Tower-of-
Hanoi Domain, which requires moving all three disks from peg-I to peg-3, and
give the shortest solution to this problem. The initial state of the problem
corresponds to the left corner of the state-space triangle in Figure 1.7, whereas
the goal state is the right corner.

1.2.2 Constructing an abstraction hierarchy

Most AI systems solve a given problem by exploring the space of partial solu-
tions rather than expanding the problem's state space. That is, the nodes in
their search space represent incomplete solutions, which may not correspond
to paths in the state space.
This strategy allows efficient reasoning in large-scale domains, which have
intractable state spaces; however, it causes a major inefficiency in the Tower-
of-Hanoi Domain. For example, if we apply PRODIGY to the problem in Fig-
ure 1.6, the system considers more than one hundred thousand partial solu-
tion, and the search takes ten minutes on a Sun Sparc 5 computer.
We may significantly improve the performance by using an abstraction
hierarchy [Sacerdoti, 1974], which enables the system to subdivide problems
into simpler subproblems. To construct a hierarchy, we assign different levels
of importance to predicates in the domain encoding. In Figure 1.8(a), we give
the standard hierarchy for the three-disk Tower of Hanoi.
The system first constructs an abstract solution at level 2 of the hier-
archy, ignoring the positions of the small and medium disks. We show the
state space of the abstracted puzzle in Figure 1.8(b) and its solution in Fig-
ure 1.8(c). Then, PRODIGY steps down to the next lower level and inserts
operators for moving the medium disk. At this level, the system cannot add
new move-large operators, which limits its state space. We give the level-1
space in Figure 1.8(d) and the corresponding solution in Figure 1.8(e). Fi-
nally, the system shifts to the lowest level and inserts move-small operators,
thus constructing the complete solution (Figures 1.8f and 1.8g).
In Table 1.1, we give the running times for solving six Tower-of-Hanoi
problems, without and with abstraction. We have obtained these results using
1.2 Examples of representation changes 17

peg-l peg-2 peg-3 peg-l peg-2 peg-3

A
Initial State Goal State
(small-on peg-I) (small-on peg-3)
(medium-on peg-I) (medium-on peg-3)
(large-on peg-I) (large-on peg-3)

(a) Encoding of the problem.

1:. I I .:b. I 1 1.11 1.1 I I 11. 111. 1 I.:b. I I 1:.


move- move- move- move- move- move- move-
small medium small large small medium small
(peg-I, (peg-I, (peg-3, (peg-I, (peg-2, (peg-2, (peg-I,
peg-3) peg-2) peg-2) peg-3) peg-I) peg-3) peg-3)

(b) Shortest solution.

Fig. 1.6. Example of a problem in the Tower-of-Hanoi Domain. We need to move


all three disks from peg-1 to peg-3. The shortest solution contains seven steps.

Fig. 1. 7. State space of the three-disk Tower of Hanoi. We illustrate all possible
configurations of the puzzle (circles) and legal transitions between them (arrows).
The initial state of the problem in Figure 1.6 is the left corner of the triangle, and
the goal is the right corner.
18 1. Motivation

more
(large-on <peg» level 2 important
(medium-on <peg»

(small-on <peg»
level I

level 0
t
less
important

(a) Abstraction hierarchy of predicates.

Il.l

/\
Initial State Goal State

1.1 I @E ... I 11.


initial state goal state
(c) Solution at level 2.
(b) State space at level 2.

1.11 111.

/\ /\
move- move- move-
medium large medium
(peg-I, (peg-I, (peg-2,
peg-2) peg-3) peg-3)
tID E "O).EE>----'l..ooiO E ...

.:b. I I 1.1 I I 11. I I.:b. (e) Solution at level 1.


initial state goal state
(d) State space at level 1.

.:b.l I J..l I I lJ.. I 1.:1:.

!\)oe--E~.A>--E~.A)E--E~.A
.:i:. I I .:b. I 1 1.11 1.'=' I 1'='1. ill. 1 1.:1:. I I.:i:.
initial state goal state
(f) State space at level O.

.:i:. I I .:b. I 1 1.11 1.'=' I 1'='1. ill. 1 I.:b. I I.:i:.


move- move- move- move- move- move- move-
small medium small large small medium small
(peg-I, (peg-I, (peg-3, (peg-I, (peg-2, (peg-2, (peg-I,
peg-3) peg-2) peg-2) peg-3) peg-I) peg-3) peg-3)

(g) Solution at level O.

Fig. 1.8. Abstraction problem solving in the Tower-of-Hanoi Domain with a three-
level hierarchy (a). First, the system disregards the small and medium disks, thus
solving the simplified one-disk puzzle (b, c). Then, it inserts the missing movements
of the medium disk (d, e). Finally, it steps down to the lowest level of abstraction
and adds move-small operators (r, g).
1.2 Examples of representation changes 19

Table 1.1. PRODIGY performance on six Tower-of-Hanoi problems. We give running


times in seconds for problem solving without and with the abstraction hierarchy.
number of a problem mean
1 2 3 4 5 6 time
without abstraction 2.0 34.1 275.4 346.3 522.4 597.4 296.3
using abstraction 0.5 0.4 1.9 0.3 0.5 2.3 1.0

Table 1.2. PRODIGY performance in the Extended Tower-of-Hanoi Domain, which


allows two-disk moves. We give running times in seconds for three different domain
descriptions: without primary effects, with primary effects, and using primary ef-
fects along with abstraction. The results show that primary effects not only reduce
the search, but also allow the construction of an effective abstraction hierarchy.
number of a problem mean
1 2 3 4 5 6 time
without prim. effects 85.3 1.1 505.0 >1800.0 31.0 172.4 >432.4
using prim. effects 0.5 1.2 16.6 144.8 77.5 362.5 100.5
prims and abstraction 0.5 0.3 2.5 0.2 0.2 3.1 1.1

a Lisp implementation of PRODIGY on a Sun Sparc 5 machine; they show that


abstraction drastically reduces search.
Knoblock [1993] has investigated abstraction problem solving in PRODIGY
and developed the ALPINE system, which automatically assigns importance
levels to predicates. We have extended Knoblock's technique and imple-
mented the Abstractor algorithm, which serves as one of the description
changers in SHAPER.

1.2.3 Selecting primary effects

Suppose that we deviate from the standard rules of the Tower-of-Hanoi puzzle
and allow moving two disks together (Figure 1.9a). The new operators enable
us to construct shorter solutions for most problems. For example, we can move
all three disks from peg-1 to peg-3 in three steps (Figure 1.9b).
This change in the rules simplifies the puzzle for humans, but it makes
most problems harder for PRODIGY. The extra operations lead to a higher
branching factor, thus increasing the search space. Moreover, A bstractor fails
to generate a hierarchy for the domain with two-disk moves. In the first row
of Table 1.2, we give the results of using the extended set of operators to solve
the six sample problems. For every problem, we set a 1800-second limit, and
the system ran out of time on problem 4.
To reduce the branching factor, we may select primary effects of some
operators and force the system to use these operators only for achieving
their primary' effects. For example, we may indicate that the main effect of
move-sml-mdm is the new position of the medium disk. That is, if the
system's only goal is moving the small disk, then it must not consider this
operator. Note that an inappropriate choice of primary effects may compro-
20 1. Motivation

<from> <to> <from> <to> <from> <to>

A I ~J-, I Arb
m# rH# m#
I A I .l,~ rbA
move-sml-mdm move-sml-Irg move-mdm-Irg
(<from>, <to» (<from>, <to» (<from>, <to»
Pre: (small-on <from» Pre: (small-on <from» Pre: (medium-on <from»
(medium-on <from» (large-on <from» (large-on <from»
Elf: del (small-on <from» not (medium-on <from» not (small-on <from»
add (small-on <to» not (medium-on <to» not (small-on <to»
del (medium-on <from» Elf: del (small-on <from» Elf: del (medium-on <from»
add (medium-on <to» add (small-on <to» add (medium-on <to»
del (large-on <from» del (large-on <from»
add (large-on <to» add (large-on <to»

(a) Encoding of two-disk moves .

.i.1 1 .:6.1 1 1 1.:6. 1 I.i.

l
move-small move-mdm-lrg move-small
f-- I--
(peg-l,peg-2) (peg-l,peg-3) (peg-2,peg-3)

(b) Solution to the example problem.

operators primary effects


move-sml-mdm del.(medium-on <from»
«from>,<to> ) add (medium-on <to»
move-sml-lrg del. (large-on <from»
«from>,<to> ) add (large-on <to»
move-mdm-Irg del (large-on <from»
«from>,<to> ) add (large-on <to»

(c) Selection of primary effects.

Fig. 1.9. Extension to the Tower-of-Hanoi Domain, which includes operators for
moving two disks at a time, and primary effects of these additional operators.
1.2 Examples of representation changes 21

mise completeness, that is, make some problems unsolvable. We will describe
techniques for ensuring completeness in Section 3.2.
In Figure 1.9(c), we list the primary effects ofthe two-disk moves. The use
of the selected primary effects improves the system's performance on most
sample problems (see the middle row of Table 1.2). Furthermore, it enables
Abstractor to build the three-level hierarchy, which reduces search by two
orders of magnitude (see the last row).
The SHAPER system includes an algorithm for selecting primary effects,
called Margie 1 , which automatically performs this description change. The
Margie algorithm is integrated with Abstractor: it chooses primary effects
with the purpose of improving abstraction.

1.2.4 Partially instantiating operators

The main drawback of the Abstractor algorithm is its sensitivity to syntactic


features of a domain encoding. In particular, if predicates in the domain
description are too general, Abstractor may fail to construct a hierarchy.
For instance, suppose that the user replaces the predicates (small-on
<peg», (medium-on <peg», and (large-on <peg» with a more general pred-
icate (on <disk> <peg», which allows greater flexibility in the encoding of
operators and problems (Figure 1.lOa). In particular, it enables the user to
utilize universal quantifications (Figure 1.1Db).
Since the resulting description contains only one predicate, the abstrac-
tion algorithm cannot generate a hierarchy. To remedy this problem, we can
construct partial instantiations of (on <disk> <peg» and apply Abstractor to
build a hierarchy of these instantiations (Figure 1.IDc). We have implemented
a description-changing algorithm, called Refiner, that generates partially in-
stantiated predicates for improving the effectiveness of Abstractor.

1.2.5 Choosing a problem solver

The efficiency of search depends not only on the domain description, but also
on the choice of a solver algorithm. To illustrate the importance of a solver,
we consider the application of two different search strategies, called SAVTA
and SABA, to problems in the Extended Tower-of-Hanoi Domain.
Veloso and Stone [1995] developed these strategies for guiding PRODIGY
search (see Section 2.2.5). Experiments have shown that the relative perfor-
mance of SAVTA and SABA varies across domains, and the choice between
them may be essential for efficient problem solving. We consider two modes
1 The Margie procedure (pronounced miir'ge) is named after my friend, Margie
Roxborough. Margie and her husband John invited me to stay at their place
during the Ninth CSCSI Conference in Vancouver, where I gave a talk on related
results. This algorithm is not related to the MARGIE system (miir'je) for parsing
and paraphrasing simple stories in English, implemented by Schank et al. [1975].
22 1. Motivation

(on <disk> <peg»


(small-on <peg» (on small <peg»
(medium-on <peg» (on medium <peg»
(large-on <peg» (on large <peg»

(a) Replacing the predicates with a more general one.


<from> <to> <from> <to>

peg-! peg-2 peg-3

move-large«from>, <to»
Pre: (on large <from»
(forall <disk> other than large Goal State
not (on <disk> <from» (forall <disk>
not (on <disk> <to») (on <disk> peg-3»
Eff: del (on large <from»
add (on large <to»

(b) Examples of using the general predicate to encode operators and problems.

(on large <peg» level 2

levell

(on small <peg» level 0

(c) Hierarchy of partially instantiated predicates.

Fig. 1.10. General predicate (on <disk> <peg» in the encoding of the Tower of
Hanoi. This predicate enables the user to utilize quantifications in describing oper-
ators and goals, but it causes a failure of the Abstractor algorithm. The system has
to generate partial instantiations of (on <disk> <peg» before invoking Abstractor.

for using each strategy: with a bound on the search depth and without lim-
iting the depth. A depth bound helps to prevent an extensive exploration of
inappropriate branches in the search space; however, it also results in pruning
some solutions from the search space. Moreover, if a bound is too tight, it may
lead to pruning all solutions, thus compromising the system's completeness.
In Table 1.3, we give the results of applying SAVTA and SABA to six Tower-
of-Hanoi problems. These results suggest that SAVTA without a time limit
is more effective than the other three techniques; however, the evidence is
not statistically significant. In Section 7.4, we will describe a method for
estimating the probability that a selected problem-solving technique is the
best among the available techniques. We can apply this method to determine
the chances that SAVTA without a time limit is indeed the most effective
among the four techniques; it gives the probability estimate of 0.47.
1.3 Related work 23

Table 1.3. Performance of SAVTA and SABA in the Extended Tower-of-Hanoi Do-
main with primary effects and abstraction. We give running times in seconds for
search with and without a depth bound. The data suggest that SAVTA without a
time limit is the most efficient among the four search techniques, but this conclusion
is not statistically significant.
number of a problem mean
1 2 3 4 5 6 time
SAVTA
with depth bound 0.51 0.27 2.47 0.23 0.21 3.07 1.13
w / 0 depth bound 0.37 0.28 0.61 0.22 0.21 0.39 0.35
SABA
with depth bound 5.37 0.26 >1800.00 2.75 0.19 >1800.00 >601.43
w / 0 depth bound 0.45 0.31 1.22 0.34 0.23 0.51 0.51

If we apply SHAPER to many Tower-of-Hanoi problems, it can accumulate


more data on the performance of the candidate strategies before adopting one
of them. The system's control module combines exploitation of the past per-
formance information with collecting additional data. First, SHAPER applies
heuristics for rejecting inappropriate search techniques; then, the system ex-
periments with promising search algorithms until it accumulates enough data
for identifying the most effective algorithm.
Note that we have not accounted for solution quality in evaluating
PRODIGY performance in the Tower-of-Hanoi Domain. If the user is inter-
ested in near-optimal solutions, then SHAPER has to analyze the trade-off
between running time and solution quality, which may result in selecting dif-
ferent domain descriptions and search strategies. For instance, a depth bound
reduces the length of generated solutions, which may be a fair payment for the
increase in search time. As another example, search without an abstraction
hierarchy usually yields better solutions than abstraction problem solving. In
Chapter 7, we will describe a statistical technique for evaluating trade-offs
between speed and quality.

1.3 Related work

We review past results on representation changes, including psychological ex-


periments (Section 1.3.1), AI techniques for reasoning with multiple represen-
tations (Sections 1.3.2 and 1.3.3), and theoretical frameworks (Section 1.3.4).
We will later describe some other directions of past research, related to
specific aspects of the reported work. In particular, we will give the history
of the PRODIGY architecture in Section 2.1.1, outline the previous research
on abstraction problem solving in Section 4.1.1, and review the work on
automatic evaluation of problem solvers in Section 7.1.1.
24 1. Motivation

1.3.1 Psychological evidence

The choice of an appropriate representation is one of the main themes of


Polya's famous book How to Solve It. Polya showed that the selection of an
effective approach to a problem is a crucial skill for a student of mathemat-
ics. Gestalt psychologists also paid particular attention to reformulation of
problems [Duncker, 1945; Ohlsson, 1984).
Explorations in cognitive science have yielded much evidence that con-
firms Polya's pioneering insight. Researchers have shown that description
changes affect the problem difficulty and that the performance of human ex-
perts depends on their ability to find a representation that fits a given task
[Gentner and Stevens, 1983; Simon, 1989).
Newell and Simon [1972) studied the role of representation during their
investigation of human problem solving. They observed that human subjects
always construct some representation of a given problem: "Initially, when a
problem is first presented, it must be recognized and understood. Then, a
problem space must be constructed or, if one already exists in LTM, it must
be evoked. Problem spaces can be changed or modified during the course of
problem solving" [Newell and Simon, 1972).
Simon [1979; 1989) continued the investigation of representations in hu-
man problem solving and studied their role in a variety of cognitive tasks. In
particular, he tested the utility of different mental models in the Tower-of-
Hanoi puzzle. Simon [1975) noticed that most subjects gradually improved
their mental representations in the process of solving the puzzle.
Hayes and Simon [1974; 1976; 1977) investigated the effect of isomorphic
changes in task description. Specifically, they analyzed difficult isomorphs of
the Tower of Hanoi [Simon et al., 1985) and found out that "changing the
written problem instructions, without disturbing the isomorphism between
problem forms, can affect by a factor of two the times required by subjects
to solve a problem" [Simon, 1979).
Larkin and Simon [1981; 1987) explored the role of multiple mental models
in solving physics and math problems, with a particular emphasis on pictorial
representations. They observed that mental models of skilled scientists differ
from those of novices, and expertly constructed models are crucial for solving
scientific problems. Qin and Simon [1992) came to a similar conclusion during
their experiments on the role of imagery in understanding special relativity.
Kook and Novak [1991) also studied alternative representations in physics.
They implemented the APEX program, which used multiple representations of
physics problems, handled incompletely specified tasks, and performed trans-
formations among several types of representations. Their conclusions about
the significance of appropriate representation agreed with Simon's results.
Kaplan and Simon [1990) explored representation changes in solving the
Mutilated-Checkerboard problem. They noticed that the participants oftheir
experiments first tried to utilize the initial representation and then searched
1.3 Related work 25

for a more effective approach. Kaplan [1989] implemented a production sys-


tem that simulated the representation shift of successful human subjects.
Tabachneck [1992] studied the utility of pictorial reasoning in economics.
She implemented the CaMeRa production system, which modeled human
reasoning with multiple representations [Tabachneck-Schijf et al., 1997].
Boehm-Davis et al. [1989], Novak [1995], and Jones and Schkade [1995]
showed the importance of representations in software development. Their
results confirmed that people are sensitive to changes in problem description
and that improvement of the initial description is often a difficult task.
The reader may find a detailed review of past results in Peterson's [1996]
collection of articles on reasoning with multiple representations. It includes
several different views of representation, as well as evidence on the importance
of representation in physics, mathematics, economics, and other areas.

1.3.2 Automatic representation changes

AI researchers recognized the significance of representation in the very begin-


ning oftheir work on automated reasoning. In particular, Amarel [1961; 1968;
1971] discussed the effects of representation on the behavior of search algo-
rithms, using the Missionaries-and-Cannibals problem to illustrate his main
points. Newell [1965; 1966] showed that the complexity of some games and
puzzles strongly depends on the representation and emphasized that "hard
problems are solved by finding new viewpoints; i.e., new problem spaces"
[Newell, 1966].
Later, Newell with several other researchers implemented the Soar system
[Laird et al., 1987; Tamble et al., 1990; Newell, 1992]' which utilized multiple
descriptions to facilitate search and learning; however, their system did not
generate new representations. The user was still responsible for constructing
domain descriptions and providing guidelines for their effective use.
Larkin et al. [1988] took a similar approach in their work on the FERMI
expert system, which accessed several representations of task-related knowl-
edge and used them in parallel. This system required the user to provide
appropriate representations of the input knowledge. The authors of FERMI
encoded "different kinds of knowledge at different levels of granularity" and
demonstrated that "the principled decomposition of knowledge according to
type and level of specificity yields both power and cross-domain generality"
[Larkin et al., 1988].
Research on automatic change of domain descriptions has mostly been
limited to the design of separate learning algorithms that perform specific
types of improvements. Examples of these special-purpose algorithms in-
clude systems for replacing operators with macros [Korf, 1985; Mooney, 1988;
Cheng and Carbonell, 1986; Shell and Carbonell, 1989], changing the search
space by learning heuristics [Newell et al., 1960; Langley, 1983] and control
rules [Minton et al., 1989; Etzioni, 1993; Veloso and Borrajo, 1994; Perez,
26 1. Motivation

1995], generating abstraction hierarchies [Sacerdoti, 1974; Knoblock, 1993],


and replacing a given problem with a simplified problem [Hibler, 1994].
The authors of these systems have observed that utility of most learning
techniques varies across domains, and their blind application may worsen
efficiency in some domains; however, researchers have not automated selection
among available learning systems and left it as the user's responsibility.

1.3.3 Integrated systems

The Soar architecture includes a variety of general-purpose and specialized


search algorithms, but it does not have a top-level mechanism for selecting
an algorithm that fits a given task. The classical problem-solving systems,
such as SIPE [Wilkins, 1988], PRODIGY [Carbonell et al., 1990; Veloso et al.,
1995], and uCPOP [Penberthy and Weld, 1992; Weld, 1994], have the same
limitation: they allow alternative search strategies, but they do not include
a central mechanism for selecting among them.
Wilkins and Myers [1995; 1998] have addressed this problem and con-
structed the Multiagent Planning Architecture, which supports integration
of multiple planning and scheduling algorithms. The available search algo-
rithms in their architecture are arranged into groups, called planning cells.
Every group has a control procedure, called a cell manager, which analyzes an
input problem and selects an algorithm for solving it. Some cell managers are
able to break a given task into subtasks and distribute them among several
algorithms.
The architecture includes advanced software tools for incorporating di-
verse search algorithms with different domain languages; thus, it allows a
synergetic use of previously implemented AI systems. Wilkins and Myers
have demonstrated the effectiveness of their architecture in constructing cen-
tralized problem-solving systems. In particular, they have developed several
large-scale systems for air campaign planning.
Since the Multiagent Planning Architecture allows the use of diverse algo-
rithms and domain descriptions, it provides an excellent testbed for the study
of search with multiple representations; however, its current capabilities for
automated representation changes are very limited.
First, the system has no general-purpose control mechanisms, and the
user has to design a specialized manager algorithm for every planning cell.
Wilkins and Myers have used fixed selection strategies in the implemented
cell managers and have not augmented them with learning capabilities.
Second, the system has no tools for the inclusion of algorithms that im-
prove domain descriptions, and the user must either hand-code all necessary
descriptions or incorporate a mechanism for changing descriptions into a cell
manager. The authors of the architecture have implemented several special-
ized techniques for decomposing a problem into subproblems, but have not
considered other types of description changes.
1.3 Related work 27

Minton [1993a; 1993b; 1996] has investigated the integration of constraint-


satisfaction programs and designed the MULTI-TAC system, which combines
a number of generic heuristics and search procedures. The system's control
module explores the properties of a given domain, selects appropriate search
strategies, and combines them into an algorithm for this domain.
The main component of MULTI-TAC's control module is an inductive learn-
ing algorithm, which tests the available heuristics on a collection of problems.
It guides a beam search in the space of the allowed combinations of heuris-
tics. The system synthesizes efficient constraint-satisfaction algorithms, which
perform on par with manually configured programs. The major drawback is
significant learning time; when the control module searches for an efficient
algorithm, it tests candidate procedures on hundreds of sample problems.
Yang et al. [1998] have begun development of an architecture for integra-
tion of AI planning techniques. Their architecture, called PLAN++, includes
tools for implementing, modifying, and reusing the main elements of planning
systems. The purpose is to modularize typical planning algorithms, construct
a large library of search modules, and use them as building blocks for new
algorithms.
Since the effectiveness of planning techniques varies across domains, the
authors of PLAN++ intend to design software tools that enable the user to
select appropriate modules and configure them for specific domains. The au-
tomation of these tasks is a major open problem, which is closely related to
work on representation improvements.

1.3.4 Theoretical results

Researchers have developed theoretical frameworks for some special cases


of description changes, including abstraction, replacement of operators with
macros, and learning of control rules; however, they have done little study of
the common principles underlying different changer algorithms. The results in
developing a formal model of representation are also limited, and the general
notion of useful representation changes has remained at an informal level.
Most theoretical models are based on the analysis of a search space. When
investigating some description change, researchers usually identify its effects
on the search space and estimate the resulting reduction in the search time.
In particular, Korf [1985a; 1985b; 1987] investigated the effects of macro
operators on the state space and demonstrated that well-chosen macros may
exponentially reduce the search time. Etzioni [1992] analyzed the search space
of backward-chaining algorithms and showed that an inappropriate choice
of macro operators or control rules may worsen the performance. He then
derived a condition under which macros reduce the search and compared the
utility of macro operators with that of control rules.
Cohen [1992] developed a mathematical framework for the analysis of
macro-operator learning, explanation-based generation of control rules, and
28 1. Motivation

chunking. He analyzed mechanisms for reusing solution paths and described


a series of learning algorithms that provably improve performance.
Knoblock [1993] explored the benefits and limitations of abstraction, iden-
tified conditions that ensure search reduction, and used them in developing
an algorithm for automatic abstraction. Bacchus and Yang [1992; 1994] re-
moved some of the assumptions underlying Knoblock's analysis and presented
a more general evaluation of abstraction search. They studied the effects of
backtracking across abstraction levels, demonstrated that it may impair effi-
ciency, and described a technique for avoiding it.
Giunchiglia and Walsh [1992] proposed a general model of reasoning with
abstraction, which captured most of the previous frameworks. They defined
an abstraction as a mapping between axiomatic formal systems, studied the
properties of this mapping, and classified the main abstraction techniques.
A generalized model for improving representation was suggested by
Korf [1980], who formalized representation changes based on the notions of
isomorphism and homomorphism of state spaces (see Section 1.1.3). Korf
utilized the resulting formalism in his work on automatic representation im-
provements; however, his model did not address "a method for evaluating
the efficiency of a representation relative to a particular problem solver and
heuristics to guide the search for an efficient representation" [Korf, 1980].

1.4 Overview of the approach

The review of previous work has shown that the results are still limited. The
main open problems are (1) design of AI systems capable of performing a
wide range of representation changes and (2) development of a unified theory
of reasoning with multiple representations.
The work on SHAPER is a step toward addressing these two problems. We
have developed a framework for evaluating representations and formalized
the task of finding an appropriate representation in the space of alternative
domain descriptions and solver algorithms. We have applied this framework
to design a system that automatically performs several types of representation
improvements.
The system includes a collection of problem-solving and description-
changing algorithms, as well as a control module that selects and invokes
appropriate algorithms (Figure 1.11). The most important result is the con-
trol mechanism for intelligent selection among available algorithms and rep-
resentations. We use it to combine multiple learning and search algorithms
into an integrated AI system.
We now explain the main architectural decisions that underlie SHAPER
(Section 1.4.1), outline our approach to development of changer algorithms
(Section 1.4.2), and briefly describe search in a space of representations (Sec-
tion 1.4.3).
1.4 Overview of the approach 29

Top-level
control

Fig. 1.11. Integration of solver and changer algorithms. SHAPER includes a control
module that analyzes a given problem and selects appropriate algorithms.

1.4.1 Architecture of the system

According to our definition, a system for changing representations has to


perform two main functions: (1) improvement of a problem description and
(2) selection of an algorithm for solving the problem. The key architectural de-
cision underlying SHAPER is the distribution of the first task among multiple
changer algorithms. For instance, the description improvement in Section 1.2
has involved three algorithms: Refiner, Margie, and Abstractor.
The central use of separate algorithms differentiates SHAPER from Korf's
mechanism for improving representations. It also differs from description-
improving cell managers in the Multiagent Planning Architecture.
Every changer algorithm explores a certain space of modified descriptions
until finding a new description that improves the system's performance. For
example, Margie searches among alternative selections of primary effects,
whereas Refiner explores a space of different partial instantiations.
The control module coordinates the application of description-changing
algorithms. It explores a more general space of domain descriptions, using
changer algorithms as operators for expanding nodes in this space. The sys-
tem thus combines the low-level search by changer algorithms with the cen-
tralized high-level search. This two-level search prevents a combinatorial ex-
plosion in the number of candidate representations, described by Korf (1980).
The other major function of the control module is the selection of problem-
solving algorithms for the constructed domain descriptions. To summarize,
SHAPER consists of three main parts, illustrated in Figure 1.11:

Library of problem solvers. The system uses search algorithms of


the PRODIGY architecture. We have composed the solver library from
several different configurations of PRODIGY's search mechanism.
Library of description changers. We have implemented seven al-
gorithms that compose SHAPER'S library of changers. They include
procedures for selecting primary effects, building abstraction hier-
archies, generating partial and full instantiations of operators, and
identifying relevant features of the domain.
30 1. Motivation

Efficiency. The primary criterion for evaluating representations in the SHAPER


system is the efficiency of problem solving, that is, the average search time.
Near-completeness. A change of representation must not cause a significant vio-
lation of completeness; that is, most solvable problems should remain solvable after
the change. We measure the completeness violation by the percentage of problems
that become unsolvable.
Solution quality. A representation change should not result in significant decline
of solution quality. We usually define the quality as the total cost of operators in a
solution and evaluate the average increase in solution costs.

Fig. 1.12. Main factors that determine the quality of a representation. We have
developed a general utility model that unifies these factors and enables the system
to make trade-off decisions.

Control module. The functions of the control mechanism include


selection of description changers and problem solvers, evaluation of
new representations, and reuse of the previously generated represen-
tations. The control module contains statistical procedures for an-
alyzing past performance, heuristics for choosing algorithms in the
absence of past data, and tools for manual control.

The control mechanism does not rely on specific properties of solver and
changer algorithms, and the user may readily add new algorithms; however,
all solvers and changers must use the PRODIGY domain language and access
relevant data in the central data structures of the PRODIGY architecture.
We evaluate the utility of representations along three dimensions, summa-
rized in Figure 1.12: the efficiency of search, the number of solved problems,
and the quality ofthe resulting solutions [Cohen, 1995]. The work on changer
algorithms involves decisions on trade-offs among these factors. We allow a
moderate loss of completeness and solution quality in order to improve effi-
ciency. In Section 6.3, we will describe a general utility function that unifies
the three evaluation factors. .
When evaluating the utility of a new representation, we have to account
for the overall time for improving a domain description, selecting a problem
solver, and using the resulting representation to solve given problems. The
system is effective only if this overall time is smaller than the time for solving
the problems with the initial description and some fixed solver algorithm.

1.4.2 Specifications of description changers

The current version of SHAPER includes seven description changers. We have


already mentioned three of them: an extended version of the ALPINE abstrac-
tion generator, called Abstractor; the Margie algorithm, which selects primary
effects; and the Refiner procedure, which generates partial instantiations of
operators.
1.4 Overview of the approach 31

Selecting primary effects of operators: Choosing "important" effects and us-


ing operators only for achieving their important effects (Sections 3.4.1 and 3.5).
Generating an abstraction hierarchy: Decomposing the set of predicates in a
domain into subsets, according to their "importance" (Sections 4.1, 4.2, and 5.1).
Generating more specific operators: Replacing an operator with several more
specific operators, which together describe the same actions. We generate specific
operators by instantiating variables in an operator description (Section 3.4.2).
Generating more specific predicates: Replacing a predicate with more specific
predicates, which together describe the same set of literals (Section 4.3).
Identifying relevant features (literals): Determining which features of a do-
main are relevant to the current task and ignoring the other features (Section 5.2).

Fig. 1.13. Types of description changes in the current version of SHAPER.

Every changer algorithm in SHAPER performs a specific type of description


change and serves a certain purpose. For example, Margie selects primary
effects of operators, with the purpose of increasing the number of levels in
the abstraction hierarchy.
When implementing a changer algorithm, we must decide on the type and
purpose of the description changes performed by the algorithm. The choice
of a type determines the space of alternative descriptions explored by the
algorithm, whereas the purpose specification helps to develop techniques for
search in this space. We also need to analyze interactions of the new algorithm
with other changer and solver algorithms. Finally, we have to identify the
parts of the domain description that compose the algorithm's input.
These decisions form a high-level specification of a changer algorithm. We
use specifications to summarize the main properties of description changers,
thus separating them from implementation techniques. These summaries of
main decisions have proved a useful development tool. In Part II, we will give
specifications for all seven description changers.
When making the high-level decisions, we must ensure that they define
a useful class of description changes and lead to an efficient algorithm. We
compose a specification from the following five parts.
Type of description change. When designing a new algorithm, we first
have to decide on the type of description change. For instance, the Abstractor
algorithm improves the domain description by generating an abstraction hi-
erarchy, whereas Margie is based on selecting primary effects of operators. In
Figure 1.13, we summarize the types of description changes used in SHAPER.
Note that this list is only a small sample from the space of description im-
provements. We will discuss some other improvements in Section 5.3.2.
Purpose of description change. Every changer algorithm in SHAPER
serves a specific purpose, such as reducing the branching factor of search,
constructing an abstraction hierarchy with certain properties, or improving
32 1. Motivation

the effectiveness of other description changers. We express this purpose by


a heuristic function for evaluating the quality of a new description. For ex-
ample, we may evaluate the performance of Margie by the number of lev-
els in the resulting hierarchy: the more levels, the better. We also specify
certain constraints that describe necessary properties of newly generated de-
scriptions. For example, Margie must preserve the completeness of problem
solving, which limits its freedom in selecting primary effects.
To summarize, we view the purpose of a description improvement as max-
imizing a specific evaluation function, while satisfying certain constraints.
This specification of the purpose shows exactly in which way we improve
description and helps to evaluate the results of applying the changer. The
effectiveness of our approach depends on the choice of appropriate evaluation
functions, which must correlate with the resulting efficiency improvements.
Use of other algorithms. A description-changing algorithm may use
subroutine calls to some problem solvers or other description changers. For
example, Margie calls the Abstractor algorithm to construct hierarchies for
the chosen primary effects. It repeatedly invokes Abstractor for alternative
selections of primary effects until finding a satisfactory selection.
Required input. We identify the elements of domain description that
must be a part of the changer's input. For example, Margie has to access the
description of all operators in the domain.
Optional input. Finally, we specify the additional information about
the domain that may be used in changing description. If this information is
available, the changer algorithm utilizes it to generate a better description;
otherwise, the algorithm uses some default assumptions. The optional input
may include restrictions on the allowed problem instances, useful knowledge
about domain properties, and advice from the user.
For example, we may specify constraints on the allowed problems as an
input to Margie. Then, Margie passes these constraints to Abstractor, which
utilizes them in building hierarchies (see Section 5.2). As another example, the
user may pre-select some primary effects of operators. Then, Margie preserves
this pre-selection and chooses additional primary effects (see Section 5.1).
We summarize the specification of the Margie algorithm in Figure 1.14; how-
ever, this specification does not account for advanced features of the PRODIGY
domain language. In Section 5.1, we will extend it and describe the imple-
mentation of Margie in the PRODIGY architecture.

1.4.3 Search in the space of representations

When SHAPER inputs a new problem, the control module has to determine
a strategy for solving it, which involves several decisions (Figure 1.15):

1. Can SHAPER solve the problem with the initial domain description? Al-
ternatively, can the system reuse one of the old descriptions?
1.4 Overview of the approach 33

Type of description change: Selecting primary effects of operators.


Purpose of description change: Maximizing the number of levels in Abstractor's
hierarchy, while ensuring completeness of problem solving.
Use of other algorithms: The Abstractor algorithm, which constructs hierarchies for
the selected primary effects.
Required input: Description of all operators in the domain.
Optional input: Restrictions on goal literals; pre-selected primary and side effects.

Fig. 1.14. Simplified specification of the Margie algorithm. We use specifications


to summarize the main properties of description changers, thus abstracting them
from the details of implementation.

construct a new description or use an old one?

construct a description use an available description


'f 'f
which changer to apply? which solver to apply?

apply the selected apply the selected


description changer problem solver

Fig. 1.15. Top-level decisions in SHAPER. The control module applies changer
algorithms to improve the description and then chooses an appropriate solver.

applying applying applying selecting


Refiner Margie Abstractor a solver
,-----, ,------, ,----~ ,-----, ,--------,
SAVTA with
primary effects
and abstraction

Fig. 1.16. Representation changes in the Tower of Hanoi (see Section 1.2). The
system applies three description changers and identifies the most effective solver.

2. If not, what are the necessary improvements to the initial description?


Which changer algorithms can make these improvements?
3. Which of the available search algorithms are efficient for the problem?

These decisions guide the construction of a new representation, which


consists of an improved description and matching solver. For example, if we
encode the Tower-of-Hanoi Domain using the general predicate (on <disk>
<peg» and allow the two-disk moves, the control procedure will apply three
description changers, Refiner, Margie, and Abstractor, and then select an
effective problem solver (Figure 1.16).
34 1. Motivation

SEARCH FOR AN EFFECTIVE


REPRESENTATION

construction of new evaluation of the


representations available representations

heuristic selection of
problem solvers for the
generated descriptions

computationally
expensive computationally
expensive

Fig. 1.17. Main operations for exploring the space of representations. SHAPER
interleaves generation of new representations with testing of their performance.

When SHAPER searches for an effective representation, it performs two


main tasks: generating new representations and evaluating their utility (Fig-
ure 1.17). The first task involves improving the available descriptions and
pairing them with solvers. The system applies changer algorithms, using
them as basic steps for expanding a space of descriptions (Figure 1.18a).
After generating a new description, it chooses solvers for this description. If
the system identifies several matching solvers, it pairs each of them with the
new description, thus constructing several representations (Figure 1.18b).
The system evaluates the available representations in two steps (Fig-
ure 1.17). First, it applies heuristics for estimating their relative utility and
eliminates ineffective representations. Second, the system collects experimen-
tal data on the performance of the remaining representations, and applies
statistical analysis to select the most effective domain description and solver.

1.5 Extended abstract

The main results of the work on SHAPER include development of several


description changers and a general-purpose control module. The presentation
of these results is organized in four parts, as shown in Figure 1.19. Part I
includes the motivation and description of the PRODIGY search. In Part II,
we present algorithms for using primary effects and abstraction. In Part III,
we describe the control mechanism, which explores a space of alternative
representations. In Part IV, we give the results of testing SHAPER. We now
give an overview of the results and summarize the material of every chapter.
1.5 Extended abstract 35

applying solver-l solver-2


changer-l with with
desc-O desc-O

applying
changer-2

solver-3
applying with
changer-3 desc-2

solver-l solver-4
with with
desc-3 desc-3

space of descriptions - - - space of representations

Fig. 1.18. Expanding the representation space. The control module uses changer al-
gorithms to generate new descriptions and combines solvers with these descriptions.

Fig. 1.19. Dependencies among different chapters. The large rectangles show the
four main parts of the presentation, whereas the small rectangles are chapters.
36 1. Motivation

Part I: Introduction

The purpose of the introduction is to explain the problem of representation


changes and give the background results. We have emphasized the impor-
tance of appropriate representations and summarized the goals of the re-
ported work. In Chapter 2, we describe the PRODIGY architecture, which
serves as a testbed for the development and evaluation of SHAPER.
Chapter 1: Motivation. We have explained the concept of representation
and argued the need for an AI system that generates multiple representations.
We have also reviewed previous work on representation changes, identified
the main research problems, and outlined our approach to addressing some
of them. To illustrate the role of representation, we have given an example
of representation changes in the Tower-of-Hanoi puzzle.
Chapter 2: Prodigy search. The PRODIGY system is based on a com-
bination of goal-directed reasoning with simulated execution. Researchers
have implemented a series of se?-rch algorithms that utilize this technique;
however, they have provided few formal results on the common principles
underlying the developed algorithms. We formalize the PRODIGY search and
show how different strategies for controlling search complexity give rise to
different versions of the system. In particular, we demonstrate that PRODIGY
is not complete and discuss advantages and drawbacks of its incompleteness.
We then develop a complete algorithm, which is almost as fast as PRODIGY
and solves a wider range of problems.

Part II: Description changers

We investigate two techniques for reducing the complexity of goal-directed


search: identifying primary effects of operators and generating abstraction
hierarchies. These techniques enable us to develop a collection of efficiency-
improving algorithms, which compose SHAPER'~ library of changers.
Chapter 3: Primary effects. The use of primary effects of operators allows
us to improve search efficiency and solution quality. We formalize this tech-
nique and evaluate its effectiveness. First, we present a criterion for choosing
primary effects, which guarantees efficiency and completeness, and describe
algorithms for automatic selection of primary effects. Second, we experimen-
tally show their effectiveness in two backward-chaining systems, PRODIGY
and ABTWEAK.
Chapter 4: Abstraction. We describe abstraction for PRODIGY, present
algorithms for generation of abstraction hierarchies, and give empirical con-
firmation of their effectiveness. First, we review Knoblock's ALPINE system,
which constructs hierarchies for a limited domain language, and extend it
for the advanced language of PRODIGY. Second, we give an algorithm that
improves effectiveness of the abstraction generator by partially instantiating
predicates in the domain encoding.
1.5 Extended abstract 37

Chapter 5: Summary and extensions. We present two techniques for


enhancing the utility of primary effects and abstraction. First, we describe a
synergy of the abstraction generator with a procedure for selecting primary
effects. Second, we give an algorithm for adjusting the domain description
to a specific problem. In conclusion, we review the results of the work on
description changers, summarize the interactions among the implemented
algorithms, and outline some directions for future research.

Part III: Top-level control

We develop a system for the automatic generation and use of multiple rep-
resentations. When the system faces a new problem, it first improves the
problem description and then selects an appropriate solver. The system's
central part is a control module, which chooses appropriate solvers and chang-
ers, reuses descriptions, and accumulates performance data. We describe the
structure of the control module and techniques for automatic selection among
the available solvers, changers, and domain descriptions.
Chapter 6: Multiple representations. We lay a theoretical groundwork
for a synergy of multiple solvers and changers. First, we discuss the task of
improving domain descriptions and selecting appropriate solvers, formalize it
as search in a space of representations, and define the main elements of this
space. Second, we develop a utility model for evaluating representations.
Chapter 7: Statistical selection. We consider the task of choosing among
the available representations and formalize the statistical problem involved
in evaluating their performance. We then present a learning algorithm that
gathers performance data, evaluates representations, and chooses a represen-
tation for each given problem. It also selects a time limit for search with the
chosen representation and interrupts the solver upon reaching this limit.
Chapter 8: Statistical extensions. We extend the statistical procedure to
account for properties of specific problems. The extended learning algorithm
uses problem-specific utility functions, adjusts the data to the estimated prob-
lem sizes, and utilizes information about similarity among problems.
Chapter 9: Summary and extensions. We present some extensions to
the control module, which include heuristics for identifying an effective repre-
sentation, rules for selecting among changers, and tools for optional user par-
ticipation in the top-level control. We then summarize the main results, dis-
cuss limitations of SHAPER, and point out related directions for future work.

Part IV: Empirical results

We give the results of applying SHAPER to four domains: a model of a ma-


chine shop (Chapter 10), Sokoban puzzle (Chapter 11), Extended STRIPS
38 1. Motivation

world (Chapter 12), and PRODIGY Logistics Domain (Chapter 13). The ex-
periments have confirmed that the system's control module almost always
selects the right solvers and changers, and that its performance is not sensi-
tive to features of specific domains.
Part II

Description changers
2. Prodigy search

Newell and Simon [1961; 1972] invented means-ends analysis during their
work on the General Problem Solver (GPs), back in the early days of ar-
tificial intelligence. Their technique combined goal-directed reasoning with
forward chaining from the initial state. The authors of later systems [Fikes
and Nilsson, 1971; Warren, 1974; Tate, 1977] gradually abandoned forward
search and began to rely exclusively on backward chaining.
Researchers studied several types of backward chainers [Minton et al.,
1994] and discovered that least commitment improves the efficiency of goal-
directed reasoning, which gave rise to TWEAK [Chapman, 1987], ABTWEAK
[Yang et al., 1996], SNLP [McAllester and Rosenblitt, 1991], uCPOP [Pen-
berthy and Weld, 1992; Weld, 1994], and other least-commitment solvers.
Meanwhile, PRODIGY researchers extended means-ends analysis and de-
signed a family of problem solvers based on the combination of goal-directed
backward chaining with simulation of operator execution. The underlying
strategy is a special case of bidirectional search [Pohl, 1971]. It has given rise
to several versions of the PRODIGY system, including PRODIGY1, PRODIGY2,
NOLIMIT, PRODIGy4, and FLECS.
The developed algorithms keep track of the world state, which results from
executing parts of the constructed solution, and use the state to guide the
goal-directed reasoning. Least commitment proved ineffective for this search
technique, and Veloso developed an alternative strategy based on instantiat-
ing all variables as early as possible.
Experiments have shown that PRODIGY search is an efficient procedure, a
fair match to least-commitment systems and other successful problem solvers.
Moreover, the PRODIGY architecture has proved a valuable tool for the de-
velopment of speed-up learning techniques.
We review the past work on the PRODIGY system (Section 2.1) and de-
scribe the foundations of the developed techniques (Section 2.2), as well as
the main extensions to the basic search engine (Sections 2.3 and 2.4). We then
report the results of a joint investigation with Blythe on the completeness of
PRODIGY (Section 2.5).

E. Fink, Changes of Problem Representation


© Springer-Verlag Berlin Heidelberg 2002
40 2. Prodigy search

Table 2.1. Main versions of the PRODIGY architecture. The work on PRODIGY
continued for over ten years and gave rise to a series of novel search strategies.
version year authors
PRODIGy1 1986 Minton and Carbonell
PRODIGy2 1989 Carbonell, Minton, Knoblock, and Kuokka
NOLIMIT 1990 Veloso and Borrajo
PRODIGy4 1992 Blythe, Wang, Veloso, Kahn, Perez, and Gil
FLEes 1994 Veloso and Stone

2.1 PRODIGY system

The PRODIGY system went through several stages of development and grad-
ually evolved into an advanced architecture that supports a variety of search
and learning techniques. We give a brief history of its development (Sec-
tion 2.1.1) and summarize the main features of its search engines (Sec-
tion 2.1.2).

2.1.1 History

The history of PRODIGY (Table 2.1) began circa 1986, when Steven Minton
and Jaime Carbonell implemented PRODIGY1, which became a testbed for
their work on control rules [Minton, 1988; Minton et al., 1989a].
Steven Minton, Jaime Carbonell, Craig Knoblock, and Dan Kuokka used
PRODIGYl as a prototype in their work on PRODIGy2 [Carbonell et al., 1991]'
which supported an advanced language for describing domains [Minton et al.,
1989b]. They showed the system's effectiveness in scheduling machine-shop
operations [Gil, 1991; Gil and Perez, 1994]' planning a robot's actions in
an extended STRIPS world [Minton, 1988], and solving a variety of smaller
problems.
Manuela Veloso [1989] and Daniel Borrajo developed the next version,
called NOLIMIT, which significantly differed from its predecessors. They added
new branching points and introduced object types for specifying possible val-
ues of variables. Veloso showed the effectiveness of NOLIMIT on the previously
designed PRODIGY domains, as well as on transportation problems.
Jim Blythe, Mei Wang, Manuela Veloso, Dan Kahn, Alicia Perez, and
Yolanda Gil developed a collection of techniques for enhancing the search en-
gine and built PRODIGy4 [Carbonell et al., 1992]. In particular, they provided
an efficient technique for instantiating operators [Wang, 1992], extended the
use of inference rules, and designed advanced data structures for the low-level
implementation. They also designed a user interface and tools for adding new
learning mechanisms.
Manuela Veloso and Peter Stone [1995] implemented the FLEes algorithm,
an extension to PRODIGy4 that included an additional decision point and new
search strategies, and showed that their strategies improved the efficiency.
2.1 PRODIGY system 41

The PRODIGY architecture provided ample opportunities for speed-up


learning, and researchers used it to develop a variety of techniques for au-
tomated efficiency improvement. Minton [1988] designed the first learning
module for PRODIGY, which automatically generated control rules. He demon-
strated effectiveness of integrating PRODIGy search with learning, which stim-
ulated work on other speed-up techniques.
In particular, researchers designed modules for explanation-based learning
[Etzioni, 1990; Etzioni, 1993; Perez and Etzioni, 1992], inductive generation
of control rules [Veloso and Borrajo, 1994; Borrajo and Veloso, 1996], abstrac-
tion search [Knoblock, 1993], and analogical reuse of problem-solving episodes
[Carbonell, 1983; Veloso and Carbonell, 1990; Veloso and Carbonell, 1993a;
Veloso and Carbonell, 1993b; Veloso, 1994]. They also investigated techniques
for improving the quality of solutions [Perez and Carbonell, 1993; Perez,
1995], learning unknown properties of a domain [Gil, 1992; Carbonell and
Gil, 1990; Wang, 1994; Wang, 1996], and collaborating with the user [Joseph,
1992; Stone and Veloso, 1996; Cox and Veloso, 1997a; Cox and Veloso, 1997b;
Veloso et at., 1997].
The reader may find a summary of PRODIGy learning techniques in the
review papers by Carbonell et at. [1990] and Veloso et at. [1995]. These
results have been major contributions to machine learning; however, they
have left two notable gaps. First, PRODIGy researchers tested each learning
module separately, without exploring the synergetic use of multiple modules.
Although preliminary attempts to integrate learning with abstraction gave
positive results [Knoblock et at., 1991a], the researchers have not pursued
this direction. Second, there have been no automated techniques for deciding
when to invoke specific learning modules. The user has traditionally been
responsible for the choice among available learning systems.

2.1.2 Advantages and drawbacks

The PRODIGy architecture is based on two major design decisions. First, it


combines backward chaining with simulated execution of relevant operators.
Second, it fully instantiates operators in early stages of search, whereas most
classical systems delay the instantiation.
The backward-chaining procedure selects operators relevant to the goal,
instantiates them, and arranges them into a partial-order solution. The for-
ward chainer simulates the execution of these operators and gradually con-
structs a total-order sequence of operators. The system keeps track of the
simulated world state that would result from executing this sequence. It uti-
lizes the simulated state in selecting operators and their instantiations, which
improves the effectiveness of goal-directed reasoning. In addition, learning
modules use the state to identify reasons for successes and failures.
Since PRODIGy uses fully instantiated operators, it efficiently handles a
powerful domain language. In particular, it supports disjunctive and quanti-
fied preconditions, conditional effects, and arbitrary constraints on the values
42 2. Prodigy search

of operator variables [Carbonell et at., 1992]. The solver utilizes the world
state in choosing appropriate instantiations.
On the flip side, full instantiation leads to a large branching factor, which
results in gross inefficiency of a breadth-first search. The solver uses depth-
first search and relies on heuristics for selecting appropriate branches of the
search space, which usually leads to suboptimal solutions. If the heuristics
are misleading, the solver may fail to find a solution.
A formal comparison of PRODIGY and other search systems is still an open
problem; however, multiple experimental studies confirmed that PRODIGY
search is an efficient strategy [Stone et at., 1994]. Experiments also revealed
that PRODIGY and backward chainers perform well in different domains. Some
tasks are more suitable for execution simulation, whereas others require back-
ward chaining. Veloso and Blythe [1994] identified some domain properties
that determine which of the two strategies is more effective.
Kambhampati and Srivastava [1996a; 1996b] investigated common prin-
ciples underlying PRODIGY and least-commitment search. They developed a
framework that generalizes these two types of goal-directed reasoning and
combined them with direct forward search. They implemented the Universal
Classical Planner, which can use all these strategies; however, the resulting
algorithm has many branching points, which give rise to an impractically
large search space. The main open problem is the development of heuristics
that would effectively use the flexibility of the Universal Classical Planner.
Blum and Furst [1997] constructed GRAPHPLAN, which uses the world
state in a different way. It utilizes propagation of constraints from the ini-
tial state to identify operators with unsatisfiable preconditions. The system
discards these operators and uses backward chaining with the remaining op-
erators. GRAPHPLAN performs forward constraint propagation prior to its
search for a solution. Unlike PRODIGY, it does not use forward search from
the initial state.
The relative performance of PRODIGY and GRAPHPLAN varies across do-
mains. The GRAPHPLAN algorithm has to generate all possible instantiations
of all operators before searching for a solution, which often causes a combi-
natorial explosion in large-scale domains. On the other hand, GRAPHPLAN is
faster in small-scale domains that require extensive search.
Researchers applied PRODIGY to robot navigation and discovered that its
execution simulation is useful for interleaving search with real execution. In
particular, Blythe and Reilly [1993a; 1993b] explored techniques for plan-
ning the routes of a household robot in a simulated environment. Stone and
Veloso [1996] constructed a mechanism for user-guided interleaving of prob-
lem solving and execution.
Haigh and Veloso [1996; 1997; 1998a; 1998b] built a system for navigating
XAVIER, a real robot at Carnegie Mellon University. Haigh [1998] integrated
this system with XAVIER'S low-level control and demonstrated its effectiveness
in guiding the robot's actions. It begins the real-world execution before com-
2.2 Search engine 43

pleting the search for a solution, which involves the risk of guiding the robot
into an inescapable trap. To avoid such traps, Haigh and Veloso restricted
their system to domains with reversible actions.

2.2 Search engine


We now describe the basics of PRODIGY search, including the results of joint
work with Veloso on the principles underlying PRODIGY [Fink and Veloso,
1996]. We present the PRODIGY domain language (Section 2.2.1), the encod-
ing of intermediate incomplete solutions (Section 2.2.2), and the algorithm
that combines backward chaining with execution simulation (Sections 2.2.3
and 2.2.4). Then, we discuss differences among the main versions of PRODIGY
(Section 2.2.5).

2.2.1 Encoding of problems

We define a problem domain by a set of object types and a library of opera-


tors. The PRODIGY language for describing operators is based on the STRIPS
domain language [Fikes and Nilsson, 1971], extended to express conditional
effects, disjunctive preconditions, and quantifications.
An operator is defined by its preconditions and effects. The preconditions
of an operator are the conditions that must be satisfied before its execution.
They are encoded by a logical expression with negations, conjunctions, dis-
junctions, universal quantifiers, and existential quantifiers. The effects are
encoded as a list of predicates added to the current state or deleted from the
state upon the execution.
We can specify conditional effects, also called if-effects, with outcome
depending on the state. An if-effect is defined by its conditions and actions.
If the conditions hold, the effect changes the state according to its actions.
Otherwise, it does not affect the state.
The effect conditions are encoded by a logical expression in the same
way as operator preconditions; however, their meaning is different. If the
preconditions of an operator do not hold in the state, the operator cannot
be executed. On the other hand, if the conditions of an if-effect do not hold,
we can execute the operator, but the if-effect does not change the state. The
actions of an if-effect are predicates, to be added to the state or deleted from
the state; that is, their encoding is identical to unconditional effects.
In Figure 2.1, we give an example of a simple domain; its syntax differs
slightly from the PRODIGY language [Carbonell et al., 1992] for the purpose
of better readability. The domain includes two types of objects, Package and
Place, and the Place type has two subtypes, Town and Village. We use types to
limit the allowed values of variables in the operator encoding.
A truck carries packages between towns and villages. The truck's fuel tank
is sufficient for only one ride. Towns have gas stations, so the truck can refuel
44 2. Prodigy search

leave-town(<from>, <to» leave-village(<froID>, <to»


- Type Hierarchy - <from>: type Village
<from>: type Town
I packa{1 !;lace I
.........
<to>: type Place
Pre: (truck-at <from»
<to>: type Place
Pre: (truck-at <from»
./'
I Town I I Village I Eft: del (truck-at <from» (extra-fuel)
Elf: del (truck-at <from»
add (truck-at <to»
add (truck-at <to»

load(<pack>, <place» fuel( <place»


<pack>: type Package unload(<pack>, <place»
<place>: type Town
<place>: type Place <pack>: type Package
Pre: (truck-at <place»
Pre: (at <pack> <place» <place>: type Place
Eft: add (extra-fuel)
(truck-at <place» Pre: (in-truck <pack»
Eft: del (at <pack> <place» (truck-at <place»
add (in-truck <pack» Eff: del (in-truck <pack» cushion( <pack»
(if (fragile <pack» add (at <pack> <place» <pack>: type Package
add (broken <pack») Eft: del (fragile <pack»

Fig. 2.1. Simple trucking world in the PRODIGY domain language. The Trucking
Domain is defined by a hierarchy of object types and a library of six operators.

before leaving a town. On the other hand, villages do not have gas stations;
if the truck comes to a village without a supply of extra fuel, it cannot leave.
To avoid this problem, the truck can get extra fuel in any town. If a package
is fragile, it always gets broken during loading. We can cushion a package
with soft material, which removes the fragility and prevents breakage.
A problem is defined by a list of objects, an initial state, and a goal
statement. The initial state is a set of literals, whereas the goal statement is
a condition that must hold after executing a solution. A complete solution
is a sequence of instantiated operators that can be executed from the initial
state to achieve the goal. We give an example of a problem in Figure 2.2;
the task is to deliver two packages from town-l to ville-I. We may solve it as
follows: "load(pack-l,town-l), load(pack-2,town-l), leave-town(town-l,ville-l),
unload(pack-l,ville-l), unload(pack-2,ville-l)."
The initial state may include literals that cannot be added or deleted by
operators, called static literals. For example, if the domain did not include the
fuel operator, then (extra-fuel) would be a static literal. If all instantiations of
a predicate are static literals, we say that the predicate itself is static. Since no
operator can affect these literals, the goal statement should be consistent with
the static elements of the initial state. Otherwise, the problem is unsolvable,
and the system reports failure without search.

2.2.2 Incomplete solutions

Given a problem, most problem-solving systems begin with the empty set
of operators and modify it until a solution is found. Examples of modifi-
2.2 Search engine 45

Set of Objects
Initial State
pack-I, pack-2: type Package
town-I: type Town
ville-I: type Village

Goal Statement

(at pack-I town-I)


(at pack-2 town-I)
~
(truck-at town-I) (at pack-I ville-I)
(at pack-2 ville-I)

Fig. 2.2. Encoding of a problem in the Trucking Domain, which includes a set
of objects, an initial world state, and a goal statement. The task is to deliver two
packages from town-l to ville-l.

cations include adding an operator, instantiating or constraining a variable


in an operator, and imposing an ordering constraint. The intermediate sets
of operators are called incomplete solutions, and we view them as nodes in
the search space. Each modification of a current incomplete solution gives
rise to a new node, and the number of possible modifications determines the
branching factor of search.
Researchers have explored a variety of structures for encoding an incom-
plete solution. For instance, it may be a sequence of operators [Fikes and
Nilsson, 1971] or a partially ordered set [Tate, 1977]. Some problem solvers
fully instantiate the operators, whereas other solvers use the unification of
operator effects with goals [Chapman, 1987]. Some systems mark relations
among operators by causal links [McAllester and Rosenblitt, 1991]' and oth-
ers do not explicitly maintain these relations.
In PRODIGY, an incomplete solution consists of two parts: a total-order
head and tree-structured tail (Figure 2.3). The root of the tail's tree is the
goal statement G, the other nodes are fully instantiated operators, and the
edges are ordering constraints.
The tail is built by a backward chainer, which starts from the goal and
adds operators, one by one, to achieve goal literals and preconditions of pre-
viously added operators. When the algorithm adds an operator to the tail,
it instantiates the operator; that is, it replaces all variables with specific
objects. The preconditions of an instantiated operator are a conjunction of
literals; every literal in this conjunction is an instantiated predicate.
The head is a sequence of instantiated operators that can be executed
from the initial state. It is generated by an execution-simulating algorithm,
described in Section 2.2.3. The simulated execution of the head transforms
the initial state I into a new state C, called the current state. In Figure 2.4,
we illustrate an incomplete solution for the example problem.
46 2. Prodigy search

Initial Current Goal


State State Statement

0-0-0-G
~ - - head . - -> j...: - gap ->I
~
<- - - tail - - ->

Fig. 2.3. Encoding of an incomplete solution. It consists of a total-order head,


which can be executed from the initial state, and a tree-structured tail constructed
by a backward chainer. The current state C is the result of applying the head
operators to the initial state I.

Initial State Current State Goal Statement


unload
(pack-I,
load ville-I)
(pack-I, j...:- gap ->I
town-I) unload
(pack-2,
ville-I)

Fig. 2.4. Example of an incomplete solution. The head consists of a single operator,
load; the tail includes two unload operators, linked to the goal literals.

Since the head is a total-order sequence of operators that do not contain


variables, the current state C is uniquely defined. If the tail operators cannot
be executed from the current state C, then there is a "gap" between the
head and tail, and the purpose of problem solving is to bridge this gap. For
example, we can bridge the gap in Figure 2.3 by a sequence of two operators:
"load(pack-2,town-l), leave-town(town-l,ville-l)."

2.2.3 Simulating execution

Given an initial state f and a goal statement G, PRODIGY begins with the
empty head and tail, and modifies them until it finds a complete solution.
The initial incomplete solution has no operators, and its current state is the
same as the initial state, C = f.
At each step, PRODIGY can modify the current incomplete solution in one
of two ways (Figure 2.5). First, it can add an operator to the tail (operator t
in Figure 2.5) in order to achieve a goal literal or a precondition of another
operator. This tail modification is a job of the backward-chaining algorithm,
described in Section 2.2.4. Second, PRODIGY can move some operator from
the tail to the head (operator x in Figure 2.5), if its preconditions are satisfied
in the current state C. This operator becomes the last in the head, and the
current state is updated to account for its effects.
Intuitively, we may imagine that the system executes the head opera-
tors in the real world, and it has already changed the world from the initial
2.2 Search engine 47

0-0-f) ~
%' ~

Q-0-f) ~ t y 8-0-0-0 ~
Adding an operator to the tail Applying an operator (moving it to the head)

Fig. 2.5. Modifying an incomplete solution. PRODIGY either adds a new operator to
the tail tree (left) or moves one of the previously added operators to the head (right).

state I to the current state C. If the tail contains an operator with precondi-
tions satisfied in C, PRODIGY applies this operator and further changes the
state. Because of this analogy with real-world changes, moving an operator to
the head is called the application of the operator; however, this term refers
to simulating the application. Even if the execution of the head operators
is disastrous, the world does not suffer: PRODIGY backtracks and tries an
alternative execution sequence.
When the system applies an operator to the current state, it begins with
the deletion effects and removes the corresponding literals from the state.
Then, it performs the addition of new literals. Thus, if the operator adds
and deletes the same literal, the net result is adding it to the state. For
example, suppose that the current state includes the literal (truck-at town-
1), and PRODIGY applies leave-town(town-1.town-1) with effects "del (truck-
at town-1)" and "add (truck-at town-1)." The system first removes this literal
from the state, and then adds it back. If PRODIGY processed the effects in
the opposite order, it would permanently remove the truck's location, thus
getting an inconsistent state.
An operator application is the only way of updating the head. The system
never inserts a new operator directly into the head, which means that it uses
only goal-relevant operators in the forward chaining. The search terminates
when the goal statement G is satisfied in C. If the tail is not empty at that
point, it is dropped.

2.2.4 Backward chaining

We next describe the backward-chaining procedure that constructs the tree-


structured tail of an incomplete solution. When PRODIGY invokes this pro-
cedure, it adds a new operator to the tail for achieving either a goal literal
or a precondition of another tail operator. Then, it establishes a link from
the newly added operator to the achieved literal and adds the corresponding
ordering constraint. For example, if the incomplete solution is as shown in
Figure 2.4, the procedure may add load(pack-2.town-1) to achieve the precon-
dition (in-truck pack-2) of unload (Figure 2.6). If the backward chainer uses
48 2. Prodigy search
Goal
Current State (in-truck Statement
pack-I)
(truck-at
ville-l)

(at pack-2 load in-truck


town-I) pack-2)
(pack-2,
(truck-at town-I) (truck-at
town-I) ville-l)
Fig. 2.6. Example of the tail in an incomplete solution. First, the backward chainer
adds the two unload operators to achieve the goal literals. Then, it inserts load
to achieve the precondition (in-truck pack-2) of unload(pack-2,ville-l).
--------------------
load (in-truck
«pack>, ~ pack-2) ----
<place»
I _ _ _ _/ _ _ _ _ _ _ _ _ _ _ ____ J
,
~

---------------~---- ----~--------------
I

load (in-truck load (in-truck


(pack-2, - pack-2) ---- (pack-2, ,.---- pack-2) ----
town-I) ville-I)
1 __________________ 1 ,- ' - - - - - , ________ --,
Fig. 2.7. Instantiating a newly added operator. If the set of objects is as shown
in Figure 2.2, PRODIGY can generate two alternative versions of load for achieving
the subgoal (in-truck pack-2).

an if-effect to achieve a literal, then the effect's conditions are added to the
preconditions of the instantiated operator.
PRODIGY tries to achieve a literal only if it is not true in the current
state C and has not been linked to any tail operator. Unsatisfied goal literals
and preconditions are called subgoals. For example, the tail in Figure 2.6 has
two identical subgoals, marked by italics.
Before inserting an operator into the tail, the solver fully instantiates
the operator; that' is, it substitutes all variables with specific objects. Since
PRODIGY allows disjunctive and quantified preconditions, instantiating an
operator may be a difficult problem. The system uses a constraint-based
matching procedure that generates all possible instantiations [Wang, 1992].
For example, suppose the backward chainer uses load( <pack>,<place»
to achieve the subgoal (in-truck pack-2), as shown in Figure 2.7. PRODIGY in-
stantiates the variable <pack> with the object pack-2 from the subgoalliteral,
and then it has to instantiate the other variable, <place>. Since the domain
has two places, town-l and ville-l, the variable has two possible instantiations,
which give rise to different branches in the search space (Figure 2.7).
In Figure 2.8, we summarize the search algorithm that explores the space
of incomplete solutions. The Operator-Application procedure builds the head
2.2 Search engine 49

and maintains the current state, whereas Backward-Chainer constructs the


tail. The algorithm includes five decision points, which give rise to different
branches of the search space. It can backtrack over the decision to apply
an operator (Line 2a) and over the choice of an "applicable" tail operator
(Line 1b). It also backtracks over the choice of a subgoal (Line lc), an operator
that achieves it (Line 2c), and the operator's instantiation (Line 4c). We
summarize the decision points in Figure 2.9.
The first two choices (Lines 2a and Ib) enable PRODIGY to consider dif-
ferent orderings of head operators. These choices are essential for solving
problems with interacting subgoals; they are analogous to the choice of or-
dering constraints in least-commitment algorithms.

2.2.5 Main versions

The algorithm in Figure 2.8 has five decision points, which allow flexible
selection of operators, their instantiations, and the order of their execution;
however, these decisions give rise to a large branching factor. The use of
built-in heuristics, which eliminate some choices, may reduce the search space
and improve the efficiency. On the negative side, such heuristics prune some
solutions, and they may direct the search to a suboptimal solution or even
prevent finding any solution. Setting appropriate restrictions on the solver's
choices is a major research problem.
Although the described algorithm underlies all PRODIGY versions, from
PRODIGyl to FLECS, the versions differ in their use of decision points and
built-in heuristics. Researchers investigated different trade-offs between flex-
ibility and reduction of branching. They gradually increased the number of
available decision points from two in PRODIGyl to all five in FLECS.
The versions also differ in some features of the domain language, in the
use of learning modules, and in the low-level implementation of search mech-
anisms. We do not discuss these differences; the reader may learn about them
from the review article by Veloso et al. [1995].
PRODIGY! and PRODlGy2. The early versions of PRODIGY had only two
backtracking points: the choice of an operator (Line 2c in Figure 2.8) and
the instantiation of the selected operator (Line 4c). The other three decisions
were based on fixed heuristics, which did not give rise to multiple search
branches. The algorithm preferred operator application to adding new oper-
ators (Line 2a), applied the tail operator that had been added last (Line Ib),
and achieved the first unsatisfied precondition of the last added operator
(Line lc). This algorithm generated suboptimal solutions and sometimes
failed to find any solution.
For example, consider the PRODIGy2 search for the problem in Fig-
ure 2.2. The solver adds unload(pack-I,ville-I) to achieve (at pack-I ville-I),
and load(pack-I,town-I) to achieve the precondition (in-truck pack-I) of unload
(Figure 2.10a). Then, it applies load and adds leave-town(town-I,ville-I) to
50 2. Prodigy search

Base-PRODIGY
la. If the goal statement G is satisfied in the current state C, then return the head.
2a. Either
(i) Backward-Chainer adds an operator to the tail, or
(ii) Operator-Application moves an operator from the tail to the head.
Decision point: Choose between (i) and (ii).
3a. Recursively call Base-PRODIGY on the resulting incomplete solution.

Operator-Application
lb. Pick an operator op in the tail, such that
(i) there is no operator in the tail ordered before op, and
(ii) the preconditions of op are satisfied in the current state C.
Decision point: Choose one of such operators.
2b. Move op to the end of the head and update the current state C.

Backward-Chainer
lc. Pick a literal 1 among the current subgoals.
Decision point: Choose one of the subgoal literals.
2c. Pick an operator op that achieves l.
Decision point: Choose one of such operators.
3c. Add op to the tail and establish a link from op to l.
4c. Instantiate the free variables of op.
Decision point: Choose an instantiation.
5c. If the effect that achieves 1 has conditions,
then add them to the operator preconditions.

Fig. 2.8. Foundations of the PRODIGY search algorithm. The Operator-Application


procedure simulates execution of operators, whereas Backward- Chainer selects op-
erators relevant to the goal.

,- - - - - - - - - - - Base-PRODIGY - - - - - - - - - - - ~

2a. Decide whether to apply an operator


apply or add a new operator to the tail add new
operator I _________________________________ ~
, operator

,, - - - - - Operator-Application . - - - - -
,' ,
r - - - - - - Backward-Chainer - - - - - -,

lb. Choose an operator to apply lc. Choose an unachieved literal


~------------------- _ _ _ _ _ _ I
2c. Choose an operator
that achieves this literal

4c. Choose an instantiation for


the variables of the operator
,
---------------------------
Fig. 2.9. Decision points of the PRODIGY algorithm, given in Figure 2.8. Every
decision point allows backtracking, thus giving rise to multiple search branches.
2.2 Search engine 51

Initial State Goal


Statement
(at pack-l
, town-I) (at pack-I (in-truck (at 8ack-1
(at pack-2 town-I) pack-I) vi e-l)
(a)
,: town-I) (truck-at (truck-at (at 8ack-2
(truck-at town-I) ville-I) vi e-I)
town-I)
'--------------------------,--------------------------
rV
;ry
~---------------------------t------------------------- __ ,
Current State Goal
Statement
(in-truck
pack-I) (in-truck (at Back-I
(at pack-2 (truck-at pack-I) vi e-I)
(b) town-l) town-I) (truck-at (at Back-2
(truck-at ville-I) vi e-I)
town-I)
'- - - - - - - - - - - - - - - - - - - - - - - - - - - -,- - - - - - - - - - - - - - - - - - - - - - - - - - - -'
rV
;ry
~---------------------t---------------------
,: Current State
(at Rack-I

~
vi e-I)
, (at pack-2
(c) : town-I)
,,
(truck-at
ville-I)

FAILURE

Fig. 2.10. Incompleteness of PRODIGy2. The system fails to solve the problem in
Figure 2.2. Since PRODIGy2 always prefers the operator application to adding new
operators, it cannot load both packages before driving the truck to its destination.

achieve the precondition (truck-at ville-I) of unload (Figure 2.lOb). Finally,


PRODIGY applies leave-town and unload (Figure 2.lOc), thus bringing only
one package to the village.
Since the algorithm uses only two backtracking points, it does not consider
loading two packages before the ride or getting extra fuel before leaving the
town; thus, it fails to solve the problem. This example demonstrates that
PRODIGy2 is incomplete; that is, it may fail on a problem that has a solution.
The user may improve the situation by providing domain-specific control rules
that enforce different choices of subgoals in Line lc. Note that PRODIGy2 does
not backtrack over these choices, and an inappropriate control rule may cause
a failure. This approach often improves the performance; however, it requires
the user to assume the responsibility for completeness and solution quality.
NOLIMIT and PRODIGv4. During the work on NOLIMIT, Veloso added two
more backtracking points, delaying the application of tail operators (Line 2a)
and choosing a subgoal (Line lc); later, PRODIGy4 inherited these points. On
52 2. Prodigy search

(at pack-l
town-I)
Initial State (truck-at Goal
town-I)
Statement
(at pack-I
town-I) applied (truck-at (at pack-I
(at pack-2 second town-I) ville-I)
town-I) (atpack-2
(truck-at (at pack-2 (in-truck ville-I)
town-I) applied town-I) pack-2)
first (truck-at
town-I) (truck-at
ville-I)
Fig. 2.11. Example of inefficiency in PRODIGy4. The boldface numbers, in the up-
per right corners of operators, mark the order of adding operators to the tail. Since
PRODIGy4 always applies the last added operator, it tries to apply leave-town be-
fore one of the load operators, which leads to a deadend and requires backtracking.

the other hand, PRODIGy4 makes no decision in Line Ib; it always applies
the last added operator. The absence of this decision point does not rule out
any solutions, but it may negatively affect the search time.
For instance, if we use PRODIGy4 to solve the problem in Figure 2.2, it
may generate the tail shown in Figure 2.11, where the numbers show the order
of adding operators. We could now solve the problem by applying the two
load operators, the leave-town operator, and then both unload operators;
however, the solver cannot use this order. It applies leave-town before one
of the load operators, which leads to a deadend. It then has to backtrack and
construct a new tail that allows the right order of applying the operators.
FLEes. The FLEes algorithm has all five decision points, but it does not
backtrack over the choice of a subgoal (Line lc), which means that only four
points give rise to multiple search branches. Since backtracking over these
points may produce an impractically large space, Veloso and Stone [1995]
implemented general heuristics that further limit the space.
They experimented with two versions of the FLEes algorithm, SAVTA and
SABA, which differ in their choice between adding an operator to the tail and
applying an operator (Line 2a). SAVTA prefers to apply tail operators before
adding new ones, whereas SABA tries to delay their application.
Experiments have shown that the greater flexibility of PRODIGy4 and
FLEes usually has an advantage over PRODIGY2, despite the larger branching
factor. The relative effectiveness of PRODIGy4, SAVTA, and SABA depends on a
specific domain, and the choice among them is often essential for performance
[Stone et at., 1994].
2.3 Extended domain language 53

Goal Statement
cushion( <pack>, <place» (exists <pack> of type Package
<pack>: type Package (at <pack> ville-I»
<place>: type Place
(b) Existential quantification.
Pre: (truck-at <place»
(or (at <pack> <place» Goal Statement
(in-truck <pack»)
(forall <pack> of type Package
Eff: del (fragile <pack» (at <pack> ville-I»

(a) Disjunction. (c) Universal quantification.

Fig. 2.12. Examples of disjunction and quantification in PRODIGY. The user may
utilize them in precondition expressions and goal statements.

2.3 Extended domain language

The PRODIGY domain language is an extension of the STRIPS language


[Fikes and Nilsson, 1971]. The STRIPS system used a limited encoding of op-
erators, and PRODIGY researchers added several advanced features, including
complex preconditions and goal expressions (Section 2.3.1), inference rules,
(Section 2.3.2), and flexible use of object types (Section 2.3.3).

2.3.1 Extended operators

The PRODIGY language allows complex logical expressions in operator pre-


conditions, if-effect conditions, and goal statements. They may include not
only negations and conjunctions, but also disjunctions and quantifications.
The language also enables the user to specify the costs of operators, which
serve as a measure of solution quality.
Disjunctive preconditions. To illustrate disjunction, we consider a vari-
ation of the cushion operator, given in Figure 2.12(a). In this example, we
can cushion a package when it is inside or near the truck.
When PRODIGY instantiates an operator that has disjunctive precondi-
tions, it generates an instantiation for one element of the disjunction and
discards all other elements. For example, if the solver has to cushion pack-l,
it may choose the instantiation (at pack-l town-l), which matches (at <pack>
<place», and discard the other element, (in-truck <pack».
If the initial choice does not lead to a solution, the solver backtracks and
considers the instantiation of another element. For instance, if the selected
version of cushion has proved inadequate, PRODIGY may discard the first
element of the disjunction, (at <pack> <place», and choose the instantiation
(in-truck pack-l) of the other element.
54 2. Prodigy search

Quantified preconditions. We illustrate the use of quantifiers in Fig-


ures 2.12(b) and 2.12(c). In the first example, the solver has to transport
any package to ville-l. In the second example, it has to deliver all packages.
When the problem solver instantiates an existential quantifier, it selects
one object of the specified type. For example, it may decide to deliver pack-I
to ville-I, thus replacing the goal in Figure 2.12(b) by (at pack-I ville-I). If it
does not lead to a solution, PRODIGY backtracks and tries another object.
When instantiating a universally quantified expression, the solver treats it as
a conjunction over all matching objects.
Instantiated operators. The PRODIGY language allows arbitrary logical
expressions, which may contain multiple levels of negations, conjunctions, dis-
junctions, and quantifications. When adding an operator to the tail, PRODIGY
generates all possible instantiations of its preconditions and chooses one of
them. If the solver backtracks, it chooses an alternative instantiation. Every
instantiation is a conjunction of literals, some of which may be negated; it
has no disjunctions, quantifications, or negated conjunctions.
Wang [1992] designed an advanced algorithm for generating possible in-
stantiations of operators and goal statements. In particular, she developed a
mechanism for pruning inconsistent choices of objects and provided heuristics
for selecting the most promising instantiations.
Costs. The use of costs allows measuring the solution quality. We assign
nonnegative costs to operators and define a solution cost as the sum of its
operator costs. The lower the cost, the better the solution.
The authors of the original PRODIGY architecture did not provide support
for costs, and they usually measured the solution quality by the number of
operators. Perez [1995] implemented costs during her exploration of control
knowledge for improving the solution quality; however, she did not incor-
porate costs into the main version, and we later re-implemented the cost
mechanism.
In Figure 2.13, we give an example of cost encoding. For every opera-
tor, the user specifies a Lisp function, the arguments of which are operator
variables. Given specific objects, the function returns the corresponding cost,
which must be a nonnegative real number. If the cost is the same for all
instantiations of an operator, it may be specified by a number rather than a
function. If the user does not encode a cost, then by default it is l.
The example in Figure 2.13(a) includes two cost functions, called leave-
cost and load-cost. We give pseudocode for these functions (Figure 2.13b) and
their real encoding in PRODIGY (Figure 2.13c). The cost of driving between
two locations is linear in the distance, determined by the miles function. The
user may specify distances by a matrix or by a list of initial-state literals,
and she should provide the appropriate look-up procedure. The loading cost
depends on the location type; it is larger in villages. Finally, the cost of the
cushion operator is constant.
2.3 Extended domain language 55

leave-town(<f'rom>, <to» load( <pack>, <place»


cushion( <pack»
<from>: type Town <pack>: type Package
<pack>: type Package
<to>: type Place <place>: type Place

Cost: 5
Cost: leave-cost( <from>, <to» Cost: load-cost( <place> )

(a) Use of cost functions and constant costs.

leave-cost ( <from>, <to> ) (defun leave-cost «from> <to»


Return 0.2' miles(<from>,<to» + 5. (+ (* 0.2 (miles <from> <to») 5»
load-cost( <place» (defun load-cost «place»
If <place> is of type Village, (if (eq (type-name (obj-type <place») 'Village)
then return 4; else, return 3. 43»

(b) Pseudocode for the cost functions. (c) Actual Lisp functions.

Fig. 2.13. Encoding of operator costs. The user may specify a constant cost or, al-
ternatively, a Lisp function that inputs operator variables and returns a nonnegative
cost. If an operator does not include a cost, PRODIGY assumes that it is 1.

When PRODIGY instantiates an operator, it calls the corresponding func-


tion to determine the cost of the resulting instantiation. If the returned value
is negative, the system signals an error. Since incomplete solutions consist
of fully instantiated operators, the solver can determine the cost of every
intermediate solution.

2.3.2 Inference rules

The PRODIGY language supports two mechanisms for changing the current
state: operators and inference rules. They have identical syntax but differ in
semantics. Operators encode actions that change the world, whereas rules
point out implicit properties of the world.
Example. In Figure 2.14, we show three inference rules for the Trucking
Domain. In this example, we have made two modifications to the original
domain (Figure 2.1). First, the cushion operator adds (cushioned <pack»
instead of deleting (fragile <pack», and the add-fragile rule indicates that
un cushioned packages are fragile.
Second, the domain includes the type County and the predicate (within
<place> <county». We use the add-truck-in rule to infer the county of the
truck's current location. For example, if the truck is at town-l, and town-l is
within county-l, then the rule adds (truck-in county-l). Similarly, we use add-in
to infer the current county of each package.
Use of inferences. The encoding of inference rules is the same as that of
operators, which may include disjunctions, quantifiers, and if-effects; however,
the rules have no costs, and they do not affect the overall solution cost.
56 2. Prodigy search

r--- Type Hierarchy -


Inf-Rule add-fragile«pack»
IPackage ~I County I cushion(<pack»
<pack>: type Package
<pack>: type Package
Pre: not (cushioned <pack»
I Town II Village I Eff: add (cushioned <pack»
Eff: add (fragile <pack»

Inf-Rule add-in«pack>, <place>, <county»


Inf-Rule add-truck-in(<place>, <county»
<pack>: type Package
<place>: type Place
<place>: type Place
<county>: type County
<county>: type County
Pre: (truck-at <place»
Pre: (at <pack> <place»
(within <place> <county»
(within <place> <county»
Eff: add (truck-in <county»
Eff: add (in <pack> <county»

Fig. 2.14. Encoding of inference rules in PRODIGY. These rules point out indirect
results of changing the world state; their syntax is identical to that of operators.

Initial State

(truck-at Goal
town-I) Statement
(truck-at
(within leave-town [;f town-2) add-truck-in
town-I (truck-at ~ (town-I, (town-2,
county-I) town-I) (within
town-2) town-2 county-2)
(within county-2) '---------'
town-2
county-2)

Fig. 2.15. Use of an inference rule in backward chaining. PRODIGY links the
add-truck-in rule to the goal literal and then adds leave-town to achieve the
rule's precondition (truck-at town-2).

The use of inference rules is also similar to that of operators: PRODIGY


adds an instantiated rule to the tail, for achieving the selected subgoal, and
applies the rule when its preconditions hold in the current state. We illustrate
it in Figure 2.15, where the solver uses the add-truck-in rule to achieve the
goal, and then adds leave-town to achieve the rule's precondition.
If the system applies an inference rule and later adds an operator that
invalidates the rule's preconditions, then it removes the rule's effects from the
state. For example, the inference rule in Figure 2.16(a) adds (truck-in town-2) to
the state. If the system then applies leave-town (Figure 2.16b), it negates
the preconditions of add-truck-in and cancels its effects. This semantics
differs from operator effects, which remain in the state unless deleted by
opposite effects of other operators.
Eager and lazy rules. The Backward-Chainer algorithm selects rules at
its discretion and may disregard unwanted rules. On the other hand, if some
rule has an undesirable effect, it should be applied regardless of the solver's
2.3 Extended domain language 57

Initial State
(truck-at Current State
town-I)
(within (truck-at
town-l town-2)
county-I) leave-town add-truck-in (truck-in
(a) (within - (town-I, I- (town-2, I- county-2)
town-2 town-2) county-2) (within ... )
county-2) (within ... )
(within
town-3 ,(within ... v
county-3)
--------------------,---------------------
V
~----------------------------------------------,
I
Current State
(truck-at
leave-town :--~dd~i~~k~i~--l leave-town town-3)
(town-I, : (town-2, f-- (town-2, _ (within ... )
town-2) : county-2) : town-3) (within ... )
~-- -- - - - ------- - - --! (within ... ) I
~-----------------------------------------------

Fig. 2.16. Cancelling the effects of an inference rule upon the negation of its
preconditions. When PRODIGY applies leave-town{town-2,town-3}, it negates the
precondition {truck-at town-2} of the add-truck-in rule; hence, it removes the
effects of this rule from the current state.

choice. For example, if pack-l is not cushioned, the system should immediately
add {fragile pack-l}.
When the user encodes a domain, she has to mark all rules that have
unwanted effects. When the preconditions of a marked rule hold in the current
state, the system applies it at once, even if it is not in the tail. The marked
rules are called eager rules, whereas the others are lazy rules. Note that
Backward-Chainer may use both eager and lazy rules, and the only special
property of eager rules is their forced application in the matching states.
Truth maintenance. When PRODIGY applies an operator or inference rule,
it updates the current state and then identifies the previously applied rules
whose preconditions no longer hold. If the system finds such rules, it modifies
the state by removing their effects. Next, PRODIGY looks for an eager rule
whose conditions hold in the resulting state. If the system finds such a rule,
it applies the rule and further changes the state.
If inference rules interact with each other, this process may involve a chain
of rule applications and cancellations. It terminates when the system gets to
a state that does not require applying a new eager rule or removing the effects
of old rules. This chain of state modifications does not involve search; it is
similar to the firing of productions in the Soar system [Laird et al., 1986;
Golding et al., 1987]. Blythe designed a truth-maintenance procedure that
keeps track of all applicable rules and controls this forward chaining.
58 2. Prodigy search

leave-town( <froID>, <to»


<from>: type (or Town City) fuel( <place»
<to>: type Place <place>: type (or Town City)
Pre: (truck-at <from» Pre: (truck-at <place»
Elf: del (truck-at <from» Elf: add (extra-fuel)
add (truck-at <to»

Fig. 2.17. Disjunctive type. The <from> and <place> variables are declared as
(or Town City). They may be instantiated with objects of two types, Town or City.

If the user provides inference rules, she has to ensure that the inferences
are consistent. In particular, a rule must not negate its own preconditions.
If two rules may be applied in the same state, they must not have opposite
effects. If a domain includes several eager rules, they should not cause an
infinite cyclic chain of forced applications. PRODIGY does not check for such
inconsistencies, and inappropriate rules may cause unpredictable results.

2.3.3 Complex types

We have already described a type hierarchy (Figure 2.1), which defines object
classes and enables the user to limit the allowed values of variables. For
example, the possible values of the <from> variable in leave-town include
all towns, but not villages. The early versions of PRODIGY did not support a
type hierarchy. Veloso designed a typed domain language during her work on
NOLIMIT, and the authors of PRODIGy4 further developed this language.
A type hierarchy is a tree; its nodes are called simple types. For instance,
the hierarchy in Figure 2.1 has five simple types: Package, Town, Village, Place,
and the root type that includes all objects. We have illustrated the use of
simple types; however, they often do not provide sufficient flexibility. For
example, consider the hierarchy in Figure 2.17 and suppose that the truck
may get extra fuel in a town or city, but not in a village. We cannot encode
this constraint with simple types unless we define an additional type. The
PRODIGY language includes a mechanism for specifying complex constraints
through disjunctive and functional types.
Disjunctive types. In Figure 2.17, we illustrate a disjunctive type that
determines the possible values of the variables <from> and <place>. The user
specifies a disjunctive type as a set of simple types; in the example, it includes
Town and City. When PRODIGY instantiates the corresponding variable, it uses
an object that belongs to any of these types. For instance, the system may
use leave-town for departing from a town or city.
Functional types. In Figure 2.18, we give an example of a functional type
that limits the values of the <to> variable. A functional type consists of two
parts: a simple or disjunctive type and a boolean test function. The system
first identifies all objects of the specified simple or disjunctive type, and then
2.3 Extended domain language 59

leave-town(<from>, <to» connected( <from>, <to»


<from>: type (or Town City) If <from> = <to>,
<to>: type Place then return False;
connected( <from>, <to»
else, return True.
Pre: (truck-at <from» (b) Pseudocode for the function.
Elf del (truck-at <from»
add (truck-at <to» (defun connected (<from> <to»
(not (eq <from> <to>)))
(a) Use of a functional type. (c) Actual Lisp function.
Fig. 2.18. Functional type. When PRODIGY instantiates leave-town, it ensures
that <from> and <to> are connected by a road. The user has to implement a
boolean Lisp function for testing the connectivity. We give an example function
that defines the fully connected graph of roads.

eliminates the objects that do not satisfy the test function. The remaining
objects are the valid values of the declared variable. In the example, the valid
values of <to> include all places that have road connections with <from>. For
instance, if every place is connected with every other place except itself, then
we use the test function given in Figure 2.18(b). The domain encoding must
include a Lisp implementation of this function, as shown in Figure 2.18(c).
The boolean function is a Lisp procedure, and its arguments are operator
variables. The function must input the variable described by the functional
type. In addition, it may input variables declared before this functional type;
however, the function cannot input variables declared after it. For example,
we use the <from> variable in limiting the values of <to>; however, we cannot
use the <to> variable as an input to a test function for <from> because of
the declaration order.
Use of test functions. When the system instantiates a variable with a
functional type, it identifies all objects of the specified simple or disjunctive
type, prunes the objects that do not satisfy the test function, and then selects
an object from the remaining set. If the user specifies not only functional types
but also control rules, which further limit suitable instantiations, then the
generation of instantiated operators becomes a complex matching problem.
Wang [1992] investigated it and developed an efficient matching algorithm.
Test functions may use any information about the current incomplete
solution, other nodes in the search space, and the global state of the system,
which allows unlimited flexibility in constraining operator instantiations. In
particular, they enable the user to encode functional effects, that is, operator
effects that depend on the current state.
Generator functions. The system also supports generator functions in the
specification of variable types. These functions generate and return a set
of allowed values, instead of testing the available values. The user has to
specify a simple or disjunctive type along with a generator function. When
60 2. Prodigy search

Test Function
load( <pack>,<place>, positive «old-space»
<old-space>,<new-space>)
If <old-space> > 0,
<pack>: type Package then return True;
<place>: type Place else, return False.
<old-space>: type Trunk-Space
positive( <old-space» Generator Function
<new-space>: type Trunk-Space decrement «old-space»
decrement( <old-space» If <old-space> = 0,
Pre: (at <pack> <place» then signal an error;
(truck-at <place» else, return { <old-space> - 1 }.
(space-left <old-space»
Eff: del (at <pack> <place» Actual Lisp Functions
add (in-truck <pack» (defun positive «old-space»
del (space-left <old-space» (> <old-space> 0)
add (space-left <new-space» (defun decrement «old-space»
(if (fragile <pack» (assert (not (= <old-space> 0»)
add (broken <pack>)) (list (- <old-space> 1»)

Fig. 2.19. Generator function. The user provides a Lisp function that generates
instantiations of <new-space>; they must belong to the specified type, Trunk-Space.

the system uses the function, it checks whether all returned objects belong
to the specified type and prunes the extraneous objects.
In Figure 2.18, we give an example that involves both a test function,
called positive, and a generator function, decrement. In this example, the
system keeps track of the available space in the trunk. If there is no space,
it cannot load packages. We use the generator function to decrement the
available space after loading a package.
When the user specifies a simple or disjunctive type for a generator func-
tion, she may define a numeric type that includes infinitely many values.
For instance, the Trunk-Space type in Figure 2.18 may include all natural
numbers. On the other hand, the generator function always returns a finite
set. The PRODIGY manual [Carbonell et al., 1992] contains a more detailed
explanation of infinite types.
2.4 Search control 61
Initial State Goal
(at pack-I (in-truck Statement
(at pack-I
town-I) ville-I) "" load / pack-I) "" unload
(pack-I, (pack-I,
(truck-at (truck-at / ville-I) (truck-at / ville-I)
town-I) ville-I) ville-I)

Fig. 2.20. Goal loop in the tail. The precondition (at pack-I ville-I) of load is the
same as the goal; hence, the solver has to backtrack and choose another operator.

2.4 Search control


The efficiency of problem solving depends on the search space and the order
of expanding nodes of the space. The nondeterministic PRODIGY algorithm
in Figure 2.8 defines the search space, but it does not specify the exploration
order. It has several decision points (Figure 2.9), which require heuristics for
selecting appropriate branches of the space.
The PRODIGY system includes a variety of search-control mechanisms,
which combine general heuristics, domain-specific experience, and advice by
the user. Some basic mechanisms are hard-coded into the system; however,
most heuristics are optional, and the user can enable or disable them.
We outline some control techniques, including heuristics for avoiding re-
dundant search (Section 2.4.1), main knobs for adjusting the search strategy
(Section 2.4.2), and control rules (Section 2.4.3). The reader may find an
overview of other control techniques in the article by Blythe and Veloso [1992]'
which explains dependency-directed backtracking, limited look-ahead, and
heuristics for choosing appropriate subgoals and instantiations.

2.4.1 Avoiding redundant search

We describe three basic techniques for eliminating redundant branches of the


search space, which are hard-coded into the search algorithm.
Goal loops. We first present a mechanism that prevents PRODIGY from
running in simple circles. To illustrate it, consider the problem of delivering
pack-I from town-I to ville-I (Figure 2.20). The solver adds unload(pack-I,ville-
1) and then tries to achieve its precondition (in-truck pack-I) by load(pack-
I,ville-I); however, the precondition (at pack-I ville-I) of load 'is identical to
the goal, and achieving it is as difficult as solving the original problem.
We call it a goal loop, which arises when a precondition of a newly added
operator is identical to the literal of some link on the path from this operator
to the goal statement. We illustrate it in Figure 2.21, where thick links mark
the path from a new operator z to the goal. The precondition l of z makes a
loop with an identical precondition of x, achieved by y.
When PRODIGY adds an operator to the tail, it compares the opera-
tor's preconditions with the links between this operator and the goal. If the
solver detects a goal loop, it backtracks and tries an alternative operator
62 2. Prodigy search

Fig. 2.21. Detection of goal loops. The backward changer compares the precondi-
tion literals of a newly added operator z with the links between z and the goal. If
some precondition l is identical to one of the link literals, the solver backtracks.

G)------@------@
(a) State loop.

Initial Intermediate Current


State State State
(at pack-I (in-truck (in-truck
town-I) pack-I) pack-I)
load (at pack-2 load unload (at pack-2
(at pack-2 (pack-I, (pack-2, (pack-2,
town-I) town-I) town-I)
town-I) town-I) town-I)
(truck-at (truck-at (truck-at
town-I) town-I) town-I)

(b) Example of a loop. "


Fig. 2.22. State loops in the head of an incomplete solution. If the current state C is
the same as one of the previous states, the problem solver backtracks. For example,
if PRODIGY applies unload(pack-2,town-l) immediately after load(pack-2,town-l),
it creates a state loop.

that achieves the same subgoal. For example, the solver may generate a new
instantiation of the load operator, load(pack-l,town-l).
State loops. The solver also watches for loops in the head of an incomplete
solution, called state loops. Specifically, it verifies that the current state differs
from all previous states. If the current state is identical to some earlier state
(Figure 2.22a), the solver backtracks.
We illustrate a state loop in Figure 2.22, where the application of two
opposite operators, load and unload, leads to a repetition of an interme-
diate state. The solver would detect this redundancy and either delay the
application of unload or use a different instantiation.
Satisfied links. Next, we describe the detection of redundant tail operators,
illustrated in Figure 2.23. In this example, PRODIGY is solving the problem
in Figure 2.2, and it has constructed the tail in Figure 2.23(a). The literal
(truck-in ville-I) is a precondition oftwo different operators, unload(pack-l,ville-
1) and unload(pack-2,ville-l). Thus, the tail includes two identical subgoals,
and the solver adds two copies of leave-town(town-l,ville-l) to achieve them.
Such situations arise because PRODIGY connects each tail operator to
only one subgoal, which simplifies the maintenance of links. When the solver
2.4 Search control 63
~-------------------------------------l

load
(pack-I,
town-I) ~ unload
Initial State (pack-I, Goal
(at pack-I
leave-
town
V ville-I) \ Statement
town-I) (to~n-I, (at Rack-I
(at pack-2 vllle-I) vi e-I)
(a)

!I
town-I) (at Rack-2
load
(truck-at (pack-2, vi e-I)
town-I) town-I) ~ unload
(pack-2,
leave-
town l/
ville-I)
(town-I,
ville-I)
-------------------,---
,v:
- ------------
r;-'
r-------------------------~-------------------------
Current State
f\ unload
(in-truck
pack-I) (pack-I, f\
(b) load load leave- . ville-I) 1\
town
r- (pack-I, r- (pack-2, r- (town-I, (m-truck
r- pack-2)
town-I) town-I) ville-I) leave-
(truck-at
ville-I) (town-I, (pack-2, 1I
town f->- unload
v ville-I) ville-I)

Fig. 2.23. Satisfied link. After the solver has applied three operators, it notices
that all preconditions of unload(pack-2.ville-l} hold in the state; hence, it omits
the tail operator leave-town, which is linked to a satisfied precondition of unload.

applies an operator, it skips redundant parts of the tail. For example, sup-
pose that it has applied the two load operators and one leave-town, as
shown in Figure 2.23(b). The precondition (truck-in ville-l) ofthe tail operator
unload now holds in the current state, and the solver skips the tail operator
leave-town linked to this precondition.
When a tail operator achieves a precondition that holds in the current
state, we call the corresponding link satisfied. We show this situation in Fig-
ure 2.24(a), where the precondition l of x is satisfied, which makes the dashed
operators redundant.
The solver keeps track of satisfied links and updates their list after each
modification of the current state. When the solver selects a tail operator to
apply (line Ib in Figure 2.8) or a subgoal to achieve (line Ic), it ignores the
tail branches that support satisfied links. Thus, it would not consider the
dashed operators in Figure 2.24 and their preconditions.
If PRODIGY applies the operator x, it discards the dashed branch that
supports a precondition of x (Figure 2.24b). Note that it discards the dashed
branch only after applying x. If it decides to apply some other operator
before x, it may delete l, in which case dashed operators become useful again.
64 2. Prodigy search

Q--o---G)
I
(a) :

---------------,----------------
I

_______________ J _______________ _
(b): o--o---w-o ~~~
~-------------------------------
:
Fig. 2.24. Identification of satisfied links. PRODIGY keeps track of all link literals
satisfied in the state, and disregards the tail operators that support these literals.

2.4.2 Knob values

The PRODIGY architecture includes several knob variables, which allow the
user to adjust the search strategy.
Depth limit. The user usually limits the search depth, which results in
backtracking upon reaching the pre-set limit. Note that the number of op-
erators in a solution is proportional to the search depth; hence, limiting the
depth is equivalent to limiting the solution length.
After adding operator costs to the PRODIGY language, we provided a knob
for limiting the solution cost. If the system constructs a partial solution with
cost greater than the limit, it backtracks and considers an alternative branch.
If the user bounds both search depth and solution cost, the solver backtracks
upon reaching either limit.
The effect of these bounds varies across domains. In some domains, they
improve efficiency by preventing a long descent into a branch that has no
solutions. In other domains, they cause an extensive search instead of a fast
generation of a suboptimal solution. If the search space has no solution within
the specified bounds, the system fails to solve the problem; thus, a depth limit
may cause a failure on a solvable problem.
Time limit. By default, the solver runs until it either finds a solution or
exhausts the search space. If it takes too long, the user may enter a key-
board interrupt to terminate the execution. Alternatively, she may pre-set a
time limit; then, the system interrupts the search upon reaching this limit.
The user may also bound the number of expanded nodes, which causes an
interrupt upon reaching the specified node number.
Search strategies. The system normally uses a depth-first search and ter-
minates upon finding any complete solution. The user has two options for
changing this default behavior. First, PRODIGY allows breadth-first explo-
ration; however, it is usually much less efficient than the default strategy.
Second, the user may request all solutions to a given problem. Then, the
solver explores the entire search space and outputs all available solutions,
until it exhausts the space, gets a keyboard interrupt, or reaches a time
limit. The system also allows search for an optimal solution. This strategy is
similar to the search for all solutions; however, when finding a new solution,
2.4 Search control 65

(a) Select Rule


{f (truck-at <to» is the current subgoal
and leave-town«from>,<to» is used to achieve it
and (truck-at <place» holds in the current state
and <place> is of type Town
Then select instantiating <from> with <place>

(b) Reject Rule


If (truck-at <place» is a subgoal
and (in-truck <pack» is a subgoal
Then reject the subgoal (truck-at <place»

Fig. 2.25. Examples of control rules, which encode domain-specific heuristics. The
user may provide rules based on her knowledge about the domain. Moreover, the
system includes mechanisms for the automatic construction of control heuristics.

the system reduces the cost bound and then looks only for better solutions. If
the solver gets an interrupt, it outputs the best solution found by that time.

2.4.3 Control rules

The efficiency of a depth-first search depends on the heuristics for selecting


appropriate branches of the search space, as well as on the order of exploring
these branches. The PRODIGY architecture provides a general mechanism for
specifying search heuristics in the form of control rules.
A control rule is an if-then rule that specifies branching decisions, which
may depend on the current state, subgoals, and other features of the current
incomplete solution, as well as on the global state of the search space. The
PRODIGY language provides a mechanism for hand-coding control rules. In
addition, the architecture includes several learning mechanisms for automatic
generation of domain-specific rules. The work on these mechanisms has been
one of the main goals of the PRODIGY project.
The system uses three rule types, called select, reject, and prefer rules.
A select rule points out appropriate branches of the search space. When
its applicability conditions match the current incomplete solution, the rule
generates one or more promising choices. For example, consider the control
rule in Figure 2.25(a). When the solver uses the leave-town operator for
moving the truck to some destination, the rule indicates that the truck should
go there directly from its current location.
A reject rule identifies wrong choices and prunes them from the search
space. For instance, the rule in Figure 2.25(b) indicates that, if PRODIGY
has to load the truck and drive it to a certain place, then it should delay
driving until after loading. Finally, a prefer rule specifies the order of ex-
ploring branches without pruning any of them. For example, we may replace
the select rule in Figure 2.25(a) with an identical prefer rule, which would
mean that the system should first try going directly from the truck's current
66 2. Prodigy search

location to the destination, and it should keep the other options open for
later consideration. For some problems, this rule is more appropriate than
the more restrictive select rule.
At every decision point, the system identifies all applicable rules and uses
them to make appropriate choices. First, it uses an applicable select rule to
choose candidate branches of the search space. If the current incomplete solu-
tion matches several select rules, the system arbitrarily chooses one of them.
If no select rules are applicable, all available branches become candidates.
Then, PRODIGY applies all reject rules that match the current solution, and
prunes every candidate branch indicated by at least one of these rules. Note
that select and reject rules sometimes prune branches that lead to a solution;
hence, they may prevent the system from solving some problems.
After pruning inappropriate branches, PRODIGY applies prefer rules to
determine the order of exploring the remaining branches. If the system has
no applicable prefer rules, or the rules contradict each other, then it relies on
general heuristics for the exploration order.
If the system uses numerous control rules, matching of their conditions at
every decision point takes significant time, which may defeat the benefits of
the right selection [Minton, 1990]. Wang [1992] designed several techniques
that improve the matching efficiency; however, the study of the trade-off
between matching time and search reduction remains an open problem.

2.5 Completeness
A search algorithm is complete if it finds a solution for every solvable problem.
This notion does not involve a time limit, which means that an algorithm may
be complete even if it takes an impractically long time. Although researchers
used PRODIGY in multiple studies of learning and search, the question of
its completeness had remained unanswered for several years, until Veloso
demonstrated the incompleteness of PRODIGy4 in 1995.
We have further investigated completeness issues, in collaboration with
Blythe, and found that all PRODIGY algorithms had been incomplete. Then,
Blythe has implemented a complete solver by extending PRODIGY search.
We have compared it with the incomplete system and demonstrated that the
extended algorithm is almost as efficient as PRODIGY and solves a wider range
of problems [Fink and Blythe, 1998].
We have already shown that PRODIGYI and PRODIGy2 do not interleave
subgoals and sometimes fail to solve simple problems (Figure 2.10). NOLIMIT,
PRODIGy4, and FLECS use a more flexible strategy, and their incompleteness
arises less frequently. Veloso and Stone [1995] proved the completeness of
FLECS using simplifying assumptions, but their assumptions hold only for a
limited class of domains.
The incompleteness of PRODIGY is not a major handicap. Since the search
space of most problems is very large, its complete exploration is not feasible,
2.5 Completeness 67

which makes any solver "practically" incomplete. If incompleteness comes up


only in a fraction of problems, it is a fair payment for efficiency.
If we achieve completeness without compromising efficiency, we get two
bonuses. First, we ensure that the system solves every problem whose search
space is sufficiently small for complete exploration. Second, incompleteness
may occasionally rule out a simple solution to a large-scale problem, causing
an extensive search instead of an easy win. If a solver is complete, it does not
rule out any solutions and finds this simple solution.
The incompleteness of means-ends analysis in PRODIGY comes from two
sources. First, the solver does not add operators for achieving preconditions
that are true in the current state. Intuitively, it ignores potential troubles
until they actually arise. Sometimes, it is too late, and the solver fails because
it did not take measures earlier. Second, PRODIGY ignores the conditions of
if-effects that do not establish any subgoals. Sometimes, such effects negate
preconditions of other operators, which may cause a failure.
We achieve completeness by adding new branches to the search space. The
main challenge is to minimize the number of new branches in order to preserve
efficiency. We describe a method for identifying the crucial branches, based
on the information learned in failed old branches, and present an extended
search algorithm (Sections 2.5.1 and 2.5.2). We believe that this method may
prove useful for developing complete versions of other search algorithms.
The full domain language of PRODIGY has two features that aggravate
the completeness problem (Section 2.5.3), which are not addressed in the
extended algorithm. First, eager inference rules may mislead the solver and
cause a failure. Second, functional types allow the reduction of every compu-
tational task to a PRODIGY problem, and some tasks are undecidable.
We prove that the extended algorithm is complete for domains that have
no eager inference rules and functional types (Section 2.5.4). Then, we give ex-
perimental results on the performance of the extended solver (Section 2.5.5).
We conclude with a summary of the main results (Section 2.5.6).

2.5.1 Limitation of means-ends analysis

GPS, PRODIGY1, and PRODIGy2 were not complete because they did not ex-
plore all branches in their search space. The incompleteness of later algo-
rithms has a deeper reason: they do not try to achieve tail preconditions that
hold in the current state.
For example, suppose that the truck is in town-I, pack-I is in ville-I, and
the goal is to get pack-I to town-I. The only operator that achieves the goal
is unload(pack-I.town-I), so PRODIGY begins by adding it to the tail (Fig-
ure 2.26a). The precondition (truck-at town-I) of unload is true in the ini-
tial state. The solver may achieve the other precondition, (in-truck pack-I), by
adding load(pack-I.ville-I). The precondition (at pack-I ville-I) ofload is true in
the initial state, and the other precondition is achieved by leave-town(town-
I.ville-I), as shown in Figure 2.26(a).
68 2. Prodigy search
~-----------------------------------------------------------,
: Initial State Goal :
Statement,
(truck-at (at pack-I
(a): town-I) (truck-at ville-I)
: (atjlack-I town-I) (truck-at
, ville-I) ville-I)
,------------------------------r----------------------- ------ ,
~-----------------------------~-----------------------------,
, Initial State Current State Goal '
Statement :
(truck-at (truck-at ,
town-I) ville-I)
(at pack-I (at pack-I
ville-I) ville-I)

Fig. 2.26. Incompleteness of means-ends analysis in PRODIGY. The solver does not
consider fueling the truck before applying leave-town(town-l,ville-l}. Since the
truck cannot leave ville-l without extra fuel, PRODIGY fails to find a solution.

Now all preconditions are satisfied, and the solver's only choice is to apply
leave-town (Figure 2.26b). The application leads into an inescapable trap,
where the truck is stranded in ville-l without fuel. The solver may backtrack
and consider different instantiations of load, but they will eventually lead to
the same trap.
To avoid such traps, a solver must sometimes add operators for achieving
literals that are true in the current state and have not been linked with any
tail operators. Such literals are called anycase subgoals. The challenge is to
identify anycase subgoals among the preconditions of tail operators.
A simple method is to view all preconditions as anycase subgoals. Veloso
and Stone [1995] considered this approach in building a complete version of
FLEes; however, it caused an explosion in the number of subgoals, leading to
gross inefficiency.
Kambhampati and Srivastava [1996b] used a similar approach to ensure
the completeness of the Universal Classical Planner. Their system may add
operators for achieving preconditions that are true in the current state, if
these preconditions are not explicitly linked to the corresponding literals of
the state. Although this approach is more efficient than viewing all precon-
ditions as anycase subgoals, it considerably increases branching and often
makes the search impractically slow.
A more effective solution is to use information learned in failed branches
of the search space. Let us look again at Figure 2.26. The solver fails because
it does not add any operator to achieve the precondition (truck-at town-l) of
unload, which is true in the initial state. The solver tries to achieve this pre-
condition only when the application of leave-town has negated it; however,
after the application, the precondition can no longer be achieved.
We see that means-ends analysis may fail when some precondition is true
in the current state, but it is later negated by an operator application. We use
2.5 Completeness 69

(a) L-I _0_I_----,~,--o----O------l.


I is not anycase I is anycase
'f --~~--------------,
(b) I 0 I--~o--O I 0
(d):_ _ _ _ _ _ _I--~o--O
_ _ _ _ _ _ _ _ _ _ _ _ 11

~~----------~'I----------~~

Fig. 2.27. Identifying an anycase subgoal. When PRODIGY adds a new opera-
tor x (a), the preconditions of x are not anycase subgoals (b). If some application
negates a precondition l of x (e), the solver marks l as anycase and expands the
corresponding new branch of the search space (d).

this observation to identify anycase subgoals: a precondition or a goal literal


is an anycase subgoal if, at some point of search, an application negates it.
We illustrate a technique for identifying anycase sub goals in Figure 2.27.
Suppose that the solver adds an operator x, with a precondition l, to the
tail (node a in Figure 2.27). The solver creates the branch where l is not an
anycase subgoal (node b). If an application of some operator negates l and if
it was true before the application, then l is marked as anycase (node c). If
the solver fails to find a solution in this branch, it eventually backtracks to
node a. If l is marked as anycase, PRODIGY creates a new branch, where l is
an anycase subgoal (node d).
If several preconditions of x are marked as anycase, the solver creates the
branch where they all are anycase subgoals. During the exploration of this
new branch, the algorithm may mark other preconditions of x as anycase.
If it again backtracks to node a, then it creates a branch where the newly
marked preconditions are also anycase subgoals.
We now show how this mechanism works for the example problem.
The solver first assumes that the preconditions of unload(pack-l,town-l)
are not anycase subgoals. It builds the tail shown in Figure 2.26 and ap-
plies leave-town, negating the precondition (truck-at town-l) of unload. The
solver then marks this precondition as anycase.
Eventually, the algorithm backtracks, creates the branch where (truck-at
town-l) is an anycase subgoal, and uses leave-village(ville-l,town-l) to achieve
it (Figure 2.28). Then, it constructs the tail shown in Figure 2.28, which leads
to the solution "fuel(town-l), leave-town(town-l,ville-l), load(pack-l,ville-l),
leave-village(ville-l,town-l), unload(pack-l,town-l)."
When the solver identifies the set of all satisfied links (Section 2.4.1),
it does not include anycase links; hence, it never ignores the tail operators
that support anycase links. For example, consider the tail in Figure 2.28; the
70 2. Prodigy search

. . ;- - - - - - ~ (extra- ;-Ieav-e:- -
~
Imbal State (truck-at fuel ~ fuel) ~ village'
(truck -at town-Ir~ (town-I)' (truck-at ~ (ville-1 ~'tr k t
, "11 1) 7', town-i)' I. uc -a
town-I) ------" VI e- ------" town-I)
(at pack-I (in-truck
ville-I) pack-I)
(truck-at
town-I) truck-at
ville-I)
Fig. 2.28. Achieving the subgoal (track-at town-l), which is satisfied in the cur-
rent state. First, the solver constructs a three-operator tail, shown by solid rect-
angles. Then, it applies leave-town and marks the precondition (truck-at town-l)
of unload as an anycase sub goal. Finally, it backtracks and adds the two dashed
operators to achieve this subgoal.

anycase precondition (truck-at town-l) of unload holds in the state, but the
solver does not ignore the operators that support it.
We also have to modify the detection of goal loops, described in Sec-
tion 2.4.1. For instance, consider again Figure 2.28; the precondition (truck-at
town-l) of fuel makes a loop with the identical precondition of unload, but
the solver should not backtrack. Since this precondition of unload is an any-
case subgoal, it must not cause goal-loop backtracking. We use Figure 2.21
to generalize this rule: if the precondition l of x is an anycase subgoal, then
the identical precondition of z does not make a goal loop.

2.5.2 Clobbers among if-effects

We illustrate the other source of incompleteness, the use of if-effects, in Fig-


ure 2.29. The goal is to load fragile pack-l without breaking it. The problem
solver adds load(pack-l,town-l} to achieve (in-truck pack-l). The preconditions
of load and the goal "not (broken pack-l)" hold in the current state (Fig-
ure 2.29a), and the solver's only choice is to apply load. The application
causes the breakage of pack-l (Figure 2.29b), and no further search improves
the situation. The solver may try other instantiations of load, but they also
break the package.
The problem arises because an effect of load negates the goal "not (broken
pack-l};" we call it a clobber effect. The application reveals the clobber, and
PRODIGY backtracks and tries to find an operator that does not cause clob-
bering. If the clobber effect has no conditions, backtracking is the only way
to remedy the situation.
If the clobber is an if-effect, we can try to negate its conditions [Pednault,
1988a; Pednault, 1988b). It mayor may not be a good choice; perhaps, it
is better to apply the clobber and then re-achieve the negated subgoal. For
example, if we had a means for repairing a broken package, we could use it
instead of cushioning. We thus need to add a new decision point, where the
algorithm determines whether it should negate a clobber's conditions.
2.5 Completeness 71
r-------------------------------------------~

Initial State Goal Statement

(at pack-I
, town-I) (at pack-I (in-truck
pack-I)
(a) : town-I)
, (fra~le
pac -I) (truck-at not
(truck-at town-I) (broken
town-I) pack-I)
l----------------------r--------------------
_____________________ J_____________________ _
Initial State Current State Goal Statement

(at pack-I
(at pack-I town-I) (in-truck
town-I) (fragile pack-I)
(b)
(fragile paCk-I)
pack-I) (broken not
pack-I) (broken
(truck-at (truck-at pack-I)
town-I)
, town-I)
,--------------------------------------------
Fig. 2.29. Failure because of a clobber effect. The application of the load operator
results in breaking the package, and no further actions can undo this damage.

negate e's conditions


- - - - - - - - --,
~ -:::;::,., - - -~-
(d)' 0
not d x
:____ ~o_n____________ ~
:

Fig. 2.30. Identifying a clobber effect. When the solver adds a new operator x (a),
it does not try to negate the conditions of x's if-effects (b). When applying x,
PRODIGY checks whether if-effects negate any subgoals that were true before the
application (c). If an if-effect e of x negates some subgoal, the solver marks e as a
clobber and adds the corresponding branch to the search space (d).

Introducing this new decision point for every if-effect will ensure com-
pleteness, but it may considerably increase branching. We avoid this problem
by identifying potential clobbers among if-effects. We detect them in the
same way as anycase subgoals. An if-effect is marked as a potential clobber
if it actually deletes some subgoal in one of the failed branches. The deleted
subgoal may be a goal literal, an operator precondition, or a condition of
an if-effect that achieves another sub goal. Thus, we again use information
learned in failed branches.
72 2. Prodigy search

Initial State Goal Statement

(at pack-I
town-I)
(truck-at (broken not
town-I) pack-I) (broken
pack-I)
1________________ ______________ , ___________________ 1
,
7
/

do not worry about delete the conditions of

:0
(broken pack-I) (broken pack-I)
r------------ p -----------, ----------~--------------

0
load (at pack-I
'(pack-I, town-I)
~ ________t~~~-~~------------. (~~;~=~~ ~;~~k-I,
, not town-I)
r----~-l~~~~d~-~-~-~-J---------o---: (~:~~)

.------
(pack-I,
town-I)
---;~~-- - - ---" i0---
(~i~t~Y) ,'- ------------, -------------
:

L
-~~~-- -t*~l;
_________________________
V
-----,
__ I

Fig. 2.31. Negating a clobber. When the load operator is applied, its if-effect
deletes one of the goal literals. The solver backtracks and adds cushion, which
negates the conditions of this if-effect.

We illustrate the mechanism for identifying clobbers in Figure 2.30. Sup-


pose that the solver adds an operator x with an if-effect e to the tail, and
that this operator is added for the sake of its other effects (node a in Fig-
ure 2.30); that is, e is not linked to a subgoal. Initially, PRODIGY does not
try to negate e's conditions (node b). If x is applied and its effect e negates
a subgoal that was true before the application, then the solver marks e as a
potential clobber (node c). If the solver fails to find a solution in this branch,
it backtracks to node a. If e is now marked as a clobber, the solver adds the
negation of e's conditions, cand, to the operator's preconditions (node d). If
an operator has several if-effects, the solver uses a separate decision point for
each of them.
In the example of Figure 2.29, the application of load(pack-I,town-I)
negates the goal "not (broken pack-I)," and PRODIGY marks the if-effect of
load as a potential clobber (Figure 2.31). Upon backtracking, the solver adds
the negation of the clobber's condition (fragile pack-I) to the preconditions of
load. It uses cushion to achieve this new precondition and generates the
solution "cushion(pack-I), load(pack-I. town-I)."
2.5 Completeness 73

RASPUTIN-Backward-Chainer
lc. Pick a literal 1 among the current subgoals.
Decision point: Choose one of the subgoal literals.
2c. Pick an operator or inference rule step that achieves I.
Decision point: Choose one of such operators and rules.
3c. Add step to the tail and establish a link from step to l.
4c. Instantiate the free variables of step.
Decision point: Choose an instantiation.
5c. If the effect that achieves 1 has conditions,
then add them to step's preconditions.
6c. Use data from the failed descendants to identify anycase preconditions of step.
Decision point: Choose anycase subgoals among the preconditions.
7c. If step has if-effects not linked to l, then:
use data from the failed branches to identify clobber effects;
add the negations of their conditions to the preconditions.
Decision point(s): For every clobber, decide whether to negate its conditions.

Fig. 2.32. Backward-chaining procedure of RASPUTIN. It includes new decision


points (lines 6c and 7c), which ensure completeness of PRODIGY means-ends analysis.

We have implemented the extended search algorithm, called RASPUTIN 1 ,


which achieves anycase subgoals and negates the conditions of clobbers. Its
main difference from PRODIGy4 is the backward-chaining procedure, summa-
rized in Figure 2.32. We show its decision points in Figure 2.33, where thick
lines mark the points absent in PRODIGY.

2.5.3 Other violations of completeness

The PRODIGY system has two other sources of incompleteness, which arise
from the advanced features of the domain language. We have not addressed
them, and their use may violate the completeness of the extended algorithm.
Eager inference rules. If a domain includes eager rules, the system may
fail to solve simple problems. For example, consider the rules in Figure 2.14
and the problem in Figure 2.34(a), and suppose that add-truck-in is an
eager rule. The truck is initially in town-I, within county-I, and the goal is to
leave this county.
Since the preconditions of add-truck-in hold in the initial state, the sys-
tem applies it at once and adds (truck-in county-I), as shown in Figure 2.34(b).
If the solver applied leave-town(town-l.town-2}, it would negate the rule's
preconditions and delete (truck-in county-I). In other words, the application of
leave-town would immediately solve the problem.
The system does not find this solution because it inserts new operators
only when they achieve some subgoal, whereas the effects of leave-town do
1 The Russian mystic Grigori Rasputin used the biblical parable of the Prodigal
Son to justify his debauchery. He tried to make the story of the Prodigal Son as
complete as possible, which is similar to our goal. Furthermore, his name comes
from the Russian word rasputie, which means decision point.
74 2. Prodigy search

Ic. Choose an unachieved literal

2c. Choose an operator or inference


rule that achieves this literal

6c. Choose anycase subgoals


among its preconditions

< 7c. For every clobber effect, decide ::>


< whether to negate its conditions ::>
Fig. 2.33. Decision points of RASPUTIN'S backward chainer, given in Figure 2.32.
Thick lines show the points that differentiate it from PRODIGY (Figure 2.9).

...---- Set of Objects Initial State


(truck-at town-I) Goal Statement
town-I, town-2: type Town
county-I, county-2: type County (within town-I county-I) not (truck-in county-I)
(within town-2 county-2)

(a) Example problem: The truck has to leave county-I.

Initial State Current State


/(truck-at
(truck-at town-I) Goal Statement
town-I) (truck-in
(within add-truck-in county-I)
town-I I - (town-I, I - (within
county-I) town-I
county-I) county-I)
(within
town-2 (within
county-2) town-2
county-2)

(b) Application of an inference rule.

Fig. 2.34. Failure because of an eager inference rule. PRODIGY does not negate the
preconditions of the rule add-truck-in(town-l,county-l}, which clobbers the goal.
2.5 Completeness 75

Set of Objects
Type ffierarchy op«var» obj: type Root-Type
I <var>: type Root-Type
Root-Type teste <var»
Eff: add (pred <var>)
[ Goal Statement
(pred obj)
J
Fig. 2.35. Reducing an arbitrary decision task to a PRODIGY problem. The test
function is a Lisp procedure that encodes the decision task. Since functional types
allow encoding of undecidable problems, they cause incompleteness.

not match the goal statement. Since the domain has no operators with a
matching effect, the solver terminates with failure. To summarize, the solver
sometimes has to negate the preconditions of eager rules that have clobber
effects, but it does not consider this option.
Functional types. We next show that functional types enable the user to
encode every computational decision task as a PRODIGY problem. Consider
the artificial domain and problem in Figure 2.35. If the object obj satisfies
the test function, the solver uses the operator op(obj) to achieve the goal;
otherwise, the problem has no solution.
The test function may be any Lisp program, and it has access to all data
in the PRODIGY architecture. This flexibility allows the user to encode any
decision problem, including undecidable and semi-decidable tasks, such as the
halting problem. Thus, we may use functional types to specify undecidable
PRODIGY problems, and the corresponding completeness issues are beyond
the classical search.

2.5.4 Completeness proof

If a domain has no eager inference rules or functional types, the extended


solver is complete. To prove it, we show that, for every solvable problem,
some sequence of choices in the solver's decision points leads to a solution.
Suppose that a problem has a solution "stePl' steP2' steP3' ... , stePn,"
where every step is either an operator or a lazy inference rule, and that no
other solution has fewer steps. We begin by defining clobber effects, subgoals,
and justified effects in the complete solution.
A clobber is an if-effect such that (1) its conditions do not hold in the
solution and (2) if we applied its actions anyways, they would make the
solution incorrect.
A subgoal is a goal literal or precondition such that either (1) it does
not hold in the initial state or (2) it is negated by some prior operator or
inference rule. For example, a precondition of steP3 is a subgoal if either it
does not hold in the initial state or it is negated by stepl. Every subgoal in a
correct solution is achieved by some operator or inference rule; for example,
if stePl negates a precondition of steP3, then steP2 must achieve it.
76 2. Prodigy search

Fig. 2.36. Converting a shortest complete solution into a tail. We link every subgoal
to the last operator or inference rule that achieves it, and use these links to construct
a tree-structured tail.

A justified effect is the last effect that achieves a subgoal or negates a


clobber's conditions. For example, if stePl, steP2' and steP3 all achieve some
subgoal precondition of steP4, then the corresponding effect of steP3 is justified
since it is the last among the three.
If a condition literal in a justified if-effect does not hold in the initial state,
or if it is negated by some prior step, we consider it a subgoal. Note that the
definition of such a subgoal is recursive; we define it through a justified effect,
and a justified effect is defined in terms of a subgoal in some step that comes
after it.
Since we consider a shortest solution, each step has at least one justified
effect. If we link each subgoal and each clobber's negation to the correspond-
ing justified effect, we may use the resulting links to convert the solution
into a tree-structured tail, as illustrated in Figure 2.36. If a step is linked to
several subgoals, we use anyone of these links in the tail.
We now show that the extended algorithm can construct this tail. If no
subgoal holds in the initial state and the solution has no clobber effects, the
tail construction is straightforward. The nondeterministic algorithm creates
the desired tail by always calling Backward-Chainer rather than applying op-
erators, choosing subgoals that correspond to the links of the desired solution
(Line Ic in Figure 2.32), selecting the appropriate operators and inference
rules (Line 2c), and generating the right instantiations (Line 4c).
If some subgoal literal holds in the initial state, the solver first builds
a tail that has no operator linked to this subgoal. Then, the application of
some step negates the literal, and the solver marks it as an anycase subgoal.
The algorithm can then backtrack to the point before the first application
and choose the right operator or inference rule for achieving the subgoal.
Similarly, if the solution has a clobber effect, the algorithm can detect it by
applying operators and inference rules. It can then backtrack to the point
before the applications and add the right step for negating the clobber's
conditions. Note that, even if the solver always makes the right choice, it
may have to backtrack for every subgoal that holds in the initial state and
for every clobber effect.
Eventually, the algorithm constructs the desired tail and no head. It can
then produce the complete solution by always applying tail steps, rather than
adding new operators, and selecting applicable steps in the right order.
2.5 Completeness 77

(a) Logistics Domain. (b) Trucking Domain.


10.--------------~

"0
Ql
+01\; "0
Ql
o
(/) (/)

=> =>
a. o
a.
!2. !2.
z z
i= i=
=>
a.
=>
a.
lIE lIE
~ ~
a: a:
0.1 "'-_ _ _ ~ _ _ _ _...J 0.1 "'-----~------'
0.1 1 10 0.1 1 10
PRODIGY (CPU sec) PRODIGY (CPU sec)

Fig. 2.37. Comparison of RASPUTIN and PRODIGY in the (a) Logistics Domain
and (b) Trucking Domain. The horizontal axes give the search times of PRODIGY,
whereas the vertical axes show the efficiency of RASPUTIN on the same problems.
Pluses (+) and asterisks (*) mark the problems solved by both algorithms, whereas
circles (0) are the ones solved only by RASPUTIN.

2.5.5 Perfonnance of the extended solver

We have tested RASPUTIN in three domains and compared it with PRODIGy4.


We present data on the relative efficiency of the two solvers and show that
RASPUTIN solves more problems than PRODIGy4.
We first give results for the PRODIGY Logistics Domain [Veloso, 1994]; the
task is to construct plans for transporting packages by vans and airplanes.
The domain consists of several cities, each of which has an airport and postal
offices. We use airplanes for carrying packages between airports, and vans for
delivery within cities. This domain has no if-effects and does not give rise to
situations that require achieving anycase subgoals; thus, PRODIGy4 performs
better than the complete algorithm.
We have run both solvers on fifty problems, which differ in the number
of cities, vans, airplanes, and packages. We have randomly generated initial
locations of packages, vans, and airplanes, as well as destinations of packages.
The results are summarized in Figure 2.37(a), where each plus (+) denotes a
problem. The horizontal axis shows PRODIGY'S running times, and the vertical
axis gives RASPUTIN's times for the same problems. Since PRODIGY is faster
on all problems, all pluses are above the diagonal. The ratio of RASPUTIN'S
to PRODIGY's time varies from 1.20 to 1.97; its mean is 1.45.
We have run similar tests in the PRODIGY Process-Planning Domain
[Gil, 1991]' which also does not require negating if-effect conditions or achiev-
ing anycase subgoals. The task is to construct plans for making mechanical
parts with specified properties, using available machining equipment. The
ratio of RASPUTIN'S to PRODIGY'S time in this domain is between 1.22 and
1.89, with the mean at 1.39.
78 2. Prodigy search

We next show results in an extended version of the Trucking Domain. We


now use multiple trucks and connect towns and villages by roads. A truck can
go from one place to another only if there is a road between them. We have
experimented with different numbers of towns, villages, trucks, and packages.
We have randomly generated road connections, initial locations of trucks and
packages, and destinations of packages.
In Figure 2.37(b), we show the performance of PRODIGY and RASPUTIN on
fifty problems. The twenty-two problems denoted by pluses (+) do not require
the clobber negation or anycase subgoals. PRODIGY outperforms RASPUTIN
on these problems, with a mean ratio of 1.27.
The fourteen problems denoted by asterisks (*) require the use of anycase
subgoals or the negation of clobbers' conditions for finding an efficient solu-
tion, but they can be solved inefficiently without it. RASPUTIN wins on twelve
of these problems and loses on two. The ratio of PRODIGY's to RASPUTIN's
time varies from 0.90 to 9.71, with the mean at 3.69. This ratio grows with
the number of required anycase subgoals.
Finally, the circles (0) show the sixteen problems that cannot be solved
without anycase subgoals and the negation of clobbers. PRODIGY hits the
10-second time limit on some problems and terminates with failure on the
others, whereas RASPUTIN solves all of them.

2.5.6 Summary of completeness results

We have extended PRODIGY search in collaboration with Blythe. The new


algorithm is complete for a subset of the PRODIGY domain language. The full
language includes two features that may violate completeness: eager inference
rules and functional types. To our knowledge, the extended solver is the
first complete algorithm among general-purpose systems that use means-ends
analysis. It is about 1.5 times slower than PRODIGy4 on the problems that
do not require negating clobbers' conditions and achieving anycase subgoals;
however, it solves problems that PRODIGY cannot solve.
We have built the extended solver in three steps. First, we have identified
the specific reasons for incompleteness of previous systems. Second, we have
added new decision points to eliminate these reasons, without significant
increase of the search space. Third, we have implemented an algorithm that
begins by exploring the branches of the old search space; it extends the space
only after failing to find a solution in the old space. We conjecture that
this three-step approach may prove useful for enhancing other incomplete
algorithms.
3. Primary effects

The use of primary effects in goal-directed search is an effective approach to


reducing the search time. The underlying idea is to identify important effects
of every operator and to apply operators only for achieving their important
effects. For example, the main effect of lighting a fireplace is to heat the
house; we call it a primary effect. If we have lamps in the house, illumination
is not an important result of using the fireplace. We view it as a side effect,
which means that we would not light the fireplace just for illumination.
Researchers have long recognized the advantages of this approach and
incorporated it into a number of AI systems. For example, Fikes and Nils-
son [1971; 1993] used primary effects to improve the quality of solutions
generated by STRIPS. Wilkins [1984] distinguished between main and side ef-
fects in SIPE and used this distinction to simplify conflict resolution. Yang et
al. [1996] provided a mechanism for specifying primary effects in ABTWEAK.
Researchers have also used primary effects to improve the effectiveness of
abstraction search. In particular, Yang and Tenenberg [1990] employed a com-
bination of abstraction and primary effects in ABTWEAK, and Knoblock [1993]
utilized primary effects in the automatic generation of abstraction hierarchies.
Despite the importance of primary effects, this notion long remained at an
informal level, and the user was responsible for identifying important effects.
We have studied the use of primary effects in collaboration with Yang;
this work has revealed that primary effects may exponentially improve
the efficiency, but choosing them appropriately is often a difficult task
[Fink and Yang, 1997]. An improper selection of primary effects can cause
three major problems. First, it may compromise completeness; that is, it
may cause a failure on a solvable problem. For example, if the fireplace is the
only source of light, but illumination is not its primary effect, then we cannot
solve the problem of illuminating the room. Second, primary effects may lead
to costly solutions; for instance, electric lamps may be more expensive than
firewood. Third, the use of primary effects may increase the search depth,
which sometimes leads to an exponential increase of the search time.
The work with Yang has led to an inductive learning algorithm that au-
tomatically selects primary effects. The formal analysis has shown that the
resulting selection exponentially reduces the search and ensures a high prob-
ability of completeness. We have used ABTWEAK to test this technique and

E. Fink, Changes of Problem Representation


© Springer-Verlag Berlin Heidelberg 2002
82 3. Primary effects

I--w-=---I:- -
I

go break
(a) Example of a connected world. (b) Operators.

Fig. 3.1. Simple robot world. The robot may go through doorways and break
through walls. We view change of the robot's position as a side effect of break.

confirmed analytical predictions with empirical results. Later, we have im-


plemented search with primary effects in PRODIGY. The main difference from
the work with Yang is the use of primary effects with depth-first search, as
opposed to ABTWEAK'S breadth-first strategy.
We report the results of the joint work with Yang and their extension
for PRODIGY. First, we explain the use of primary effects in goal-directed
reasoning (Section 3.1), derive a condition for preserving completeness (Sec-
tion 3.2), and analyze the resulting search reduction (Section 3.3). Then, we
describe the automatic selection of primary effects. A fast heuristic proce-
dure produces an initial selection of primary effects (Section 3.4), and then
a learning algorithm revises the initial selection (Section 3.5). Finally, we
describe experiments with the resulting primary effects. The first series of
experiments is with ABTWEAK, which uses breadth-first search (Section 3.6).
The second series is with the PRODIGY depth-first search (Section 3.7).

3.1 Search with primary effects

We describe the use of primary effects and discuss the related trade-off be-
tween efficiency and solution quality. We measure the quality of a solution by
the total cost of its operators. We give motivating examples (Section 3.1.1),
formalize the notion of primary effects (Section 3.1.2), and explain their role
in backward chaining (Section 3.1.3). The described technique works for most
goal-directed solvers, but it is not applicable to forward-chaining algorithms.

3.1.1 Motivating examples

We give informal examples that illustrate the two main uses of primary effects:
reducing the search and improving the solution quality.
Robot world. First, we describe a simple version of the STRIPS world
[Fikes and Nilsson, 1971], which includes a robot and several rooms (Fig-
ure 3.1a). The robot can go between two rooms connected by a door, as well
as break through a wall, thus creating a new doorway (Figure 3.1b).
Suppose that the robot world is connected; that is, it has no regions
completely surrounded by walls. Then, we can view the location change as
3.1 Search with primary effects 83

a side effect of the break operator, which means that we use this operator
only for making new doorways. If the only goal is to move the robot, the
solver disregards break and uses go operators. This restriction reduces the
branching factor and improves the efficiency of search (see Section 3.3).
Machine shop. Next, consider a machine shop that allows cutting, drilling,
polishing, painting, and other machining operations. A solver has to generate
plans for producing mechanical parts of different quality. The production of
higher-quality parts requires more expensive operations. The solver can use
expensive operations to make low-quality parts, which simplifies the search
but leads to suboptimal solutions. For example, it may choose high-precision
drilling instead of normal drilling, which would result in a costly solution.
The use of primary effects enables the system to avoid such situations.
For example, we may view making a hole as a side effect of high-quality
drilling, and the precise position of the hole as its primary effect. Then,
the solver chooses high-quality drilling only when precision is important.
In Section 3.6.2, we give a formal description of this domain and present
experiments on the use of primary effects to select machining operations.

3.1.2 Main definitions

We extend the robot example and use it to illustrate the main notions related
to primary effects. The extended domain includes a robot, a ball, and four
rooms (Figure 3.2a). To describe its current state, we have to specify the
location of the robot and the ball, and list the pairs of rooms connected by
doors. We use three predicates, (robot-in <room», (ball-in <room», and (door
<from> <to», as shown in Figures 3.2(b) and 3.2(c). For example, the literal
(robot-in room-I) means that the robot is in room-I, (ball-in room-4) means that
the ball is in room-4, and (door room-I room-2) means that room-I and room-2
are connected by a doorway.
The robot can go between two rooms connected by a door, carry the ball,
throw it through a door into an adjacent room, and break through a wall. We
encode these actions by the four operators in Figure 3.2(d). In addition, the
domain includes the inference rule add-door (Figure 3.2e), which ensures
that every door provides a two-way connection; that is, (door <to> <from»
implies (door <from> <to».
The operators in the robot domain have constant costs (Figure 3.2d),
which is a special case of cost functions (see Section 2.3.1). We measure the
quality of a solution by the total cost of its operators; the smaller the cost,
the better the solution. An optimal solution to a given problem is a com-
plete solution that has the lowest cost. For instance, suppose that the solver
needs to move the robot from room-I to room-4, and it uses three go op-
erators, "go(room-l,room-2), go(room-2,room-3), go(room-3,room-4)," with the
total cost of 2 + 2 + 2 = 6. This solution is not optimal because the same goal
can be achieved by the operator break(room-l,room-4) with a cost of 4.
84 3. Primary effects

go( <from>, <to»


room-3 room-2 throw( <from>, <to»
<from>, <to>: type Room
<from>, <to>: type Room
,'0 room-4 Pre: (robot-in <from»
(door <from> <to»
Pre: (robot-in <from»
(ball-in <from»
room-J Eff: del (robot-in <from»
(door <from> <to»
add (robot-in <to»
I Cost: 2
Eff: del (ball-in <from»
(a) Map of the robot world. add (ball-in <to»
carry«from>,<to» Cost: 2

c~
<from>, <to>: type Room
Pre: (robot-in <from» break(<from>, <to»
(ball-in <from» <from>, <to>: type Room
Iroo~~~m_2 roo~~~m-41 (door <from> <to»
Eff: del (robot-in <from»
Pre: (robot-in <from»
Eff: del (robot-in <from»
objects add (robot-in <to» add (robot-in <to»
(b) Objects and their type. del (ball-in <from» add (door <from> <to»
add (ball-in <to» Cost: 4
(robot-in room-I) Cost: 3
(ball-in room-4) (d) Operators.
(door room-I room-2)
(door room-2 room-I) Inf-Rule add-door«from>, <to»
(door room-2 room-3)
<from>, <to>: type Door
(door room-3 room-2)
(door room-3 room-4) Pre: (door <to> <from»
(door room-4 room-3) Eff: add (door <from> <to»

(c) State description. (e) Inference rule.

Fig. 3.2. Robot Domain. We give an example of a world state (a), the encoding
of this state (b, c), and the list of operators and inference rules (d, e). The robot
can go between rooms, carry and throw a ball, and break through walls.

If an operator has several effects, we may choose certain important effects


among them and use the operator only for achieving these important effects.
The chosen effects of the operator are its primary effects, and the others are
side effects. For example, we can view (door <from> <to» as a primary effect
of break, and (robot-in <to» as its side effect. We choose primary effects
among both unconditional and conditional effects. If a conditional effect has
several actions, we can divide them into primary and side actions.
Inference rules may also have primary and side effects. For example, if the
effect of add-door is a side effect, PRODIGY never adds it to the tail. Note that
primary effects do not affect the forced application of eager inference rules.
If an eager rule has no primary effects, then Backward-Chainer disregards it,
but the system applies it in the forced forward chaining.
We now define the notion of primary-effect justification, which character-
izes solutions constructed with primary effects; it is similar to the definition of
3.1 Search with primary effects 85

Fig. 3.3. Definition of a justified effect. If a literal l is a precondition of y and


an effect of x, and no operator or inference rule between x and y has an identical
effect, then l is a justified effect of x.

Initial State (robot-in


(robot-in room-I) (doorroom-2 room-3) room-4)
(ball-in room-4) (door room-3 room-2) (robot-in (ball-in
room-I) room-4)
(door room-I room-2) (doorroom-3 room-4) L..::..:..:..::::....:..::.y (door
(doorroom-2 room-I) (doorroom-4 room-3 room-4
room-3)
(a) Using the break operator to get into room-4.
Initial
State
(robot-in ,..-_---,
room-I)
(door
room-I
~Om-2)

(b) Using a sequence of go operators.

Fig. 3.4. Two solutions for moving the ball into room-3. To reach the ball, the
robot can either break a wall (a) or go around through room-2 and room-3 (b).
Arrows show the achievement of the goal literal and operator preconditions.

justification by Knoblock et al. [1991]. This notion is independent of specific


solvers; it helps to identify general properties of search with primary effects.
We have already defined a justified effect of an operator or inference rule in
a complete solution (see Section 2.5.4). Recall that an effect literal is justified
if it is the last effect that achieves a subgoal or negates the conditions of
a clobber. We illustrate this notion in Figure 3.3, where 1 is a precondition
of y and a justified effect of x. For example, consider the task of bringing the
ball to room-3 and its solution in Figure 3.4(a). The effect (robot-in room-4) of
break and the effect (ball-in room-3) of throw are justified, whereas the other
effects are not.
Note that an effect literal may be justified even if it holds before the
execution of the operator. To repeat the related example from Section 2.5.4,
consider some solution step!, steP2, ... , stePn and suppose that step!, steP2,
and steP3 all achieve some precondition of steP4. Then, the corresponding
effect literal of steP3 is justified since it is the last among them, whereas the
identical effects of step! and steP2 are not justified.
A complete solution is primary-effect justified if every operator and every
lazy rule has a justified primary effect. Informally, it means that no operator
or inference rule is used for its side effects. For example, suppose that the
primary effects are as shown in Table 3.1(a); in particular, (robot-in <to»
is a primary effect of go and a side effect of break. Then, the solution in
86 3. Primary effects

Table 3.1. Selections of primary effects for the Robot Domain in Figure 3.2. Ob-
serve that the first selection does not allow PRODIGY to achieve deletion goals, such
as "not (robot-in room-l)."
(a)
operators primary effects
got <from>.<to» add (robot-in <to»
carry( <from>.<to» add (ball-in <to»
throw( <from>.<to» add (ball-in <to»
break( <from>.<to» add (door <from> <to»
add-door( <from>.<to»

(b)
operators primary effects
got <from>.<to» del (robot-in <from», add (robot-in <to»
carry( <from>.<to» add (ball-in <to»
throw( <from>.<to» del (ball-in <from», add (ball-in <to»
break( <from>.<to» add (door <from> <to»
add-door( <from>.<to»

Figure 3.4(a) is not primary-effect justified because it does not utilize the
primary effect of break. On the other hand, the solution with go operators
in Figure 3.4(b) is justified.

3.1.3 Search algorithm

We now explain the role of primary effects in goal-directed reasoning. When


adding a new operator to an incomplete solution, a solver selects it among the
operators with primary effects matching the current subgoal. After inserting
the selected operator into the solution, the solver may utilize its side effects to
achieve other subgoals. If the solver uses inference rules in backward chaining,
the same restrictions apply to the choice of matching rules. This general
principle underlies the ABTWEAK and PRODIGY implementations of search
with primary effects.
In Figure 3.5, we give the corresponding modification of the PRODIGY
backward-chainer, which differs from the original procedure only in Step 2c.
We do not modify the other two procedures, Base-PRODIGY and Operator-
Application (Figure 2.8). For example, suppose that the primary effects are
as shown in Table 3.1(a) and consider a problem with two goals, (door room-l
room-4) and (ball-in room-3), as shown in Figure 3.6(a). The original procedure
would consider two alternatives for achieving the first goal: the break op-
erator and the add-door rule. On the other hand, Prim-Back-Chainer uses
break because add-door has no primary effects.
When PRODIGY applies an operator, it uses both primary and side effects
in updating the current state, and it may later utilize useful side effects. For
3.1 Search with primary effects 87

Prim-Back-Chainer
lc. Pick a literal l among the current subgoals.
Decision point: Choose one of the subgoal literals.
2c. Pick an operator or inference rule step
that achieves l as a primary effect.
Decision point: Choose one of such operators and rules.
3c. Add step to the tail and establish a link from step to l.
4c. Instantiate the free variables of step.
Decision point: Choose an instantiation.
5c. If the effect achieving l has conditions,
then add them to step's preconditions.

Fig. 3.5. Backward-chaining procedure that uses primary effects. When adding a
new step to the tail, Prim-Back-Chainer chooses among the operators and inference
rules with primary effects matching the current subgoal (line 2c).

,r------------------ (a) Achieving the first goal.


Goal
Initial State Statement
, (robot-in room-I) (door room-2 room-3) (door
, (ball-in room-4) (door room-3 room-2) . break ". room-I
(robot-m~ (room-I / room-4)
, (door room-I room-2) (door room-3 room-4) room-I) - ,
room-4) (ball-in
, (door room-2 room-I) (door room-4 room-3) room-3)
J--------------------------- r - - - - - - - - - - - - - - - - - - - - - - - - - -
~
~
- - - - - - - - - - (b) Achieving the second goal. - - - - - - - - - -,
: Initial Goal ,
: State Statement :
(robot-in
room-4) (door
break (ball-in throw room-I
(room-I, room-4) (room-4, room-4)
room-4) (door room-3) all-in
room-4 room-3) ,
, room-3) ,
-------------------r-------------------
~
- - - - - - - - - - - - - - - (c) Complete solution. - - - - - - - - - - - - - - ~
Initial Goal ,
State Statement '
(robot-in
room-4)
break (ball-in
(robot-in (room-I,
room-I) room-4)
room-4) (door
room-4
room-3)
___________________________________________ J

Fig. 3.6. PRODIGY search with primary effects. Note that break is the only operator
that achieves the first goal as a primary effect, and throw is the only match for
the second goal.
88 3. Primary effects

Goal
Initial State (robot-in Statement
room-I)
(robot-in room-I) (door room-2 room-3) (door
(ball-in room-4) (door room-3 room-2) room-I
(robot-in room-4)
(door room-I room-2) (door room-3 room-4) <from> (ball-in
(door room-2 room-I) (door room-4 room-3) (ball-in room-3)
<from>
(door
<from>
room-3)
---------------------------r--------------------------
~----------------------J------------------------I
Goal
break Statement
(robot-in (room-I,
room-I)
room-4)
robot-in
room-4)
(ball-in
room-4)
(door
room-4
room-3)

Fig. 3.7. ABTWEAK search with primary effects.

example, suppose that it applies break and then adds throw(room-4,room-3)


to achieve (ball-in room-3), as shown in Figure 3.6(b). Since break achieves the
precondition (robot-in room-4) of throw, PRODIGY can apply throw. Thus, it
uses two effects of break (Figure 3.6c), one of which is a side effect.
ABTWEAK also chooses operators with primary effects matching the cur-
rent subgoal, but it has a different mechanism for utilizing side effects. After
inserting an operator into an incomplete solution, the system unifies the op-
erator's effects with matching subgoals. For example, it may add break and
throw for achieving the two goals, and then unify a side effect of break with
the corresponding precondition of throw (Figure 3.7). The reader may find a
description of this algorithm in the articles on ABTWEAK [Yang and Murray,
1994; Yang et al., 1996; Fink and Yang, 1997].
Observe that, if all effects of all operators and inference rules are primary,
their use is identical to the search without primary effects. This rule holds
for all solvers that use primary effects, including ABTWEAK and PRODIGY.

3.2 Completeness of primary effects

We discuss possible problems of search with primary effects and ways to avoid
them (Section 3.2.1). We then derive a condition for completeness of search
(Section 3.2.2), which underlies the algorithm for learning primary effects.
3.2 Completeness of primary effects 89

3.2.1 Completeness and solution costs


Primary effects reduce the search space and usually lead to pruning some
solution branches. An inappropriate selection of primary effects may result
in pruning all solutions, thus causing a failure on a solvable problem. For ex-
ample, consider the robot world with the primary effects in Table 3.I(a) and
suppose that the robot has to vacate room-l; that is, the goal is "not (robot-in
room-l)." The robot can achieve it by going to room-2 or by breaking into
room-3 or room-4. If a solver uses primary effects, it will not find either solu-
tion because "del (robot-in <room»" is not a primary effect of any operator.
To preserve completeness, we have to select additional primary effects (Ta-
ble 3.lb). We discuss two factors that determine completeness: the existence
of a primary-effect justified solution and the solver's ability to find it.
Complete selections. If every solvable problem in a domain has a primary-
effect justified solution, we say that the selection of primary effects is com-
plete. This definition of a complete selection is independent of a specific
solver. For example, the choice of primary effects in Figure 3.I(a) is in-
complete, which causes a failure on some problems, whereas the selection
in Figure 3.I(b) is complete. Observe that, if a selection includes all effects of
all operators and inference rules, then it is trivially complete, which implies
that every domain has at least one complete selection.
Primary-effect complete search. If a solver uses primary effects and can
solve every problem that has a primary-effect justified solution, it is called
primary-effect complete. The PRODIGY search does not satisfy this condition.
On the other hand, the extended algorithm of Section 2.5 is primary-effect
complete, if the domain does not include eager inference rules and functional
types. The proof is similar to the completeness proof in Section 2.5.4.
Note that, if all effects are primary, then a solver with this completeness
property finds a solution for every solvable problem; hence, primary-effect
completeness implies standard completeness. The reverse does not hold; that
is, a complete solver may not be primary-effect complete. For example, con-
sider the extended algorithm and suppose that we disable the backtracking
over the choice of a subgoal in Line Ic (Figure 2.32). This modification pre-
serves standard completeness but compromises primary-effect completeness.
To illustrate the incompleteness, we consider the primary effects in Ta-
ble 3.I(a) and the problem with two goals, "not (robot-in room-l)" and "(robot-
in room-2)." It has a simple primary-effect justified solution (Figure 3.8), but
the modified algorithm may fail. If the algorithm begins by choosing the first
goal, it does not find a matching primary effect and terminates with failure.
Cost increase. Even if primary effects do not compromise completeness,
they may lead to finding a costly solution. For example, suppose that the
robot is initially in room-4, and the goal is (door room-l room-4). The optimal
solution utilizes the inference rule: "break(room-4,room-l), add-door(room-
l,room-4)." If the solver uses add-door in backward chaining, it can construct
the tail in Figure 3.9(a), which leads to the optimal solution.
90 3. Primary effects

Goal
Initial State Statement
(robot-in room-I) (door room-2 room-3) (robot-in not
(ball-in room-4) (door room-3 room-2) room-I) robot-in
(door room-I)
(door room-I room-2) (door room-3 room-4) room-I (robot-in
(door room-2 room-I) (door room-4 room-3) room-2) room-2)

Fig. 3.8. Solution that requires backtracking over the choice of a subgoal. The
thick arrow marks the justified primary effect.

Goal
Statement
(robot-in (door
room-4) room-4
room-I)
(a) Backward chaining without primary effects. Goal
(robot-in ,-_--, (robot-in ,-_---, Statement
room-4) room-3)
~----,

(door (door (door


room-4 room-3 room-2 room-I)
room-3) room-2) room-I)
(b) Backward chaining using primary effects.

Fig. 3.9. Tails that lead to (a) the optimal solution and (b) the best primary-effect
justified solution.

Now suppose we use the primary effects in Table 3.1(b). Since add-door
has no primary effects, the solver chooses break(room-l,room-4) for achieving
the goal, builds the tail in Figure 3.9(b), and produces the solution "go(room-
4,room-3), go(room-3,room-2), go(room-2,room-l), break(room-l,room-4)."
In this example, the minimal cost of a primary-effect justified solution is
2 + 2 + 2 + 4 = 10, whereas the cost of the optimal solution is 4. The ratio of
these costs, 10/4 = 2.5, is called the cost increase for the given problem; it
measures the deterioration of solution quality due to primary effects.

3.2.2 Condition for completeness

To ensure completeness, we have to provide a complete selection and a


primary-effect complete solver. The first of the two requirements is the main
criterion for selecting primary effects. We now derive a condition for satisfying
this requirement.
Replacing sequence. Consider some fully instantiated operator or infer-
ence rule, denoted step, an initial state I that satisfies the preconditions
of step, and the goal of achieving all side effects of step and preserving all fea-
tures of the initial state that are not affected by step. Formally, this goal is the
conjunction of the side effects of step and the initial-state literals unaffected
by step. We give an algorithm for computing this conjunction in Figure 3.10.
Since the resulting goal is a function of step and I, we denote it by G(step, I).
3.2 Completeness of primary effects 91
Generate-Goal(step, 1)
The input includes an instantiated operator or inference rule, step,
and an initial state I that satisfies the preconditions of step.
Create a list of literals, which is initially empty.
For every side-effect literal side of step:
If step adds this literal, then add side to the list of literals.
If step deletes this literal, then add "not side" to the list of literals.
For every literal init of the initial state I:
If step does not add or delete this literal, then add in it to the list of literals.
Return the conjunction of all literals in the list.

Fig. 3.10. Generating the goal G(step, I} of a replacing sequence. When the algo-
rithm processes the effect literals of step, it considers all unconditional effects and
the if-effects whose conditions hold in the state I.

If a solver applies step to the initial state, it achieves all goal literalsj
however, this solution is not primary-effect justified. A replacing sequence
of operators and inference rules is a primary-effect justified solution that
achieves the goal G(step, I) from the same initial state Ij that is, it achieves all
side effects of step and leaves unchanged the other literals of I. For example,
consider the operator break(room-l,room-4}, with side effects "del (robot-in
room-4)" and add (robot-in room-I}," and the state in Figure 3.2. For this
operator and initial state, we can construct the replacing sequence "go(room-
1,room-2), go(room-2,room-3}, go(room-3,room-4}."
The cost of the cheapest replacing sequence is called the replacing cost
of step in the state I. For example, the cheapest replacing sequence for
break(room-l,room-4} consists of three go operatorsj hence, its replacing cost
is 6. Note that the replacing cost may be smaller than the cost of step. For
example, suppose that carry has two primary effects, "del (robot-in <from>)"
and "add (robot-in <to»," and consider its instantiation carry(room-l,room-
2). The one-operator replacing sequence "go(room-l,room-2}" has a cost of 2,
which is smaller than carry's cost of 3.
Completeness condition and its proof. We state a completeness condi-
tion in terms of replacing sequences:

Completeness: Suppose that, for every fully instantiated operator


and inference rule, and every initial state that satisfies its precon-
ditions, there is a primary-effect justified replacing sequence. Then,
the selection of primary effects is complete.
Limited cost increase: Suppose that there is a positive real
value C such that, for every instantiated operator op and every ini-
tial state, the replacing cost is at most C . cost( op). Suppose further
that, for every inference rule and every state, the replacing cost is O.
Then, the cost increases of all problems are at most max(l, C).
92 3. Primary effects

The proof is based on the observation that, given a problem and its op-
timal solution, we can substitute all operators and inference rules with re-
placing sequences. We thus obtain a primary-effect justified solution with
cost at most C times larger than the cost of the optimal solution. To for-
malize this observation, we consider a problem with an optimal solution
"stePl' step2' ... , stePn," and we construct its primary-effect justified solution.
Suppose that the completeness condition holds; that is, we can find a
replacing sequence for every operator and inference rule. If stePn is not
primary-effect justified, we substitute it with the cheapest replacing sequence.
If stePn_l is not primary-effect justified in the resulting solution, we also
substitute it with the cheapest replacing sequence. We repeat it for all other
operators and inference rules, considering them in the reverse order, from
stePn_2 to stePl. Observe that, when we replace some stePi' all steps after
it remain primary-effect justified, which implies that the replacement of all
steps in the reverse order leads to a primary-effect justified solution.
Now suppose that the condition for limited cost increase also holds. Then,
for every replaced stePi' the cost of the replacing sequence is at most C .
cost( stepi), which implies that the total cost of the primary-effect justified
solution is at most max(l,C)· (cost(stepl) + cost(step2) + ... + cost(stePn)).
Use of the completeness condition. The completeness condition suggests
a technique for selecting primary effects: we should consider instantiated op-
erators, along with states that satisfy their preconditions, and ensure that
we can always find a replacing sequence. Search through all instantiated op-
erators and initial states is usually intractable; however, we can guarantee
a high probability of completeness by considering a small random sample
of operator instantiations and states. We use this probabilistic approach to
design a learning algorithm that selects primary effects (see Section 3.5).

3.3 Analysis of search reduction

We present an analytical comparison of search with and without primary ef-


fects, and identify the factors that determine the utility of primary effects. We
show that primary effects reduce the branching factor but increase the search
depth, and efficiency depends on the trade-off between these two factors. The
comparison is an approximation based on simplifying assumptions; it is sim-
ilar to the analysis of abstraction search by Korf [1987] and Knoblock [1991].
Although real domains usually do not satisfy these assumptions, experiments
have confirmed the analytical predictions (see Sections 3.6 and 3.7).
Simplifying assumptions. When a solver works on a problem, it expands
a tree-structured search space; the nodes of this space correspond to inter-
mediate incomplete solutions. The solver's running time is proportional to
the size of the expanded space [Minton et al., 1991]. We estimate the space
3.3 Analysis of search reduction 93

size for search with and without primary effects, and use these estimates to
analyze the utility of primary effects.
For simplicity, we assume that the domain has no inference rules. When
adding a new operator to achieve some subgoal, the solver may use any
operator with a matching primary effect. To estimate the number of match-
ing operators, we denote the number of nonstatic predicates in the domain
by N, and the total number of primary effects in all operators by PE, which
stands for "primary effects." That is, we count the number of primary effects,
#(Prim(op)), of each operator op and define PE as the sum of these numbers:

PE = L #(Prim(op)).
op

A nonstatic predicate gives rise to two types of subgoals: adding and delet-
ing its instantiations. Thus, the average number of operators that match a
subgoal is f.~. If we assume that the number of matching operators is the
same for all subgoals, then this number is exactly f.~.
After selecting an operator, the solver picks the operator's instantiation
and chooses the next subgoal. In addition, it may apply some of the tail op-
erators. These choices lead to new branches of the search space; we denote
their number by BF, which stands for "branching factor." The full decision
cycle of PRODIGY includes choosing an operator for the current subgoal, in-
stantiating it, applying some tail operators, and choosing the next subgoal
(Figure 2.9); thus, it gives rise to BF· f.~ new branches.
To estimate the branching factor without primary effects, we define E as
the total number of all effects in all operators:

E = L # (Prim(op)) + L # (Side(op)).
op op

Then, the branching of the PRODIGY decision cycle is BF· 2~N. Finally, we
assume that all operators have the same cost. We summarize the main as-
sumptions in Figure 3.11, and the related notation in Figure 3.12.
Exponential efficiency improvement. First, we consider search without
primary effects and assume that the resulting solution contains n operators.
Since the branching of the decision cycle is BF· 2~' we conclude that the
number of nodes with one-operator incomplete solutions is BF· the /N'
number of nodes with two-operator solutions is (BF· 2~)2, and so on. The
total number of nodes is
E E E (BF· ..K..)n+l_1
1+BF-- +(BF· 2 N)2+ ... +(BF- 2 N)n= 2.NE . (3.1)
2· N . . BF· 2.N -1
Now suppose that we use primary effects, and the cost increase is C; that
is, the resulting solution has C . n operators, which translates into the pro-
portional increase of the search depth. The branching of adding an operator
with a matching primary effect is BF· :f.~; hence, the number of nodes is
94 3. Primary effects

1. The domain has no inference rules.


2. The number of matching operators is the same for all subgoals.
3. The branching factor BF is constant throughout the search.
4. All operators have the same cost.

Fig. 3.11. Simplifying assumptions.

N number of nonstatic predicates in the problem domain


PE total number of primary effects in all operators
E total number of all effects in all operators
BF overall branching factor of instantiating a new operator,
applying operators, and choosing the next subgoal
C cost increase of problem solving with primary effects
n number of operators in the solution found without primary effects

Fig. 3.12. Notation in the search-reduction analysis.

(BF. ~)C.n+l -1
(3.2)
BF. PE -1
2·N
We next determine the ratio of running times with and without primary
effects. Since the running time is proportional to the number of nodes in the
search space, the time ratio is determined by the ratio of search-space sizes:

. :. (:..-B_F_'-,,:c:..:,.~;,,:.)_c_.n_+_l
( _-_1..:..):..-/(.:....B_F_......:;;.'-.~-'---_l..:..) ~ _(B_F_.=-:':.;o~;-)c_._n (BF. ~)c)n
(
((BF· 2::N)nH -l)/(BF· /1-1) (BF· 2~)n BF·...!L
2·N
(3.3)
We denote the base of this exponent by r:

_ (BF· ~)C _ ~. (PE. BF)C


r- E- (3.4)
BF· -2·N E· BF 2· N

Next, we rewrite Expression 3.3 for the time ratio as rn, which shows that
the savings in running time grow exponentially with the solution length n.
Conditions for efficiency improvement. Primary effects improve the ef-
ficiency only if r < 1, which means that < 1. We solve this i.·:F . (pr:F)C
inequality with respect to C and conclude that primary effects improve the
performance when
C IgE + IgBF-Ig(2· N)
(3.5)
< 19 PE+ 19 BF -lg(2· N)"

Observe that E > PE, which implies that the right-hand side ofInequality 3.5
is larger than 1. Therefore, if primary effects do not cause a cost increase, that
is C = 1, then their use reduces the running time. This observation, however,
does not imply that we should minimize the cost increase; primary effects
with a significant cost increase sometimes give a greater search reduction.
3.4 Automatically selecting primary effects 95

We can draw some other conclusions from the expression for r (Equa-
tion 3.4). The base of the exponent, P~JJF, is the branching factor of the
search with primary effects; hence, it is larger than 1. We conclude that r
grows with an increase of C, which implies that the time savings increase
with a reduction of C. Also, r grows with an increase of PE; therefore, the
time savings increase with a reduction of the number of primary effects PE.
The efficiency depends on the trade-off between the number of primary ef-
fects PE and the cost increase C; as we select more primary effects, PE
increases, whereas C decreases. To minimize r, we have to strike the right
balance between P E and C.
Discussion of the assumptions. Although the derivation is based on sev-
eral strong assumptions, the main conclusions are usually valid even when the
search space does not have the assumed properties. In particular, the removal
of Assumptions 1 and 4 in Figure 3.11 does not change the quantitative result
(Expression 3.3), but it requires a more complex definition of BF.
The described estimation of the time ratio works not only for PRODIGY
but also for most backward-chaining solvers. We have applied it to ABTWEAK
[Fink and Yang, 1995] and obtained the identical expression for the time ratio
(Expression 3.3), but BF in that derivation had a different meaning. Since
ABTWEAK creates a new node by inserting a new operator or imposing a
constraint on the order of old operators, we have defined BF as the number
of different ways to impose constraints after inserting an operator.
We have estimated the total number of nodes up to the depth of the
solution node. If a solver uses breadth-first search, it visits all of them; how-
ever, PRODIGY uses depth-first strategy and may explore a smaller space.
Its effective branching factor is the average number of alternatives consid-
ered in every decision point, which depends on the frequency of backtracking
[Nilsson, 1971]. This factor is usually proportional to the overall branching
factor of the search space, which leads to the same time-ratio estimate, with
BF adjusted to account for the smaller effective branching.
The key assumption is that the effective branching factor of the decision
cycle, BF· t.J&, is proportional to the number of primary effects, PE. If a
domain does not satisfy it, we cannot use Expression 3.3. For example, if a
depth-first solver always finds a solution without backtracking, then primary
effects do not reduce its search.
To summarize, the analysis has shown that an appropriate choice of pri-
mary effects significantly reduces the search, but poorly selected effects may
increase the search. Experiments support this conclusion for most domains,
even when the search space does not satisfy the simplifying assumptions.

3.4 Automatically selecting primary effects


We describe an algorithm that chooses primary effects of operators and in-
ference rules. It uses simple heuristics that usually lead to a near-complete
96 3. Primary effects

Type of description change: Selecting primary effects of operators and inference


rules.
Purpose of description change: Minimizing the number of primary effects, while
ensuring near-completeness and a limited cost increase.
Use of other algorithms: None.
Required input: Description of the operators and inference rules.
Optional input: Cost-increase limit Cj pre-selected primary and side effects.

Fig. 3.13. Specification of the Chooser algorithm.

selection; however, it does not guarantee completeness. First, we present the


Chooser algorithm, which combines several heuristics for selecting primary ef-
fects (Section 3.4.1). Second, we give the Matcher algorithm, which improves
the effectiveness of Chooser (Section 3.4.2). If problem solving with the re-
sulting selection reveals its incompleteness, the system chooses additional
primary effects (Section 3.5).

3.4.1 Selection heuristics

We present an algorithm, called Chooser, that processes a list of operators and


generates a selection of primary effects. The user has two options for affecting
the choice of primary effects. First, she may specify a desirable limit C on
the cost increase. Second, she may pre-select some primary and side effects
before applying Chooser. If an effect is not pre-selected as primary or side,
it is called a candidate effect. In particular, if the user does not provide any
pre-selection, all effects of operators and inference rules are candidate effects.
The algorithm preserves the initial pre-selection and chooses additional
primary effects among the candidate effects. If some candidate effects do not
become primary, PRODIGY treats them as side effects; that is, it does not
distinguish between candidate and side effects.
We summarize the specification of Chooser in Figure 3.13 and give the
algorithm in Figure 3.14. We do not show pseudocode for two procedures,
Choose-Del-Operator and Choose-Del-Inf-Rule, because they are similar to
Choose-Add-Operator and Choose-Add-Inf-Rule. The algorithm consists of
two parts: Choose-Initial and Choose-Extra. The first part generates an initial
selection of primary effects. The second part is an optional procedure for
choosing additional primary effects; it may be disabled by the user.
Generating initial selection. Choose-Initial ensures that every nonstatic
predicate is a primary effect of some operator or inference rule, which is
necessary for completeness. For every nonstatic predicate pred that is not
a primary effect in the user's pre-selection, the algorithm looks for some
operator with a candidate effect pred and makes pred a primary effect of
this operator. If the predicate is not a candidate effect of any operator, the
algorithm makes it a primary effect of some inference rule.
3.4 Automatically selecting primary effects 97

Chooser(C)
The algorithm optionally inputs a cost-increase limit C, whose default value is infinity.
It accesses the operators, inference rules, and pre-selected primary and side effects.
Call Choose-Initial(C) to generate an initial selection of primary effects.
Optionally call Choose-Extra to select additional primary effects.

Choose-Initial(C) - Ensure that every nonstatic predicate is a primary effect.


For every predicate pred in the domain description:
If some operator adds pred,
then call Choose-Add-Operator(pred, C).
If some operator or inference rule adds pred,
and no operator adds it as a primary effect,
then call Choose-Add-Inf-Rule(pred).
If some operator deletes pred,
then call Choose-Del-Operator(pred, C).
If some operator or inference rule deletes pred,
and no operator deletes it as a primary effect,
then call Choose-Del-Inf-Rule(pred).
Choose-Add-Operator(pred, C) - Make pred a primary effect of some operator.
Determine the cheapest operator that adds predj let its cost be min-cost.
If some operators, with cost at most C . min-cost, add pred as a primary effect,
then terminate (do not select a new primary effect).
If some operators, with cost at most C . min-cost, add pred as a candidate effect,
then select one of these operators, make pred its primary effect, and terminate.
If some operators, with cost larger than C . min-cost, add pred as a primary effect,
then terminate (do not select a new primary effect).
If some operators, with cost larger than C· min-cost, add pred as a candidate effect,
then select one of these operators and make pred its primary effect.
Choose-Add-Inf-Rule(pred) - Make pred a primary effect of some inference rule.
If some inference rule adds pred as a primary effect,
then terminate (do not select a new primary effect).
If there are inference rules that add pred as a candidate effect,
then select one of these rules and make pred its primary effect.

Choose-Extra - Ensure that every operator and inference rule has a primary effect.
For every operator and inference rule in the domain description:
If it has candidate effects but no primary effects,
then select one of its candidate effects as a new primary effect.

Fig. 3.14. Heuristic selection of primary effects. Choose-Initial generates an ini-


tial selection of primary effects, and then Choose-Extra selects additional primary
effects. We do not show the Choose-Del-Operator and Choose-Del-Inf-Rule proce-
dures, which are analogous to Choose-Add-Operator and Choose-Add-Inf-Rule.
98 3. Primary effects

Table 3.2. Steps of selecting primary effects.


operators primary effects

(a) Pre-selection by the user.


carry ( <from>.<to» add (ball-in <to»

(b) Choose-Initial: Initial selection of primary effects.


go( <from>.<to» del. (robot-in <from»
add (robot-in <to»
throw( <from>.<to» del (ball-in <from»
break( <from>.<to» add (door <from> <to»

(c) Choose-Extra: Heuristic selection of extra effects.


add-door ( <from>.<to» add (door <from> <to»

(d) Completer: Learned primary effects.


throw( <from>.<to» add (ball-in <to»

If the user has specified a limit C on the cost increase, the algorithm
takes it into account when choosing an operator for achieving pred. First, the
algorithm finds the cheapest operator that achieves pred; its cost is denoted
by "min-cose' in Figure 3.14. Then, the algorithm identifies the operators
with costs within C . min-cost, and chooses one of them for achieving pred.
For example, consider the use of Choose-Initial for the Robot Domain in
Figure 3.2. We assume that the desired cost increase is C = 1.5, and the user
has pre-selected "add (ball-in <to»" as a primary effect of carry (Table 3.2a).
Suppose that the algorithm first selects an operator for "add (robot-in
<to> )." The cheapest operator that has this effect is go with a cost of 2.
Thus, the algorithm looks for an operator, with a candidate effect "add robot-
in," the cost of which is at most 1.5·2 = 3. The domain includes two such
operators, go and carry. If the user does not provide a heuristic for selecting
among available operators, Choose-Initial picks the cheaper operator; thus, it
selects go for adding "robot-in" as a primary effect. Similarly, it selects go for
"del robot-in," throw for "del ball-in," and break for "add door" (Table 3.2b).
Optional heuristics. When Choose-Initial picks an operator for achieving
pred, it may have to select among several operators. For example, when it
chooses an operator for "add robot-in," with cost at most 3, it must select
between go and carry. We have used three heuristics for selecting among
several operators.
Choosing an operator without primary effects: If some operator,
with a candidate effect pred, has no primary effects, the algorithm selects this
operator. This heuristic ensures that most operators have primary effects; it
is effective for domains that do not include unnecessary operators.
Preferring an operator with weak preconditions: Choose-Initial
may look for the operator with the weakest preconditions that achieves pred.
3.4 Automatically selecting primary effects 99

This strategy helps to reduce the search, but it may negatively affect the
solution quality. For example, suppose that we employ it in the Robot Do-
main with the cost increase C = 2. Then, the algorithm chooses break for
adding "robot-in" because the preconditions of break are weaker than the
preconditions of go and carry. Then, a solver uses break for moving the
robot between rooms, which reduces the search but leads to costly solutions.
Improving the abstraction hierarchy: If the system utilizes primary
effects in abstraction problem solving, then the abstraction hierarchy may
depend on the selected effects. We have implemented a selection heuristic
that improves the resulting hierarchy. In Section 5.1, we will describe this
heuristic and use it to combine Chooser with an abstraction algorithm.
Choosing additional effects. The optional Choose-Extra procedure en-
sures that every operator and inference rule has a primary effect (Figure 3.14).
If the user believes that the domain has no unnecessary operators, she should
enable this procedure. For example, if we apply Choose-Extra to the Robot
Domain with the initial selection in Table 3.2(a,b), it will select the effect
"add door" of the add-door rule as a new primary effect (Table 3.2c).
Choose-Extra includes two heuristics for selecting a primary effect among
several candidate effects. First, it chooses "add" effects rather than "del"
effects. Second, it prefers predicates achieved by the fewest number of other
operators. If we use Chooser in conjunction with an abstraction generator,
then we disable these two heuristics and instead employ a selection technique
that improves the resulting abstraction (see Section 5.1).
Running time. The time complexity of Choose-Initial and Choose-Extra
depends on the total number of effects in all operators and inference rules,
and on the number of nonstatic predicates. We denote the total number of
effects by E, and the number of nonstatic predicates by N.
For each predicate, Choose-Initial invokes four procedures. First, it calls
Choose-Add-Operator and Choose-DeI-Operator; the running time of both
procedures is proportional to the total number of effects in all operators.
Then, it applies Choose-Add-Inf-Rule and Choose-Del-Inf-Rule; the running
time of these two procedures is proportional to the number of inference-rule
effects. Thus, the complexity of executing all four procedures is O(E), and
the overall time of processing all nonstatic predicates is O(E· N). Finally, the
complexity of Choose-Extra is O(E), which implies that the overall running
time of Chooser is O(E . N).
We have implemented Chooser in Common Lisp and tested it on a Sun
Sparc 5. Choose-Initial takes about (E· N) .10- 4 seconds, and Choose-Extra
takes E . 6 . 10- 4 seconds.

3.4.2 Instantiating the operators

If the predicates in a domain description are too general, Chooser may pro-
duce an insufficient selection of primary effects. To illustrate this problem,
100 3. Primary effects

L~T~ure'I7\ III
room-l roomAl1 robot ball
room-2 room-3
objects
(in <thing> <room»
(robot-in <room» ~ (in robot <room»
(ball-in <room» --- L -_(in
_ _ball
___ <room»
______

(a) Replacing two predicates with a more general predicate.

operators primary effects

Pre-selection by the user.


carry( <from>,<to» add (in ball <to»

Choose-Initial: Initial selection of primary effects.


go( <from>,<to> del in robot <from>
break(<from>,<to» add (door <from> <to»

(b) Resulting incomplete selection.


Fig. 3.15. Generating an incomplete selection due to the use of general predicates.

we consider the Robot Domain and replace the predicates (robot-in <room»
and (ball-in <room» with a general predicate (in <thing> <room», where
<thing> is either robot or ball (Figure 3.15a). If the user has pre-selected "add
(in <thing> <to»" as a primary effect of carry, then Chooser constructs
the selection in Figure 3.15(b). Since we use a general predicate for (in robot
<room» and (in ball <room», the algorithm does not distinguish between
these two effects, and it does not select primary effects for adding (in robot
<room» and deleting (in ball <room».
To avoid this problem, we may construct all instantiations of operators
and inference rules, and apply Chooser to select primary effects of these
instantiations. We illustrate the use of this technique, discuss its advantages
and drawbacks, and describe an instantiation algorithm.
After Chooser has processed the instantiations, we may optionally convert
the resulting selection into the corresponding selection for uninstantiated op-
erators. In Figure 3.16, we give a conversion procedure, which loops through
operators and inference rules. We denote the currently considered operator
or inference rule by stepu; the subscript "u" stands for "uninstantiated."
The procedure identifies the effects of stepu that are primary in at least one
instantiation and marks them as primary effects of stepu.
3.4 Automatically selecting primary effects 101

Generalize-Selection
For every uninstantiated operator and inference rule stepu:
For every candidate effect of stepu:
If this effect is primary in some instantiation,
then make it a primary effect of stepu.

Fig. 3.16. Converting an instantiated selection of primary effects into the cor-
responding uninstantiated selection. The procedure marks an effect of stepu as
primary if it has been chosen as primary in at least one instantiation of stepu.

Example of instantiating the operators. Consider a simplified version


of the Robot Domain, which does not include break and add-door, with
the room layout shown in Figure 3.17. We give static predicates that encode
this layout in Figure 3.17 and list all instantiated operators in Figure 3.18.
The list does not include infeasible instantiations, such as go(room-1,room-4),
and the instantiated preconditions do not include static literals that always
hold for the given layout.
Suppose that the user has marked "add (in ball <to»" as a primary effect
of carry. If Chooser processes the instantiated operators, it generates the
selection in Table 3.3, and then Generalize-Selection constructs the selection
in Table 3.4. The result is similar to the selection in Table 3.2 despite the use
of a more general predicate.
Advantages and drawbacks. The main advantage of processing the in-
stantiated operators is the reduction of Chooser's sensitivity to a specific
domain encoding. In addition, the system can generate instantiations for a
particular class of problems, thus constructing a problem-specific selection.
On the negative side, the generation of all instantiations may lead to
a combinatorial explosion. Furthermore, the user must supply a list of the
allowed objects, and the resulting selection may be inappropriate for problems
with other objects.
Generating feasible instantiations. We next present the Matcher algo-
rithm, which instantiates operators and inference rules. If the user specifies
static literals that always hold, Matcher utilizes them for pruning infeasible
instantiations. We say that these static literals have known truth values.
We show the specification of Matcher in Figure 3.19 and give pseudocode
in Figure 3.20. When Matcher processes an operator or inference rule stepu,
it generates all instantiations of stepu that match the given static literals. In
Figure 3.22, we illustrate the processing of the go operator in the simplified
Robot Domain. Recall that the simplified domain has no break operator;
hence, the predicate (door <from> <to» is static.
First, the algorithm simplifies the precondition expression of stepu by
removing all predicates with unknown truth values (Line 1 in Figure 3.20).
Similarly, it simplifies the condition expressions of all if-effects (Line 2). For
instance, when the algorithm processes go, it prunes the precondition (in robot
<room», as shown in Figure 3.22(b).
102 3. Primary effects

, - - - - - - - Set of Objects
room-3 room-2 room-I, room-2, room-3, room-4: type Room
robot, ball: type Thing
10 room-4
1 Static Literals
room-l (door room-l room-2) (door room-2 room-I)
(door room-2 room-3) (door room-3 room-2)
I (door room-3 room-4) (door room-4 room-3)

Fig. 3.17. Encoding of the room layout in the simplified Robot Domain.

go(room-I, room-2) go(room-2, room-3) go(room-3, room-4)


Pre: (in robot room-I) Pre: (in robot room-2) Pre: (in robot room-3)
Elf: del (in robot room-I) Elf: del (in robot room-2) Elf: del (in robot room-3)
add (in robot room-2) add (in robot room-3) add (in robot room-4)

go(room-2, room-I) go(room-3, room-2) go(room-4, room-3)


Pre: (in robot room-2) Pre: (in robot room-3) Pre: (in robot room-4)
Elf: del (in robot room-2) Elf: del (in robot room-3) Elf: del (in robot room-4)
add (in robot room-I) add (in robot room-2) add (in robot room-3)

throw(room-I, room-2) throw(room-2, room-3) throw(room-3, room-4)


Pre: (in robot room-I) Pre: (in robot room-2) Pre: (in robot room-3)
(in ball room-I) (in ball room-2) (in ball room-3)
Elf: del (in ball room-I) Elf: del (in ball room-2) Elf: del (in ball room-3)
add (in ball room-2) add (in ball room-3) add (in ball room-4)
throw(room-2, room-I) throw(room-3, room-2) throw(room-4, room-3)
Pre: (in robot room-2) Pre: (in robot room-3) Pre: (in robot room-4)
(in ball room-2) (in ball room-3) (in ball room-4)
Elf: del (in ball room-2) Elf: del (in ball room-3) Elf: del (in ball room-4)
add (in ball room-I) add (in ball room-2) add (in ball room-3)

carry(room-I, room-2) carry(room-2,room-3) carry(room-3, room-4)


Pre: (in robot room-I) Pre: (in robot room-2) Pre: (in robot room-3)
(in ball room-I) (in ball room-2) (in ball room-3)
Elf: del (in robot room-I) Elf: del (in robot room-2) Elf: del (in robot room-3)
add (in robot room-2) add (in robot room-3) add (in robot room-4)
del (in ball room-I) del (in ball room-2) del (in ball room-3)
add (in ball room-2) add (in ball room-3) add (in ball room-4)

carry(room-2, room-I) carry(room-3, room-2) carry(room-4, room-3)


Pre: (in robot room-2) Pre: (in robot room-3) Pre: (in robot room-4)
(in ball room-2) (in ball room-3) (in ball room-4)
Elf: del (in robot room-2) Elf: del (in robot room-3) Elf: del (in robot room-4)
add (in robot room-I) add (in robot room-2) add (in robot room-3)
del (in ball room-2) del (in ball room-3) del (in ball room-4)
add (in ball room-I) add (in ball room-2) add (in ball room-3)

Fig. 3.1B. Fully instantiated operators in the simplified Robot Domain, which
has no break operator and no add-door inference rule. We show only feasible
instantiations, which can be executed in the given room layout (Figure 3.17).
3.4 Automatically selecting primary effects 103

Table 3.3. Primary effects of the fully instantiated operators.


operators primary effects

Pre-selection by the user.


carry~room-l.room-~~ add ~in ball room-2~
carry(room-2.room-3} add (in ball room-3)
carry(room-3.room-4} add (in ball room-4)
carry(room-2.room-l} add (in ball room-I)
carry(room-3.room-2} add (in ball room-2)
carry(room-4.room-3} add (in ball room-3)

Choose-Initial: Initial selection of primary effects.


go(room-l.room-2) del (in robot room-I)
add (in robot room-2)
go(room-2.room-3) del (in robot room-2)
add (in robot room-3)
go(room-3.room-4 ) del.(in robot room-3J,
add (in robot room-4)
go ( room-2. room-I) add (in robot room-I)
go(room-4.room-3) del (in robot room-4)
throw~room-l.room-~~ del ~in ball room-~~
throw( room-2.room-3) del (in ball room-2)
throw(room-3.room-4) del (in ball room-3)
throw ( room-4.room-3) del (in ball room-4)

Choose-Extra: Heuristic selection of extra effects.


go(room-3.room-2) add (in robot room-2)

Table 3.4. Primary effects of uninstantiated operators in the simplified Robot


Domain, which correspond to the instantiated selection in Table 3.3.
I operators primary effects
carry( <from>.<to» add (in ball <to»
go( <from>.<to» del (in robot <from»
add (in robot <to»
throw( <from>.<to>} del (in ball <from»

We give a detailed pseudocode for the Remove- Unknown procedure (Fig-


ure 3.21), which deletes predicates with unknown truth values from a boolean
expression. It recursively parses the expression, identifies predicates with un-
known values, and replaces them by boolean constants. If a predicate is inside
an odd number of negations, Remove- Unknown replaces it by false; otherwise,
the predicate is replaced by true. This procedure preserves all feasible instanti-
ations of the original expression; that is, if an instantiation satisfies the initial
expression in at least one state, it also satisfies the simplified expression.
104 3. Primary effects

Type of description change: Generating fully instantiated operators and inference


rules.
Purpose of description change: Producing all feasible instantiations, while avoiding
the instantiations that can never be executed.
Use of other algorithms: None.
Required input: Description of the operators and inference rules; all possible values
of the variables in the domain description.
Optional input: Static literals that hold in the initial states of all problems.

Fig. 3.19. Specification of the Matcher algorithm.

Matcher
The algorithm inputs a set of available objects
and a list of static literals that hold in all problems.
It also accesses the operators and inference rules.
For every uninstantiated operator and inference rule stepu:
1. Apply Remove-Unknown (Figure 3.21) to the preconditions of stepu.
(It simplifies the preconditions by pruning the predicates with unknown values.)
2. For every if-effect of stepu,
apply Remove- Unknown to the if-effect conditions.
3. Generate all instantiations of the resulting simplified version of stepu.
4. Convert them into the corresponding instantiations of the original stepu.
5. For every instantiation of stepu,
delete literals with known values from the preconditions.

Fig. 3.20. Generating all feasible instantiations of operators and inference rules.

Matcher generates all possible instantiations of the simplified version of


stepu (Line 3). It uses a standard instantiation procedure, which recursively
parses the preconditions and if-effect conditions, and outputs all instantia-
tions of stepu that match the given static literals. In Figure 3.22(c), we give
a table of all variable instantiations for the go operator. Then, Matcher con-
structs the corresponding instantiations of the full version of stepu (Line 4),
as illustrated in Figure 3.22(d). Finally, it removes the static literals with
known truth values from the resulting instantiations (Line 5); for example,
it removes "door" from the instantiated go operators (Figure 3.22e).
Running time. We now determine the time of generating the instantiations
and selecting their primary effects. It depends on the total number of effects
in all instantiations, and on the number of instantiated nonstatic literals.
We denote the number of all instantiated effects by Ei and the number of
nonstatic literals by Ni, where "i" stands for "instantiated."
The running time of Matcher is proportional to the number of instanti-
ated effects. The per-effect time depends on the complexity of precondition
expressions; in the experiments, it has ranged from 1 to 5 milliseconds. Thus,
Matcher's running time is within Ei . 5 . 10- 3 seconds.
3.4 Automatically selecting primary effects 105

Remove- U nknown( bool-exp, negated)


The input includes a boolean expression, bool-exp, and a boolean value negated,
which is 'true' if bool-exp is inside an odd number of negations.
The algorithm accesses the list of static literals with known truth values.
Check whether bool-exp is predicate, conjunct, disjunct, negation, or quantification.
Call the appropriate subroutine below and return the resulting expression.

Remove-Predicate(pred, negated)
The input is a predicate, denoted pred.
If the truth values of pred are unknown, then return ,negated (i.e. 'true' or 'false').
Else, return pred (that is, the unchanged predicate).
Remove-From-Conjunction( bool-exp, negated)
The input is a conjunctive expression, bool-exp, of the form '(and sub-exp sub-exp ... ).'
New-Exps := 0.
For every term sub-exp of the conjunction bool-exp:
new-exp := Remove- Unknown(sub-exp, negated).
If new-exp is false, then return false (do not process the remaining terms).
If new-exp is not true or false (that is, it is an expression with variables),
then N ew-Exps := N ew-Exps U {new- exp }.
If New-Exps is 0, then return true.
Else, return the conjunction of all terms in New-Exps.
Remove-From-Disjunction( bool-exp, negated)
The input is a disjunctive expression, bool-exp, of the form '(or sub-exp sub-exp ... ).'
New-Exps := 0.
For every term sub-exp of the disjunction bool-exp:
new-exp := Remove- Unknown( sub-exp, negated).
If new-exp is true, then return true (do not process the remaining terms).
If new-exp is not true or false (that is, it is an expression with variables),
then N ew-Exps := N ew-Exps U {new-exp }.
If New-Exps is 0, then return false.
Else, return the disjunction of all terms in New-Exps.
Remove-From-N egation( bool- exp, negated)
The input is a negated expression, bool-exp, of the form '(not sUb-exp).'
Let sub-exp be the expression inside the negation.
new-exp := Remove- Unknown(sub-exp, ,negated).
If new-exp is true or false, then return ,new-exp (that is, 'false' or 'true').
Else, return the negation of new-exp (that is, an expression with variables).
Remove-From-Quantification( bool- exp, negated)
The input is a boolean expression, bool-exp, with universal or existential quantification.
Let sub-exp be the expression inside the quantification.
new-exp := Remove- Unknown(sub-exp, negated).
If new-exp is true or false, then return new-expo
Else, return the quantification of new-expo

Fig. 3.21. Deletion of the predicates with unknown truth values from a boolean
expression. The Remove- Unknown procedure inputs an expression and recursively
processes its subexpressions; it may return true, false, or an expression with vari-
ables. The auxiliary boolean parameter, denoted negated, indicates whether the
current sub expression is inside an odd number of negations.
106 3. Primary effects

r------ Set of Objects


room-I, room-2, room-3, room-4: type Room go(<from>, <to»
robot, ball: type Thing Pre: (in robot <from»
Static Literals (door <from> <to»
(door room-I room-2) (door room-2 room-I)
Eff: del (in robot <from»
(door room-2 room-3) (door room-3 room-2)
(door room-3 room-4) (door room-4 room-3) add (in robot <to»
(a) Input of the Matcher algorithm: Available objects,
static literals with known truth values, and an operator.

<from> <to>
go( <from>, <to» room-I room-2
Pre: (door <from> <to» room-2 room-3
Eff: del (in robot <from» room-3 room-4
add (in robot <to» room-2 room-I
room-3 room-2
(b) Removal of the predicates with unknown room-4 room-3
truth values from the precondition expression. (c) Generation of all matching
instantiations of variables.

go(room-I, room-2) go(room-2, room-3) go(room-3, room-4)


Pre: (in robot room-I) Pre: (in robot room-2) Pre: (in robot room-3)
(door room-I room-2) (door room-2 room-3) (door room-3 room-4)
Eff: del (in robot room-I) Eff: del (in robot room-2) Eff: del (in robot room-3)
add (in robot room-2) add (in robot room-3) add (in robot room-4)
go(room-2, room-I) go(room-3, room-2) go(room-4, room-3)
Pre: (in robot room-2) Pre: (in robot room-3) Pre: (in robot room-4)
(door room-2 room-I) (door room-3 room-2) (door room-4 room-3)
Eff: del (in robot room-2) Eff: del (in robot room-3) Eff: del (in robot room-4)
add (in robot room-I) add (in robot room-2) add (in robot room-3)
(d) Construction of all operator instantiations.

go(room-I, room-2) go(room-2, room-3) go(room-3, room-4)


Pre: (in robot room-I) Pre: (in robot room-2) Pre: (in robot room-3)
Eff: del (in robot room-I) Eff: del (in robot room-2) Eff: del (in robot room-3)
add (in robot room-2) add (in robot room-3) add (in robot room-4)

go(room-2, room-I) go(room-3, room-2) go(room-4, room-3)


Pre: (in robot room-2) Pre: (in robot room-3) Pre: (in robot room-4)
Eff: del (in robot room-2) Eff: del (in robot room-3) Eff: del (in robot room-4)
add (in robot room-I) add (in robot room-2) add (in robot room-3)
(e) Deletion of the precondition literals with known truth values.

Fig. 3.22. Constructing the instantiations of go in the simplified Robot Domain.


3.5 Learning additional primary effects 107

Type of description change: Selecting primary effects of operators and inference


rules.
Purpose of description change: Minimizing the number of primary effects, while
ensuring that the cost increase is no larger than the user-specified limit C.
Use of other algorithms: A solver that constructs replacing sequences; a generator
of initial states.
Required input: Description of the operators and inference rules; limit C on the
allowed cost increase.
Optional input: Pre-selected primary and side effects.

Fig. 3.23. Specification of the Completer algorithm.

The complexity of applying Chooser to the instantiated domain is O(Ei .


Ni). Finally, an efficient implementation of Generalize-Selection takes O(Ei)
time. Thus, the total time of generating all instantiations, choosing their pri-
mary effects, and constructing the corresponding selection for uninstantiated
operators is O(Ei . Ni); in practice, it is about (Ei + 60) . Ni . 10- 4 seconds.

3.5 Learning additional primary effects

Chooser usually improves the efficiency of search, but the selected primary
effects may cause two problems. First, the selection is not immune to incom-
pleteness; second, it may lead to a large cost increase. For example, suppose
that the robot and the ball are in room-l, and the goal is to move the ball
to room-2 and keep the robot in room-I. We may achieve it by throw(room-
l.room-2); however, if the solver uses primary effects selected by Chooser, it
will find a costlier solution: "carry(room-l.room-2), go(room-2.room-l)."
To address this problem, we have developed a learning algorithm, called
Completer, that chooses additional primary effects. Completer inputs a pre-
liminary selection of primary effects and a limit C on the cost increase. It
checks the completeness of the input selection by solving simple problems. If
the selection is incomplete, it selects new primary effects.
The completeness test is based on the condition given in Section 3.2.2.
For every operator and inference rule, the algorithm generates several initial
states that satisfy its preconditions, and tries to construct primary-effect
justified replacing sequences. Note that it has to use some solver to search
for replacing sequences, and it also needs a procedure for generating random
initial states. We summarize the specification of Completer in Figure 3.23.
We use the theory of probably approximately correct learning to develop an
algorithm that satisfies this specification. We give the learning algorithm (Sec-
tion 3.5.1), discuss heuristics for improving its effectiveness (Section 3.5.2),
and determine the required number of learning examples (Section 3.5.3).
108 3. Primary effects

Completer(C, f, 8)
The algorithm inputs a cost increase, C, and success-probability parameters, f and 8.
It accesses the operators, inference rules, and pre-selected primary and side effects.
For every uninstantiated operator and inference rule stepu:
la. Determine the required number m of learning examples,
which depends on stepu, f, and 8 (see Section 3.5.3).
2a. Repeat m times:
If stepu has no candidate effects,
then terminate the processing of stepu and go to the next operator.
Else, call Learn-Example(stepu, C) to process a learning example.

Learn-Example( stepu, C)
lb. Call the state generator to produce state I satisfying the preconditions of stepu.
2b. Generate an instantiation of stepu, denoted stepi, that matches the state I.
3b. Call Generate-Goal(stepi, I) to produce goal G(stepi, 1) of a replacing sequence.
4b. Call the solver to search for a replacing sequence with cost at most C· cost( stepu).
5b. If the solver fails, select some candidate effect of stepu as a new primary effect.

Fig. 3.24. Inductive learning of additional primary effects.

3.5.1 Inductive learning algorithm


We describe a technique for testing the completeness of the initial selection
and learning additional primary effects.
Success probability. The input of Completer includes two probability val-
ues, E and o. These values are standard parameters of the probably approx-
imately correct learning [Valiant, 1984], which determine the probability re-
quirements for the success of inductive learning.
The E value determines the required probability of completeness. The
learning algorithm must ensure that a randomly selected solvable problem
has a primary-effect justified solution, within the cost increase C, with a
probability of at least 1 - Eo In other words, E of all solvable problems may
become unsolvable due to the use of primary effects. The 0 value determines
the likelihood of successful learning. The probability that at most E of all
solvable problems may become unsolvable must be at least 1 - o.
Learning loop. In Figure 3.24, we give an algorithm that loops through op-
erators and selects new primary effects, based on the completeness condition
in Section 3.2.2. When the algorithm processes stepu, it generates learning
examples and uses them to identify missing primary effects of stepu. A learn-
ing example consists of an initial state I, satisfying the preconditions of stepu,
and a matching instantiation of stepu, denoted stepi.
The Learn-Example procedure constructs and processes a learning ex-
ample (Figure 3.24). First, it generates a random state that satisfies stepu's
preconditions (Step 1b) and a matching operator instantiation (Step 2b).
If multiple instantiations of stepu match the state, the procedure randomly
chooses one of them.
3.5 Learning additional primary effects 109

The algorithm uses the resulting example to test the completeness of


the current selection (Steps 3b and 4b). It calls the Generate-Goal proce-
dure (Figure 3.10) to construct the corresponding goal G(stepi, I), and then
invokes a solver to search for a primary-effect justified replacing sequence
within the cost limit C· cost(stepi). If the solver does not find a replacing
sequence, then the current selection is incomplete, and the algorithm chooses
some candidate effect of stepu as a new primary effect (Step 5b).
Generating initial states. We have implemented a procedure that uses
past solutions to build a library of states, and then uses this library as a
source of random states. For every available solution, it determines and stores
all intermediate states. If there is no past solutions, the user has to hand-
code a generator of initial states. We have implemented state generators for
several domains and used them in the experiments.
Example. We consider the application of Completer to the Robot Domain
with the initial selection produced by Chooser (Table 3.2a-c). We assume
that the cost-increase limit is C = 2, and Completer first considers the throw
operator with candidate effect "add .ball-in."
Suppose that Completer generates the initial state shown in the left of
Figure 3.25(a) and the matching instantiation throw(room-l,room-2}. It pro-
duces the goal given in the right of Figure 3.25(a), which includes moving
the ball to room-2, leaving the robot in room-I, and preserving all doorways.
Then, Completer calls the solver to search for a primary-effect justified solu-
tion with a cost at most C· cost(throw) = 2·2 = 4. The solver fails to find
a solution within this cost limit, and Completer chooses the candidate effect
"add ball-in" as a new primary effect.
Now suppose that Completer considers the break operator, chooses an
initial state with the robot in room-4, and generates the matching instanti-
ation break(room-4,room-I}, as shown in Figure 3.25(b). The candidate ef-
fects of break are "add robot-in" and "del robot-in;"thus, Completer considers
the task of moving the robot from room-4 to room-I, within the cost limit
C· cost(break) = 2·4 = 8. The solver finds a replacing sequence "go(room-
4,room-3}, go(room-3,room-2}, go(room-2.room-I}," and Completer does not
choose a new primary effect.
We summarize the steps of choosing primary effects for the Robot Domain
in Table 3.2 and show the resulting selection in Table 3.5.
110 3. Primary effects

Initial state Goal statement


room-3 room-2 room-3 room-2
I roomA I roomA
I I
room-I room-I
,
°
(ball-in room-I)
(robot-in room-I)
(ball-in room-2)
(robot-in room-I)
,0

(door room-I room-2) (door room-I room-2)


(door room-2 room-3) (door room-2 room-3)
(door room-3 room-4) (door room-3 roomA)
(door room-2 room-I) (door room-2 room-I)
(door room-3 room-2) (door room-3 room-2)
(door roomA room-3) (door roomA room-3)

(a) Example for throw.


Initial state Goal statement
room-3 room-2 room-3 room-2

:9 roomA
I

1
roomA

room-l room-I
, ,
not (robot-in room-4)
(robot-in roomA) (robot-in room-I)
(door room-I room-2) (door room-I room-2)
(door room-2 room-3) (door room-2 room-3)
(door room-3 room-4) (door room-3 roomA)
(door room-2 room-I) (door room-2 room-I)
(door room-3 room-2) (door room-3 room-2)
(door roomA room-3) (door roomA room-3)

(b) Example for break.

Fig. 3.25. Learning examples.

Table 3.5. Automatically selected primary effects in the Robot Domain.


operators primary effects
go( <from>.<to» del (robot-in <from», add (robot-in <to»
carry( <from>.<to» add (ball-in <to»
throw( <from>.<to» del (ball-in <from», add (ball-in <to»
break( <from>.<to» add (door <from> <to»
add-door ( <from>.<to» add (door <from> <to»
3.5 Learning additional primary effects 111

3.5.2 Selection heuristics

We next discuss two heuristics used in Completer.


Choice among candidate effects. When Completer cannot find a replac-
ing sequence, it often has to select a new primary effect among several can-
didates. We use a heuristic that prunes inappropriate choices by analyzing
partial solutions to the replacing problem. When the solver fails to find a re-
placing sequence for a learning example (stepi, I), it tries to construct incom-
plete solutions that achieve some candidate effects of stepi. Then, Completer
selects a new primary effect among unachieved candidate effects.
For example, suppose that Completer processes the break operator with
the only primary effect "add door," and the cost-increase limit is C = l.
Suppose further that the algorithm constructs the example in Figure 3.25(b)
and calls the solver to search for a replacing sequence with cost limit 4. The
solver fails to find a replacement; however, it can construct an incomplete
solution "go(room-4,room-3)," which achieves one of break's candidate effects,
"not (robot-in room-4)." Then, Completer selects the other candidate effect,
"add robot-in," as a new primary effect.
Order of processing operators. The choice of primary effects may depend
on the order of processing operators and inference rules. We process them in
increasing order of their costs; since inference rules have no cost, we process
them before operators. This heuristic is based on the observation that the
solver usually uses cheap operators in replacing sequences. If we use some
operator or inference rule step! in a primary-effect justified replacement of
steP2' then Completer should process step! before step2, in order to utilize
the newly selected primary effects of step!. Since the algorithm uses cheap
operators in replacing sequences for more expensive operators, it should begin
by processing cheap operators. If two operators have the same cost, we process
them in increasing order of the number of candidate effects. That is, if step!
has fewer candidate effects than steP2' then step! is processed before steP2'

3.5.3 Sample complexity

When Completer learns primary effects of an operator or inference rule stepu,


it considers m states that satisfy the preconditions of stepu. To determine
the appropriate m value, we use the theory of probably approximately correct
learning, abbreviated as PAC learning.
PAC learning. Suppose that we apply the learning algorithm to select pri-
mary effects of some operator or inference rule stepu with j candidate ef-
fects. The algorithm randomly chooses m states that satisfy the preconditions
of stepu, generates corresponding instantiations of stepu, and calls a solver to
find replacing sequences.
The selected states and instantiations of stepu are learning examples. For-
mally, every example is a pair (stePh I), where stepi is an instantiation of
112 3. Primary effects

m number of learning examples for stepu


j number of candidate effects of stepu before learning
E. error of the PAC learning for stepu's selection of primary effects
8. probability that the error of stepu's selection is larger than Es
E allowed probability of failure on a randomly chosen problem
8 allowed probability of generating an inappropriate selection
s number of operators and inference rules in the domain
n length of an optimal solution to a randomly chosen problem
n max maximal possible length of an optimal solution

Fig. 3.26. Summary of the notation in the sample-complexity analysis.

stepu, and I is a state satisfying the preconditions of stepi. The algorithm se-
lects learning examples from the set of all possible examples, using a certain
probability distribution over this set.
We assume that the chance of choosing an instantiation stepi and state I
during the learning is equal to the chance of generating the same instantiation
and applying it to the same state during the problem solving. This assumption
is called the stationarity assumption of PAC learning [Valiant, 1984].
When the solver cannot find a replacing sequence, Completer selects one
of the candidate effects of stepu as a new primary effect. The final selection
of primary effects depends on learning examples. Since stepu has j candidate
effects, it allows 2j different selections. Every selection is a hypothesis of the
PAC learning, and the set of 2j possible selections is the hypothesis space.
If the solver can find a replacing sequence for an instantiated opera-
tor stepi and state I, within the cost limit C· cost(stepi), then the current
selection is consistent with the example (stepi, 1). The final selection is con-
sistent with all m examples chosen by the algorithm, but it may not be
consistent with other possible examples.
The error ofthe PAC learning is the probability that the learned selection
of stepu's primary effects is not consistent with a randomly chosen example.
The selection is approximately correct if this error is no larger than a certain
small value Es , where "s" stands for stepu.
In Figure 3.26, we summarize the notation used in the analysis of Com-
pleter. We have defined the first three symbols, and we will introduce the
other notation in the following derivation.
Probability of approximate correctness. The algorithm randomly picks
m learning examples. Since the final selection is consistent with these m exam-
ples, we intuitively expect that it is also consistent with most other examples.
If m is large, the learned selection is likely to be approximately correct.
Blumer et al. [1987] formalized this intuition and derived an upper bound
for the probability that the results of learning are not approximately cor-
rect. For a space of 2j hypotheses, the probability of learning an incorrect
hypothesis is at most
3.5 Learning additional primary effects 113

The learning is probably approximately correct if this probability is no


larger than a certain small value 8s :

(3.6)

This expression is the classical inequality of PAC learning, which allows deter-
mining the'required number m of examples. The following condition ensures
the satisfaction of Inequality 3.6 [Blumer et al., 1987]:

1 2i 1 1.
m ~ - ·In - = - . (In - + J . In 2). (3.7)
Es 8s Es 8s
Number of learning examples. We now relate m to the user-specified
parameters E and 8. Recall that E is the allowed probability of failure to solve
a randomly selected problem, and 8 is the allowed probability of generating
an inappropriate selection.
First, assume that the learning algorithm has found an approximately
correct selection of primary effects for every operator and inference rule. We
express the probability of failing to solve a random problem through Es and
determine the dependency between Es and Eo
Suppose that the Es value is the same for all operators and inference rules.
Suppose further that we use the learned primary effects to solve a randomly
chosen problem, and an optimal solution to this problem consists of n steps. If
every operator and inference rule stepi in the optimal solution has a replacing
sequence, with a cost no larger than C· cost(stepi), then the problem has a
primary-effect justified solution within the cost increase C.
The probability that every step has a replacement is at least (1 - Es)nj
hence, the probability that the problem has no primary-effect justified solu-
tion is at most
1 - (1 - Es)n ::; n . Es.
This expression estimates the chance of failure due to primary effects, which
must be no larger than the user-specified parameter E:

n . Es ::; Eo (3.8)

Next, we estimate the probability of learning an inappropriate selection


of primary effects and derive the dependency between 8s and 8. We suppose
that the 8s value is the same for all operators and inference rules, and denote
the total number of operators and rules by s.
If the learning algorithm has not found an approximately correct selection
for some operator or inference rule, the overall selection may not satisfy the
completeness requirement. The probability that some operator or rule has
inappropriate primary effects is at most
114 3. Primary effects

This result is an upper bound for the chance of generating an inappropriate


selection, which must be no larger than 8:

(3.9)
We rewrite Inequality 3.8 as 1..
fs
~ !!f and Inequality 3.9 as f- Us
> ~,
u
and
substitute these lower bounds for 1..
fs
and f-
Us
into Inequality 3.7:

m ~ ?:':. (In ~ + j . In 2) = ?:': . (In ~ + In s + j . In 2). (3.10)


E 8 E 8
The computation of m in Completer is based on Inequality 3.10. Since
the optimal-solution length n depends on a specific problem, we have to use
its estimated maximal value. If it is at most n max , the following value of m
satisfies Inequality 3.10:

m= r nmax 1
- E - · ( InJ+lns+J.ln2 .)l . (3.11)

Thus, the number of examples for stepu depends on the user-specified pa-
rameters E and 8, as well as on the number j of stepu's candidate effects, the
total number s of operators and inference rules, and the estimated limit n max
on solution lengths. This dependency of m on the success-probability param-
eters, E and 8, is called the sample complexity of the learning algorithm.
Learning time. The learning time is proportional to the total number of
examples for all operators and inference rules. If we use Expression 3.11 to
determine m for each operator and each rule, the total number of examples
is roughly proportional to the total number of candidate effects:

Lm op + Lmin!
op in!

= ~
op
" r nmax
-E-· (In1J+ lns +Jop·ln2
. )l" r +~
.n!
nmax
-E-· (In1J+lnS+Jinf"ln2
. )l
= s . -n max
_ . (1 n""i1 + 1n s ) + -
E u
n max
E
- . In 2· (~
" JoP
. +"
~. ).
Jin!
op in!

For instance, if we apply Chooser to the Robot Domain, the resulting selection
includes six primary effects and six candidate effects (Table 3.2a-c). If we
then apply Completer with E = 8 = 0.2 and n max = 5, the overall number of
learning examples is 185.
When Completer processes a learning example, it invokes a solver to search
for a replacing sequence. The processing time depends on the domain and
cost-increase limit C, as well as on the specific example. For the Lisp imple-
mentation of PRODIGY on a Sun Sparc 5, the processing time varies from 0.01
to 0.1 seconds, which means that the overall learning time is usually one or
two orders of magnitude larger than Chooser's running time.
3.6 ABTWEAK experiments 115

3.6 ABTWEAK experiments

We present experiments on the use of primary effects in ABTWEAK, developed


by Yang et al. [1996), which explores the search space in breadth-first order.
We first describe experiments with artificial domains (Section 3.6.1), and then
give the results for a robot world and manufacturing domain (Section 3.6.2).

3.6.1 Controlled experiments

We have experimented with artificial domains, similar to the domains devel-


oped by Barrett and Weld [1994] for evaluation of least-commitment systems.
Artificial domains. We define a problem by s + 1 initial-state literals,
in ito , initl, ... , inits, and a conjunctive goal statement that includes s literals,
goalo , goall , ... , goals_I' The domain contains s operators, 0Po, OPI' ... , 0Ps-I j
every operator 0Pi has the single precondition initi+1 and k + 1 effects,
which include deleting the initial-state literal initi and adding the goal literals
goali' goalHI' ... , goalHk_I' The goal literals are enumerated modulo Sj that
is, a rigorous notation for the effects of 0Pi is goali mod s' ... , goal(Hk-l) mod s.
For example, suppose that S = 6, which means that the goal statement
includes six literals. If k = 1, every operator adds one goal literal, and the
solution contains six operators: "oPo, OPI' OP2' OP3' OP4, 0P5'" If k = 3, every
operator adds three literals, and the solution includes two steps: "oPO,oP3'"
Controlled features. We can vary the following problem features:

• Goal size: The goal size is the number of goal literals, s. The length of the
optimal solution is proportional to the number of goals.
• Effect overlap: The effect overlap, k, is the average number of operators
achieving the same literal.
• Cost variation: The cost variation is the statistical coefficient of variation
of the operator costs, that is, the ratio of the standard deviation of costs
to their mean. Intuitively, it shows the relative difference among costs.
Varying solution lengths. First, we describe experiments that show the
dependency of the search reduction on the number of operators in the
optimal solution. We have varied the goal size S from 1 to 20 and con-
structed conjunctive goal statements by randomly permuting the literals
goalo, goall' ... , goals_I' We have experimented with two values of the effect
overlap, 3 and 5j we have not used an overlap of 1 because all effects would
then have to be primary. In addition, we have considered cost assignments
with three different variations: 0, 004, and 204.
We have applied Chooser and Completer, and used the resulting primary
effects. We have tested two cost-increase limits, C = 2 and C = 5, and two
values of the success-probability parameters, € = /) = 0.2 and € = /) = 004.
In Figures 3.27 and 3.28, we give the running times of ABTWEAK without
116 3. Primary effects
100 wopnm_ 100 w o pr)m_
with prim" Wlth pnm"'"
U' U'
~ 10 ~ 10
::> ::>
»-. »-.
S
O)
S
O)

8 8
E= om E= 0.1

·1 om
bJ)
c

~
om ~
0.001 1 2 3 4 5 6 0.001 1 2 3 4
Length of an Optimal Solution Length of an Optimal Solution
Effect overlap is 3, cost increase C is 5, Effect overlap is 5, cost increase C is 5,
cost variation is O, and t = 8 = 0 .2. cost variation is O, and t = 8 = 0.2 .

100 wo nm_ 100 '-': o P'1m-


with prim···· Wlth pnm'··
U' U'
1;l 10 1;l 10
::> ::>
»-. »-.
S S
O)
O)

8 8
E= 0.1 E= 0.1
bJ) bJ)
c c
'§ '§
om 0.01
"
e>:: ~
0.001 1 2 3 4 5 6 7 0.001 1 2 3 4
Length of an Optimal Solution Length of an Optimal Solution
Effect overlap is 3 , cost increase C is 5, Effect overlap is 5, cost increase C is 5,
cost variation is 0.4, and t = 8 = 0 .2 . cost variation is 0.4, and t = 8 = 0 .2 .

100 wo nm_ 100 '-':°P'1m _


withprim" Wlth pnm'··
U' U'
1;l 10 1;l 10
::> ::>
»-. »-.
S S
O)

8 "8
E= E= 0.\
bJ) bJ)
c c
'§ '§
0.01
~ ~
0.001 1 2 3 4 5 6 7 0.001 1 2 3 4
Length of an Optimal Solution Length of an Optimal Solution
Effect overlap is 3 , cost increase C is 5 , Effect overlap is 5 , cost increase C is 5 ,
cost variat ion is 2.4 , and t = 8 = 0 .2. cost variation is 2.4 , and t = 8 = 0.2.
Fig. 3.27. Dep endency of t he running t ime o n anoptimal-solution length , for the
search wit hout and with primary effects in the artificial domains. We give results
for different effect overlaps (3 and 5) and cost variations (0 , 0.4, and 2.4) .
3.6 ABTWEAK experiments 117
100 wo nm_ 100 w opr!m_
with prim'· Wlth pnm·· "
'Ci' 'Ci'
~ 10 ~ 10
;:J ;:J
p.. p..
8 8
"Ei "Ei
b 0.1 b 0.1
00 00

"
.~ "
.~
0.01 0.01
~ '"
p::

0.001 1 2 4 5 6 7 0.001 1 2 3 4
Length of an Optimal Solution Length of an Optimal Solution
Effect overlap is 3, cost increase C is 2, Effect overlap is 5, cost increase C is 2,
cost variation is 2.4, and E = O = 0.2. cost variation is 2.4, and E = O = 0 .2 .

IOD wo nm_ 100 wopnm_


with rim- g · •·
with prim·'
'Ci' 'Ci'
~ 10 ~ 10
;:J ;:J
p..
~ 8
"Ei "Ei
b 0.1 b 0.1
00 00

"
.~ "
.~

~
om ~
0.001 1 2 4 5 6 7 0.001 1 2 3 4
Length of an Optimal Solution Length of an Optimal Solution
Effect overlap is 3 , cost increase C is 5 , Effect overlap is 5 , cost in crease C is 5 ,
cost variation is 2.4, and E = O = 0.4 . cost variation is 2.4, and E = O = 0.4.
Fig. 3 .28. Dependency ofrunning time on an optimal-solution length (continued).
We use different values of effect overlap (3 and 5) and cost increase C (2 and 5) .

primary effects ("w / O prim") and with the learned primary effects ("with
prim" ). The graphs show the dependency of the search time on the optimal-
solution length; note that t he time scale is logarithmic. Every point in each
graph is the mean time for fifty different problems, and the vertical bars show
the 95% confidence intervals.
The search with primary effects has yielded solutions to ali problems,
and the time savings have grown exponentially with a solution length . These
results confirm the r n estimate of the search-reduction ratio, where n is an
optimal-solution length (see Section 3.3). The r value in the artificial exper-
iments varies from 0.3 to 0.6.
Varying effect overlap. Recall that the effect overlap is the average num-
ber of operators achieving the same literal. In Figure 3.29, we show the run-
ning times of ABTWEAK without primary effects ( "w/o prim" ) and with pri-
mary effects ( "with prim" ) for different effect overlaps and cost variations.
The primary effects improve the performance for ali effect overlaps, although
the time savings are smaller for large overlaps.
118 3. Primary effects

100 100
w/o prim ~ w/o prim ~
with prim .....
Running 10 10
Time
(CPU sec)

0.1 0.1
2 3 4 5 6 2 3 4 5 6
Effect Overlap Effect Overlap

All operators have the same cost; Operators have different costs,
that is, the cost variation is O. with the cost variation 0.4.

100
w/o prim ~
A withA>rim .....
Running 10 V'~~
Time
(CPU sec)

0.1 C=t==:t===::t:==::4
2. 3 4 5 6
Effect Overlap

Operators have different costs,


with the cost variation 2.4.

Fig. 3.29. Dependency of the running time on the effect overlap.

3.6.2 Robot world and machine shop

We show the effectiveness of primary effects for planning a robot's actions


and choosing appropriate operations in a machine shop.
Extended Robot Domain. We consider the robot world in Figure 3.30(a);
the robot can move among different locations within a room, go through a
door to another room, open and close doors, and climb tables. The domain
includes two operators for opening and closing doors, four operators for mov-
ing among locations and rooms, and four operators for climbing up and down
a table. In Figure 3.30(b), we give the encoding of two table-climbing oper-
ators; climb-up is for getting onto a table without a box, and carry-up is
for climbing with a box.
We have used Chooser and Completer with C = 5 and f = 8 = 0.1. In
Figure 3.30(c), we show the selected primary effects of all operators, as well
as their side effects and costs. We have applied ABTWEAK to several problems
of different complexity; we have used the initial state in Figure 3.30(a), with
both doors being closed, and randomly generated goal statements. The results
are given in Table 3.6, which includes the running time and branching factor
of search without and with primary effects.
3.6 ABTWEAK experiments 119

room-I room-3 open(<door» go-within-room


Prim: del (closed <door» «from-Ioc>,
~ ~ <to-Ioc>, <room»
table-I table-3 add (open <door»
Prim: del (robot-at <from-loe»
Ibox-I I Cost: I
add (robot-at <to-loe»
Ibox-21 door-b
c1ose(<door»
Cost: I
Prim: del (open <door»
door-a add (closed <door» go-thm-door
Cost: I «from-room>,
c1imb-up(<table» <to-room>, <door»
room-2 Prim: del (robot-in <from-room»
Prim: del (robot-on-floor)
(a) Map of the robot world. add (robot-in <to-room»
add (robot-on <table»
Cost: I Cost: 2
climb-up(<table»
Pre: (robot-at <table» c1imb-down( <table» carry-witbin-room
(robot-on-floor) Prim: del (robot-on <table» «box>, <from-Ioc>,
add (robot-on-floor) <to-Ioc>, <room»
Eff: del (robot-on-floor)
Cost: I Prim: del (at <box> <from-loe»
add (robot-on <table»
add (at <box> <to-loe»
Cost: I carry-up«box>, <table»
Side: del (robot-at <from-loe»
Prim: del (on-floor <box»
carry-up«box>, <table» add (robot-at <to-loe»
add (on <box> <table»
Pre: (robot-at <table» Cost: 2
Side: del (robot-on-floor)
(robot-on-floor) add (robot-on <table» carry-thm-door
(at <box> <table» Cost: 4 «box>, <from-room>,
(on-floor <box» carry-down«box>, <table» <to-room>, <door»
£,ff: del (robot-on-floor) Prim: del (on <box> <table» Prim: del (in <box> <from-room»
add (robot-on <table» add (on-floor <box» add (in <box> <to-room»
del (on-floor <box» Side: del (robot-on <table» Side: del (robot-in <from-room»
add (on <box> <table» add (robot-on-floor) add (robot-in <to-room»
Cost: 4 Cost: 4 Cost: 4

(b) Table-climbing operators. (c) Effects and costs of all operators.

Fig. 3.30. Extended Robot Domain.

Table 3.6. Performance of ABTWEAK in the Extended Robot Domain.


Optimal Run Time (sec) Branching Factor
# Goal Statement Solution wlo with wlo with
length cost prim prim prim prim
1 (robot-on table-2) 1 1 0.03 0.02 1.5 1.0
2 (open door-a) 2 2 0.09 0.09 2.0 1.4
3 (robot-in room-I) 3 4 0.15 0.13 1.7 1.3
4 (on box-l table-I) 5 9 0.55 0.36 2.3 1.3
5 (robot-on table-3) 5 6 0.57 0.37 2.0 1.3
6 (and ~robot-on table-I) 6 7 2.25 1.16 2.1 1.4
(closed door-a»
7 (and (on box-2 table-I) 7 11 4.65 2.52 2.3 1.4
(open door-b»
8 (at box-2 table-2) 7 13 6.20 2.76 2.1 1.3
9 (on box-2 table-2) 8 17 15.09 4.41 2.1 1.4
120 3. Primary effects

The selected primary effects have improved the efficiency of ABTWEAK.


Moreover, the search with primary effects has yielded optimal solutions to
all problems despite the high cost-increase limit. The r value in these ex-
periments is about 0.9; that is, the ratio of search times with and without
primary effects is about 0.9 n .
Machining Domain. We next give results for a machining domain, similar
to the domain used by Smith and Peot [1992]. It includes a simplified version
of cutting, drilling, polishing, and painting operations from the PRODIGY
Process-Planning Domain [Gil, 1991]. We use eight operators that encode
low-quality and high-quality operations (Figure 3.31); the production of high-
quality parts incurs higher costs. The solver has to choose between low and
high quality, and find the right ordering of operations.
We have run Completer with C = 1 and € = 8 = 0.1, and it has selected
the primary effects given in Figure 3.31. Then, we have applied ABTWEAK
to one hundred randomly generated problems. In Figure 3.32, we show the
running times with and without primary effects, and the 95% confidence
intervals. The search with primary effects has yielded optimal solutions to all
problems and given significant time savings. The r value is about 0.8; that
is, the search-reduction ratio is 0.8 n .

3.7 PRODIGY experiments

We next describe results of using primary effects in PRODIGY, which performs


a depth-first search and usually finds suboptimal solutions. We may force the
search for a near-optimal solution by setting a cost bound (see Section 2.4.2).
We describe experiments with two domains from ABTWEAK (Section 3.7.1)
and with two traditional PRODIGY domains (Section 3.7.2), and then conclude
with a discussion of the results (Section 3.7.3).

3.7.1 Domains from ABTWEAK

We have tested PRODIGY on the two domains described in Section 3.6.2.


Extended Robot Domain. We have applied PRODIGY to the nine problems
in Table 3.6, without and with primary effects. For every problem, we have
run PRODIGY without a cost bound and then with two different bounds.
The first bound is twice the cost of the optimal solution; we call it a loose
bound. The second bound, called tight, is exactly equal to the optimal cost.
We have manually determined the optimal cost for each problem and set
the appropriate bounds. If the system does not find a solution within 1800
seconds, we interrupt the search. We give the results in Table 3.7, which
shows that primary effects drastically reduce the search. In most cases, the
saving factor is greater than 500; the r value varies from 0.1 to 0.4.
3.7 PRODIGY eXperiments 121

cut-roughly(<part> ) cut-finely( <part»


Prim: add (cut <part> ) Prim: add (finely-cut <part»
del (finely-cut <part» Side: add (cut <part»
del (drilled <part» del (drilled <part»
Side: del (finely-drilled <part» del (finely-drilled <part»
del (polished <part» del (polished <part»
del (finely-polished <part» del (finely-polished <part»
del (painted <part» del (painted <part»
del (finely-painted <part» del (finely-painted <part»
Cost: 1 Cost: 2

drill-roughly( <part» drill-finely( <part»


Prim: add (drilled <part» Prim: add (finely-drilled <part»
del (finely-drilled <part» Side: add (drilled <part»
del (polished <part>) del (polished <part> )
Side: del (finely-polished <part» del (finely-polished <part»
del (painted <part> ) del (painted <part>)
del (finely-painted <part» del (finely-painted <part»
Cost: 1 Cost: 2
polish-roughly( <part» polish-finely(<part> )
Prim: add (polished <part» Prim: add (finely-polished <part»
del (finely-polished <part» Side: a~d (polished <part»
del (painted <part>) del (painted <part»
Side: del (finely-painted <part» del (finely-painted <part»
Cost: 1 Cost: 2
paint-roughly(<part» paint-finely( <part»
Prim: add (painted <part» Prim: add (finely-painted <part»
del (finely-painted <part» Side: add (painted <part»
Side: del (polished <part» del (polished <part> )
del (finely-polished <part» del (finely-polished <part»
Cost: 1 Cost: 2

Fig. 3.31. Effects and costs of operators in the Machining Domain.

100 wo nm~
--. with prim"
u
(\)
<Il
10
~
S
(\)

El
~ 0.1
=
OJ)

'2
=
::l
~
0.01

0.001 8
10 12 14 16 18 20
Length of an Optimal Solution
Fig. 3 .32 . Performance of ABTWEAK in t he Machining Domain .
122 3. Primary effects

Table 3.7. Performance of PRODIGY in the Extended Robot Domain. We give


running times in seconds for search without and with primary effects.
# No Cost Bound Loose Bound Tight Bound
wlo prim with prim wlo prim with prim wlo prim with prim
1 62.71 0.04 0.06 0.04 0.05 0.04
2 142.26 0.06 3.55 0.06 0.23 0.06
3 >1800.00 0.08 219.43 0.09 1.14 0.09
4 >1800.00 0.16 >1800.00 0.17 41.32 0.16
5 >1800.00 0.15 >1800.00 0.15 51.98 0.14
6 >1800.00 6.54 >1800.00 6.62 173.83 0.39
7 >1800.00 2.67 >1800.00 2.49 >1800.00 3.43
8 >1800.00 >1800.00 >1800.00 >1800.00 >1800.00 >1800.00
9 >1800.00 >1800.00 >1800.00 >1800.00 >1800.00 >1800.00

Machining Domain. We have applied PRODIGY to one hundred problems


in the Machining Domain, with a 600-second limit for each problem.
In Figure 3.33, we present the results of search without a cost bound.
We show the running time and solution quality, for search without and with
primary effects; the vertical bars are 95% confidence intervals. PRODIGY has
solved all problems within the time limit. The primary effects have not im-
proved the efficiency, but they have enabled PRODIGY to find better solutions.
The cost-reduction factor ranges from 1.2 to 1.6, with the mean at 1.37.
In Figure 3.34, we give the results of search with loose cost bounds. These
cost bounds have forced PRODIGY to produce better solutions; however, they
have increased the search complexity, and the system has failed to solve
some problems within the 600-second limit. We give the running times in
Figure 3.34(a); for most problems, the primary effects reduce the search by
two orders of magnitude, and the r value is about 0.6. In Figure 3.34(b), we
show the mean solution costs for the problems solved within the time limit.
The primary effects reduce the cost by a factor of 1.1 to 1.3, with the mean
at 1.22. In Figure 3.34(c), we we give the percentage of unsolved problems.
We have also applied PRODIGY with tight cost bounds to the same prob-
lems, thus forcing search for optimal solutions; however, it has not found
optimal solutions to any problems within the 600-second limit.

3.7.2 Sokoban puzzle and STRIPS world

We now give results for two other domains, the Sokoban puzzle and Extended
STRIPS world, which have long served as benchmarks for the PRODIGY search.

Sokoban Domain. The Sokoban puzzle is a one-player game, apparently


invented in 1982 by Hiroyuki Imabayashi, the president of Thinking Rabbit
Inc. (Japan). Junghanns and Schaefer [1998; 1999a, 1999b) explored search
techniques for Sokoban problems and designed a collection of related heuris-
tics. Since PRODIGY is not fine-tuned for Sokoban, it is much less effective
than the specialized algorithms by Junghanns and Schaefer.
3.7 PRODIGY experiments 123

(a) Efficiency of problem solving. (b) Quality of the resulting solutions.

al
; 0.6
a.
£
~0.4
:;:::
C)

.~ 0.2
c: .....
2
o~--~~--~--~--~~ o~--~~--~--~--~~
8 10 12 14 16 18 8 10 12 14 16 18
length of an optimal solution length of an optimal solution

Fig. 3.33. PRODIGY performance in the Machining Domain without a cost bound.
We show the results without primary effects (solid lines) and with primary effects
(dashed lines), as well as the mean cost of the optimal solutions (dotted line).

(a) Efficiency of problem solving. (b) Quality of the resulting solutions.

&f 60
;100
a.
~
~ 10 I
':t= I

t
C)
c: ......
'1:
c: - - -- - - -- i ......
2 :E- - -I-

0.1 o~--~~--~--~--~~
8 10 12 14 16 18 8 10 12 14 16 18
length of an optimal solution length of an optimal solution

(c) Percentage of unsolved problems.


100%

80%

60%

40%

20%
0%---------------
8 10 12 14 16 18
length of an optimal solution

Fig. 3.34. PRODIGY performance in the Machining Domain, without primary effects
(solid lines) and with primary effects (dashed lines). For every problem, we use a
loose cost bound, equal to the doubled cost of the optimal solution.
124 3. Primary effects

3 ~
•• D
(blocked 2 2)
(blocked 2 4)
(blocked 4 I)
[Ty/-'~
IRo;k] Coordinate
drive-down( <x>,<y»
drive-up(<x>,<y»
drive-Ieft(<x>,<y»
drive-right(<x>,<y»

D n;]
(dozer-at I 3)
~2
2 move-down(<rock>,<x>,<y»
(at rock-a I 4) ... move-up(<rock>,<x>,<y»

D
objects move-Ieft(<rock>,<x>,<y»
\: (at rock-b 2 3) (b) Types of objects. move-right(<rock>,<x>,<y»
2 3 4 (c) List of operators.
(a) Example of a world state.


<x-2> <x-I> <x> move-right( <rock>,<X>,<y»
<y> ~ <x>: type Coordinate
<y>: type Coordinate
<x> drive-down(<x>,<y» D
Iii no-rock( <X>, <y»

q+l>I~I_1 <x>: type Coordinate <x-I>: type Coordinate


~#H
D <y>: type Coordinate decrement( <x»
no-rock( <X>, <y> ) <x-2>: type Coordinate
<y>

IIBI-I
<y+I>: type Coordinate decrement( <x-J »

1## increment( <y» Pre: not (blocked <x> <y»


(at <rock> <x-I> <y»
Pre: not (blocked <x> <y»

~
(dozer-at <x> <y+I» (dozer-at <x-2> <y»
Elf: del (dozer-at <x> <y+ I> Elf: del (at <rock> <x-I> <y»
add (dozer-at <x> <y» add (at <rock> <x> <y»
Cost: I del (dozer-at <x-2> <y»
(d) Operator for changing the bulldozer's position. add (dozer-at <x-I> <y»
Cost: 2
(e) Operator for moving a rock.

Test Function
no-rock «x>, <y> ) Generator Functions
If there is a rock at «x>,<y» decrement «coord» increment «coord»
in the current state, If <coord> 0,= If <coord> =
max,
then return False; then return the empty set; then return the empty set;
else, return True. else, return { <coord> - 1 }. else, return { <coord> + I }.
(f) Functions in the encoding of operators.

Fig. 3.35. Sokoban puzzle and its encoding in PRODIGY.

The PRODIGy version of Sokoban consists of a rectangular grid, obstacles


that block some squares, a bulldozer that drives among the other squares,
and pieces of rock (Figure 3.35a). The bulldozer occupies one square, and it
can drive to any of the adjacent empty squares (Figure 3.35d); however, it
cannot enter a square with an obstacle or rock. If the bulldozer is next to a
rock, and the square on the other side of the rock is empty, then the bulldozer
can push the rock to this empty square (Figure 3.35e). The goal is to deliver
the rocks to certain squares.
The PRODIGy Sokoban Domain includes two object types (Figure 3.35b)
and eight operators (Figure 3.35c). The first four operators are for chang-
ing the bulldozer's position, and the other four are for moving rocks. We
give the full encoding of two operators, drive-down and move-right, in
3.7 PRODIGY experiments 125

Figures 3.35(d) and 3.35(e). We use two generator functions, decrement and
increment, for determining the coordinates of adjacent squares (Figure 3.35f).
In Figure 3.36, we show the automatically selected primary effects. To test
their effectiveness, we have applied PRODIGY to 320 problems with four grid
sizes: 4x4, 6x6, 8x8, and 10x10. In Figure 3.37, we give the results without
and with primary effects. The graphs show the percentage of problems solved
by different time limits, from 1 to 60 seconds. For example, if we use a 20-
second limit in solving 4x4 problems without primary effects, then PRODIGY
solves 8% of them (Figure 3.37a).
The results confirm that the primary effects improve the efficiency; when
PRODIGY utilizes primary effects, it solves five times more problems, and the
time-reduction factor varies from 1 to more than 100.
Extended STRIPS Domain. The STRIPS Domain is a large-scale robot
world, designed by Fikes and Nilsson [1971; 1993] during their work on the
STRIPS system. Later, PRODIGY researchers used an extended version of this
domain for the evaluation of learning techniques [Minton, 1988; Knoblock,
1993; Veloso, 1994].
Although the STRIPS world and the ABTWEAK Robot Domain are based
on similar settings, the encoding of the Extended STRIPS Domain differs from
the ABTWEAK domain, which causes different behavior of search algorithms.
We give the object types, predicates, and operators of the STRIPS world in
Figure 3.38, show an example state in Figure 3.39, and list effects and costs of
all operators in Figures 3.40 and 3.41. The domain includes ten object types,
twelve predicates, and twenty-three operators. The robot can move among
rooms, open and close doors, lock and unlock doors using appropriate keys,
pick up and put down small things, and push large things.
We have experimented with one hundred problems, composed from hand-
coded initial states and randomly generated goals. For every problem, we
have run PRODIGY three times: without a cost bound, with the loose bound,
and with the tight bound. Recall that the loose cost bound is twice the cost
of the optimal solution, whereas the tight bound is equal to the optimal cost.
First, we have applied PRODIGY without primary effects, using a 600-
second time limit. It has solved only two problems with the tight cost bounds,
and no problems at all without the tight bounds. Then, we have tested
PRODIGY with the automatically selected primary effects. We show these
effects in Figures 3.40 and 3.41, and give the results in Figure 3.42. PRODIGY
has solved seventy-eight problems without cost bounds, seventy-five problems
with the loose bounds, and forty-one problems with the tight bounds. More-
over, most problems has taken less than a second; thus, the primary effects
have improved the efficiency at least by three orders of magnitude.
126 3. Primary effects

drive-down( <x>,<y» drive-up( <x>,<y»


Prim: del (dozer-at <x> <y+ 1» Prim: del (dozer-at <x> <y-I»
add (dozer-at <x> <y» add (dozer-at <x> <y»
Cost: 1 Cost: I
drive-Ieft( <x>,<y» drive-right(<x>,<y»
Prim: del (dozer-at <x+1> <y» Prim: del (dozer-at <x-I> <y»
add (dozer-at <x> <y» add (dozer-at <x> <y»
Cost: 1 Cost: 1
move-down( <rock>,<x>,<y» move-up(<rock>,<x>,<y»
Prim: del (at <rock> <x> <y+ 1» Prim: del (at <rock> <x> <y-I»
add (at <rock> <x> <y» add (at <rock> <x> <y»
Side: del (dozer-at <x> <y+2» Side: del (dozer-at <x> <y-2»
add (dozer-at <x> <y+l» add (dozer-at <x> <y-I»
Cost: 2 Cost: 2
move-Ieft(<rock>,<x>,<y» move-right( <rock>,<x>,<y»
Prim: del (at <rock> <x+1> <y» Prim: del (at <rock> <x-I> <y»
add (at <rock> <X> <y» add (at <rock> <x> <y»
Side: del (dozer-at <x+2> <y» Side: del (dozer-at <x-2> <y»
add (dozer-at <x+1> <y» add (dozer-at <x-I> <y»
Cost: 2 Cost: 2

Fig. 3.36. Effects and costs of operators in the Sokoban Domain.

(a) 4 x 4 grid. (b) 6 x 6 grid.


100% r-------~------~------......, 1000/0r---------------~------......,

80% 80%
.2!
~ 60%
Ul r 60%
8l '
g 40% ' 40%
::J
Ul .,
,
I
./-

20% 20% ,
I
O%~ 0% ~
o 20 40 60 o 20 40 60
time limit (CPU sec) time limit (CPU sec)
(c) 8 x 8 grid. (d) 10 x 10 grid.
20% r-------------------------, 200/0,-------------------------,

Q) 15% 15%
~
~10%
g , r 10%

,,
::J
(J) 50/0 5%
,,--------------
v'
0% L -______ ______ ______
~ ~ ~
00/0L-------~------~------~
o 20 40 60 0 20 40 60
time limit (CPU sec) time limit (CPU sec)
Fig. 3.37. PRODIGY performance in the Sokoban Domain, without primary effects
(solid lines) and with primary effects (dashed lines). We show the percentage of
problems solved by different time limits, from 1 to 60 seconds.
3.7 PRODIGY experiments 127

(a) Types of objects.

(fits <key> <door» (holding <small»


(connects <door> <room» (arm-empty)
(status <door> <status» (in <thing> <room»
(robot-in <room» (next-to <thing> <other-thing»
(robot-at <door» (next-to <thing> <door»
(robot-at <thing» (next-to <door> <thing»
(b) Predicates.

open«door» }
close(<door» opening and closing a door
lock(<door» } locking and unlocking a door
unlock( <door» with an appropriate key
go-to-door(<door>, <room» going to a door within a room
go-to-stable( <stable>, <room»} going to a stable, large, or
go-to-Iarge«large>, <room» small thing within a room
go-to-small(<small>, <room»
go-aside«room» going to a new location with no things
pick-up(<small>, <room» picking up a small thing from the floor
put-near-door«small>, <door>, <room» putting a small thing near a door
put-near-stable«small>, <stable>, <room» } putting a small thing near a stable,
put-near-Iarge(<small>, <large>, <room» large, or another small thing
put-near-small(<small>, <other-small>, <room»
put-aside(<small>, <room» putting a small thing in a new location away from other things
move-aside«small>, <room» moving a small thing away from the robot's current location
push-to-door( <large>, <door>, <room» pushing a large object to a door within a room
push-to-stable(<large>, <stable>, <room» }
push-to-Iarge( <large>, <other-large>, <room» pushing a large t~ing t~ a. stable, another
push-to-small(<large>, <small>, <room» large, or small thmg wlthm a room
push-aside«large>, <room» pushing a large thing to a new location away from other things
go-thru-door«door>, <from-room>, <to-room» 19oing and pushing a large thing
if
push-thru-door( <large>, <door>, <from-room>, <to-room> through a door to another room

(c) Operators.
Fig. 3.38. Extended STRIPS Domain.
128 3. Primary effects

room-6 room-5 room-4

window-6 window-S window-4

!box-3! door-d EJ EJ
can-I can-2
!box-4!
door-f
!box-I! 0-n door-c

if
key-b
door-a !box-2! door-b 0-n
0-n key-c
key-a window-I window-2 window-3

room-! room-2 room-3


(a) Map of the world.

room-I, room-2, room-3, room-4, room-S, room-6: type Room


door-a, door-b, door-c, door-d, door-e, door-f: type Door
open, closed, locked: type Status
key-a, key-b, key-c: type Key
window-I, window-2, window-3, window-4, window-S, window-6: type Stable
box-I, box-2, box-3, box-4: type Large
can-I, can-2: type Item

(b) Set of objects.

(connects door-a room-I) (connects door-d room-4) (status door-a locked) (fits key-a door-a)
(connects door-a room-2) (connects door-d room-S) (status door-b closed) (fits key-b door-b)
(connects door-b room-2) (connects door-e room-S) (status door-c closed) (fits key-c door-c)
(connects door-b room-3) (connects door-e room-6) (status door-d closed) (in key-a room-I)
(connects door-c room-3) (connects door-f room-6) (status door-e closed) (in key-b room-2)
(connects door-c room-4) (connects door-f room-I) (status door-f closed) (in key-c room-3)
(in window-I room-I) (in box-I room-2) (in can-I room-4)
(in window-2 room-2) (in box-2 room-2) (in can-2 room-4)
(in window-3 room-3) (next-to box-I box-2) (next-to can-I can-2)
(in window-4 room-4) (next-to box-2 box-I) (next-to can-2 can-I)
(in window-S room-S)
(in window-6 room-6) (in box-3 room-S) (robot-in room-I)
(in box-4 room-S) (robot-at window-I)
(next-to box-3 box-4) (arm-empty)
(next-to box-4 box-3)
(c) Encoding of the world state.

Fig. 3.39. Example of an initial state in the Extended STRIPS Domain.


3.7 PRODIGY experiments 129

open( <door» close( <door»


Prim: del (status <door> closed) Prim: del (status <door> open)
add (status <door> open) add (status <door> closed)
Cost: I Cost: 1
lock( <door» nnlock( <door»
Prim: add (status <door> locked) Prim: del (status <door> locked)
Side: del (status <door> closed) add (status <door> closed)
Cost: 2 Cost: 2
go-to-door( <door>, <room» go-to-stable( <stable>, <room»
Prim: add (robot-at <door» Prim: add (robot-at <stable»
Side: (forall <other> of type (or Thing Door) Side: (forall <other> of type (or Thing Door)
del (robot-at <other») del (robot-at <other»)
Cost: 2 Cost: 2
go-to-Iarge«large>, <room» go-to-small( <small>, <room»
Prim: add (robot-at <large» Prim: add (robot-at <small»
Side: (forall <other> of type (or Thing Door) Side: (forall <other> of type (or Thing Door)
del (robot-at <other») del (robot-at <other»)
Cost: 2 Cost: 2

go-aside( <room» go-thru-door


Prim: (forall <other> of type (or Thing Door) «door>, <from-room>, <to-room»
del (robot-at <other») Prim: del (robot-in <from-room»
Cost: 2 add (robot-in <to-room»
Cost: 3
pick-up«small>, <room» put-near-door
Prim: del (arm-empty) «small>, <door>, <room»
del (in <small> <room» Prim: add (next-to <small> <door»
(forall <other> oftype (or Thing Door) add (next-to <door> <small»
del (next-to <other> <small») Side: del (holding <small»
add (holding <small» add (in <small> <room»
Side: (forall <other> oftype (or Thing Door) add (robot-at <small»
del (next-to <small> <other») add (arm-empty)
Cost: 1 Cost: 1
put-near-stable put-near-Iarge
«small>, <stable>, <room» «small>, <large>, <room»
Prim: add (next-to <small> <stable» Prim: add (next-to <small> <large»
add (next-to <stable> <small» Side: add (next-to <large> <small»
Side: del (holding <small» del (holding <small»
add (in <small> <room» add (in <small> <room»
add (robot-at <small» add (robot-at <small»
add (arm-empty) add (arm-empty)
Cost: 1 Cost: 1
put-aside( <small>, <room»
Prim: del (holding <small»
put-near-small add (in <small> <room»
«small>, <other-small>, <room» add (arm-empty)
Prim: add (next-to <small> <other-small» Side: add (robot-at <small»
add (next-to <other-small> <small» Cost: 1
Side: del (holding <small»
add (in <small> <room» move-aside( <small>, <room»
add (robot-at <small» Prim: (forall <other> of type (or Thing Door)
add (arm-empty) del (next-to <small> <other»)
Cost: 1 Side: (forall <other> of type (or Thing Door)
del (next-to <other> <small»)
Cost: 1

Fig. 3.40. Effects in the Extended STRIPS Domain; also see Figure 3.41.
130 3. Primary effects

push-to-door(<large>, <door>, <room» push-to-stable( <large>, <stable>, <room»


Prim: add (next-to <large> <door» Prim: add (next-to <large> <stable»
add (next-to <door> <large» add (next-to <stable> <large»
Side: (forall <other> of type (or Thing Door) Side: (forall <other> of type (or Thing Door)
del (next-to <large> <other>)) del (next-to <large> <other»)
(forall <other> of type (or Thing Door) (forall <other> of type (or Thing Door)
del (next-to <other> <large>)) del (next-to <other> <large»)
(forall <other> of type (or Thing Door) (forall <other> of type (or Thing Door)
del (robot-at <other») del (robot-at <other»)
add (robot-at <door» add (robot-at <stable»
add (robot-at <large» add (robot-at <large»
Cost: 4 Cost: 4

push-to-large
push-to-small(<large>, <small>, <room»
(<large>, <other-large>, <room»
Prim: add (next-to <large> <small»
Prim: add (next-to <large> <other-large»
add (next-to <small> <large»
add (next-to <other-large> <large»
Side: (forall <other> of type (or Thing Door)
Side: (forall <other> of type (or Thing Door)
del (next-to <large> <other»)
del (next-to <large> <other»)
(forall <other> of type (or Thing Door)
(forall <other> of type (or Thing Door)
del (next-to <other> <large»)
del (next-to <other> <large»)
(forall <other> of type (or Thing Door)
(forall <other> of type (or Thing Door)
del (robot-at <other»)
del (robot-at <other»)
add (robot-at <small»
add (robot-at <other-large»
add (robot-at <large»
add (robot-at <large»
Cost: 4
Cost: 4
push-aside(<large>, <room»
push-thru-door
Prim: (forall <other> of type (or Thing Door)
«large>, <door>,
del (next-to <large> <other»)
<from-room>, <to-room»
(forall <other> of type (or Thing Door)
Prim: del (in <large> <from-room»
del (next-to <other> <large>))
add (in <large> <to-room»
Side: (forall <other> of type (or Thing Door)
Side: del (robot-in <from-room»
del (robot-at <other»)
add (robot-in <to-room»
add (robot-at <large»
Cost: 6
Cost: 4

Fig. 3.41. Effects in the Extended STRIPS Domain; also see Figure 3.40.

80%
Q)

~ 60%
'"'"
Q)
840%
:J

'" 20%

O%b=~~~~~~~----~~~~~~----~~~~~~----~~~~

0.1 10 100
time limit (CPU sec)

Fig. 3.42. PRODIGY performance in the Extended STRIPS Domain with primary
effects. We show the percentage of problems solved by different time limits, for
search without cost bounds (solid line), with loose bounds (dashed line), and with
tight bounds (dotted line).
3.7 PRODIGY experiments 131

3.7.3 Summary of experimental results

The results confirm that primary effects can improve the performance, and
that Chooser and Completer accurately identify the important effects; how-
ever, they reduce the search only if some operators have unimportant effects.
If all effects are important, the developed algorithms do not improve the
efficiency. For example, consider the PRODIGY Logistics Domain, constructed
by Veloso [1994] (Figure 3.43). PRODIGY has to construct plans for trans-
porting packages among post offices and airports located in different cities.
It uses vans for transportation within cities, and airplanes for carrying pack-
ages between airports. The operators in this domain do not have unimportant
effects, and Completer selects all effects as primary.
We summarize the results in Table 3.8; Chooser and Completer improve
the efficiency in the four domains marked by the upward arrows (-(t). The
time-reduction factor varies from 1 to 200 in the Machining Domain and
exceeded 500 in the other three domains.

---
IPackage I I Transport
~~ ~IAirportl
---
Type Hierarchy
I ~ I City I

load·plane unload-plane
fly·plane «pack>, <plane>, <airport» «pack>, <plane>, <airport»
«plane>, <from>, <to» <pack>: type Package <pack>: type Package
<plane>: type Plane <plane>: type Plane <plane>: type Plane
<from>, <to>: type Airport <airport>: type Airport <airport>: type Airport
Eff: del (at <plane> <from» Eff: del (at <pack> <airport» Eff: del (in <pack> <plane»
add (at <plane> <to» add (in <pack> <plane» add (at .q,ack> <airport»
load-van unload-van
drive-van «pack>, <van>, <place» «pack>, <van>, <place»
«van>, <from>, <to» <pack>: type Package <pack>: type Package
<van>: type Van <van>: type Van <van>: type V an
<from>, <to>: type Place <place>: type Place <place>: type Place
Eff: del (at <van> <from» Eff: del (at <pack> <place» Eff: del (in <pack> <van»
add (at <van> <to» add (in .(pack> <van»' add (at <pack> <place»

Fig. 3.43. Effects of operators in the Logistics Domain.

Table 3.8. Results of testing PRODIGY with primary effects. The automatically
selected effects improve the performance in the first four domains, marked by the
upward arrow (11"), and do not affect the search in the Logistics Domain. We show
the time-reduction factor, that is, the ratio of the search time without primary
effects to that with primary effects.
Domain Overall Search-Time
Result Reduction
Extended Robot 11' > 500
Machining 11' 1-200
Sokoban 11' > 500
Extended STRIPS 11' > 500
Logistics - -
4. Abstraction

Abstraction search dates back to the early days of artificial intelligence. The
central idea is to identify important aspects of a given problem, construct
a solution outline that ignores less significant aspects, and use it to guide
the search for a complete solution. The applications of this approach take
different shapes, and their utility varies widely across systems and domains.
The related analytical models give disparate efficiency predictions for specific
abstraction techniques; however, researchers agree that a proper abstraction
can enhance almost all search systems.
In the early seventies, Sacerdoti experimented with abstraction in back-
ward chaining. He hand-coded the relative importance of operator precon-
ditions and automated the construction of solution outlines. His approach
proved effective, and researchers applied it in later systems. In particular,
Knoblock adapted it for an early version of PRODIGY and then developed
an algorithm for automatic assignment of importance levels; however, his
algorithm was sensitive to syntactic features of a domain encoding.
We have extended Knoblock's technique to the richer domain language of
PRODIGy4 and made it less sensitive to the domain encoding. We explain the
use of abstraction in backward chaining (Section 4.1), describe abstraction
algorithms for PRODIGY (Sections 4.2 and 4.3), and present experiments on
their performance (Section 4.4).

4.1 Abstraction in problem solving

We overview the history of abstraction (Section 4.1.1), explain its use in


problem solving (Section 4.1.2), discuss its advantages and drawbacks (Sec-
tion 4.1.3 and 4.1.4), and outline Knoblock's technique for abstracting oper-
ator preconditions (Section 4.1.5).

4.1.1 History of abstraction

Researchers have developed multiple abstraction techniques and used them in


a number of systems. We review the work on abstraction in classical problem

E. Fink, Changes of Problem Representation


© Springer-Verlag Berlin Heidelberg 2002
134 4. Abstraction

solving, which is closely related to our results. The reader may find a more ex-
tensive review in the works of Knoblock [1993], Giunchiglia and Walsh [1992]'
and Yang [1997], as well as in the textbook by Russell and Norvig [1995].
Newell and Simon [1961; 1972] introduced abstraction in their work on
GPS, the first AI problem solver. They applied it to automated construction
of propositional proofs and showed that abstraction reduced the search.
Sacerdoti [1974] extended their technique and developed ABSTRIPS, which
combined abstraction with the STRIPS system [Fikes and Nilsson, 1971]. Given
a problem, ABSTRIPS constructed an abstract solution for achieving "impor-
tant" subgoals, and then refined the solution by inserting operators for less
important subgoals. Sacerdoti partially automated the assignment of impor-
tance levels to operator preconditions; however, his technique often produced
inappropriate assignments [Knoblock, 1992].
Following the lead of GPS and ABSTRIPS, researchers used abstraction
in other systems, such as NONLIN [Tate, 1976; Tate, 1977], NOAH [Sacer-
doti, 1977], MOLGEN [Stefik, 1981], and SIPE [Wilkins, 1984; Wilkins, 1988;
Wilkins, et al., 1995]. All these systems required the user to provide an ap-
propriate abstraction.
Goldstein designed a procedure that automatically generated abstractions
for GPS; he showed its utility for several puzzles, including Fool's Disk, Tower
of Hanoi, and Monkey and Bananas [Goldstein, 1978; Ernst and Goldstein,
1982]. His procedure constructed a table of transitions among main types of
world states and subdivided the state space into several levels.
Christensen [1990] developed a more general technique for automatic ab-
straction. His algorithm, called PABLO, determined the length of operator
sequences for achieving potential subgoals and assigned greater importance
to the subgoals that required longer sequences. Unruh and Rosenbloom [1989;
1990] devised a different method of generating abstractions and used it with
the look-ahead search in the Soar architecture [Laird et al., 1987].
Knoblock [1993] added abstraction search to PRODIGy2. He then devel-
oped the ALPINE algorithm for automatically abstracting preconditions of
operators. Blythe implemented a similar technique in PRODIGy4, but he did
not extend ALPINE to the richer domain language of PRODIGy4.
Yang et al. [1990; 1996] developed the ABTWEAK system, an abstraction
version of TWEAK [Chapman, 1987], and integrated it with ALPINE. Then,
Bacchus and Yang [1991; 1994] implemented the HIGHPOINT algorithm, an
extension to ALPINE that produced abstractions with stronger properties.
Holte et al. [1994; 1996a; 1996b] considered a different method for gen-
erating abstractions. Their system expanded the full state space for a given
domain and used it to construct an abstract space. The analysis of the explicit
space gave an advantage over other systems and led to better abstractions;
however, it worked only for small spaces.
Several researchers have investigated theoretical properties of abstraction
[Korf, 1980; Giunchiglia and Walsh, 1992]. The proposed formal models differ
4.1 Abstraction in problem solving 135

in underlying assumptions, and the efficiency estimates vary from very opti-
mistic [Korf, 1987; Knoblock, 1991) to relatively pessimistic [Backstrom and
Jonsson, 1995); however, the researchers agree on two qualitative conclusions.
First, an appropriate abstraction almost always reduces the search. Second,
a poor choice of abstraction may drastically impair the efficiency.

4.1.2 Hierarchical problem solving

We can define abstraction by assigning importance levels to literals in the


domain description. To illustrate it, we describe abstraction for the Drilling
Domain in Figure 4.1, which is a simplified version of the PRODIGY Process-
Planning Domain [Gil, 1991; Gil and Perez, 1994).
The drill press uses two types of drill bits, a spot drill and twist drill.
A spot drill makes a small spot on the surface of a partj this spot guides
a twist drill, which makes a deeper hole. When painting a part, we have to
remove it from the drill press. If a part has been painted before drilling, the
drill destroys the paint. If we make a surface spot and then paint it, the spot
disappearsj however, painting does not remove deeper holes. Thus, we should
paint after the completion of drilling.
We use natural numbers to encode the importance of literals, and we
partition literals into groups by their importance, as shown in Figure 4.2.
This partition is an abstraction hierarchy; each group of equally important
literals is a level of the hierarchy. The topmost level usually contains the
static literals, which encode the unchangeable features; recall that a literal is
static if no operator adds or deletes it.
An abstraction solver begins by finding a solution at the highest nonstatic
level, ignoring the sub goals below it. For example, consider the problem in
Figure 4.3, which involves drilling and painting a given part. The solver first
constructs the abstract solution in Figure 4.4(a)j it achieves all subgoals at
level 2 and ignores lower-level subgoals.
After constructing an abstract solution, the solver refines it at the next
lower level, that is, inserts operators for achieving lower-level sub goals (Fig-
ure 4.4b). It continues to move down the hierarchy, refining the solution at
lower and lower levels, until it achieves all subgoals at level 0 (Figure 4.4c).
The refinement process preserves the operators of the abstract solution
and their relative order. It may involve significant search since the inserted
operators introduce new subgoals. If the solver fails to find a refinement, it
backtracks to the previous level and constructs a different abstract solution.

4.1.3 Efficiency and possible problems

Korf (1987) has described a simple model of search that shows why abstraction
improves the efficiency. His analysis implies that the improvement is expo-
nential in the number of levels. Knoblock (1993) has developed an alternative
136 4. Abstraction

r--= Type Hierarchy =:l


I L~i I D~-Bit J I
(a) Drill press. (b) Types of objects.

put-parte<part» drill-spot( <part>, <drill-bit»


<part>: type Part <part>: type Part
Pre: (no-part) <drill-bit>: type Drill-Bit
Prim: add (holds-part <part» Pre: (spot-drill <drill-bit»
Side: del (no-part) (holds-drill <drill-bit»
(holds-part <part»
remove-parte<part» Prim: add (has-spot <part»
<part>: type Part Side: del (painted <part»
Pre: (holds-part <part»
Prim: add (no-part) drill-hole( <part>, <drill-bit»
Side: del (holds-part <part» <part>: type Part
<drill-bit>: type Drill-Bit
Pre: (twist-drill <drill-bit»
put-drill(<drill-bit» (has-spot <part»
<drill-bit>: type Drill-Bit (holds-drill <drill-bit»
Pre: (no-drill) (holds-part <part»
Prim: add (holds-drill <drill-bit» Prim: add (has-hole <part»
Side: del (no-drill) Side: del (has-spot <part»
del (painted <part»
remove-drill( <drill-bit» paint-parte <part»
<drill-bit>: type Drill-Bit <part>: type Part
Pre: (holds-drill <drill-bit» Pre: not (holds-part <part»
Prim: add (no-drill) Prim: (painted <part»
Side: del (holds-drill <drill-bit» Side: del (has-spot <part»
(c) Library of operators.

Fig. 4.1. Drilling Domain.

(twist-drill <drill-bit» static more


literals important

I
(spot-drill <drill-bit»
(has-hole <part» level 2
(has-spot <part» (painted <part» level!
(holds-part <part» (no-part) less
level 0
(holds-drill <drill-bit» (no-drill) important

Fig. 4.2. Abstraction hierarchy in the Drilling Domain.


4.1 Abstraction in problem solving 137

Initial State ----, Set of Objects


part-1: type Part

g drill-I, drill-2: type Drill-Bit

Goal Statement - - - - - ,
(twist-drill drill-I) (no-drill) .LQ7l (has-hole part-I)
(spot-drill drill-2) (no-part) LbV (painted part-I)
Fig. 4.3. Problem in the Drilling Domain.

Initial State Goal Statement

(twist-drill--..\.. _ _ _ _ _ _ _""'"! drill-hole


t
levels
drill-I) (part-I, r--_-.j,_(,has-hole
2 and 3
(spot-drill drill-I) part-I)
drill-2)
------------------------------
t
(no-drill) (has-spot t

,
(painted
part-I) part-I)
(no-part) lower
(holds-drill
drill-I) levels

(a) Level 2: Abstract solution.


Initial State Goal Statement
t
,
(twist-drill (has-hole levels
drill-I) part-I)
1-3
(spot-drill
drill-2)
(no-drill) (holds-drill not
drill-I) (holds-part lower
(no-part)
(holds-part part-I) level
part-I) V
(b) Level 1: Intermediate refinement.
Initial
State Goal
f\
drill-spot remove- drill-hole remove-
put-drill put-part put-drill paint
(part-I, drill (part-I, part
(drill-2) (part-I) drill-2) (drill-2) (drill-I) drill-I) (part-I) (part-I)

(c) Level 0: Complete solution.


Fig. 4.4. Abstract solution and its refinements. We italicize subgoals at the current
level of abstraction; thick rectangles mark operators inherited from higher levels.
138 4. Abstraction

model, which gives the same efficiency estimate. He has shown that abstrac-
tion linearly reduces the search depth; since the search time is exponential in
the depth, abstraction leads to exponential time reduction.
Multiple experiments have confirmed that abstraction reduces the search
[Sacerdoti, 1974; Tenenberg, 1988; Knoblock, 1993; Yang et at., 1996]; how-
ever, experiments have also shown that performance depends on a specific
hierarchy, and an inappropriate abstraction may result in gross inefficiency
[Bacchus and Yang, 1992; Smith and Peot, 1992]. We outline the main causes
of such inefficiency.
Backtracking across levels. A solver may construct an abstract solution
that has no refinement. For instance, if both drill bits in the example prob-
lem were twist drills, the solver could still produce the abstract solution in
Figure 4.4(a), but it would fail to refine this solution. After failing to find a
refinement, the solver backtracks to a higher level and constructs a different
abstract solution. Bacchus and Yang [1992; 1994] showed that backtracking
across levels may cause an exponential increase in the search time.
Intermixing abstraction levels. When a solver constructs a refinement,
it may insert an operator that adds or deletes a high-level literal, thus in-
validating the abstract solution. The solver then has to insert operators that
restore correctness at the high level. Thus, it has to intermix high-level and
low-level search, which reduces the effectiveness of abstraction. This situation
differs from backtracking to a higher level; the solver inserts additional op-
erators into the abstract solution, rather than abandoning it and producing
an alternative solution.
Generating long solutions. Abstraction may lead to finding lengthy so-
lutions. For instance, consider the problem in Figure 4.5{a). A solver may
construct the abstract solution in Figure 4.5(b) and refine it as shown in
Figure 4.5(c). In this example, the solver has found a shortest solution at
levelland its shortest refinement at level 0, but the resulting solution is not
optimal; we show a shorter solution in Figure 4.5(d). Since the search depth
is usually proportional to the solution length, the search time grows expo-
nentially with the length. In some cases, abstraction leads to unreasonably
long solutions, causing gross inefficiency [Backstrom and Jonsson, 1995].

4.1.4 A voiding the problems

We have discussed three major problems with abstraction search. An increase


in the number of levels may reduce the search at each level, but it may also
aggravate the three problems (Figure 4.6).
Knoblock [1990; 1993] developed an algorithm, called ALPINE, for gen-
erating abstraction hierarchies that never cause intermixing of levels. This
property is called ordered monotonicity, and hierarchies that satisfy it are
ordered hierarchies. Experiments have confirmed that ordered monotonicity
4.1 Abstraction in problem solving 139

Initial State
(twist-drill drill-I) (no-drill)
(spot-drill drill-2) (no-part)
(has-spot part-I)

(a) Problem.
level 0
level 2 put-drill (drill-2)
put-part (part-2) put-drill(drill-2)
drill-hole (part -1, drill-I)
drill-hole (part-2, drill-I) Idrill-spot (part-2, drill-2)I put-part (part-2)

remove-drill(drill-2) drill-spot (part-2, drill-2)


remove-part (part-2) remove-drill(drill-2)
level 1 put-drill(drill-I) put-drill(drill-I)
drill-spot (part-2, drill-2) put-part (part -1)

Idrill-hole (part-I, drill-!) I Idrill-hole


drill-hole (part-2, drill-I)
(part-I, drill-!)I
remove-part (part-2)
remove-part (part-I)
Idrill-hole (part-2, drill-!) I put-part (part-2) put-part (part-1)
drill-hole (part-I, drill-I)
I drill-hole (part-2, drill-I)I
(b) Abtract solution and its
intermediate refinement. (c) Low-level refinement. (d) Shortest solution.
Fig. 4.5. Example of a lengthy. Abstraction search yields an eleven-operator solu-
tion Cc), whereas a shortest solution has nine operators Cd). Rectangles mark the
operators inherited from the higher levels.

ABSTRIPS ALPINE HIGHPOINT less backtracking


more levels E ;0 less level intermixing
shorter solutions
Fig. 4.6. Trade-off in the construction of hierarchies.

improves the efficiency of PRODIGY [Knoblock et al., 1991a; Knoblock, 1993],


ABTWEAK [Yang et al., 1996], and other solvers; however, it reduces the num-
ber of levels and often leads to "collapsing" all literals into a single level.
Bacchus and Yang [1991; 1994) designed the HIGHPOINT system, which
generates abstractions with a stronger property. It ensures that the hierarchy
is ordered, and that every abstract solution has a refinement; thus, the solver
never backtracks across levels. To enforce these properties, the system im-
poses rigid constraints on the abstraction levels of literals, and it often fails
to find a hierarchy that satisfies them. On the positive side, when HIGHPOINT
finds a hierarchy, it usually gives better results than ALPINE.
In Figure 4.6, we illustrate the relative properties of ABSTRIPS [Sacerdoti,
1974), ALPINE, and HIGHPOINT. We have followed Knoblock's approach to
resolving this trade-off: we use ordered hierarchies and maximize the number
of levels within the constraints of ordered monotonicity.
140 4. Abstraction

4.1.5 Ordered monotonicity

We review Knoblock's technique for generating ordered hierarchies. It is based


on the following constraints, which determine the relative importance of pre-
conditions and effects in every instantiated operator:
• If priml and prim2 are primary-effect literals of an instantiated operator,
then level (prim 1 ) = level (prim2)'
• If prim is a primary-effect literal and side is a side-effect literal,
then level(prim) 2: level(side).
• If prim is a primary effect and prec is a nonstatic precondition literal,
then level (prim) 2: level (prec).
Knoblock et al. [1991] showed that these constraints guarantee ordered
monotonicity for backward-chaining solvers and for PRODIGY; however, this
result is not applicable to forward chainers. We next outline their proof.
When a solver refines an abstract solution, it inserts operators for achiev-
ing low-level literals. The new operators may affect higher levels in two ways.
First, they may add or delete high-level literals and invalidate the abstract
solution. Second, they may have high-level preconditions that become new
abstract subgoals. If the hierarchy satisfies the constraints, neither situation
can arise. The first two inequalities ensure that the new operators have no
high-level effects, and the last inequality implies that they do not introduce
high-level subgoals. Thus, the solver does not intermix the refinement with
the abstract-level search.
We have described constraints for instantiated operators. If the system
uses them, it has to generate all instantiations, which may cause a combina-
torial explosion. To avoid this problem, Knoblock implemented an algorithm
that constructs a hierarchy of predicates rather than literals, by imposing
constraints on the precondition and effect predicates.
The resulting constraints are stronger than the constraints for instanti-
ated operators, and they may cause overconstraining and collapse of a hierar-
chy. To illustrate this problem, we consider the Drilling Domain (Figure 4.1)
and replace the predicates (has-spot <part», (has-hole <part», and (painted
<part» with a more general predicate (has <feature> <part», as shown in
Figure 4.7(a). Then, we cannot separate levels 1 and 2 of the hierarchy in
Figure 4.2, and we have to use fewer levels (Figure 4.7b).
As a more extreme example, we can encode the domain using two general
predicates: (pred-O <name-O» replaces the predicates with no arguments, and
(pred-l <name-I> <thing» replaces the one-argument predicates (Figure 4.8).
The <name-O> and <name-I> variables range over the names of the original
predicates, whereas the values of <thing> are drill bits and parts. Then,
Knoblock's algorithm cannot separate predicates into multiple levels.
4.2 Hierarchies for the PRODIGY domain language 141

Type Hierarchy ~

L..':::Part::;I~I=D=r=il=~-B=i~~
E t hole pa~ (twist-drill <drill-bit»
objects (spot-drill <drill-bit»
(has <feature> <part» (has <feature> <part»
(has-spot <part» __ (has spot <part»
(has-hole <part» :€:: (has hole <part» (holds-part <part» (no-part)
(painted <part» -- (has paint <part» (holds-drill <drill-bit» (no-drill)

(a) Replacing three predicates (b) Resulting hierarchy.


with a more general predicate. which has fewer levels.

Fig. 4.7. Reduction in the number of levels due to the use of general predicates.

holds-drill
holds-part

objects
(pred-O <name-O»
'--____
_ (no-drill)
I. ____~(~p~re~d~-o~n~o~-p=art~)
(~n~o~-p_art~)____~~'_ ( red-O no-drill)____ _J

(pred-l <name-I> <thing»


(twist-drill <drill-bit» (pred-l twist-drill <drill-bit»
(spot-drill <drill-bit» (pred-l spot-drill <drill-bit»
(has-spot <part» f (pred-l has-spot <part»
(has-hole <part» ~ (pred-l has-hole <part»
(painted <part» (pred-l painted <part»
(holds-part <part» (pred-l holds-part <part»
(holds-drill <drill-bit» (pred-l holds-drill <drill-bit»

Fig. 4.8. Collapse of an ordered hierarchy due to general predicates. We encode


the Drilling Domain with two predicates, which leads to a one-level hierarchy.

4.2 Hierarchies for the PRODIGY domain language

ALPINE was designed for a limited sublanguage of PRODIGY; in particular, it


did not handle if-effects and inference rules. We present a set of constraints
that account for all features of PRODIGY (Section 4.2.1), and then give an
algorithm for generating abstraction hierarchies (Section 4.2.2).
142 4. Abstraction

4.2.1 Additional constraints

We describe constraints for if-effects and then discuss the use of eager and
lazy inference rules with abstraction.
If-effects. Since PRODIGY uses if-effect actions in the same way as simple
effects, they require the same constraints; in addition, we have to constrain
the levels of if-effect conditions. If the system uses a primary action of an if-
effect, it adds the conditions of this if-effect to the operator's preconditions.
Therefore, nonstatic conditions require the same constraints as preconditions:

• If prim is a primary action and end is a nonstatic condition of an if-effect,


then level (prim) ~ level ( end).
We list all constraints in Figure 4.9(a), where the term "effect" refers to
both simple effects and if-effect actions. For example, consider an extended
version of painting in the Drilling Domain, which allows the choice of a color.
We encode it by two operators, pick-paint and paint-part, given in Fig-
ure 4.10(a). The first operator does not add constraints because it contains
only one predicate. The second operator requires the following constraints:
For primary effects:
level (painted) = level (part-color)
For side effects:
level(painted) ~ level(has-spot)
level (painted) ~ level (part-color)
level (part-color) ~ level (has-spot)
For preconditions:
level(painted) ~ level(holds-part)
level (part-color) ~ level (holds-part)
For if-effect conditions:
level(part-color) ~ level(paint-color)

Eager inference rules. PRODIGY can use eager rules in the same way as
operators (see Section 2.3.2); thus, they inherit all operator constraints. In
addition, the system uses them in forward chaining from the current state,
which poses the need for additional constraints.
When PRODIGY applies an eager rule, it must not add or delete any literals
above the current level. Therefore, the effects of the rule must be no higher
than its nonstatic preconditions. Similarly, the actions of each if-effect must
be no higher than its conditions. Thus, we need the following constraints:
• If eff is an effect and pree is a nonstatic precondition of an inference rule,
then level (eff) ~ level (pree).
• If eff is an action and end is a nonstatic condition of an if-effect,
then level (eff) ~ level (end).
4.2 Hierarchies for the PRODIGY domain language 143

(a) Constraints for an operator.


For every two primary effects, prim l and prim2:
level(prim l ) = level(prim 2)·
For every primary effect prim and side effect side:
level(prim) ~ level(side).
For every primary effect prim and nonstatic precondition pree:
level(prim) ~ level(pree).
For every primary action prim and nonstatic condition end of an if-effect:
level(prim) ~ level(end).

(b) Constraints for an inference rule.


For every two primary effects, prim l and prim2:
level (prim l ) = level (prim2)'
For every primary effect prim and side effect side:
level(prim) ~ level(side).
For every primary effect prim and nonstatic precondition pree:
level(prim) = level(pree).
For every side effect side and nonstatic precondition pree:
level(side) :S level(pree).
For every primary action prim and nonstatic condition end of an if-effect:
level(prim) = level(end).
For every side action side and nonstatic condition end of an if-effect:
level(side) :S level(end).

Fig. 4.9. Constraints on the literal levels in an ordered abstraction hierarchy.

Type Hierarchy paint-parte <part»

I ~'~
<part>: type Part
Part 1 Drill-Bit I-I Color 1
I <old-color>, <new-color>: type Color
Pre: not (holds-part <part»
pick-paint( <old-color>, <new-color» Prim: (painted <part»
<old-color>, <new-color>: type Color (if (paint-color <new-color»
add (part-color <part> <new-color»)
Prim: add (paint-color <new-color» Side: del (has-spot <part»
Side: del (paint-color <old-color» (if (part-color <part> <old-color»
del (part-color <part> <old-color»)
(a) Painting operators.
lnf-Rule del-colore<part>, <old-color»
lnf-Rule add-idle
<part>: type Part
Pre: (no-drill) <old-color>: type Color
(no-part) Pre: not (painted <part»
Prim: add (idle) (part-color <part> <old-color»
Side: del (part-color <part> <old-color»
(b) Inference rules.
Fig. 4.10. Extensions to the Drilling Domain.
144 4. Abstraction

We combine these inequalities with the operator constraints (Figure 4.9a)


and obtain the constraint set in Figure 4.9(b). For example, consider the
inference rules in Figure 4.10(b). The first rule indicates that, ifthe drill press
has neither a drill bit nor a part, then it is idle. The second rule ensures that
unpainted parts have no color. These rules require the following constraints:
lnf-Rule add-idle:
level(idle) = level(no-drill)
level(idle) = level(no-part)
lnf-Rule del-color:
level(part-color) :S level(painted)
Lazy inference rules. Since PRODIGY uses lazy rules in backward chaining,
these rules also inherit the operator constraints; however, the use of their
effects differs from that of operator effects and requires more constraints. If
the system moves a lazy rule from the tail to the head and later applies an
operator that invalidates the rule's preconditions, then it cancels the rule's
effects; that is, it removes all effects of the rule from the current state (see
Section 2.3.2). This removal must not affect higher levels; thus, the effects of a
lazy rule must be no higher in the hierarchy than its preconditions. Similarly,
the actions of an if-effect must be no higher than its conditions. We conclude
that lazy rules require the same constraints as eager rules (Figure 4.9b).

4.2.2 Abstraction graph

We now describe the Abstractor algorithm, which generates ordered hierar-


chies for the full domain language of PRODIGY. It generates a hierarchy of
predicates; that is, it places all literals with a common predicate name on the
same level. Its purpose is to generate a hierarchy that satisfies the constraints
in Figure 4.9 and has as many levels as possible. The algorithm builds a hier-
archy for a specific selection of primary effects. If it has no information about
primary effects, it assumes that all effects are primary. We summarize the
specification of Abstractor in Figure 4.11.
Encoding of constraints. We encode the constraint set by a directed graph
[Knoblock, 1993]; the nonstatic predicates are nodes of this graph, and the
constraints are edges. If the level of some predicate pred1 must be no smaller
than that of pred2 , the algorithm adds an edge from pred1 to pre~. If the two
predicates must be on the same level, it adds two edges: from pred1 to pred2
and from pred2 to pred1 .
Constraint edges for an operator. If an operator has no primary effects,
it requires no constraints. For an operator with primary effects, we apply
the Add-Operator procedure in Figure 4.12, which imposes the constraints in
Figure 4.13(a). It picks one of the primary effects and uses this effect as a
"pivot" for adding constraints; in Figure 4.13, the pivot is shown by an oval.
4.2 Hierarchies for the PRODIGY domain language 145

Type of description change: Generating an abstraction hierarchy.


Purpose of description change: Maximizing the number of levels, while ensuring
ordered monotonicity.
Use of other algorithms: None.
Required input: Description of the operators and inference rules.
Optional input: Selection of primary effects.
Fig. 4.11. Specification of the Abstractor algorithm.

Add-Operator( op)
Pick a primary effect prim of op.
For every other primary effect other-prim of op:
Add an edge from prim to other-prim.
Add an edge from other-prim to prim.
For every side effect side of op:
Add an edge from prim to side.
For every nonstatic precondition prec of op:
Add an edge from prim to prec.
For every if-effect efJ of op:
If efJ has some primary action:
For every nonstatic condition cnd of efJ:
Add an edge from prim to cnd.
Add-Prim-Rule( inf)
Pick a primary effect prim of inf.
For every other primary effect other-prim of inf:
Add an edge from prim to other-prim.
Add an edge from other-prim to prim.
For every side effect side of inf:
Add an edge from prim to side.
For every nonstatic precondition prec of inf:
Add an edge from prim to prec.
Add an edge from prec to prim.
For every if-effect efJ of inf:
If efJ has some primary action:
For every nonstatic condition cnd of efJ:
Add an edge from prim to cnd.
Add an edge from cnd to prim.
If efJ has no primary actions:
For every action side of efJ:
For every nonstatic condition cnd of efJ:
Add an edge from cnd to side.

Add-Side-Rule( in!)
For every effect side of inf:
For every nonstatic precondition prec of inf:
Add an edge from prec to side.
For every if-effect efJ of inf:
For every action side of efJ:
For every nonstatic condition cnd of efJ:
Add an edge from cnd to side.
Fig. 4.12. Adding constraint edges.
146 4. Abstraction

simple effects

simple
effects

conds

i .. - - - - -..-....'----' ,
, , ---------
: • )E.l-_/

(a) Operator. (b) Inference rule (c) Eager inference rule


with primary effects. wlo primary effects.

Fig. 4.13. Constraint edges. We show preconditions and if-effect conditions by


filled circles, primary effects by thick circles, and side effects by thin circles. The
oval is the pivot, selected arbitrarily among the primary effects.

The procedure adds edges from the pivot to every other primary effect, as
well as opposite edges from other primary effects to the pivot. In Figure 4.13,
we mark primary effects by thick circles; note that they include both simple
effects and if-effect actions. Then, the procedure adds edges from the pivot to
all side effects (thin circles) and to all nonstatic preconditions (filled circles).
Finally, for every if-effect with primary actions, it adds edges from the pivot
to the nonstatic conditions of the if-effect (also filled circles). The resulting
constraints are equivalent to the inequalities in Figure 4.9(a). We show the
constraint edges for the paint-part operator in Figure 4.14(a).
Edges for an inference rule. If an inference rule has primary effects,
we impose constraints using the Add-Prim-Rule procedure in Figure 4.12;
these constraints are shown in Figure 4.13(b). The procedure picks a pivot
among the primary effects, and then adds edges from the pivot to all other
effects and to all nonstatic preconditions, as well as opposite edges from
primary effects and nonstatic preconditions to the pivot. For every if-effect
with primary actions, the procedure connects its nonstatic conditions with
the pivot by "two-way" edges. For an if-effect without primary actions, it
adds edges from every nonstatic condition to every action. The resulting
constraints are equivalent to the inequalities in Figure 4.9(b). We illustrate
them for the add-idle rule in Figure 4.14(b).
If a lazy rule has no primary effects, then PRODIGY never uses it, which
means that it requires no constraints. For an eager rule without primary
effects, we apply the Add-Side-Rule procedure (Figure 4.12). First, it inserts
4.2 Hierarchies for the PRODIGY domain language 147

constraint edges from every nonstatic precondition to every effect. Then, for
each if-effect, it adds edges from every nonstatic condition to every action.
We show these edges in Figure 4.13(c) and illustrate them for the del-paint
rule in Figure 4.14(c).
Construction of the hierarchy. After adding edges for all operators and
inference rules, we obtain a graph that encodes all constraints (Figure 4.15a).
For every two predicates predl and pre~, the graph contains a path from predl
to pred2 if and only if level (predl ) ~ level (pred2 ). In particular, if there is a
path from predl to pred2 and back from pred2 to predl , then the two pred-
icates are on the same level. Therefore, the strongly connected components
of the graph correspond to the levels of the hierarchy. The system identifies
the components, thus grouping the predicates by levels (Figure 4.15b); the
resulting encoding of the hierarchy is called an abstraction graph.
Before using the hierarchy, the system enumerates its levels and adds the
static level (Figure 4.15c). The enumeration is consistent with edges; that is, if
there is an edge from levell to level 2 , then leveh > level 2 . If the graph allows
several enumerations, we can use any of them. The system applies topological
sorting to the components and enumerates them in the resulting order. In
Figure 4.16, we summarize the Abstractor algorithm, which constructs an
ordered hierarchy.
Running time. To analyze the complexity of Abstractor, we denote the
number of effect predicates in an operator or inference rule bye, and the
number of nonstatic preconditions along with if-effect conditions by p. The
time of adding constraint edges for an operator is linear, O(e + p). If an
inference rule has primary effects, and all its if-effects have primary actions,
then the complexity of adding the corresponding constraints is also O(e+p).
Otherwise, the time for processing the rule is O(e· p).
We define E as the total number of effects in all operators and inference
rules, and P as the total number of preconditions and if-effect conditions:

E = L e op + Leinj;
op inj

P = LPop + LPinj.
op inj

If all inference rules have primary effects, and all their if-effects have primary
actions, then the complexity of adding all edges is O(E + P). If not, the
complexity is superlinear; however, such situations rarely occur in practice
and do not result in significant deviations from linearity. Finally, we denote
the number of nonstatic predicates by N. The abstraction graph contains N
nodes, and the time for identifying and sorting its components is O(N 2 ).
We have implemented Abstractor in Common Lisp and tested it on a Sun
Sparc 5. Its execution time is about (11· E + 11· P + 6· N 2 ) • 10- 4 seconds,
which is usually negligible in comparison with problem-solving time.
148 4. Abstraction

paint-part lnf-Rule add-idle


painted
holds-part n~====-=o
no-part idle
(b)

i" ------,
lnf-Rule del-color
, ____
painted
:o----=~--:-/

paint-color part-color
1 ______ --..1
:____ : 0
part-color part-color part-color part -color
(a) (c)
Fig. 4.14. Some constraint edges in the Drilling Domain.

has-hole

o
holds-part holds-part
(a) Graph of constraints. (b) Strongly connected components.
, - - - static level - - - ,
twist-drill spot-drill 4

has-hole 3

has-spot painted part-color 2

holds-drill holds-part
no-drill idle no-part

paint-color o
(c) Enumeration of components.
Fig. 4.15. Generating an abstraction hierarchy for the Drilling Domain.

Abstractor
Create a graph whose nodes are nonstatic predicates, with no edges.
For every operator op:
If op has primary effects, then call Add-Operator(op).
For every inference rule in!:
If in! has primary effects, then call Add-Prim-Rule(inf).
If in! is an eager rule without primary effects, then call Add-Side-Rule( inf)o
Identify strongly connected components of the graph.
Topologically sort the components; enumerate them accordingly.
Fig. 4.16. Constructing an ordered hierarchy.
4.3 Partial instantiation of predicates 149

4.3 Partial instantiation of predicates


The effectiveness of Abstractor depends on the user's choice of predicates
for the domain encoding. For example, an appropriate choice for the Drilling
Domain leads to a four-level hierarchy (Figure 4.2), whereas the use of general
predicates may reduce the number of levels (Figures 4.7 and 4.8).
We say that the hierarchy in Figures 4.2 is finer-grained than that in
Figure 4.7, which means that we can obtain the second hierarchy from the
first one by merging some of the levels. Knoblock [1993] showed that fine
granularity improves the performance, and Yang et al. [1996] came to the
same conclusion during their work on ABTWEAK.
We have developed a procedure that improves the granularity of Abstrac-
tor's hierarchy. It identifies predicates that cause unnecessary constraints and
replaces them with more specific predicates. We give an informal overview of
this technique (Section 4.3.1), describe the related data structures and algo-
rithms (Sections 4.3.2-4.3.4), and then present a procedure for determining
the levels of literals in the improved hierarchy (Section 4.3.5).

4.3.1 Improving the granularity

We review previous methods for increasing the number of abstraction levels,


and then introduce the new technique.
Full and partial instantiation. After Knoblock noted that general pred-
icates cause a collapse of ALPINE'S hierarchy, he found two alternatives for
avoiding this problem. The first approach is to generate all instantiations of
nonstatic predicates. The system invokes an instantiation procedure, similar
to Matcher (see Section 3.4.2), and builds a hierarchy of the resulting literals.
For example, if we apply it to the Drilling Domain with the general pred-
icate "has" (Figure 4.7a), it generates the hierarchy in Figure 4.17(a). As an-
other example, its application to the Extended Robot Domain (Figure 3.30)
leads to the three-level hierarchy in Figure 4.17(b).
This technique completely eliminates unnecessary constraints caused by
general predicates; however, it has two major drawbacks. First, it causes a
combinatorial explosion in the number of literals. Second, it cannot produce
a problem-independent abstraction because objects vary across problems.
The alternative approach is to "instantiate" predicates with low-level
types, that is, with leaves of the type hierarchy. For instance, we may replace
the nonstatic predicates "at" and "robot-at" in the Extended Robot Domain
with more specific predicates (Figure 4.18b) and then build the hierarchy in
Figure 4.18(c). We call it a partial instantiation of general predicates.
As another example, suppose that we apply this technique to the Lo-
gistics Domain (Figure 3.43), which involves the delivery of packages among
locations in different cities. The system replaces the two nonstatic predicates,
"at" and "in," with eight specific predicates (Figure 4.19b) and then generates
a four-level hierarchy (Figure 4.19c).
150 4. Abstraction

(twist-drill <drill-bit» static level


(spot-drill <drill-bit» (not instantiated)

(has hole part-I) (has hole part-2) level 2


(has spot part-I) (has paint part-I)
level I
(has spot part-2) (has paint part-2)
(holds-part part-I) (holds-drill drill-I) (no-part)
(holds-part part-2) (holds-drill drill-2) (no-drill) level 0

(a) Abstraction for the Drilling Domain.

I (within <location> <room» I statIc level


(in box-l room-I) (at box-l table-I) (at box-2 table-I) (on box-l table-I)
(in box-l room-2) (at box-l table-2) (at box-2 table-2) (on box-l table-2)
(in box-l room-3) (at box-l table-3) (at box-2 table-3) (on box-l table-3)
(in box-2 room-I) (at box-l door-a) (at box-2 door-a) (on box-2 table-I) level I
(in box-2 room-2) (at box-l door-b) (at box-2 door-b) (on box-2 table-2)
(in box-2 room-3) (on-floor box-I) (on box-2 table-3)
(on-floor box-2)
(robot-in room-I) (robot-on table-I) (robot-at table-I) (open door-a)
(robot-in room-2) (robot-on table-2) (robot-at table-2) (open door-b)
(robot-in room-3) (robot-on table-3) (robot-at table-3) (closed door-a) level 0
(robot-at door-a) (closed door-b)
(robot-on-floor) (robot-at door-b)
(b) Abstraction for the Extended Robot world.
Fig. 4.17. Examples of fully instantiated hierarchies. The first hierarchy is for
Drilling problems with two drill bits and two parts. The second is for the Robot
world (Figure 3.30a) with two boxes, three rooms, two doors, and three tables.

(at <box> <location» (at <box> <table»


(at <box> <door»

(robot-at <location» (robot-at <table»


(robot-at <door»
(a) Types of objects. (b) Refinement of nonstatic predicates.

I (within <location> <room» I


static level
(not refined)
(in <box> <room» (on-floor <box»
(at <box> <table» level I
(at <box> <door» (on <box> <table»
(robot-in <room» (robot-on-floor) (open <door»
(robot-at <table» level 0
(robot-at <door» (robot-on <table» (closed <door»

(c) Hierarchy of refined predicates.

Fig. 4.18. Partial instantiation in the Extended Robot Domain (Figure 3.30). The
system "instantiates" predicates with leaf types and arranges them into a hierarchy.
4.3 Partial instantiation of predicates 151

The partial instantiation takes less time than the construction of a literal
hierarchy, but it is not immune to explosion in the number of predicates.
Furthermore, it does not always prevent the collapse of a hierarchy. If we
apply it to the Drilling Domain with the predicate (has <feature> <part»,
then Abstractor produces the three-level hierarchy in Figure 4.7(b). To build
a four-level hierarchy, we have to replace <feature> with specific objects.
Minimal partial instantiation. We have shown a major trade-off in con-
structing ordered hierarchies: general predicates allow fast generation and
compact storage of an abstraction graph, whereas a full or partial instantia-
tion leads to a finer-grained hierarchy. We have developed an algorithm for
finding the right partial instantiation, which prevents unnecessary constraints
without a combinatorial explosion; we give its specification in Figure 4.21.
We have named it Refiner for its ability to refine the generality of predicates.
The algorithm constructs a hierarchy of literals, but it does not encode
literals explicitly. The encoding of the abstraction graph includes sets of liter-
als, subset relations between them, and constraints on their abstraction levels
(Figure 4.22); we call it an instantiation graph.
Refiner generates the minimal partial instantiation that prevents over-
constraining of the hierarchy. For example, its application to the Drilling,
Robot, and Logistics Domains leads to the hierarchies in Figure 4.20. We de-
scribe the encoding of an instantiation graph, basic operations on the graph,
and their use in constructing a hierarchy.

4.3.2 Instantiation graph

The structure of an instantiation graph is similar to the constraint graph


in Section 4.2.2. Its nodes are sets of literals, and its edges are constraints
on their relative importance. The main difference is that literal sets in an
instantiation graph may not be disjoint; that is, a literal may belong to several
nodes. The graph encoding consists of two main parts (Figure 4.22); the first
part is a collection of literal sets with links denoting subset relations, and the
second is a graph of strongly connected components with constraint edges.
Sets of literals. The nodes of an instantiation graph are typed predicates,
which encode literal sets (Figure 4.22a). A node is defined by a predicate
name and a list of argument types; each element of the list is a simple type,
disjunctive type, or specific object. Recall that a disjunctive type is a union
of several simple types (see Section 2.3.3); for example, the first argument of
(at <pack-Dr-transport> <place» is the disjunctive type (or Package Transport).
We view a typed predicate as the set of its instantiations, and define a sub-
set relation and intersection of predicates. A predicate is a subset of another
predicate if every instantiation of the first predicate is also an instantiation
of the second; for example, (at <pack> <airport» is a subset of (at <pack>
<place». Two predicates intersect if they have common instantiations; for
example, (at <transport> <airport» intersects with (at <van> <place».
152 4. Abstraction

(a) Types of objects.

(at <pack-or-transport> <place»

(in <pack> <transport»

(b) Refinement of predicates.


(within <place> <city» static level
(within <van> <city» (not refined)
(at <pack> <post» (in <pack> <van»
(at <pack> <airport» (in <pack> <plane» level 2

(at <plane> <airport» level!


(at <van> <post» (at <van> <airport» level 0
(c) Abstraction hierarchy.
Fig. 4.19. Partial instantiation in the Logistics Domain given in Figure 3.43.

static level
(has hole <part» level 2
(has spot <part» (has paint <part» level!
(holds-part <part» (no-part)
level 0
(holds-drill <drill-bit» (no-drill)
(a) Drilling abstraction.
(within <location> <room»
(in <box> <room» (on-floor <box»
(at <box> <location» (on <box> <table» levell
(robot-in <room» (robot-on-floor) level 0
(robot-at <location» (robot-on <table»
(b) Robot abstraction.
(within <place> <city»
(within <van> <city»
(at <pack> <place» level 2
(in <pack> <transport»
(at <plane> <airport» levell
(at <van> <place» level 0
(c) Logistics abstraction.
Fig. 4.20. Results of applying Refiner to (a) Drilling Domain with the predicate
(has <feature> <part», (b) Extended Robot world, and (c) Logistics Domain.
4.3 Partial instantiation of predicates 153

Type of description change: Generating more specific predicates.


Purpose of description change: Maximizing the number of levels III an ordered
hierarchy, while avoiding too specific instantiations.
Use of other algorithms: The Abstractor algorithm, which generates hierarchies
based on partially instantiated predicates.
Required input: Description of the operators and inference rules.
Optional input: Selection of primary effects; all possible values for some variables.

Fig. 4.21. Specification of the Refiner algorithm.

(in <pack> <plane» 0 0 (in <pack> <van»


(at <pack> <airport» 0- - - - ->0 (at <pack> <place»
(at <plane> <airport» 0 0 (at <van> <place»
(a) Sets of literals.

(in <pack> <Plane>~o ~(in <pack> <van»


(at <pack> <airport» 0 0 (at <pack> <place»
(at <plane> <airport» 0 0 (at <van> <place»
(b) Component graph.

(at pack-I airport-2) (at pack-I post-I)

(at pack-2 airport-I) (at pack-2 post-I)

(at pack-2 airport-2)

(at plane-I airport-I) 0 o 0 (at van-I airport-I)


(at plane-I airport-2) 0 o (at van-I airport-2)
(at van-I post-I)
(c) Example of an implicit literal graph.
Fig. 4.22. Instantiation graph for the Logistics Domain. Its encoding includes
(a) nonstatic typed predicates with subset links between them, and (b) components
and constraint edges. It corresponds to (c) an implicit graph of constraints between
literals. The dashed line shows a subset link, whereas solid lines are constraint edges.
154 4. Abstraction

pred typed predicate, which is a node of the graph


disj-type disjunctive type, which is an argument type in a predicate
simple-type simple type, which is part of a disjunctive type
compt strongly connected component of the graph
compt[predJ component that contains a given predicate pred
(comptl' compt2 ) constraint edge from the first to the second component

Fig. 4.23. Notation in the pseudocode of Refiner in Figures 4.24-4.28.

When Refiner generates a graph, it identifies all subset relations between


nodes and encodes them by directed links. The graph in Figure 4.22(a) in-
cludes a subset link from (at <pack> <airport» to (at <pack> <place».
Components and constraints. The edges of the graph determine relative
positions of typed predicates in the hierarchy, and strongly connected com-
ponents correspond to abstraction levels. Refiner constructs the hierarchy by
a sequence of modifications to a component graph (Figure 4.22b), which con-
sists of connected components and constraint edges between them. It initially
creates a separate component for every node of the instantiation graph and
then searches for cycles of constraint edges. When the algorithm identifies a
cycle, it merges all components that form this cycle.
We view the component graph as a compact encoding of an implicit literal
graph, whose nodes include all instantiations of nonstatic predicates. This
implicit graph has an edge from h to 12 if the instantiation graph includes an
edge from one of the components that contain h to some component with 12 .
For example, suppose that the Logistics world consists of two airports, one
post office, one airplane, one van, and two packages. Then, the component
graph in Figure 4.22(b) corresponds to the literal graph in Figure 4.22(c).
If some typed predicate is in a cycle of constraint edges, then the lit-
eral graph includes paths from every instantiation of this predicate to every
other instantiation; hence, all its instantiations are on the same level. Refiner
detects predicates that belong to constraint cycles, called looped predicates,
and uses this information to simplify the graph. If a predicate is not looped,
Refiner can distribute its instantiations among several levels of abstraction.

4.3.3 Basic operations

We next describe three basic procedures: verifying a subset relation between


two typed predicates, checking whether given predicates intersect, and propa-
gating constraints from a predicate to its subset. To simplify the description,
we assume that all argument types are disjunctive; that is, we view simple
types as one-element disjunctions. If a disjunction includes specific objects
along with simple types, we view each object as a simple type. In Figure 4.23,
we summarize the notation for main elements of the instantiation graph.
4.3 Partial instantiation of predicates 155

Predicate-Subset(pred 1 , pred2 )
Return "true" if every instantiation of pred1 is also an instantiation of pred2 .
If pred 1 and pred2 have different names or different number of arguments,
then return false.
Let a be the number of arguments in pred1 and pred2 .
Repeat for i from 1 to a:
Let disj-type 1 be the ith argument type in pred1 ,
and disj-type 2 be the ith type in pred2 .
If not Disj-Type-Subset(disj-type 1 , disj-type 2 ), then return false.
Return true.
Disj-Type-Subset( disj-type 1 , disj-type 2)
Check whether disj-type 1 is a subtype of disj-type 2.
For every simple-type 1 in disj-type 1 :
If not Simple-Type-Subset(simple-type 1 , disj-type 2 ), then return false.
Return true.
Simple-Type-Subset(simple-type 1 , disj-type2)
Check whether simple-type 1 is a subtype of disj-type 2 .
For every simple-type 2 in disj-type 2:
If simple-type 1 and simple-type 2 are identical,
or simple-type 1 is a subtype of simple-type 2,
then return true.
Return false.

Fig. 4.24. Subset test for typed predicates.

Predicate-Intersection(pred 1 , pred2)
Return "true" if some instantiation of pred1 is also an instantiation of pred2 .
If pred1 and pred2 have different names or different number of arguments,
then return false.
Let a be the number of arguments in pred1 and pred2.
Repeat for i from 1 to a: •
Let disj-type 1 be the ith argument type in pred1 ,
and disj-type2 be the ith type in pred2.
If not Disj- Type-Intersection( disj-type 1 , disj-type2), then return false.
Return true.
Disj-Type-Intersection( disj-type 1 , disj-type2)
Check whether disj-type 1 and disj-type 2 have common values.
For every simple-type 1 in disj-type 1 :
For every simple-type 2 in disj-type 2:
If simple-type 1 and simple-type 2 are identical,
or simple-type 1 is a subtype of simple-type 2 ,
or simple-type 2 is a subtype of simple-type 1 ,
then return true.
Return false.

Fig. 4.25. Intersection test for typed predicates.


156 4. Abstraction

camptI

campt 2
(b)
Fig. 4.26. Propagation of constraint edges. If a typed predicate pred l is a subset
of pred2, the component of predl inherits all edges from the component of pred2 .
We show the inherited edges by thick lines.

Copy-Edges( compt l , compt2)


Ensure that the first component inherits all incoming and outgoing edges of the second.
For every incoming edge (other-compt, compt2):
If other-compt and comptl are distinct,
and there is no edge (other-compt, comptl),
then add this edge.
For every outgoing edge (compt2' other-compt):
If other-compt and comptl are distinct,
and there is no edge (comptl' other-compt) ,
then add this edge.

Fig. 4.27. Propagation of constraint edges in the component graph.

Subset and intersection tests. We give procedures for testing the subset
relation and intersection in Figures 4.24 and 4.25. The first algorithm checks
that given predicates have the same name and equal number of arguments,
and every argument type in the first predicate is a subset of the corresponding
type in the second. The other algorithm verifies that every type in the first
predicate intersects with the corresponding type in the second. The running
time of these tests depends on the number a of arguments in the given predi-
cates, and on the number of simple types in each disjunction. If the maximal
length of a type disjunction is d, the complexity of both tests is O(a· d 2 ).
Inheritance of constraint edges. Consider two typed predicates, pred1
and pre~, that belong to distinct components, and suppose that pred1 is a
subset of pred2 (Figure 4.26a). We may copy all incoming and outgoing edges
of pred2 's component to pred1's component (Figure 4.26b) without affecting
the implicit literal graph. We say that pred1 inherits the edges of pred2 . The
insertion of inherited edges helps to simplify the component graph. In Fig-
ure 4.27, we give a procedure for inserting edges; its running time is propor-
tional to the number of incoming and outgoing edges of pred2 's component.
4.3 Partial instantiation of predicates 157

4.3.4 Construction of a hierarchy

Refiner constructs a hierarchy in three steps: generating an initial graph,


adding subset links, and identifying strongly connected components. We give
its pseudocode in Figure 4.28 and illustrate the construction in Figure 4.29.
Initial constraint graph. The first step of Refiner is similar to that of
Abstractor; it creates a graph of typed predicates and adds constraints for
all operators and inference rules. The nodes of the initial graph correspond
to effects and nonstatic preconditions of operators and inference rules. The
graph may have cyclic edges, which point from a node to itself. If a predicate
has a cyclic edge, all its instantiations are on the same level. In Figure 4.29(a),
we show the initial graph for the Logistics Domain.
Subsets and initial components. The next step is identification of sub-
sets and intersections among typed predicates (see Identify-Subsets in Fig-
ure 4.28). The system applies the Predicate-Subset and Predicate-Intersection
tests to every pair of predicates and adds the appropriate subset links (Fig-
ure 4.29b). If it finds a pair of intersecting predicates, neither of which is a
subset of the other, then it inserts a two-way constraint edge between them,
thus ensuring that they are on the same level.
After completing the subset and intersection tests, Refiner initializes the
component graph by creating a separate component for every predicate (see
Initialize-Components in Figure 4.28). It does not copy cyclic edges to the new
graph; instead, it marks the nodes with these edges as looped predicates. In
Figure 4.29( c), we show the initial component graph for the Logistics Domain.
Strongly connected components. The final stage is finding the connected
components (Figure 4.29d-i). The system repetitively applies two procedures,
Combine-Components and Add-Subset-Edges (Figure 4.28). The first proce-
dure finds the strongly connected components (Figure 4.29d) and modifies
the component graph (Figure 4.2ge). The second procedure iterates through
the predicates and propagates constraints to their subsets. If pred is a looped
predicate, the procedure adds the two-way constraints between pred and its
subsets, thus ensuring that they are on the same level (Figure 4.29f). Other-
wise, it calls Copy-Edges to add the appropriate constraints for pred's subsets.
Running time. The construction of an initial graph (Step 1 of Refiner in
Figure 4.28) has the same complexity as the similar step of Abstractor. If all
inference rules have primary effects, the complexity of Step 1 is O(E + P);
if not, its complexity may be superlinear (see Section 4.2.2). In practice, the
execution on a Sun Sparc 5 takes about (P + E) . 14.10- 4 seconds.
The running time of Step 2 depends on three factors: the number N of
typed predicates, the maximal number a of arguments in a predicate, and the
maximal length d of a type disjunction. The complexity of Identify-Subsets is
O(a· d2 . N2), and that of Initialize-Components is O(N2). In practice, these
two procedures together take about (2 . a . d 2 + 5) . N 2 . 10-4 seconds.
158 4. Abstraction

Refiner
1. Build the initial constraint graph; its nodes are typed predicates (Figure 4.29a).
2. Call Identify-Subsets (Figure 4.29b) and Initialize-Components (Figure 4.29c).
3. Repeat the computation until Add-Subset-Edges returns false:
Call Combine-Components and Add-Subset-Edges (Figure 4.29d-i).

Identify-Subsets
For every two distinct predicates pred1 and pred2 in the graph:
Call Predicate-Subset(pred 1 , pred2 ) and Predicate-Subset(pred 2 , pred1 ).
If both tests return true (pred 1 and pred2 are identical):
Make pred1 inherit all incoming and outgoing edges of pred2 .
If there is an edge between pred 1 and pred2 in either direction,
then add the cyclic edge (pred 1 , pred1 ) .
Remove pred2 from the graph.
Else, if pred 1 is a subset of pred2 , then add a subset link from pred 1 to pred2 .
Else, if pred2 is a subset of pred 1 , then add a subset link from pred2 to pred1 .
Else, if Predicate-Intersection(pred 1 , pred2 ) ,
then add two constraint edges: (pred 1 , pred2 ) and (pred2 , pred 1 ).
(Thus, pred1 and pred2 will be in the same component.)

Initialize-Components
For every predicate pred in the graph:
Create a one-element component containing pred.
If the graph includes the cyclic edge (pred, pred) ,
then mark pred as a looped predicate.
For every edge (pred 1 , pred2 ) in the graph:
If pred1 and pred2 are distinct,
then add the corresponding edge (compt[pred 1 ], compt[pred2 ]).

Combine-Components
Identify strongly connected components of the constraint graph (Figure 4.29d,g)
and combine old components into the resulting new components (Figure 4.2ge,h).
For every predicate pred in the graph:
If compt[pred] contains more than one predicate,
then mark pred as a looped predicate (it is in a loop of constraint edges).

Add-Subset-Edges
For every predicate pred in the graph:
For every predicate s-pred that is a subset of pred:
If pred and s-pred are in the same component,
then remove s-pred from the graph (Figure 4.29i).
Else, if pred is a looped predicate,
add edges (compt[pred] , compt[s-pred]) and (compt[s-pred], compt[pred]).
(Thus, pred and s-pred will be in the same component; Figure 4.29f).
Else:
Call Copy-Edges( compt[ s-pred], compt[pred]).
If there is an edge between compt[pred] and compt[s-pred])
in either direction,
then mark s-pred as a looped predicate.
Return true if the procedure has added at least one new edge, and false otherwise.

Fig. 4.28. Building a hierarchy of partially instantiated predicates.


4.3 Partial instantiation of predicates 159

(in <pack> (in <pack> (in <pack> (in <pack>


<plane» <van» <plane» <van»
(at <pack> at <pack> (at <pack> (at <pack>
<airport» <place» <airport» <place»
(at <plane>
<airport»
(a) (b)

<paCk>~ i ( i n<van»
(in<plane» <pack> (in<plane»
<pack> / ~/;\ :'I"~"'\
i :.
"
\.(in<van»
,
<pack>
(at <pack> 0- _ _ _ _ _ (at <pack> (at <pack> ( 0- ,~_ ~\~ _ \ (at <pack>
<airport> ) <place» <airport» \_. __ ' ',_,---' <place»
(at <plane> 0 0 (at <van> (at <plane> 0 0 (at <van>
<airport» <place» <airport» <place»
(c) (d)

<paCk>~ ~(in<van»
(in<plane» <pack>
(in<;r:~~~(i~~~~~f
(at <pack> ______ (at <pack> (at <pack> ______ (at <pack>
<airport» <place» <airport» <place»
(at <plane> 0 0 (at <van> (at <plane> 0 0 (at <van>
<aIrport» <place» <aIrport> ) <place»
(e) (f)

(in<paCk>~'/~"""~';"\(in
<plane»/
<pack>
... <van»
(in < p a C k > g o(in <pack>
<plane» <van»
(at <pack>! _ _ _ _ _ _ \(at <pack> (at <pack> 0- _____ ->0 (at <pack>
<airport>)',.... _.________ . __ ) <place» <airport» <place»
(at <plane> 0 0 (at <van> (at <plane>
<airport»
° 0 (at <van>
<place»
<airport» <place»
(g) (h)

(in < p a C k > g o(in <pack>


<plane» <van»
o (at <pack>
<place»
(at <plane> 0 0 (at <van>
<airport» <place»
(i)
Fig. 4.29. Application of Refiner to the Logistics Domain. We show the constraint
edges by solid lines, and the subset link by dashes. The algorithm (a) gener-
ates an initial graph, (b) adds subset links, (c) makes one-element components,
(d-h) combines connected components, and (i) removes redundant predicates.
160 4. Abstraction

Refiner makes at most N iterations at Step 3 because every iteration


reduces the number of components. The complexity of Combine-Components
is O(N2), and that of Add-Subset-Edges is O(N3); hence, the worst-case time
of Step 3 is O(N4). Experiments have shown that its average-case complexity
is O(N2), and its execution time is between N 2 .10- 3 and N 2 .3.10- 3 seconds.

4.3.5 Level of a given literal

When PRODIGY performs an abstraction search, it uses the component graph


to determine the levels of instantiated preconditions. For every precondi-
tion literal, it finds a matching typed predicate and looks up the level of
its component. If a literal matches several predicates, PRODIGY selects the
most specific predicate among them. We describe a data structure for a fast
look-up of literal levels (Figure 4.30).
Sorting the typed predicates. The system builds a sorted array of pred-
icate names; each entry of the array points to a list of all typed predicates
with this name. For example, the "at" entry in Figure 4.30 points to a list of
two predicates, (at <plane> <airport» and (at <van> <place».
If all predicates in a list belong to the same component, the system re-
places them with a general predicate, defined by name without argument
types; for example, it replaces (in <pack> <plane» and (in <pack> <van»
with "in." If not, it topologically sorts the list by the subset relation; that is,
if pred1 is a subset of predz, then pred1 precedes predz in the sorted list.
In Figure 4.31, we give a procedure for generating an array of predicate
lists, called Sori-Predicates. Its running time depends on the number N of
typed predicates and on the number S of subset links. The worst-case com-
plexity of Sort-Predicates is O(N . 19 N + S); its execution time on a Sun
Sparc 5 is at most (2 . N + 3 . S) . 10- 5 seconds.
Finding components. The Find-Component procedure (Figure 4.31) looks
up the component of a given literal. If the graph includes only one predicate
with the matching name, the look-up time is O(lg N); in practice, it takes
up to 3· 10- 5 seconds. If the procedure has to search through a list of sev-
eral predicates with a common name, then the look-up time depends on the
length L of the list, on the number a of the literal's arguments, and on the
maximal length d of type disjunctions. The look-up complexity is O( a . d . L),
and its empirical time is within a . d . L . 5 . 10- 5 seconds.

4.4 Performance of the abstraction search

We have tested abstraction in PRODIGY, using the same five domains as in the
experiments with primary effects (see Section 3.7). A bstractor has constructed
hierarchies for the Robot world (Figure 3.30), Sokoban puzzle (Figure 3.35),
and Logistics world (Figure 3.43), but it has failed in the other two domains.
4.4 Performance of the abstraction search 161

Sorted array of predicate lists


at in within
•• I Abstraction hierarchy
i
L-_.I-_L---7,-_L--W- i""'thi-·n-_-'u _u _u uc_~t~!~~:vel ~ 3

~
in - - -- - - - -- - - - - - - - - - -- -- - -- - - -> in 2
~
(at <plane> <airport» - - - - - - - - - - - - - - - - - - - - - - - (at <plane> <airport»
~
(at <van> <place» - - - - - - - - - - - - - - - - - - - - - - - - - - (at <van> <place» 0

Fig. 4.30. Data structure for determining the abstraction levels of literals. The
system constructs a sorted array of predicate names; each entry contains a list of
typed predicates with a common name. Every predicate points to its component in
the instantiation graph, and the component's number is the abstraction level.

Sort-Predicates
For every predicate name in the graph:
If all predicates with this name are in the same component:
Replace them with a generalized predicate,
defined by name without argument types.
Create a one-element list with the generalized predicate.
Else (these predicates are in several different components):
Topologically sort them by the subset relation.
Store the resulting list of sorted predicates.
Create an array of the generated lists
(each list contains all predicates with a certain name).
Arrange the lists in alphabetical order by name.
Find-Component(l)
The literal l is an instantiation of some predicate in the graph.
Identify the list of predicates with the matching name.
If it contains a single predicate pred, then return compt[pred).
For every pred from the list, in the subset-to-superset order:
If Predicate-Subset(l, pred), then return compt[pred).

Fig. 4.31. Identifying the components of literals. The first procedure constructs an
array of predicates, and the second uses the array to look up a literal's component.

Extended Robot Domain. If Abstractor does not use primary effects, it


fails to construct a hierarchy for the Robot world. On the other hand, the
automatically selected effects lead to the hierarchy in Figure 4.32(a). We
have tested PRODIGY on the nine problems in Table 3.7; for every problem,
we have run it without a cost bound and with two different bounds. The loose
bound is twice the optimal solution cost, whereas the tight bound equals the
optimal cost. The abstraction improves the efficiency for the problems that
require moving boxes; it does not affect the search for the other problems. In
Table 4.1, we give the times for solving the box-moving problems without and
with abstraction; the time-reduction factor varies from 1 to almost 10000.
162 4. Abstraction

(within <location> <room» static


level
(in <box> <room» (on-floor <box»
(at <box> <location» (on <box> <table» levell
(robot-in <room» (robot-on-floor) (open <door» level 0
(robot-at <location» (robot-on <table» (closed <door»
(a) Extended Robot abstraction.

Type Hierarchy (blocked <x> <y» static


level
~C~inatel (at <rock> <x> <y»
~~----------~~~
levell
~~
(dozer-at
________ <x> <y»
~~--L
level 0
(b) Sokoban abstraction.

(within <place> <city» static


(within <van> <city» level
(at <pack> <place»
(in <pack> <transport» level 2

(at <plane> <airport» levell


~--~--~------~~----~
(at <van> <place» level 0
(c) Logistics abstraction.

Fig. 4.32. Automatically generated hierarchies for the (a) Extended Robot Do-
main, (b) Sokoban Domain, and (c) Logistics Domain.

Table 4.1. Performance in the Extended Robot Domain on four box-moving prob-
lems given in Table 3.7. We show times in seconds for the search without abstraction
("w/o abs") and with the automatically generated abstraction ("with abs").
# No Cost Bound Loose Bound Tight Bound
wlo abs with abs wlo abs with abs wlo abs with abs
4 0.16 0.13 0.17 0.14 0.16 0.15
7 2.67 1.26 2.49 1.31 3.43 1.56
8 >1800.00 0.19 >1800.00 0.21 >1800.00 1660.54
9 >1800.00 0.26 >1800.00 0.26 >1800.00 >1800.00

Sokoban Domain. If Abstractor has no information about primary effects,


it fails to generate a Sokoban hierarchy. The use of primary effects yields
the three-level hierarchy in Figure 4.32(b). We have applied PRODIGY to
the same 320 problems as in the experiments of Section 3.7.2, with no cost
bounds. The results of abstraction search are summarized in Figure 4.33
(dashed lines). We also show the performance with primary effects and no
abstraction (solid lines), as well as the results without primary effects (dotted
lines). The abstraction increases the percentage of solved problems, especially
for larger grids; the time-reduction factor varies from 1 to greater than 100.
Logistics Domain. The operators in the Logistics Domain do not have
unimportant effects; hence, Chooser and Completer do not improve the per-
formance. On the other hand, Abstractor constructs a four-level hierarchy
4.4 Performance of the abstraction search 163

(a) 4 x 4 grid. (b) 6 x 6 grid.


100% r---~---~---"'" 100% ,----~-------....,
--".-_.--------
{

80% I 80%
I
r---------
~ 60%:r--
III
60%
/

~ I

840% 40% I
:l
III

20%
0% ........................................................... 20%fC.............................................. .
0% ......
o 20 40 60 o 20 40 60
time limit (CPU sec) time limit (CPU sec)

(c) 8 x 8 grid. (d) 10 x 10 grid.


40% r-------~----, 40% r---~---~----,

-_/--
Sl30% r - J 30%
- 1--
!)! r I
III I
~20% I 20%

%·l. . . . . . . . . . . . . . . . . . . . . . . . .
8 J r---------~
10% _____ 1

iil
10 ,J
0% r:.:.~ __ ~ ___ ~ _ _---' 0% :z:........................................................
o 20 40 60 o 20 40 60
time limit (CPU sec) time limit (CPU sec)

Fig. 4.33. Performance in the Sokoban Domain. We plot the percentage of prob-
lems solved by different time limits in three series of experiments: with primary ef-
fects and abstraction (dashed lines), with primary effects and no abstraction (solid
lines), and without primary effects (dotted lines).

(Figure 4.32c) and reduces the search. We have tested this hierarchy on five
hundred problems with different numbers of cities, vans, airplanes, and pack-
ages. We have experimented without cost bounds and with heuristically com-
puted bounds. The computation of cost bounds is based on the number of
packages and cities, and the resulting bounds range from the tight to 50%
greater than the tight.
The results without cost bounds are summarized in Figure 4.34, and those
with the heuristic bounds are in Figure 4.35. We show the percentage of prob-
lems solved by different time limits, without abstraction (solid lines) and with
the four-level hierarchy (dashed lines). In Figure 4.36, we show the depen-
dency between the problem size and the percentage of unsolved problems. If
we do not use cost bounds, the abstraction increases the percentage of solved
problems; the time-reduction factor ranges from 1 to 100. On the other hand,
the abstraction gives little advantage in the experiments with cost bounds.
Summary. The experiments have confirmed that ordered hierarchies reduce
the search. If Abstractor constructs a hierarchy, it usually improves the effi-
164 4. Abstraction

(a) 1 package. (b) 2 packages.


100% -------------- 100%
I
I
---------
80% 1(" 80% I
~ I
;;; 60% 60%
u; I

g40%
Q)

40%
u;
20% 20%

0% 0%
0 20 40 60 0 20 40 60
time limit (CPU sec) time limit (CPU sec)

(c) 3 packages. (d) 4 packages.


100% 100%

80% 80%
Q)

~u; 60% -,.".-------- 60%


u; I -' --------
Q)
840% 40% /
~
u;
I
:!
20% 20%

0% 0%
0 20 40 60 0 20 40 60
time limit (CPU sec) time limit (CPU sec)

(e) 5 packages.
100%r-------~---------------,

80%
.$
!!! 60%
UI
UI
Q)

840%
~ 1 __~--------------------_1
UI 20% IV"""'

O%~------~------~~------~
o 20 40 60
time limit (CPU sec)

Fig. 4.34. Performance in the Logistics Domain without cost bounds. We show the
percentage of problems solved by different time limits, without abstraction (solid
lines) and with the automatically generated abstraction (dashed lines).
4.4 Performance of the abstraction search 165

(a) 1 package. (b) 2 packages.


100% 100%

80% 80%
Ql

1§ 60%
Ul
Ul
I 60%
-------------
Ql
840%
:l
40% I
Ul

20% 20%

0% 0%
0 20 40 60 0 20 40 60
time limit (CPU sec) time limit (CPU sec)

(c) 3 packages. (d) 4 packages.


100% 100%

80% 80%
~
~ 60% 60%
Ul
Ul
Ql
840% 40% ----------
- -------------- t
..- .....
:l
Ul

20% V 20%

0% 0%
0 20 40 60 0 20 40 60
time limit (CPU sec) time limit (CPU sec)

(e) 5 packages.
100%r-------~--------~------.

80%
.l!l
~ 60%
U)

13
840%
:l
U)
rr------------
20% V

O%L-------~--------~------~
o 20 40 60
time limit (CPU sec)

Fig. 4.35. Performance in the Logistics Domain with cost bounds. We give results
without abstraction (solid lines) and with abstraction (dashed lines).
166 4. Abstraction

ciency; on the negative side, it often fails to generate a hierarchy. The observed
efficiency improvement is similar to that in Knoblock's [1993) experiments
with ALPINE and to the ABTWEAK results by Yang et al. [1996).
We summarize the results in Table 4.2; the upward arrows (if) denote
efficiency improvement, the two-way arrow (~) indicates mixed results, and
the dashes (-) mark the domains that do not allow the construction of ordered
hierarchies. The abstraction improves the performance in the Extended Robot
world, Sokoban puzzle, and Logistics problems without cost bounds. If we
use it for Logistics problems with near-tight bounds, the results vary from a
hundred-fold reduction to a hundred-fold increase of the search time.

(a) No cost bounds. (b) With cost bounds.


100% 100%

80% 80%
Ql
".---
Ti! 60% 60%
~
~ 40% 40%
.l!! /
/

20% / 20%
/

OO/O~ ____~____~____~__~
234 5 234 5
number of packages number of packages

Fig. 4.36. Percentage of Logistics problems that are not solved in 60 seconds,
without abstraction (solid lines) and with abstraction (dashed lines).

Table 4.2. Summary of the experiments with abstraction. The upward arrow (11")
indicates that abstraction reduces the search, whereas the two-way arrow (:U:) marks
the domain with mixed results. When we use Abstractor without primary effects,
it constructs a hierarchy only in the Logistics Domain. When we apply it with the
automatically selected effects, it generates hierarchies for two more domains.
Domain Overall Search-Time
Result Reduction
without primary effects
Extended Robot - -
Machining - -
Sokoban - -
Extended STRIPS - -
Logistics :U: 0.01-100
with primary effects
Extended Robot 11 1-10000
Machining - -
Sokoban 11 1-100
Extended STRIPS - -
5. Summary and extensions

An immediate research direction is to investigate the integration of these


different learning strategies within PRODIGY. This will entail further work on
incorporating other learning methods into the nonlinear planner framework.
- Manuela M. Veloso [1994], Planning and Learning by Analogical Reasoning.

We have described techniques for reducing PRODIGY search based on primary


effects and abstraction. First, we have presented algorithms for the selection
of primary effects, and given an analytical and empirical evaluation of the
resulting search reduction. Second, we have extended Knoblock's abstraction
technique to the full domain language of PRODIGy4.
We now give two procedures for enhancing the abstraction. The first pro-
cedure chooses primary effects that improve an abstraction hierarchy. We out-
line a heuristic for selecting appropriate effects and show that it reduces the
search in the Machining Domain and Extended STRIPS world (Section 5.1).
The second technique allows construction of a specialized domain description
for each given problem; it improves the efficiency in three out of the five test
domains (Section 5.2). In conclusion, we discuss the results of the work on
speed-up techniques and identify some open problems (Section 5.3).

5.1 Abstracting the effects of operators

The procedure for abstracting effects, called Margie, is the result of joint work
with Yang. The underlying idea is to increase the number of abstraction levels
by choosing appropriate primary effects [Fink and Yang, 1992a].
If a domain has no primary effects, Abstractor assigns the same impor-
tance to all effects of an operator or inference rule (see Section 4.1.5). Intu-
itively, it may abstract some preconditions of an operator, but it does not
abstract the effects. If the system first invokes Chooser and Completer, it may
abstract side effects of operators and inference rules; however, the selected
primary effects do not always lead to an abstraction hierarchy.
The purpose of Margie is to choose primary effects that improve the ab-
straction. It may optionally utilize pre-selected primary effects and a desirable
cost increase; we give its specification in Figure 5.1. The implementation is

E. Fink, Changes of Problem Representation


© Springer-Verlag Berlin Heidelberg 2002
168 5. Summary and extensions

Type of description change: Selecting primary effects of operators and inference


rules.
Purpose of description change: Maximizing the number of levels in Abstractor's
ordered hierarchy, while ensuring near-completeness and a limited cost increase.
Use of other algorithms: The Abstractor algorithm, which builds hierarchies for the
selected primary effects.
Required input: Description of the operators and inference rules.
Optional input: Cost-increase limit Cj pre-selected primary and side effects.

Fig. 5.1. Specification of the Margie algorithm.

based on Chooser, combined with a version of Abstractor that produces hier-


archies for intermediate selections of primary effects.
Recall that, when Chooser picks an operator for achieving some predicate,
it may have to select among several operators. Similarly, when choosing an
additional primary effect of an operator, it may need to select among several
effects. Margie prefers the choices that maximize the number of abstraction
levels; it generates an abstraction graph for each alternative, and makes the
choice that maximizes the number of components. If several choices lead to
the same number, it prefers the graph with fewer constraint edges.
To summarize, Margie consists of Chooser (Figure 3.14) and the two pro-
cedures in Figure 5.2, called Select-Step and Select-Effect, which serve as
Chooser's selection heuristics. It does not ensure the completeness of search
with primary effects; hence, we need to call Completer after Margie.
For example, consider the application of Margie to an Extended Tower-
of-Hanoi Domain, which includes the six operators given in Figures 1.5(b)
and 1.9(a). First, the algorithm invokes Choose-Initial, which marks all effects
of move-small, move-medium, and move-large as primary. After Margie
makes these choices, the abstraction graph is as shown in Figure 5.3(a). Sec-
ond, the Choose-Extra subroutine picks primary effects of move-sml-mdm,
move-sml-Irg, and move-med-Irg.
When Choose-Extra processes move-sml-mdm, the choice is between
"add small" and "add medium." If "add small" is selected as a primary effect,
Abstractor must ensure that level(small) ~ level(medium); hence, it adds a
new constraint edge (Figure 5.3b), which reduces the number of components.
On the other hand, if "add medium" is primary, then Abstractor does not add
new constraints. Since the goal is to maximize the number of components,
Margie prefers the "add medium" effect. Then, it applies the same technique
to move-sml-Irg and move-med-Irg, and chooses "add large" as a primary
effect of both operators.
After running Margie, we apply Completer and obtain the selection in
Figure 1.9(c). Finally, we invoke Abstractor, which constructs the three-level
hierarchy in Figure 5.3(c).
5.1 Abstracting the effects of operators 169

Select-Step(Steps, pred)
The input is a list of operators and inference rules. Steps. with a common candidate
effect pred. The output is an operator or rule for achieving pred as a primary effect.
stePbest := none (selected operator or inference rule)
n-compts best := 0 (corresponding number of components in the abstraction graph)
n-edges best := 0 (corresponding number of constraint edges between components)
For every operator and inference rule step in Steps:
Call Evaluate- Choice( step, pred) to determine n-compts and n-edges.
If Better-Choice(n-compts, n-edges, n-compts best , n-edges best ),
then stePbest := step; n-comptsbest := n-compts; n-edgesbest := n-edges.
Return stePbest.
Select-Effect ( step, Preds)
The input is an operator or inference rule. step. and a list of its candidate
effects. Preds. The output is the effect that should be primary.
predbest := none (selected effect)
n-compts best := 0
n-edgesbest := 0
For every effect pred in Preds:
Call Evaluate-Choice(step, pred) to determine n-compts and n-edges.
If Better-Choice(n-compts, n-edges, n-compts best , n-edgesbest) ,
then predbest := pred; n-compts best := n-compts; n-edgesbest := n-edges.
Return predbest '
Evaluate-Choice( step, pred)
Temporarily select pred as a primary effect of step.
Call Abstractor to generate a hierarchy for the current selection.
Let n-compts be the number of components in the resulting graph,
and n-edges be the number of constraint edges between components.
Change pred back to a candidate effect.
Return n-compts and n-edges.
Better-Choice(n-compts, n-edges, n-compts best , n-edgesbest)
If either n-compts> n-compts best ,
or n-compts = n-compts best and n-edges < n-edges best ,
then return true.
Else, return false.

Fig. 5.2. Heuristics for selecting primary effects in the Margie algorithm.

large-on large-on

Do
/'0' (large-on <peg» level 2

medium-on small-on
I
I

'-'
:Do
I
/

medium-on small-on
/ (medium-on <peg»
(small-on <peg»
level!
level 0

(a) Optimal graph. (b) Suboptimal graph. (c) Abstraction hierarchy.

Fig. 5.3. Intermediate abstraction graphs (a, b) and final hierarchy (c) in the
Extended Tower-of-Hanoi Domain, given in Figures 1.5(b) and 1.9(a). Margie uses
intermediate graphs to evaluate alternative choices of primary effects.
170 5. Summary and extensions

We have tested Margie using the same domains as in the experiments with
primary effects (Section 3.7) and abstraction (Section 4.4). For every domain,
we have compared the utility of the three strategies shown in Figure 5.4. The
first strategy is to apply Margie, Completer, and Abstractor (Figure 5.4a).
The second is to utilize the primary effects and abstraction generated with-
out Margie (Figure 5.4b). The third is to use no primary effects or abstraction
(Figure 5.4c). Margie has improved the performance in the Machining Do-
main and Extended STRIPS world. hen we have applied it to the other three
domains, it has selected the same primary effects as Chooser.
Machining domain. We first give the results for the Machining Domain,
introduced in Section 3.6.2. If the system does not use Margie, the selected
primary effects do not allow ordered abstraction. We have shown this se-
lection in Figure 3.31, and we partially reproduce it in Figure 5.5(a). If we
apply Margie, it chooses the primary effects in Figure 5.5(b), which allow the
construction of a two-level hierarchy (Figure 5.5c).
In Figure 5.6, we give the results of search without a cost bound. The
graphs show the running time and solution quality in three series of exper-
iments: with the two-level abstraction (dashed lines), with primary effects
and no abstraction (solid lines), and without primary effects (dotted lines).
Abstraction has improved the running time and enabled the system to find
optimal solutions to all problems. The time-reduction factor ranges from
1.3 to 1.6, with a mean of 1.39; the factor of solution-cost reduction is be-
tween 1.3 and 1.5, with an average of 1.42.
In Figure 5.7, we show the analogous results for search with loose cost
bounds. These bounds have not affected the performance of abstraction
search; however, they have increased the time in the other two cases (Fig-
ure 5.7a). For most problems, abstraction has improved the search time by a
factor of 1.3 to 2.0 and reduced the solution cost by a factor of 1.3 to 1.5.
We have also run PRODIGY with tight cost bounds, thus forcing search
for optimal solutions. The results of abstraction search are identical to the
results without cost bounds. On the other hand, if we do not use abstraction,
PRODIGY does not solve any problems within the 600-second time limit.

Extended STRIPS domain. We have described the STRIPS world in Sec-


tion 3.7.2, and demonstrated that Chooser and Completer select appropriate
primary effects; however, the resulting selection does not allow ordered ab-
straction. Margie constructs a slightly different selection, which leads to a
four-level abstraction. We show the differences between the two selections in
Figure 5.8 and the resulting hierarchy in Figure 5.9.
In Figure 5.10, we summarize the performance with the four-level abstrac-
tion (dashed lines) and without abstraction (solid lines) in three series of ex-
periments: without cost bounds, with loose bounds, and with tight bounds.
Margie's hierarchy reduces the search in all three cases.
We give a different summary of the same results in Figure 5.11. The
horizontal axes show the search time with Margie's abstraction, whereas the
5.1 Abstracting the effects of operators 171

vertical axes give the time without abstraction on the same problems. The
plot includes not only the solved problems, but also the problems that have
caused the interrupt upon reaching the 600-second time limit. The abstraction
search is faster for most problems; thus, most points are above the diagonal.
The ratio of the search time without abstraction to that with abstraction
varies from 0.1 to more than 1000.
lf we run PRODIGY without any selection of primary effects, its perfor-
mance is very poor. It solves only 2% of the problems when searching with
tight cost bounds, and no problems at all without tight bounds.
Summary. We have combined the selection of primary effects with the gen-
eration of an abstraction hierarchy. In Table 5.1, we give a summary of
Margie's performance; it enhances the abstraction in two test domains and
does not affect the search in the other three domains.

initi~ .,J
domam! Margie H Completer HAbstractor r-descnptlon
description L..-_"":::"'_-'
new
d0n:tai~
L..-_---J
(a)
initial ...----....
domain ~ Chooser
description
HCompleter HAbstractor ~description
new
domain
L..-_---J
(b)

Fig. 5.4. Evaluation of Margie. We have compared three strategies: (a) search
with Margie's abstraction, (b) use of abstraction generated without Margie, and
(c) search without primary effects and abstraction.

Table 5.1. Results of using Margie. It reduces the search in two domains, marked
by the upward arrow (1)-), and does not affect the search in the other domains.
Domain Overall Search-Time
Result Reduction
Extended Robot - -
Machining 11' 1.3-2.0
Sokoban - -
Extended STRIPS 11' 0.1-1000
Logistics - -
172 5. Summary and extensions

drill-roughly(<part» polish-roughly(<part»
cut-roughly(<part» Prim: add (polished <part»
Prim: add (cut <part» Prim: add (drilled <part»
del (fmely-drilled <part» del (finely-polished <part»
del (finely-cut <part» del (painted <part»
del (drilled <part» del (polished <part»
Side: del (finely-polished <part» Side: del (finely-painted <part»
Side: del (finely-drilled <part»
del (polished <part» del (painted <part» paint-roughly( <part»
del (finely-polished <part» del (finely-painted <part» Prim: add (painted <part»
del (painted <part» del (finely-painted <part»
del (finely-painted <part» Side: del (polished <part»
del (finely-polished <part»
(a) Selection of primary effects without Margie.

drill-roughly(<part» polish-roughly(<part»
cut-roughly( <part» Prim: add (polished <part»
Prim: add (cut <part» Prim: add (drilled <part»
del (fmely-drilled <part» del (fmely-polished <part»
del (finely-cut <part» del (painted <part»
del (drilled <part» Side: del (polished <part»
del (fmely-polished <part» Side: del (finely-painted <part»
Side: del (fmely-drilled <part»
del (polished <part» del (painted <part» paint-roughly( <part»
del (finely-polished <part» del (finely-painted <part» Prim: add (painted <part»
del (painted <part» del (finely-painted <part»
del (finely-painted <part» del (polished <part»
Side: del (finely-polished <part»
(b) Selection with Margie.
(cut <part» (fmely-cut <part»
(drilled <part» (fmely-drilled <part» level I
(polished <part» (finely-polished <part»
(painted <part» (finely-painted <part» level 0

(c) Abstraction hierarchy.

Fig_ 5.5. Primary effects of the low-quality operations in the Machining Domain.
If we do not use Margie, these operations cause the collapse of an ordered hierarchy.
On the other hand, the effects selected by Margie lead to a two-level hierarchy.

(a) Efficiency of problem solving. (b) Quality of the resulting solutions.


0.8rr--~----~-~--c;!

.,..- -""

O~-~~-~-~-~~ O~-~~-~-~-~~
8 10 12 14 16 18 8 10 12 14 16 18
length of an optimal solution length of an optimal solution

Fig. 5.6. Performance in the Machining Domain without a cost bound. We plot the
results with Margie's primary effects and abstraction (dashed lines), with Chooser's
effects and no abstraction (solid lines), and without primary effects (dotted lines).
The vertical bars mark the 95% confidence intervals.
5.1 Abstracting the effects of operators 173

(a) Efficiency of problem solving. (b) Quality of the resulting solutions.


1000 ,.----~-~---~-~--,

~
~ 100 1..... I 1" ··r····r 60 I' .

~ 10
""c:
0)

·cc:
2
- --- - - - ---
0.1 O"--~~~-~-~-~--'
8 10 12 14 16 18 8 10 12 14 16 18
length of an optimal solution length of an optimal solution
Fig. 5.7. Performance in the Machining Domain with loose cost bounds. The legend
is the same as in Figure 5.6, but the running-time scale is logarithmic.

pick-up( <small>, <room» pick-up(<small>, <room»


Prim: del (arm-empty) Prim: del (arm-empty)
del (in <small> <room» del (in <small> <room»
add (holding <small» add (holding <small»
(forall <other> of type (or Thing Door) Side: (forall <other> of type (or Thing Door)
del (next-to <other> <small») del (next-to <other> <small>))
Side: (forall <other> of type (or Thing Door) (forall <other> of type (or Thing Door)
del (next-to <small> <other») del (next-to <small> <other»)
Cost: I Cost: 1
move-aside( <small>, <room» move-aside(<small>, <room»
Prim: (forall <other> of type (or Thing Door) Prim: (forall <other> of type (or Thing Door)
del (next-to <small> <other») del (next-to <small> <other»)
Side: (forall <other> of type (or Thing Door) (forall <other> of type (or Thing Door)
del (next-to <other> <small») del (next-to <other> <small»)
Cost: 1 Cost: 1
(a) Selection without Margie. (b) Selection with Margie.
Fig. 5.8. Primary effects of two operators in the Extended STRIPS Domain.

Type Hierarchy ----, I


,---_~~~~;:~~I:~R~OO~~==II'_;:=D=O=o;--=I~11 ~ta~
Movable [open
closed
locked
objects
(a)

(in <stable> <room» (connects <door> <room» (fits <key> <door» static
level
(in <large> <room» (next-to <thing> <door» (next-to <door> <thing»
(robot-at <large» (next-to <thing> <other-thing» level 2

(robot-at <stable» level I


(robot-in <room» (robot-at <door» (holding <small» (arm-empty)
(in <small> <room» (robot-at <small» (status <door> <status» level 0

(b)
Fig. 5.9. Abstraction in the Extended STRIPS Domain. We show the object
types (a) and give the hierarchy based on Margie's selection of primary effects (b).
174 5. Summary and extensions

(a) No cost bound. (b) Loose bound.


100% , - - - - - - - - - - - - - - , 100%,--------------,
---- ........ ----
80% 1;----------,
- I 80%

~ 60%
/ /
1 I
60%
I
~ 1 1
8:l 400/0 1 40% 1
In I

20% 20%
0% ""-_ _ _ _ _ _ _ _ _ _ __ J
O%~------------J
0.1 1 10 100 0.1 1 10 100
time limit (CPU sec) time limit (CPU sec)
(c) Tight bound.
50% r---~-----~--,

40%
$
~ 300/0
g:
CD
8200/0
:l
In

1 10 100
time limit (CPU sec)
Fig. 5.10. Results in the Extended STRIPS Domain with Margie's primary effects
and abstraction (dashed lines) and with Chooser's effects (solid lines). We show the
percentage of problems solved by different time limits in three series of experiments:
without cost bounds (a), with loose bounds (b), and with tight bounds (c).

(a) No cost bound. (b) Loose bound.

0' 0'
CD
en 100 51100
:::l :::l
11. 11.
~ ~
CD 10 CD 10
.~ .~

as as
::;; ::;;
.Q
~ ~
1 10 100 1 10 100
with Margie (CPU sec) with Margie (CPU sec)

(c) Tight bound.

0.1 " ' - - - - - - - - - '


0.1 1 10 100
with Margie (CPU sec)
Fig. 5.11. Comparison of the search times without abstraction and with Margie's
abstraction. Since Margie reduces the search, most points are above the diagonal.
5.2 Identifying the relevant literals 175

5.2 Identifying the relevant literals


We have considered algorithms that do not utilize information about partic-
ular problems. When they produce a new domain description, PRODIGY can
employ it for all problems. On the other hand, problem-specific knowledge
often allows more effective choice of primary effects and abstraction.
Knoblock [1994] explored problem-specific abstraction during his work on
ALPINE. He designed a procedure for pinpointing the literals relevant to a
given problem and extended ALPINE to utilize this information. We have also
implemented an algorithm for identifying relevant literals, called Relator, and
enabled Chooser, Completer, Abstractor, and Margie to use relevance data.
Formally, a literal is relevant if it can become a subgoal during search for a
solution, and the purpose of Relator is to compile a set of relevant literals. For
example, consider the Drilling Domain in Figure 4.1 and suppose that we are
solving a drilling problem with the goal (has-spot part-I). Then, the relevance
set includes has-spot, spot-drill, holds-drill, holds-part, no-drill, and no-part.
Relator inputs a list of goals and identifies all literals that may be relevant
to achieving these goals. If the user pre-selects side effects of some operators,
the algorithm utilizes this information. We give a specification of Relator in
Figure 5.12 and pseudocode in Figure 5.13.
The algorithm finds all operators that match the goals, and inserts their
preconditions into the relevance set. Then, it identifies the preconditions of
the operators that achieve the newly added literals. For example, suppose
that we call Relator for the goal "has-spot" in the Drilling Domain. The algo-
rithm determines that the drill-hole operator achieves this goal, and adds its
preconditions, "spot-drill," "holds-drill," and "holds-part," to the relevance set.
Then, it identifies the put-drill and put-part operators, which match the
newly added literals, and adds their preconditions, "no-drill" and "no-part."
We have measured the search time with problem-specific abstraction and
primary effects (Figure 5.14a), and compared it with problem-independent
domain descriptions (Figure 5.14b). Relator allows the construction of finer-
grained hierarchies for the Machining Domain, Sokoban puzzle, and STRIPS
world, and it does not affect the search in the other two domains.
Extended Robot Domain. If the system does not utilize Relator in the
Robot world, it selects the primary effects in Figure 3.30 and generates the
three-level hierarchy in Figure 4.32(a). The application of Relator leads to se-
lecting fewer primary effects and constructing finer-grained hierarchies. For
example, suppose that we need an abstraction for problems that have no dele-
tion goals. Relator selects the relevant primary effects shown in Figure 5.15(a),
which allow the six-level abstraction in Figure 5.15(b).
We have tested this hierarchy on the nine problems from Table 3.6 and
compared it with the problem-independent abstraction (Table 5.2). The six-
level hierarchy reduces the search on problems 6 and 9, increases the search on
problem 7, and performs identically to the problem-independent abstraction
in the other six cases.
176 5. Summary and extensions

Type of description change: Identifying relevant literals.


Purpose of description change: Minimizing the set of selected literals, while includ-
ing all relevant literals.
Use of other algorithms: None.
Required input: Description of the operators and inference rules; list of goal predi-
cates, which may be partially instantiated.
Optional input: Pre-selected side effects.

Fig. 5.12. Specification of the Relator algorithm.

Relator( Goals)
The algorithm inputs a list of goals and returns the set of relevant literals.
It accesses the operators and inference rules, with pre-selected side effects.
New-Literals := Goals (newly added literals)
Relevance-Set := Goals (set of relevant literals)
Repeat while New·Literals is not empty:
Add-Literals := Add-Relevant(New-Literals)
New-Literals := Add-Literals - Relevance-Set
Relevance-Set := Relevance-Set U New-Literals
Return Relevance-Set

Add-Relevant ( N ew-Literals)
Add-Literals := 0
For every literal l in New-Literals:
For every operator and inference rule step that achieves l:
If l is not pre-selected as a side effect of step,
then add the preconditions of step to Add-Literals.
Return Add-Literals.

Fig. 5.13. Identifying the relevant literals for given goals.

initial new goal-


domain specific
description description
(a)

initial r---..., new goal-


domain ., Chooser HCompleterHAbstractorj- inde~n~ent
description descnptIon
(b)
Fig. 5.14. Evaluation of Relator. We have tested a problem-specific abstraction (a),
and compared it with a problem-independent version (b).
5.2 Identifying the relevant literals 177

open«door» go-within-room
Prim: add (open <door» «from-Ioc> <to-loe> <room»
Side: del (closed <door» Prim: add (robot-at <to-loc»
c1ose( <door» Side: del (robot-at <from-loc»
Prim: add (closed <door»
Side: del (open <door» go-thru-door
«from-room> <to-room> <door»
climb-up(<table»
Prim: add (robot-in <to-room»
Prim: add (robot-on <table»
Side: del (robot-in <from-room»
Side: del (robot-on-floor)
climb-down( <table» carry-within-room
Side: del (robot-on <table» «box>, <from-Ioc>, <to-Ioc>, <room»
add (robot-on-floor) Prim: add (at <box> <to-loc»
carry-up( <box>, <table» Side: del (at <box> <fn?m-loc»
Prim: add (on <box> <table» del (robot-at <from-loc»
Side: del (on-floor <box» add (robot-at <to-loc»
del (robot-on-floor)
add (robot-on <table» carry-thru-door
«box>, <from-room>, <to-room>, <door»
carry-down( <box>, <table»
Prim: add (in <box> <to-room»
Side: del (on <box> <table»
Side: del (in <box> <from-room»
add (on-floor <box»
del (robot-on <table» del (robot-in <from-room»
add (robot-on-floor) add (robot-in <to-room»
(a) Problem-specific primary effects.

(within <locatiOn> <room» static level


(on <box> <table» level 4
(in <box> <room» (at <box> <locatiOn» level 3
(robot-on <table» level 2
(robot-in <room» (open <door»
(robot-at <location» (closed <door» level I

~ ____ (on-floor
~ ______ <box»~~ __ (robot-on-floor)
~ ______ ~ ____L level 0
(b) Problem-specific abstraction.

Fig. 5.15. Results of using Relator in the Extended Robot Domain. The improved
abstraction is suitable for all problems that have no deletion goals.

Table 5.2. Performance in the Extended Robot Domain on problems from Ta-
ble 3.6. We give search times for problem-independent abstraction ("indep") and
for problem-specific abstraction ("spec").
# No Cost Bound Loose Bound Tight Bound
indep spec indep spec indep spec
6 2.66 0.91 2.61 0.90 0.27 0.24
7 1.26 3.08 1.31 3.11 1.56 1.68
9 0.26 0.26 0.26 0.26 >1800.00 571.14
178 5. Summary and extensions

Machining Domain. The problem-independent selection of primary effects


in the Machining Domain includes deletion effects (Figure 5.16a), which cause
the collapse of a hierarchy. Relator constructs a selection with fewer effects
(Figure 5.16b), which leads to a three-level hierarchy (Figure 5.16c).
In Figure 5.17, we summarize the results of search without cost bounds.
We give the times and solution costs for search with problem-specific abstrac-
tion and primary effects (dashed lines), with problem-independent primary
effects (solid lines), and without primary effects (dotted lines). In Figure 5.18,
we give analogous results for search with loose cost bounds.
The problem-specific abstraction reduces the search for all problems. The
reduction factor varies from 1.5 to 2.3, and its mean value is 1.74. Further-
more, this abstraction yields optimal solutions to all problems. The factor of
solution-cost reduction ranges from 1.6 to 1.8, with a mean of 1.69.
Extended STRIPS Domain. In Section 3.7.2, we have described the appli-
cation of Chooser and Completer to the STRIPS Domain. The resulting pri-
mary effects reduce the search, but they do not allow an ordered abstraction.
Relator enables the system to select fewer primary effects and construct the
abstraction in Figure 5.19, which works for problems with no deletion goals.
In Figure 5.20, we give the results of search with the problem-specific ab-
straction (dashed lines) and without abstraction (solid lines). The graphs
show the percentage of problems solved by different time limits. In Fig-
ure 5.21, we present an alternative summary of the same results.
When we use abstraction without cost bounds or with tight bounds, it
slightly increases the number of solved problems; however, it has the opposite
effect in the experiments with loose bounds. The ratio of the search times with
and without abstraction ranges from less than 0.001 to greater than 1000.
Summary. Relator helps to identify relevant primary effects and gener-
ate finer-grained hierarchies, but the efficiency improvements are modest.
Problem-specific abstraction reduces the search in the Machining Domain,
but it gives mixed results in the robot and STRIPS worlds (Table 5.3). Thus,
an increase in the number of abstraction levels does not always improve the
efficiency. This observation is different from the results of Knoblock [1993],
who reported that finer-grained hierarchies lead to better performance, and
that identification of relevant literals is often essential for efficient abstraction.
Table 5.3. Summary of experiments with Relator. Its application improves the
efficiency in the Machining Domain, gives mixed results in the Robot and STRIPS
worlds, and does not affect the search in the Sokoban and Logistics Domains.
Domain Overall Search-Time
Result Reduction
Extended Robot 0.4-3.1
Machining
Sokoban
~
-
1.5-2.3
-
Extended STRIPS ~ 0.001-1000
Logistics - -
5.2 Identifying the relevant lit erals 179

drill-roughly( <parI» polish-roughly( <parI»


cul-roughly( <parI» Prim: add (polished <part»
Prim: add (cut <part» Prim: add (drilled <part»
del (finely-drilled <part» del (finely-polished <part»
del (finely-cut <part» del (painted <part»
del (drilled <part» del (polished <part»
Side: del (finely-polished <part> Side: del (finely-painted <part»
Side: del (finely-drilled <part»
del (polished <part» del (painted <part» paint-roughly( <parI»
del (finely-polished <part> del (finely-painted <part» Prim: add (painted <part»
del (painted <part» del (finely-painted <part»
del (finely-painted <part» Side: del (polished <part»
del (finely-polished <part»
(a) Problem-independent primary effects.

drill-roughly( <part» polish-roughly( <parI»


cul-roughly( <parI» Prim: add (polished <part»
Prim: add (cut <part» Prim: add (drilled <part»
Side: del (finely-drilled <part» Side: del (finely-polished <part»
Side: del (finely-cut <part» del (painted <part»
del (drilled <part» del (polished <part»
del (finely-polished <part> del (finely-painted <part»
del (finely-drilled <part»
del (polished <part» del (painted <part» paint-roughly( <parI»
del (finely-polished <part> del (finely-painted <part» Prim: add (painted <part»
del (painted <part» Side: del (finely-painted <part»
del (finely-painted <part» del (polished <part»
del (finely-polished <part»
(b) Problem-specific primary effects.

(cut <part» (finely-cut <part» level2


(finely-drilled <part» level 1
(finely-polished <part» levelO
(finely-painted <part»
(e) Problem-specific abstraetion.

Fig. 5.16. Effects of the low-quality operations in the Machining Domain.

(a) Efficiency of problem solving. (b) Quality of the resulting solutions.

O'
Q)
60
;0.6
o...
~
~ 0.4
:p

o~--~--~--~--~--~~ o~--~--~--~--~--~~
8 10 12 14 16 18 8 10 12 14 16 18
length of an optimal solution length of an optimal solution

Fig. 5.17. Performance in the Machining Domain without cost bounds. We give the
results of problem-specific description improvements (dashed lines) and problem-
independent improvements (solid lines) , as well as the results of search with the ini-
tial description (dotted lines). The vertical bars show the 95% confidence intervals.
180 5. Summary and extensions

(a) Efficiency of problem solving. (b) Quality of the resulting solutions.


1000 ,.,-----~~---~--___,

···r·····r···· I ·····r···· [. 60 I
.·1

_ _:1:- - - - ---
0.1 o~--~--~--~~~~--~
8 10 12 14 16 18 8 10 12 14 16 18
length of an optimal solution length of an optimal solution

Fig. 5.1S. Performance in the Machining Domain with loose cost bounds. The
legend is the same as in Figure 5.17, but the running-time scale is logarithmic.

(in <stable> <room» (fits <key> <door» static


level
(connects <door> <room»
(next-to <large> <thing» (next-to <thing> <large» level 6
(in <large> <room» (next-to <large> <door»
(robot-at <large» (next-to <door> <large» level 5

(next-to <small> <other-small» level 4


(next-to <small> <stable» (next-to <stable> <small» level 3
(next-to <small> <door» (next-to <door> <small» level 2
(robot-at <stable» level!
(robot-in <room» (robot-at <door» (holding <small» (arm-empty) level 0
(in <small> <room» (robot-at <small» (status <door> <status»

Fig. 5.19. Problem-specific abstraction in the Extended STRIPS Domain.


5.2 Identifying the relevant literals 181

(a) No cost bound. (b) Loose bound.


100% , - - - - - - - - - - - - - - - , 100%

80% 80%
$
e! 60% 60%
'"~ /

g 40% 40% I
::I

'" 20% 20%

1 10 100 1 10 100
time limit (CPU sec) time limit (CPU sec)
(c) Tight bound.
50% , - - - - - - - - - - - - ,

40%

~30%
gJ
820%
::I

'"
0% '='---~-----~----'
0.1 1 10 100
time limit (CPU sec)
Fig. 5.20. Results in the STRIPS Domain with problem-specific abstraction and
primary effects (dashed lines), and with problem-independent primary effects (solid
lines). We show the percentage of problems solved by different time limits.

(a) No cost bound. (b) Loose bound.

i 1l
'" 100
~ 100 ~
2- 2-
o"
13
10 "o 10
e! ~
1l 1l
'o"
~
'"
~
0.1 " - - - - - - - - - ' 0.1 " - - - - - - - - - '
0.1 1 10 100 0.1 1 10 100
with abstraction (CPU sec) with abstraction (CPU sec)

(c) Tight bound.

"'"
Q)

~ 100
2-
"
o 10
~
1l
'o"
~
0.1 " - - - - - - - - - - - '
0.1 1 10 100
with abstraction (CPU sec)
Fig. 5.21. Comparison of the search times without abstraction (vertical axes) and
with problem-specific abstraction (horizontal axes).
182 5. Summary and extensions

5.3 Summary of work on description changers

We now summarize the work on speed-up techniques and outline some direc-
tions for future research. First, we list the developed algorithms and describe
interactions among them (Section 5.3.1). Then, we discuss unexplored tech-
niques and related research problems (Sections 5.3.2 and 5.3.3).

5.3.1 Library of description changers

We have developed seven algorithms for reducing PRODIGY search. In Fig-


ure 5.22, we give a summary of these algorithms, divided into two groups.
The first group contains the main description changers, which improve the
performance of PRODIGY, whereas the second group includes the auxiliary
description changers, which enhance the main changers. In Table 5.4, we
summarize the performance of the developed techniques in five domains.
A description changer may interact with other changers in two ways.
First, it can use other changers as subroutines. In Figure 5.23(a), we show
subroutine calls among the changers; for example, Margie invokes two other
changers, Matcher and Abstractor.
The second type of interaction is the sequential application of multiple
changers; for example, if we apply Abstractor after Chooser and Completer,
it utilizes the selected primary effects in generating an abstraction. In Fig-
ure 5.23(b), we show the appropriate order of applying the developed al-
gorithms. When generating a description for a specific problem, we apply
Relator before all other changers. We may apply either Chooser or Margie to
select primary effects, and then Completer to improve the selection. Finally,
we apply Abstractor after selecting primary effects. Note that we do not have
to use all description changers, and we may skip any steps in Figure 5.23(b).

Table 5.4. Summary of experiments with the speed-up techniques in five domains.
The table includes the "Overall Result" columns from Tables 3.8, 4.2, 5.1, and 5.3.
Primary Abstraction Abstraction Identification of
Domain Effects (Chapter 4) of Effects Relevant Literals
(Chapter 3) wlo with (Section 5.1) (Section 5.2)
prims prims
Robot - -
~
11" 11"
Machining 11" - - 11"
Sokoban 11" - 11" - -
STRIPS 11" - - 11" :0:
Logistics - :0: x - -
Notation:
11" positive results: it improves the performance on almost all problems
:0: mixed results: it reduces the search in some cases, but increases in others
no effect on the performance: it does not affect the system's behavior
x no experiments: operators in the Logistics Domain have no unimportant
effects, and we cannot test abstraction with primary effects
5.3 Summary of work on description changers 183

Primary effects and abstraction


Chooser: Heuristic selection of primary effects (Section 3.4.1)
Chooser selects primary effects of operators and inference rules based on several
simple heuristics. The selection algorithm is very fast, but it does not guarantee
completeness of search with the chosen primary effects.
Completer: Learning additional primary effects (Section 3.5)
Completer chooses more primary effects to ensure completeness and limited cost
increase. On the negative side, it takes significant time and needs a hand-coded
generator of initial states.
Abstractor: Building an abstraction hierarchy (Section 4.2)
Abstractor is an advanced version of the ALPINE abstraction generator, extended to
the full domain language of PRODIGy4. It imposes constraints on the relative im-
portance of predicates in operators and inference rules, and uses these constraints
to abstract some preconditions and side effects.
Margie: Abstracting effects of operators and inference rules (Section 5.1)
Margie combines the selection of primary effects with the construction of an ab-
straction hierarchy, thus abstracting unimportant effects.

Auxiliary description changers


Matcher: Instantiating operators and inference rules (Section 3.4.2)
Matcher generates all possible instantiations of operators and inference rules. We
use it to improve the effectiveness of Chooser and Margie.
Refiner: Partial instantiation of predicates (Section 4.3)
Refiner generates a
partial instantiation of predicates for abstraction graphs. It
finds a minimal instantiation that does not cause the collapse of abstraction.
Relator: Identifying relevant literals (Section 5.2)
Relator inputs a collection of goals and determines which literals are relevant to
achieving these goals. We use it to enhance Chooser, Abstractor, and Margie.

Fig. 5.22. Description changers in the developed system.

(a) Subroutine calls among changers.

[ihoosiJ .
de!~:rp~on-1 Relator ~ r:: or ~~completerHAbstractorI-J~~;~~~n-1PRODIGYI
' - - - - -.... ~arg~
(c) Order of applying changers.

Fig. 5.23. Interactions among description changers. We boldface the main changers
and use the standard font for the auxiliary changers.
184 5. Summary and extensions

If the user provides restrictions on the allowed problems, the system uti-
lizes them in choosing primary effects and abstraction. In Figure 5.24, we
summarize the use of problem-specific information; the auxiliary description
changers use it to pre-process the domain, and the main changers utilize the
results of the pre-processing. For example, Matcher uses information about
static features of the initial state and about possible values of variables. Rela-
tor inputs restrictions on goals to identify the relevant literals. Chooser uses
the pre-processing results of Matcher and Relator, thus utilizing information
about static literals, possible values of variables, and allowed goals.

5.3.2 Unexplored description changes

We outline several speed-up techniques that are not part of the implemented
system. We first suggest improvements to abstraction and then discuss three
other description changes, listed in Figure 5.25.
Semantic abstraction. The developed abstraction generators exploit syn-
tactic properties of a domain; minor syntactic changes may affect the quality
of the resulting hierarchy. A related open problem is to develop abstraction
techniques that use semantic analysis of a domain.
As a first step, we have designed an algorithm that eliminates backtracking
across abstraction levels, by revising the hierarchy in the process of search.
It detects backtracking episodes and merges the corresponding levels. For
example, consider the Drilling Domain in Figure 4.1, the hierarchy in Fig-
ure 4.2, and a problem of drilling and painting a mechanical part without the
use of a spot drill (Figure 5.26a). PRODIGY constructs the abstract solution
in Figure 4.4, and then tries to refine it at levell, which causes a failure and
backtracking to level 2. The algorithm detects this backtracking and merges
levels 1 and 2, thus producing the hierarchy in Figure 5.26(b).
Bacchus and Yang [1994] used a different method to prevent backtracking
across levels. Their HIGHPOINT system used rigid syntactic constraints on the
relative importance of predicates, which often resulted in the collapse of a
hierarchy. On the positive side, HIGHPOINT was very fast, and the resulting
hierarchy did not require modification in the process of search.
Unnecessary operators. We have observed in Section 1.2.3 that the use of
unnecessary operators may worsen the performance. For instance, if we add
two-disk moves to the Tower of Hanoi, PRODIGY needs more time to solve it.
Vela so has shown that unnecessary operations may increase the search by
more than two orders of magnitude. She has constructed the Three-Rocket
Domain, which includes a planet, three moons, three rockets, and several
packages (Figure 5.27a). Initially, all rockets and packages are on the planet.
A rocket can carry any number of packages to any moon, but it cannot return
from the moon. The task is to deliver certain packages to certain moons; we
show two problems and their solutions in Figures 5.27(b) and 5.27(c).
5.3 Summary of work on description changers 185

• All possible values of variables in the domain description.


• Static literals that hold in the initial states of all problems.
• Restrictions on the literals in goal statements.

(a) Optional problem-specific restrictions.

static features possible values possible values


of initial states of all variables of some variables
(b) Access to restrictions.

possible static goal


values features literals
Main changers
Chooser + + +
Completer + +
Abstractor + +
Margie + + +
Auxiliary changers
Matcher + +
Refiner +
Relator + +
(c) Utilization of restrictions.

Fig. 5.24. Use of problem-specific information.

Removing operators: Identifying unnecessary operators and deleting them from


the domain description.
Generating macro operators: Replacing some operators in the domain descrip-
tion with macro operators.
Generating new predicates: Replacing some predicates with new ones, con-
structed from conjunctions and disjunctions of old predicates.

Fig. 5.25. Some description changes that have not been used in PRODIGY.
186 5. Summary and extensions

Set of Objects
part-I: type Part
iliill-I: type Drill-Bit (twist-drill <drill-bit»
Initial State (spot-drill <drill-bit» static level
(twist-drill drill-I)
(no-drill) (no-part) (has-hole <part» merged
r-'--,(h_as_-s....:p_o_t_<.:...p_ar_t>-,)_(.:...p_ai_n_te_d_<....:p,-ar_t_>-'..)"'-ilevels I and 2
Goal Statement
(has-hole part-I) (holds-part <part» (no-part)
level 0
(painted part -1) (holds-drill <drill-bit» (no-drill)
(a) Drilling problem that (b) Modified hierarchy.
causes backtracking.
Fig. 5.26. Modifying an abstraction hierarchy to prevent backtracking across levels.

The domain description in Figure 5.27(a) causes an extensive search be-


cause PRODIGY tries to use the same rocket for multiple flights [Stone and
Veloso, 1994). To improve the efficiency, we construct all instantiations of
the fly operator and then remove unnecessary instantiations, thus replacing
the general fly operation with the three operations in Figure 5.27(d). These
new operators explicitly encode the knowledge that each rocket can fly only
once, and they enable PRODIGY to construct transportation plans with little
search. In Table 5.5, we show the resulting search reduction for six problems.
Macro operators. The learning of macro operators is one of the oldest
search-reduction techniques, which dates back to the GPS problem solver
[Newell et aZ., 1960) and STRIPS system [Fikes et aZ., 1972). Although sim-
ple use of macro operators often gives disappointing results [Minton, 1985;
Etzioni, 1992)' their synergy with other techniques may improve the perfor-
mance. For instance, Korf [1985a; 1985b) integrated macro operators with an
implicit use of abstraction and primary effects, and Yamada and Tsuji [1989)
combined them with heuristics for selecting appropriate operators.
We can use macro operators for improving the abstraction. For instance,
consider the Tower-of-Hanoi Domain in Figure 5.28(b), which includes a robot
hand for moving disks. Abstractor fails to generate a hierarchy because the
predicate (in-hand <disk» requires additional constraints and causes the col-
lapse of a hierarchy. If we replace the operators with the macro operators
shown in Figure 5.29(b), then we eliminate the "in-hand" predicate, and Ab-
stractor generates the three-level hierarchy in Figure 5.29(c); we give the
results of using this hierarchy in the last row of Table 5.6.
New predicates. When the user encodes a new domain, she has to choose
predicates for representing states of the simulated world, and a wrong choice
may negatively affect the performance. For example, consider the encoding
of the Tower of Hanoi in Figure 5.28(c), with trays in place of pegs. This
encoding makes most problems difficult for PRODIGY; we show the resulting
search times in the next-to-Iast row of Table 5.6. We can improve the en-
coding by defining new predicates through conjunctions and disjunctions of
old predicates. For example, we can define the "on" predicate as shown in
Figure 5.30, which allows conversion into the standard encoding.
5.3 Summary of work on description changers 187

(
moon-I moon-2
)
moon-3

A A A
rocket-I rocket-2 rocket-3
~
PlP3
packages

load(<pack>,<rocket>,<place» unload( <pack>,<rocket>,<place>


<pack>: type Package <pack>: type Package tly(<rocket>,<moon»
<rocket>: type Rocket <rocket>: type Rocket <rocket>: type Rocket
<place>: type Place <place>: type Place <moon>: type Moon
Pre: (at <pack> <place» Pre: (in <pack> <rocket» Pre: (at <rocket> planet)
(at <rocket> <place» (at <rocket> <place» Eff: del (at <rocket> planet)
Eff: del (at <pack> <place» Eff: del (in <pack> <rocket» add (at <rocket> <moon»
add (in <pack> <rocket» add (at <pack> <place»
(a) Three-Rocket Domain.
- Initial State - ,------ Solution - r - Initial State - ,------ Solution -
load
(at rocket-I planet) load (at rocket-l planet) (pack-I, rocket-I, planet)
(at rocket-2 planet) (pack-I, rocket-I, planet) (at rocket-2 planet)
load tl(rocket-I, moon-I)
(at rocket-3 planet) (pack-2, rocket-I, planet) (at rocket-3 planet) unload
(at pack-I planet) (at pack-I planet) (pack-I, rocket-I, moon-I)
load load
(at pack-2 planet) (pack-3, rocket-I, planet) (at pack-2 planet) (pack-2, rocket-2, planet)
(at pack-3 planet) (at pack-3 planet)
tl(rocket-I, moon-I) tl(rocket-2, moon-2)
unload unload
(pack-I, rocket-I, moon-I) (pack-2, rocket-2, moon-2)
Goal Statement Goal Statement load
unload (pack-3, rocket-3, planet)
(at pack-I moon-I) (pack-2, rocket-I, moon-I) (at pack-I moon-I)
(at pack-2 moon-I) unload (at pack-2 moon-2) tl(rocket-3, moon-3)
(pack-3, rocket-I, moon-I)
(at pack-3 moon-I) (at pack-3 moon-3) unload
(pack-3, rocket-3, moon-3)
(b) Delivery ofthree packages to the same moon. (c) Delivery of three packages to different moons.

tly(rocket-l, moon-I) fly(rocket-2, moon-2) tly(rocket-3, moon-3)


Pre: (at rocket-I planet) Pre: (at rocket-2 planet) Pre: (at rocket-3 planet)
Eff: del (at rocket-I planet) Eff: del (at rocket-2 planet) Eff: del (at rocket-3 planet)
add (at rocket-I moon-I) add (at rocket-2 moon-2) add (at rocket-3 moon-3)
(d) Search-saving encoding of the tly operation.

Fig. 5.27. Removing unnecessary operators from the Three-Rocket Domain. Initial
encoding causes an extensive search, whereas the new encoding of the fly operation
enables PRODIGY to solve delivery problems with little search.

Table 5.5. Search times for six problems in the Three-Rocket Domain. The deletion
of unnecessary fly operations reduces the search by two orders of magnitude.
delivery to the same moon to different moons mean
1 pack 2 packs 3 packs 4 packs 2 packs 3 packs time
with extra fly's 0.1 107.9 >1800.0 >1800.0 12.2 >1800.0 >920.0
w/o extra fly's 0.1 0.8 4.6 52.5 0.2 0.4 9.8
188 5. Summary and extensions

:6 1
peg-! peg-2 peg-3
move-Iarge(<from>, <to»
<from>, <to>: type Peg
move-medium«from>, <to> Pre: (on large <from»
<from>, <to>: type Peg not (on small <from»
move-small( <froID>, <to» Pre: (on medium <from» not (on medium <from»
<from>, <to>: type Peg not (on small <from» not (on small <to»
Pre: (on small <from» not (on small <to» not (on medium <to»
Elf: del (on small <from» Eft: del (on medium <from» Eff: del (on large <from»
add (on small <to» add (on medium <to» add on lar e <to>
(a) Standard description.

peg-! peg-2 peg-3

pick-small(<from» pick-medium( <froID» pick-Iarge( <frOID»


<from>: type Peg <from>: type Peg
<from>: type Peg Pre: (on large <from»
Pre: (on small <from» Pre: (on medium <from»
not (on small <from» not (on small <from»
not (in-hand medium) not (on medium <from»
not (in-hand large) not (in-hand small) not (in-hand small)
Eff: del (on small <from» not (in-hand large) not (in-hand medium)
add (in-hand small) Eft: del (on medium <from» Elf: del (on large <from»
add (in-hand medium) add (in-hand large)

put-small( <to» put-medium(<to» put-Iarge( <to»


<to>: type Peg <to>: type Peg
<to>: type Peg Pre: (in-hand large)
Pre: (in-hand small) Pre: (in-hand medium)
not (on small <to» not (on small <to»
Elf: del (in-hand small) not (on medium <to»
add (on small <to» Elf: del (in-hand medium) Eft: del (in-hand large)
add (on medium <to» add (on large <to»
(b) Use of a robot hand.

move-small( <from>, <to» move-medium«froID>,<to» move-large«froID>, <to»


<from>, <to>: <from>, <to>: <from>, <to>: Tray
type (or Medium Large Tray) type (or Large Tray) Pre: (upon large <from»
Pre: (upon small <from» Pre: (upon medium <from» (clear large)
(clear <to» (clear medium) (clear <to»
Eft: del (upon small <from» (clear <to» Eft: del (upon large <from»
add (clear <from» Eft: del (upon medium <from» add (clear <from»
del (clear <to» add (clear <from» del (clear <to»
add (upon small <to» del (clear <to» add (upon large <to»
add (upon medium <to»

(c) Trays in place of pegs.


Fig. 5.28. Alternative descriptions of the Tower of Hanoi.
5.3 Summary of work on description changers 189

0- 6 0-
~~~I~I~__~~~__~I~I~~ ~~~I~~~I~~
i pick-small(drom» H put-small«to» ~ move-small
(drom>, <to»

i pick-medium(drom»H put-medium«to» ~ move-medium


«from>, <to»

i pick-large«from» H put-large«to» ~ move-large


(drom>, <to»
(a) Sequences of operators. (b) Corresponding
macro operators.
(on large <peg»

(on small <peg»


(c) Abstraction based on macro operators.

Fig. 5.29. Replacing operators with macro operators in the Tower-of-Hanoi Do-
main in Figure 5.28(b). This replacement leads to generating a three-level hierarchy.

(on large <tray» = (on small <tray» =


(upon large <tray» (or (upon small <tray»
(and (upon medium <tray»
(upon small medium))
(and (upon large <tray»
(on medium <tray» =
(upon small large))
(or (upon medium <tray»
(and (upon large <tray»
(and (upon large <tray»
(upon medium large)
(upon medium large)))
(upon small medium)))

Fig. 5.30. Defining a new predicate for the Tower-of-Hanoi Domain in Fig-
ure 5.28(c). This new predicate allows conversion to the standard encoding.

Table 5.6. PRODIGY performance on six problems in the Tower-of-Hanoi Domain.


We give running times in seconds for the standard encoding (Figure 5.28a), domain
with a robot hand (Figure 5.28b), domain with trays in place of pegs (Figure 5.28c),
and standard encoding with abstraction.
number of a problem mean
1 2 3 4 5 6 time
standard encoding 2.0 34.1 275.4 346.3 522.4 597.4 296.3
robot hand 0.6 0.6 3.9 11.7 1.5 2.8 3.5
trays in place of pegs 35.1 4.2 >1800.0 >1800.0 479.0 >1800.0 >986.4
standard abstraction 0.5 0.4 1.9 0.3 0.5 2.3 1.0
190 5. Summary and extensions

5.3.3 Toward a theory of description changes

Researchers have designed many systems that improve domain descriptions


by static analysis and learning (see Section 1.3.2); however, they have done
little investigation of the common principles underlying these systems. An
important research direction is to develop general methods for the design
and evaluation of description changers.
We have described a scheme for specifying key properties of description
changers (see Section 1.4.2). Although these specifications are semi-informal,
they help to abstract the major design decisions from implementation. When
constructing a new changer, we first determine its desirable properties and
then implement a learning or static-analysis algorithm with these properties.
A challenging open problem is to develop a system that automatically
builds new description changers according to a user's specification. This prob-
lem is related to Minton's [1996] work on the automated construction of
constraint-satisfaction programs.
We can estimate the utility of a description changer by analyzing its
effect on the search space. Researchers have applied this approach to evaluate
the effectiveness of macro operators [Korf, 1987; Etzioni, 1992]' abstraction
[Knoblock, 1991; Bacchus and Yang, 1992]' control rules [Cohen, 1992]' and
alternative search strategies [Minton et al., 1991; Knoblock and Yang, 1994].
We have used a similar approach to analyze the efficiency of search with
primary effects. The analysis has revealed the factors that affect the trade-off
between search time and solution quality, and experiments have confirmed
the analytical predictions.
Although researchers have analyzed several types of description changes,
they have not developed a general framework for the evaluation of speed-up
techniques. A related problem is to study standard methods for estimating
the efficiency improvements, and to combine them with empirical evaluation.
Part III

Top-level control
6. Multiple representations

Every problem-solving effort must begin with creating a representation for the
problem-a problem space in which the search for the solution can take place.
- Herbert A. Simon [1996], The Sciences of the Artificial.

We have described the PRODIGY search and algorithms for changing domain
descriptions. We now outline top-level tools for the use of these algorithms
(Section 6.1). Then, we propose a general model of using multiple represen-
tations (Section 6.2) and evaluating their utility (Section 6.3). Finally, we
discuss the assumptions that underlie the developed model (Section 6.4).

6.1 Solvers and changers


We explain basic operations on domain descriptions (Sections 6.1.1), use of
solvers and changers (Section 6.1.2 and 6.1.3), and construction of new rep-
resentations (Section 6.1.4).

6.1.1 Domain descriptions


We have defined a problem description as an input to a solver. A domain de-
scription is the part common for all problems in a domain. The main elements
of a PRODIGY domain description include object types, operators, inference
rules, control rules, primary effects, and abstraction. We have considered the
automatic selection of primary effects, abstraction, and control rules. We do
not allow multiple versions of other elements in the current implementation.
Since descriptions differ only in primary effects, abstraction, and control
rules, we store only these three elements for each description. A description
triple is a data structure that includes a selection of primary effects, an
abstraction graph, and a set of control rules; some of these three slots may be
empty. If the primary-effect slot is empty, all effects are considered primary.
If the abstraction slot is empty, PRODIGY runs without abstraction. If the
control-rule slot is empty, PRODIGY does not use control rules.
We often construct new descriptions for a specific class of problems, and
we use applicability conditions to encode the limitations on the use of a de-
scription. A condition consists of three parts: a set of allowed goals, a set of

E. Fink, Changes of Problem Representation


© Springer-Verlag Berlin Heidelberg 2002
194 6. Multiple representations

static literals, and a collection of Lisp functions that input a PRODIGY prob-
lem and return true or false. A problem satisfies the condition if the following
three restrictions hold:
1. All goals of the problem belong to the set of allowed goals.
2. The initial state contains all static literals of the condition.
3. All functions from the third part of the condition return true.
We have implemented a procedure for constructing a conjunction of ap-
plicability conditions. The goal set in the conjunction is the intersection of
the original goal sets, the static-literal set is the union of the original sets,
and the function list is the union of the original lists.
The system generates new descriptions by applying changers to old de-
scriptions. For every new description, SHAPER stores its construction history,
which includes the changer that has generated the new description and the
corresponding old description. We use this information in selecting solvers
and deciding which changers can make further improvements.

6.1.2 Problem solvers


A problem solver is an algorithm that inputs a problem, domain description,
and time bound, and then searches for a solution. A solver terminates when
it finds a solution, exhausts its search space, or hits the time bound. We have
constructed multiple solvers by setting different values of the knob variables
that control the PRODIGY search.
We can perform a series of problem-specific description changes before
applying a solver. For example, we may select primary effects for a specific
problem and generate the corresponding abstraction. We thus define a fixed
sequence of algorithms, which includes several changers and a solver, and
apply it to different problems; we call it a solver sequence.
We often use solvers that work only for certain classes of domain descrip-
tions. For instance, if a solver is based on abstraction, it requires descriptions
with an abstraction hierarchy. For every solver, we specify a condition that
limits appropriate descriptions, which consists of five parts:
1. Elements of a description triple that must not be empty.
2. Elements of a triple that that must be empty.
3. Changers used to construct a description.
4. Changers not used to construct a description.
5. A Lisp function that inputs a description and returns true or false.
We use a description only if it satisfies all five parts of the condition.
For example, we can specify that the description must have primary effects
and no abstraction, and that we must have used Abstractor and Chooser in
generating it. We may further limit the use of a solver by restricting the
input problems. This restriction is similar to the applicability condition of a
domain description.
6.2 Description and representation spaces 195

6.1.3 Description changers


A description changer is an algorithm that inputs a domain description and
time bound, and tries to construct a new description. A changer terminates
when it generates a new description, determines that it cannot improve the
original description, or runs out of time.
We have presented several algorithms for improving domain descriptions
in Part II. We have constructed multiple changers by setting specific knob
values in these algorithms. We can also arrange several changers in a fixed
sequence, called a changer sequence. When we apply a changer sequence, the
system consecutively executes its algorithms and returns the final description.
We usually need to restrict the applicability of a changer; for example,
we should apply Abstractor only if the current description has no abstraction
hierarchy. We encode these restrictions in the same way as a solver's condition
that limits the appropriate descriptions.

6.1.4 Representations
Informally, a representation is a specific approach to a problem, which deter-
mines the actions during the search for a solution. Researchers used this in-
tuition to formulate several definitions of a representation (see Section 1.1.2).
For example, Larkin and Simon [1987] defined it as data structures and pro-
grams operating on them, and Korf [1980] as a space of world states.
A representation in SHAPER is a domain description with a solver that
uses this description. If the solver does not make random choices, the repre-
sentation uniquely determines the search space for every problem. The system
constructs representations by pairing the available solvers with descriptions.
We can limit the use of a representation by an applicability condition, en-
coded in the same way as conditions for descriptions (see Section 6.1.1). When
pairing a solver with a description, the system constructs the conjunction of
their conditions and uses it as a condition for the resulting representation.

6.2 Description and representation spaces


Although SHAPER is based on the PRODIGY architecture, the high-level con-
trol of solvers and changers is not specific to PRODIGY. We can abstract from
the underlying architecture and view descriptions, solvers, and changers as
basic objects. We define objects in this abstract model (Section 6.2.1), for-
malize the notion of a description space (Section 6.2.2), and show how it gives
rise to a representation space (Section 6.2.3).

6.2.1 Descriptions, solvers, and changers


We consider four classes of objects: problems, domain descriptions, problem
solvers, and description changers.
196 6. Multiple representations

Domain descriptions. A description is a specific domain encoding used for


solving problems. We limit its use by a condition on the relevant problems.
Formally, this condition is a boolean function, p-cond(prob), defined on the
problems in the domain. When its value is false, we do not use the description
to solve the problem.
Problem solvers. A solver inputs a domain description and a problem,
and runs until it either solves the problem or exhausts the search space. The
former outcome is a success, and the latter is a failure. If the search space is
infinite, the solver may run forever.

Solver input: description, problem.


Possible outcomes: solution, failure, infinite execution.

Solver operators. We restrict the use of a solver by two conditions. The first
condition limits descriptions used with the solver; formally, it is a boolean
function, d-cond( desc), defined on the descriptions. The second condition
limits problems for each description; it is a function of a description and
problem, dp-cond( desc, prob). We can apply a solver to a problem prob with
a description desc only when both functions give true. A solver operator is a
solver with the two conditions, denoted by a triple (solver, d-cond, dp-cond).
Description changers. A problem-independent changer inputs a domain
description and converts it into a new description, suitable for multiple prob-
lems. The changer may successfully generate a new description, terminate
with failure, or run forever. We can restrict problems solvable with a new
description, and then the changer uses this restriction in generating a de-
scription; for example, Abstractor can use a limitation on allowed goals.
Changer input: description, restriction on the problems.
Possible outcomes: new description, failure, infinite execution.
A problem-specific changer generates a description for a given problem.
Changer input: description, problem.
Possible outcomes: new description, failure, infinite execution.
Changer operators. We restrict the use of a problem-independent changer
by two conditions. The first condition, d-cond( desc) , limits input descriptions.
The second condition, dp-cond( desc, prob), limits the problems that we solve
with a new description. A changer operator is a problem-independent changer
with the two conditions, denoted by a triple (changer, d-cond, dp-cond). If we
apply a changer operator to some description desc, and the applicability of
desc is limited by a condition p-cond, then the changer's input is desc with the
problem restrictions formed by the conjunction of dp-cond and p-cond, and
we use the resulting description only for problems that satisfy the conjunction
of the two conditions.
6.2 Description and representation spaces 197

Solver and changer sequences. We can use a solver operator that contains
a sequence of problem-specific changers and a solver. When applying this
operator, we first execute the changers and then use the solver with the final
description. A sequence of problem-specific changers with a solver at the end
is called a solver sequence. We view solver sequences as a type of solver and
do not distinguish them from "simple" solvers. A changer sequence consists
of problem-independent changers, and its use involves executing the changers
in order. We do not distinguish between such sequences and simple changers.

6.2.2 Description space

The system applies changer operators to initial domain descriptions and to


newly generated descriptions. The description space of a domain is the set of
all descriptions that the system can potentially generate using the available
changer operators. The descriptions that have already been generated form
the expanded part of the space, stored as a collection of nodes; each node
includes a description and its applicability condition.
The application of different changers may result in generating the same
description. When adding a node, we check whether its description is different
from the other nodes. If it is identical to some old description, we merge the
new node with the old one. The result is a node with the same description,
and its applicability condition is the disjunction of the two original conditions.
We allow the use of hand-coded rejection rules for pruning description
nodes. Formally, a rejection rule is a boolean function that inputs a node and
returns true for inappropriate nodes. After generating a new node, SHAPER
applies the available rejection rules. If the node matches some rule, the system
does not add it to the expanded space.
We also use rules that compare description nodes; formally, a comparison
rule is a boolean function, better-node(nodel, node2)' If it returns true, then
the description of nodel is always better than that of node2, and we can
prune node2. After generating a new node, the system compares it with the
old nodes. If some old nodes are better, SHAPER does not add the new node
to the expanded space; if some old nodes are worse, SHAPER removes them.
In Figure 6.1, we give a procedure for applying a changer operator to a
description node. It checks whether the initial description satisfies the opera-
tor's condition (step 1), and then generates a new description (steps 2-4). If
some old node already has this description, the procedure modifies the node's
condition (step 5); otherwise, it creates a new node (step 6) and applies re-
jection and comparison rules (steps 7-9).

6.2.3 Representation space

A representation is a triple (desc, solver, p-cond); the first element is a de-


scription, the second is a solver, and the third is an applicability condition.
198 6. Multiple representations

Make-Description( changer, d-cond, dp-condj init-desc, p-cond)


The input includes a changer operator (changer, d-eond, dp-cond), and
a node with description init-desc and applicability condition p-cond.

1. If init-desc does not satisfy d-cond, then terminate.


2. Define condition new-p-cond such that, for every prob,
new-p-cond(prob) = p-cond(prob) II dp-eond( init-dese, prob).
3. Apply changer to init-desc with new-p-cond.
4. If it fails to produce a new description, then terminatej
else, it returns some description new-dese.
5. If some old node has an identical description, and its condition is old-p-eond,
then replace its condition with old-p-cond V new-p-cond and terminate.
6. Make a node new-node with description new-desc and condition new-p-cond.
7. If new-node matches some rejection rule, then mark it "rejected."
8. For every old node, old-node:
For every comparison rule, better-node:
If better-node( old-node, new-node), then mark new-node "rejectedj"
else, if better-node(new-node, old-node), then remove old-node.
9. If new-node is not "rejected," then add it to the description space.

Fig. 6.1. Applying a changer operator to a description node.

When we use a representation to solve a problem, the system first checks its
applicability. If the problem does not match the condition, it is rejected. If
it matches, the system may solve it, terminate with failure after exhausting
the search space, or interrupt the solver upon reaching a time bound.

Representation input: problem, time bound.


Possible outcomes: solution, failure, rejection, interrupt.

Note that successes, failures, and rejections do not require an external


interrupt of the solver. We refer to them as terminations to distinguish them
from interrupts. In case of an interrupt, SHAPER stores the expanded search
space. If we later try to solve the problem with the same representation,
we can continue the expansion of the space; however, this reuse of spaces is
limited by the available memory for storing them.
The representation space of a domain is the set of representations that
SHAPER can potentially generate based on the description space and avail-
able solvers. After adding a new description dese with condition p-cond, the
system produces the corresponding representations. For every solver opera-
tor (solver, d-cond, dp-cond) , the system checks whether desc matches d-eond.
If it does, SHAPER creates a representation (desc, solver, new-p-eond), where
new-p-cond is the conjunction of p-cond and dp-cond. The system supports
rejection and comparison rules for representations, similar to rules for de-
scription nodes. We give a procedure for generating new representations in
Figure 6.2. SHAPER invokes it after creating a new description desc with
condition p-cond.
6.3 Utility functions 199

Make-Reps ( dese, p-cond)


The algorithm inputs a description dese and its applicability condition p-eond.
It also accesses the solver operators.
For every solver operator (solver, d-eond, dp-eond):
If dese satisfies d-eond, then:
Define condition new-p-eond such that, for every prob,
new-p-eond(prob) = p-eond(prob) 1\ dp-eond( dese, prob).
Define representation new-rep as (dese, solver, new-p-eond).
Call Add-Rep( new-rep).

Add-Rep ( new-rep)
If new-rep matches some rejection rule, then mark it "rejected."
For every old representation, old-rep:
For every comparison rule, better-rep:
If better-rep( old-rep, new-rep), then mark new-rep "rejected;"
else, if better-rep(new-rep, old-rep), then remove old-rep.
If new-rep is not "rejected," then add it to the representation space.

Fig. 6.2. Generating representations for a new description.

6.3 Utility functions

We describe a utility model for evaluating representations, which accounts


for the number of solved problems, running time, and solution quality.

6.3.1 Gain function

We assume that the application of a solver leads to one of four outcomes: find-
ing a solution, terminating with failure, rejecting a problem, or hitting a time
bound. We pay for running time and get a reward for solving a problem; the
reward may depend on a specific problem and its solution. The overall gain
is a function of a problem, time, and search result. We denote this function
by gn(prob, time, result), where prob is the problem, time is the running time,
and result may be a solution or one of three unsuccessful outcomes: failure
(denoted fail), rejection (reject), or a time-bound interrupt (intr). Note that
fail, reject, and intr are not variables; hence, we do not italicize them. A user
has to provide a specific gain function, thus encoding value judgments about
different outcomes. We impose four constraints on the allowed functions.
1. The gain decreases with the time:
For every prob, result, and time! < time2,
gn(prob, time!, result) 2: gn(prob, time2, result).
2. A rejection gives the same gain as an interrupt:
For every prob and time,
gn(prob, time, reject) = gn(prob, time, intr).
3. A zero-time interrupt gives zero gain:
200 6. Multiple representations

For every prob,


gn(prob, 0, intr) = O.
4. The gain of solving a problem or exhausting the search
space is no smaller than the interrupt gain:
For every prob, time, and soln,
(a) gn(prob, time, soln) ~ gn(prob, time, intr),
(b) gn(prob, time, fail) ~ gn(prob, time, intr).
Constraint 3 means that the gain of doing nothing is zero. Constraints 1 and 3
imply that rejections and interrupts never give a positive gain. Constraints 3
and 4 imply that the gain of instantly finding a solution is nonnegative.
We define a relative quality of solutions through a gain function. Suppose
that solnl and soln2 are two solutions for a problem prob.
solnl has higher quality than soln2 if, for every time,
gn(prob, time, solnl) ~ gn(prob, time, soln2).
If solnl gives larger gains than soln2 for some running times and lower gains
for others, then neither of them has higher quality than the other.

6.3.2 Additional constraints

We give additional constraints used in some special cases; we do not assume


that they hold for all gain functions.
5. If the time approaches infinity, the gain approaches negative infinity:
For every prob, result, and negative value g,
there is time such that gn(prob, time, result) :::; g.
6. A failure gives the same gain as an interrupt:
For every prob and time,
gn(prob, time, fail) = gn(prob, time, intr).
7. If solnl gives a larger gain than soln2 for zero time,
it gives larger gains for all other times:
For every prob, time, solnl, and sol~,
if gn(prob, 0, solnd ~ gn(prob, 0, soln2),
then gn(prob, time, solnd ~ gn(prob, time, soln2)'
If Constraint 7 holds, we can compare the quality of any two solutions; that
is, for every problem, its solutions are totally ordered by quality. We can
then define a quality function, quality(prob, result), that satisfies the following
conditions for every problem and every two results:
• quality(prob, intr) = O.
• If gn(prob, 0, resulh) = gn(prob, 0, resul0.),
then quality(prob, resultl ) = quality(prob, resul0.).
• If gn(prob, 0, resultd > gn(prob, 0, resul0.),
then quality(prob, resultl ) > quality(prob, resul0.).
6.3 Utility functions 201

Most PRODIGY domains have natural quality measures. For example, suppose
that we evaluate a solution by the total cost of its operators. Suppose further
that, for each problem prob, there is a maximal acceptable cost, costmax(prob).
Then, we define quality as follows:

quality(prob, soln) = costmax(prob) - cost(soln).

We may view gain as a function of a problem, time, and solution quality,


denoted gnq(prob, time, quality), which satisfies the following condition:
For every prob, time, and result,
gnq(prob, time, quality(prob, result)) = gn(prob, time, result).
Note that gain is an increasing function of quality:
If qualitYl ~ qualitY2' then
gnq(prob, time, qualitYl) ~ gnq(prob, time, qualitY2)'
We next consider two other optional constraints:
8. We can decompose the gain into the payment for time and the reward
for solving a problem:
For every prob, time, and result,
gn(prob, time, result) = gn(prob, time, intr) + gn(prob, 0, result).
9. The sum payment for two interrupts that take timel and time2 is the
same as the payment for an interrupt that takes timel + time2:
For every prob, timel, and time2,
gn(prob, timel , intr) + gn(prob, time2, intr) = gn(prob, timel+tim e2, intr).
If Constraint 8 holds, then Constraint 7 also holds; furthermore, if we de-
fine quality as quality(prob, result) = gn(prob, 0, result), then gnq is a linear
function of quality:

gnq(prob, time, quality) = gn(prob, time, intr) + quality.


If Constraint 9 holds, the interrupt gain is proportional to time:

gn(prob, time, intr) = time· gn(prob, 1, intr).

Constraints 8 and 9 together lead to the following decomposition of the gain:

gn(prob, time, result) = time· gn(prob, 1, intr) + gn(prob, 0, result).

6.3.3 Representation quality

We derive a utility function for evaluating a representation. We assume that


solvers never make random choices; then, for every problem prob, a repre-
sentation uniquely determines the running time, time(prob), and the result,
result(prob). If we do not interrupt the solver, its running time may be infinite.
If we use a time bound B, the time and result are as follows:
202 6. Multiple representations

time' = min(B, time(prob))


It' _ { result(prob), if B ~ time(prob)
resu - .mtr, 1·fB < t·~me (pro b)

The choice of a bound B may affect the time and result, which implies that
it affects the gain. We denote the function that maps problems and bounds
into gains by gn':
gn' (prob, B) = gn(prob, time', result!). (6.1)
If the gain function satisfies Constraint 7 of Section 6.3.2, and we use a
quality measure quality(prob, result), then we can define gn' in terms of the
solution quality. If we use a time bound B, the solution quality is as follows:

n '( b B) = { quality(prob, result(prob)), if B ~ time(prob)


qua ~ y pro , quality(prob, intr), if B < time(prob)

We express gn' through the function gnq defined in Section 6.3.2:


gn' (prob, B) = gnq(prob, time', quality' (prob, B)). (6.2)
We will describe heuristics for choosing a time bound B in Chapters 7
and 8. These heuristics determine a function that maps every prob into a
bound B(prob). We determine a representation utility by averaging the gain
over all problems [Koenig, 1997]. We denote the set of problems by P, assume
a fixed probability distribution on P, and denote the probability of encoun-
tering prob by p(prob). If we select a problem at random, the expected gain
is as follows:
G= L p(prob)· gn'(prob,B(prob)). (6.3)
probEP

We use G as a utility function for evaluating a representation. It unifies the


three utility dimensions: near-completeness, speed, and solution quality.

6.3.4 Use of multiple representations

We next define a utility for a collection of representations. We denote the


number of available representations by k, and consider the respective gain
functions gnl, ... , gnk and bound-selection functions B 1 , ... , Bk. When solving
a problem prob with representation i, we set the time bound Bi(prob) and
use gni to determine the gain. For every i, we define the function gn~ in the
same way as we have defined gn' in Section 6.3.3. The gain of solving prob
with representation i is gn~ (prob, Bi (prob)).
In Chapters 7 and 8, we will describe algorithms that choose a represen-
tation for each given problem. These algorithms determine a function that
maps every prob into a representation i(prob). When solving prob, we iden-
tify the representation i(prob) and time bound Bi(PTOb)(prob). If we select a
problem at random, the expected gain is as follows:
6.3 Utility functions 203

G= L p(prob)· gn~(prob)(prob,Bi(Prob)(prob)). (6.4)


probEP
The utility G depends on the gain function, probability distribution, rep-
resentation, and time bounds. SHAPER inputs the user-specified gain function
and gradually learns the probability distribution. It chooses representations
and time bounds that maximize the expected gain.

6.3.5 Summing gains

Suppose that we apply SHAPER to several problems, called prob l , ... ,probn .
We denote the time of solving prob i by timei, the result by resul4., and the
corresponding gain function by gni .
A. If prob l , prob 2, ... ,probn are all distinct,
the total gain is L~=l gni(prob i , timei, resul4.).
We use this rule in SHAPER although it does not always hold in the real
world. For example, if two problems are parts of a larger problem, solving
one of them may be worthless without the other.
If we try to solve a problem with different representations, and all at-
tempts except one lead to an interrupt or rejection, then the total gain is
also the sum of individual gains:
B. If resultl , ... , resultn-l for prob are all interrupts and rejections,
the total gain of solving prob is L~=l gni(prob, timei, resul4.).
We cannot use this rule if two different runs lead to a solution. For example,
if two runs give the same solution, the rule would count the reward twice.
If we allow search for multiple solutions, we need Constraints 6 and 8 of
Section 6.3.2 for defining the total gain. We compute the payment for time
as the sum of payments for individual runs, and the reward for solving a
problem as the maximum of rewards for individual solutions:
C. If resultl , ... , resultn are all for the same problem prob,
the gain is L~=l gni(prob, timei, intr) + maxi=l gni(prob,O, resul4.).
This rule is also a simplification since obtaining multiple solutions may be
more valuable than finding only the best solution.
When using description changers, we include the payment for their use in
the gain computation. Since a changer does not solve any problems, its "gain"
is negative. We assume that the gain of applying a changer is a function of
the description and running time, gn-ehange( dese, time), and that it has the
same properties as the interrupt gain of solvers:
• For every dese, gn-ehange( dese, 0) = 0 .
• If timel < time2, then gn-ehange(dese, timed 2: gn-ehange(dese, time2).
We compute the total gain by summing all changer gains and adding them
to the solver gains.
204 6. Multiple representations

6.4 Simplifying assumptions

We have formalized the use of multiple representations. We now discuss the


related simplifying assumptions and the user's role in guiding SHAPER.
Solvers and changers. We assume that solvers and changers are sound;
that is, they always produce correct solutions and valid descriptions. Fur-
thermore, solvers do not use any-time behavior; that is, a solver finds a so-
lution and terminates, rather than outputting successively better solutions.
Description changers also do not use any-time behavior.
Utility Illodel. We assume the availability of functions that determine an
exact gain for every problem-solving episode. The performance analysis does
not account for the time of the gain computation and selection of representa-
tions; the time of these operations in SHAPER is smaller than the search time
by two orders of magnitude. Since SHAPER does not run solvers and chang-
ers in parallel, we have defined utility only for their sequential application;
parallelism would require a significant extension to the utility model. We
have also assumed that solvers do not make random choices (Section 6.3.3);
however, we can readily extend the model to randomized solvers by viewing
every possible behavior as a separate problem-solving episode.
Additional assuIllptions. We next introduce assumptions that underlie
the automatic selection of representations. First, SHAPER views solvers and
changers as black boxes; it calls a selected algorithm and waits for a termina-
tion, without providing any guidance during the execution of the algorithm.
Second, the system faces one problem at a time; it does not select among
problems or decide on the order of solving them. Third, SHAPER does not
interleave the use of different representations in solving a problem, although
interleaving may be more effective than trying them one after another.
User role. The user must encode domains in the PRODIGY language, pro-
vide solvers and changers, and specify gain functions that encode her value
judgments. The PRODIGY architecture includes a number of solvers, as well as
the changers described in Part II. The user can adjust the knobs of existing
algorithms or add new algorithms.
The rest of the user's input is optional; it may include solver and changer
operators, rejection and comparison rules, and domain descriptions. Solver
and changer operators consist of algorithm sequences and their applicability
conditions. If the user does not specify them, SHAPER assumes that all solvers
and changers are always applicable. The user can also provide procedures for
estimating problem complexity and similarity between problems, and then
SHAPER utilizes them in the performance analysis.
7. Statistical selection

Good ideas are based on past experience.


- George Polya [1957], How to Solve It.

We describe statistical techniques for selecting a representation and time


bound. We formalize the problem of estimating the utility of a representation
(Section 7.1), derive a solution to this problem (Sections 7.2 and 7.3), use it
in choosing a representation (Sections 7.4), and give empirical results that
confirm the effectiveness of the selection algorithm (Sections 7.5 and 7.6).

7.1 Selection task


We outline the main results (Section 7.1.1) and then formalize the selection
task (Section 7.1.2).

7.1.1 Previous and new results


Researchers have long realized the importance of automatic evaluation and
selection of search methods, and developed techniques for various special
cases of this problem. In particular, Horvitz [1988] described a framework for
evaluating algorithms based on trade-offs between computational cost and so-
lution quality, and used it in the selection of sorting algorithms. Breese and
Horvitz [1990] designed a decision-theoretic procedure that evaluated differ-
ent methods of belief-network inference and selected the optimal method.
Hansson and Mayer [1989], and Russell [1990] applied related techniques to
choose promising branches of a search space.
Russell et al. [1993] formalized a general problem of selecting among alter-
native search methods, and used dynamic programming to solve special cases
of this problem. Minton developed an inductive learning system that config-
ured constraint-satisfaction programs by choosing among search strategies
[Minton, 1996; Allen and Minton, 1996].
Hansen and Zilberstein [1996] studied trade-offs between running time
and solution quality in simple any-time algorithms, and used dynamic pro-
gramming for deciding when to terminate the search. Mouaddib and Zilber-
stein [1995] developed a similar technique for knowledge-based algorithms.

E. Fink, Changes of Problem Representation


© Springer-Verlag Berlin Heidelberg 2002
206 7. Statistical selection

Howe et al. [1999) built the meta-planner system, which integrated six plan-
ners and chose among them based on features of a given problem. It used
linear regression to compute the expected running time of each planner.
We found that the previous results are not applicable to selecting rep-
resentations in SHAPER because the developed techniques rely on analysis
of a large sample of past performance data. When we apply SHAPER to a
new domain or generate new representations, we usually have no prior data.
Acquisition of sufficient data is often impractical because experimentation is
more expensive than solving given problems.
We have developed a selection technique that makes the best use of the
available data even when these data do not allow an accurate estimate. We
present a learning algorithm that accumulates performance data and selects
the representation that maximizes the expected gain. We also describe a
statistical technique for setting time bounds, and show that determining an
appropriate bound is as crucial as choosing the right representation.
The described techniques are aimed at selecting a representation and time
bound before solving a given problem. We do not provide a mechanism for
switching a representation or revising the selected bound during the search for
a solution. Developing a revision mechanism is an important open problem.

7.1.2 Example and general problem

Suppose that we use PRODIGY to construct plans for transporting packages


between different locations in a city [Veloso, 1994), and we consider three
alternative representations. The first of them is based on the SAVTA search,
described in Section 2.2.5, with control rules designed by Veloso [1994) and
Perez [1995). SAVTA applies the selected actions to the current state of the
simulated world as early as possible; we call the representation based on this
algorithm Apply.
The second representation uses the SABA search (see Section 2.2.5) with
the same control rules. It delays the application of the selected actions and
forces more emphasis on backward search; we call this representation Delay.
Finally, the third representation, called Abstract, is a combination of SAVTA
with the problem-specific version of Abstractor.
When we use one of these representations, we get one of three outcomes:
the system may solve a problem, terminate with failure after exhausting the
search space, or interrupt the search upon reaching a time bound.
In Table 7.1, we give the results of solving thirty transportation prob-
lems with each representation; we denote successes by s, failures by j, and
interrupts upon hitting a time bound by b. Note that these data are only for
illustrating the selection problem, and not for a general comparison of these
three techniques; their performance may be different in other domains.
Although each representation outperforms the others on at least one prob-
lem, a glance at the data reveals that Apply is probably the best among the
7.1 Selection task 207

Table 7.1. Performance of Apply, Delay, and Abstract on thirty problems.


# time (secL and outcome # of # time (secL and outcome # of
Apply Delay Abstract packs Apply Delay Abstract packs
1 1.6 s 1.6 s 1.6 s 1 16 4.4 s 68.4 s 4.6 s 4
2 2.1 s 2.1 s 2.0 s 1 17 6.0 s 200.0 b 6.2 s 6
3 2.4 s 5.8 s 4.4 s 2 18 7.6 s 200.0 b 7.8 s 8
4 5.6 s 6.2 s 7.6 s 2 19 11.6 s 200.0 b 11.0 s 12
5 3.2 s 13.4 s 5.0 s 3 20 200.0 b 200.0 b 200.0 b 16
6 54.3 s 13.8 f 81.4 s 3 21 3.2 s 2.9 s 4.2 s 2
7 4.0 s 31.2 f 6.3 s 4 22 6.4 s 3.2 s 7.8 s 4
8 200.0 b 31.6 f 200.0 b 4 23 27.0 s 4.4 s 42.2 s 16
9 7.2 s 200.0 b 8.8 s 8 24 200.0 b 6.0 s 200.0 b 8
10 200.0 b 200.0 b 200.0 b 8 25 4.8 s 11.8 f 3.2 s 3
11 2.8 s 2.8 s 2.8 s 2 26 200.0 b 63.4 f 6.6 f 6
12 3.8 s 3.8 s 3.0 s 2 27 6.4 s 29.1 f 5.4 f 4
13 4.4 s 76.8 s 3.2 s 4 28 9.6 s 69.4 f 7.8 f 6
14 200.0 b 200.0 b 6.4 s 4 29 200.0 b 200.0 b 10.2 f 8
15 2.8 s 2.8 s 2.8 s 2 30 6.0 s 19.1 s 5.4 f 4

three. We use statistical analysis to confirm this intuitive conclusion and to


set a time bound for the chosen representation.
We suppose that the user has specified a gain function, and that we have
sample data on the past performance of every representation. We need to
select a representation and time bound that maximize the expected gain G
determined by Equation 6.3.
We can estimate the gain for all candidate representations and time
bounds, and select the representation and bound with the largest estimate.
We distinguish between past terminations and interrupts in deriving the esti-
mate, but we do not distinguish among the three termination types: successes,
failures, and rejections. We thus consider the following statistical problem.
Problem: Suppose that a representation terminated on n problems, called
ptl' ... , ptn' and gave an interrupt upon hitting a time bound on m problems,
called pbl , ... , pbm . The termination times were h, ... , tn and the correspond-
ing results were resultl , ... , resultn; the interrupt times were bl , ... , bm . Given a
gain function and a time bound B, estimate the expected gain and determine
the standard deviation of the estimate.
The termination results, denoted resultl, ... , resultn, may include solutions,
failures, and rejections. On the other hand, all interrupts by definition give
the same result, denoted intr. We need a gain estimate that makes the best use
of the available data, even if they are not sufficient for statistical significance.
We next give an example of a gain function for the transportation domain.
Suppose that we get a certain reward R for solving a problem, and that we
pay for each second of running time. If the system solves a problem, the
overall gain is (R - time); if it fails or hits a time bound, the gain is (-time):
208 7. Statistical selection

For every prob, time, and soln,


(a) gn(prob, time, soln) = R - time,
(b) gn(prob, time, fail) = -time,
(c) gn(prob, time, intr) = -time.
Suppose that we choose one of the three representations and use some
fixed time bound B. For each prob, the representation uniquely determines
the (finite or infinite) running time until termination, time(prob), and we can
rewrite Equation 6.1 as follows:

R-time(prOb), if B?:. time(prob) and the outcome is success


gn' (prob, B) = { - time(prob) , if B?:. time(prob) and the outcome is failure
-B, ifB<time(prob)

We need to select the representation and time bound that maximize the
expected value of gn' (prob, B).

7.2 Statistical foundations

We evaluate the results of search with a fixed representation and time


bound, and estimate the expected gain. For convenience, we assume that
the termination and interrupt times are sorted in increasing order; that is,
tt :::; h :::; ... :::; tn and b1 :::; b2 :::; ... :::; bm . We first consider the case when the
time bound B is no larger than the lowest of the past time bounds; that is,
B :::; b1 . We denote the number of termination times that are no larger than
B by c; that is, te :::; B < t e+!.
We estimate the expected gain by averaging the gains that would be
obtained in the past if we used the bound B for all problems. The solver
would terminate without hitting this bound on the problems ptl' ... ,pte,
earning the gains gn(pt1,tl,resultl),"" gn(pte,te,resulte). It would hit the
time bound on pte+1, ... ,ptn , getting the negative gains gn(pte+!,B,intr), ... ,
gn(ptn' B, intr). It would also hit the bound on pb1, ... ,pbm , with the negative
gains gn(pb 1 , B, intr), ... , gn(pbm , B, intr). The estimated gain is equal to the
mean of these gain values:

"L~=lgn(pti' ti, result;.) + "L7=e+l gn(pti' B, intr)+ "L"'fr=l gn(pb j , B, intr). (7.1)
n+m
For instance, suppose that we solve transportation problems with Ab-
stract, and that we use the example gain function with reward R = 30.0. The
gain of solving a problem is (30.0 - time), and the negative gain of a failure
or interrupt is (-time). If we apply Equation 7.1 to estimate the gain for
B = 6.0 using the data in Table 7.1, we find that the expected gain is 6.0.
If we raise the bound to 8.0, the expected gain becomes 11.1. In Figure 7.1,
7.2 Statistical foundations 209

Apply Delay Abstract


............

60 60 60
rJ)
c
•Cij -... ......
Ol 40 40 40

*
"0

Q) 20 20 20
a. ,..---
--,...-
/
X
._._'-'-
/" J
Q)
"-
0 , , "- 0 0
'. "- "- ,
"- "-
-20 -20 -20
10 100 10 100 10 100
time bound time bound time bound

Fig. 1.1. Dependency of the expected gain on the time bound for a reward of 10.0
(dash-and-dot lines), 30.0 (dashed lines), and 100.0 (solid lines). The dotted lines
show the standard deviation of the expected gain for the 100.0 reward.

we show the dependency of the expected gain on the time bound for the
three representations. We give the dependency for three different values of
the reward R: 10.0, 30.0, and 100.0.
Since we have computed the mean gain for a sample of problems, it may
differ the mean of the overall population. We estimate the standard deviation
of the expected gain using the expression for the deviation of a sample mean:

SqrSum _ Sum 2
n+m
(7.2)
(n + m) . (n + m - 1)'
where

Sum = .E~=1 gn(pt;, t;, result;) + .E~=C+1 gn(pt;, B, intr) + .E7=1 gn(pbj , B, intr),
SqrSum = .E~=1 gn(pti' ti, resulti?+.E~=C+1 gn(pti' B, intr)2+.E7=1 gn(pbj , B, intr)2.
This result is an approximation based on the Central Limit Theorem, which
states that the distribution of sample means is close to the normal distribu-
tion. Its accuracy improves with sample size; for thirty or more problems, it
is near-perfect.
For example, if we use Abstract with reward 30.0 and time bound 6.0,
the standard deviation of the expected gain is 2.9. In Figure 7.1, we use
dotted lines to show the standard deviation of the gain estimates for the 100.0
reward. The lower dotted line is one standard deviation below the estimate,
and the upper line is one standard deviation above.
We have so far assumed that B :S b1 . We next consider the case when B
is larger than d of the past interrupt times; that is, bd < B :S bdH . We
cannot use b1 , b2 , ... , bd directly in the gain estimate because the use of the
time bound B would cause the solver to run beyond these old bounds. The
collected data do not show the results of solving the corresponding problems
210 7. Statistical selection

Table 7.2. Distributing the weights of interrupts among the larger-time outcomes.
# Abstract's
time weight time weight time
1 1.6 s 1.000 1.6 s 1.000 1.6 s
2 2.0 s 1.000 2.0 s 1.000 2.0 s
3 4.4 s 1.000 4.4 s 1.000 4.4 s
4 4.5 b - - - -
5 5.0 s 1.048 5.0 s 1.048 5.0 s
6 81.4 s 1.048 81.4 s 1.118 81.4 s
7 5.5 b 1.048 5.5 b - -
8 200.0 b 1.048 200.0 b 1.118 200.0 b
9 8.8 s 1.048 8.8 s 1.118 8.8 s
... . .. . ..
29
30
I 10.2 f
5.4 f
1.048
1.048 I
10.2 f
5.4 f
1.118
1.048 I
10.2 f
5.4 f

(a) (b) (c)

with the time bound B. We estimate these results by "re-distributing" the


probabilities of reaching these low bounds among the other outcomes.
If we had not interrupted the solver at bl in the past, it would have ter-
minated at some larger time or hit a larger bound. We estimate the expected
gain using the data on the past episodes in which the solver ran beyond bl .
We get this estimate by averaging the gains for all the larger-time outcomes.
To incorporate this averaging into the computation, we remove bl from the
sample and distribute its chance to occur among the larger-time outcomes.
To implement this re-distribution, we assign weights to all past outcomes;
initially, the weight of every outcome is 1. After removing bl , we distribute its
weight among all the larger-than-b l outcomes. If the number of such outcomes
is al, each of them gets the weight of 1 + ..1.. al
Note that b2 , ... , bd are all larger
than bl , and thus they all get the new weight.
We next remove b2 and distribute its weight, which is 1 + ..1., al
among the
larger-than-b 2 outcomes. If the number of such outcomes is a2, we increase
their weights by (1 +..1.) al
. ..1.;
a2
that is, their weights become (1 +..1.).al
(1 + ..1.).
a2
We repeat the distribution process for all interrupt times smaller than B;
that is, for b3 , ... , bd.
We illustrate the re-distribution for the data on Abstract's performance
(Table 7.1). Suppose that we interrupted Abstract on problem 4 after 4.5
seconds of the execution and on problem 7 after 5.5 seconds, thus obtaining
the data in Table 7.2(a), and we need to estimate the gain for B = 6.0.
We first distribute the weight of bl . In this example, bl is 4.5, and there are
21 problems with larger times; thus, we remove 4.5 and increase the weights
of the larger-time outcomes from 1 to 1 + A
= 1.048 (Table 7.2b). We next
perform the distribution for b2 , which is 5.5. The table contains 15 problems
with larger-than-b 2 times. We distribute b2 's weight, 1.048, among these 15
problems, thus increasing their weights to 1.048 + 1.~:8 = 1.118 (Table 7.2c).
7.3 Computation of the gain estimates 211

We have removed the interrupt times bl , bz , ... , bd , and assigned weights to


the termination times and remaining interrupt times. We denote the resulting
weights of the termination times tl, ... , tc by UI, ... , U c ; recall that these times
are smaller than B. All termination and interrupt times larger than B have
the same weight, denoted by u*. The sum of the weights is equal to the
number of problems in the original sample; that is, L~=I Ui + (n + m - c-
d) . U* = n + m. We use the weighted times to compute the expected gain,
which leads to the following expression:
L~=l Ui· gn(pti' ti, result;) +u.· L~=C+lgn(pt;, B, intr)+u.· L'l'=d+lgn(pbj, B, intr)
n+m
(7.3)
Similarly, we use the weights in computing the standard deviation:

SqrSum _ Sum 2
n+m
(7.4)
(n + m) . (n + m - d - 1)'
where
Sum = L~=I Ui . gn(pti' ti, result;) + u* . L~=C+1 gn(pti' B, intr)
+ u*· L;:d+1 gn(pbj,B,intr),
SqrSum = L~=I Ui· gn(pti' ti, result;)2 + u* . L~=C+I gn(pti , B, intr)2
+ u* . L;:d+1 gn(pb j , B, intr)z.
The application of these results to the data in Table 7.2(c), for reward 30.0
and bound 6.0, gives the expected gain 6.1 and the standard deviation 3.0.
If B is larger than the largest of the past bounds (that is, B > bm ), and
the largest bound is larger than all past termination times (that is, bm > t n ),
then the re-distribution procedure does not work. We need to distribute the
weight of bm among the larger-time outcomes, but the sample has no such
outcomes. In this case, the data are not sufficient for the statistical analysis
because we do not have past experience with large bounds.

7.3 Computation of the gain estimates

We describe an algorithm that computes the gain estimates for multiple val-
ues of the time bound B. We have assumed in deriving the gain estimate
that we try to solve each problem once. In practice, if we have interrupted
the search, we may later return to the same problem and continue from the
point where we have stopped. We develop a gain-estimate procedure that
accounts for this reuse of previously expanded space.
Suppose that we have tried to solve prob in the past and hit a bound Bo,
where the subscript "0" stands for "old." We can store the complete space of
the interrupted search or the part of the space expanded until reaching some
212 7. Statistical selection

bound bo, where bo ::; Bo; we refer to bo as the reused time. If we did not
reuse the old space, the time until termination on prob would be time(prob);
if we reuse the space, the time until termination is time(prob) - boo
Now suppose that we are solving prob again, with the same representation
and a new bound B. We measure the time from the beginning of the old
search; that is, if we reuse the space expanded in time bo , then the new
search begins at time bo and reaches the bound at time B, which means that
the running time of the new search is bounded by B - boo Since we know that
the solver hits the bound Bo without finding a solution, we must set a larger
bound; that is, B > Bo. We modify Equation 6.1 for search with bound B
and reused time bo :

'( b B) = {gn(prOb, time(prob) - bo, result(prob)) , if B 2: time(prob)


gn pro , gn(prob, B - bo , intr), if B < time(prob)
We determine the expected gain based on the past performance data.
First, we remove the previous attempt to solve the given problem from the
data. We denote the termination times in the resulting sample by tI, ... , tn
and interrupt times by b1 , ... , bm .
Let e be the number of past termination times that are no larger than B o ,
which means that te ::; Bo < t e+1; note that, since Bo < B, we have e ::; c.
We know that time(prob) > Bo; hence, we must not use the past termination
times h, ... , te in computing the expected gain. We remove them and use the
remaining termination times te+l' ... , tn and interrupt times bl, ... , bm ; thus,
the reduced sample includes n + m - e past outcomes.
We apply the estimation technique of Section 7.2 to this sample. If some
interrupt times b1 , ... , bd are smaller than B, we distribute their weights among
the larger-time outcomes. We denote the weights of the termination times
te+l' ... , tc by u~+1 ' ... , u~, and the weight of all other times by u~. The sum of
all weights equals the number of problems in the reduced sample, 2:~=e+l u~+
(n + m - c - d) . u~ = n + m - e. We compute the expected gain as follows:
Sum
(7.5)
n+m-e'
where
Sum = 2:~=e+l u~ . gn(pti' ti - bo, result;)
+ u~ . 2:~=C+l gn(pti' B - bo, intr)
+ u~ . 2:.t=d+l gn(pbj , B - bo, intr).
The standard deviation of this estimate is

SqrSum _ Sum 2
n+m-e
(7.6)
(n +m - e) . (n +m - e - d - 1) ,

where Sum is the same as in Equation 7.5 and


7.3 Computation of the gain estimates 213

c number of the processed termination times;


the next termination time to process will be tc+l
number of the processed interrupt times
number of the processed time bounds
weight of the times that are larger than the current bound Bh+l
weighted sum of the gains for the processed terminations;
that is, L~=e+1 u; . gn(pti' ti - bo , resulti)
LSum unweighted sum of the interrupt gains for the current bound Bh+l;
that is, L~=C+l gn(pti , B-bo , intr)+ L7=d+l gn(pbj , B-b o , intr)
weighted sum of the gains for all problems, for the bound Bh+l
weighted sum of the squared gains for the processed terminations;
that is, L~=e+l u; .gn(pti , ti - bo , resulti)2
LSqrSum unweighted sum of the squared interrupt gains for the bound Bh+l
SqrSum weighted sum of squared gains for all problems, for the bound Bh+l

Fig. 7.2. Variables in the gain-estimate algorithm in Figure 7.3.

SqrSum = L~=e+l u~ . gn(pti , ti - bo , result;)2


+ u~ . L~=C+l gn(pti' B - bo , intr)2
+ u~ . L~d+l gn(pb j , B - bo , intr)2.
We now describe the computation of the gain estimate for multiple can-
didate bounds B 1 , .•. , Bl. We list the related variables in Figure 7.2 and give
pseudocode in Figure 7.3. The algorithm computes the weights and gain es-
timates in one pass through the sorted list of termination times, interrupt
times, and time bounds. When processing a termination, it increments the
sum of the weighted gains and the sum of their squares. When processing
an interrupt, it modifies the weight value. When processing a time bound, it
computes the gain estimate and deviation for this bound.
We execute Step 2 of the algorithm at most n - e times, Step 3 at most m
times, and Step 4 at most Z times. We compute one value of the gain function
at Step 2, and n + m - e - d values at Step 4. If the time complexity of
computing this function is Comp(gn), the overall complexity of the algorithm
is O(Z· (n + m - e)· Comp(gn)). The computation of LSum and LSqrSum at
Step 4 is the most time-consuming part; however, we can speed up this step
for some special cases of the gain function. We consider two cases that allow
computing the gain estimates in linear time.
First, suppose that the interrupt gains do not depend on a specific prob-
lem; that is, we use an interrupt gain function gn-intr(time):
For every prob and time,
gn(prob, time, intr) = gn-intr(time).
We then compute LSum and LSqrSum as follows:
LSum = (n + m - c) . gn-intr(Bh+l - bo ),
LSqrSum = (n + m - c) . gn-intr(Bh+l - bo )2.
214 7. Statistical selection

Since we execute Step 4 at most l times, the overall complexity of the algo-
rithm is O((l +n- e)· Comp(gn) +m). If the computation ofthe gain function
takes constant time, then the overall complexity is linear, O(l + n + m - e).
Next, we consider a gain function that satisfies Constraint 9 of Sec-
tion 6.3.2; that is, the interrupt gain is proportional to the running time:

gn(prob, time, intr) = time· gn(prob, 1, intr).

We use this property in computing the values of LSum and LSqrSum:

LSum = (Bh+l -bo )· (l:7=e+l gn(pti' 1, intr) + l:;:d+l gn(pbj , 1, intr)),

LSqrSum = (Bh+1 - bo )2. (l:7=e+1gn(pti,l,intr)2+l:7=d+1gn(pbj,1,intr)2).


We denote the four sums in this computation as follows:
Sume = l:7=e+l gn(pti' 1, intr) SqrSume = l:7=e+l gn(pti' 1, intr)2
Sumd = l:;:d+1 gn(pbj , 1, intr) SqrSumd = l:7=d+1 gn(pb j , 1, intr)2
We can pre-compute these sums for all values of c and d before executing the
statistical algorithm. We give a procedure for finding them in Figure 7.4; its
complexity is O((m + n - e) . Comp(gn)). If we use the pre-computed sums
at Step 4, the overall complexity of the statistical algorithm is O((m + n -
e) . Comp(gn) + l). If we can compute each gain value in constant time, the
complexity is O(l + n + m - e).
We have implemented the algorithm in Common Lisp and tested it on a
Sun Sparc 5. For the gain function described at the end of Section 7.1.2, the
running time is about (l + n + m) .3.10- 4 seconds.

7.4 Selection of a representation and time bound

We use the statistical estimate to choose a representation and time bound.


For instance, if we solve transportation problems and use the example gain
function with reward 30.0, then the best choice is Apply with the bound 11.6,
which gives the expected gain of 14.0. This choice corresponds to the maxi-
mum of the dashed lines in Figure 7.1. If the expected gain for all represen-
tations and time bounds is negative, we are better off avoiding the problem.
We describe a technique for learning the performance of available repre-
sentations. For each new problem, the system uses statistical estimates to
select a representation and time bound. After using the selected representa-
tion, it adds the result to the performance data. Sometimes, it deviates from
the maximal-expectation choice in order to explore new opportunities.
We describe the choice of candidate bounds (Section 7.4.1), consider the
selection of a bound for a fixed representation (Section 7.4.2), show how to
select a representation (Section 7.4.3), and discuss the choice of a represen-
tation and bound in the absence of past data (Section 7.4.4).
7.4 Selection of a representation and time bound 215

Estimate-Gains
The input includes the gain function gn; the sorted list of termination times te+l, ... , tn,
with the corresponding problems pte+1 , ... ,ptn and results resulte+1, ... ,resultn ; the
sorted list of interrupt times b1 , .•. , bm , with the corresponding problems pb 1 , ... , pbm ; a
sorted list of candidate time bounds B 1 , .•. , Bl; and the reused time boo The variables
used in the computation are described in Figure 7.2.
Set the initial values:
c := ej d := OJ h := 0
u := Ij T_Sum:= OJ T_SqrSum:= 0
Repeat the computation until finding the gains for all bounds, that is, until h = l:
1. Select the smallest among the following three times: te+l, bd+l, and Bh+l
2. If the termination time te+l is selected, increment the related sums:
T_Sum := T_Sum + u . gn(pte+l' te+l - bo , resulte+1)
T_SqrSum:= T_SqrSum + u· gn(pte+l' te+1 - bo , resulte+l?
c:= c+ 1
3. If the interrupt time bd+l is selected:
If no termination or interrupt times are left, that is, c = nand d + 1 = m,
then terminate (we cannot estimate the gains for the remaining bounds)
Else, distribute the interrupt's weight among the remaining times:
u .- u. n+m-e-e-d
.- 'n+m-e-c-d-l
d:= d+ 1
4. If the time bound Bh+l is selected:
Compute the unweighted sums of interrupt gains and their squares:
LSum:= 2:~=e+l gn(pti' Bh+1 - bo , intr) + 2:7=d+l gn(pbj , Bh+l - bo , intr)
LSqrSum:= 2:~=e+1gn(pti' Bh+l-bo , intr?+2:7=d+lgn(pbj, Bh+l-b o , intr)2
Compute the overall sums of the sample-problem gains and their squares:
Sum := T_Sum + u . LSum
SqrSum := T_SqrSum + u . LSqrSum
Compute the gain estimate and its deviation for the bound Bh+1:
Gain estimate: ~
n+m-er-________~----___
·
E stlmate d · t· SqrSum-Sum 2 /(n+m-e)
eVla lOn: (n+m e).(n+m e d 1)
Increment the number of processed bounds:
h:= h + 1
Fig. 7.3. Computing the gain estimates and their deviations.

Pre-Compute-Sums(gnj pte+l' ... , ptnj pb 1 , ... , pbm )


Set the initial values:
Sumn := OJ SqrSum n := 0
Summ := OJ SqrSum m := 0
Repeat for c from n - 1 down to e:
Sume := Sume+l + gn(pte+l' 1, intr)
SqrSume := SqrSume+1 + gn(pte+l' 1, intr)2
Repeat for d from m - 1 down to 0:
Sumd := SUmd+l + gn(pbd+l' 1, intr)
SqrSumd := SqrSumd+l + gn(pbd+l' 1, intr)2
Fig. 7.4. Pre-computing the sums of interrupt gains.
216 7. Statistical selection

Generate-Bounds (gnj te+l, ... , tn)


Set the initial values:
Bounds := 0 (list of candidate bounds)
B := 0 (largest selected bound; 0 if none)
Repeat for i from e + 1 to n:
If ti ?: 1.05· B, then:
B := 1.001 . ti
Bounds := Bounds U {B}

Fig. 7.5. Generating candidate time bounds based on the past termination times.

7.4.1 Candidate bounds

We have described an algorithm that estimates gains for a finite list of candi-
date time bounds (Figure 7.3). We use past termination times as candidate
bounds and compute expected gains only for these bounds. If we used some
other bound B, we would get a smaller estimate than for the closest lower
termination time ti, where ti < B < ti+1, because extending the bound from
ti to B would not increase the number of terminations on the past problems.
Suppose that we have tried to solve some problem in the past and hit a
time bound B o , and that we need to estimate gains for a new attempt to
solve this problem. All candidate bounds must be larger than Bo; thus, if
te ::; Bo < te+l' we use only te+l' ... , tn as candidate bounds.
We multiply the selected bounds by 1.001 to avoid rounding errors. If
several bounds are "too close" to each other, we drop some of them. In the
implementation, we consider bounds too close if they are within the factor of
1.05 from each other. We summarize the procedure for generating candidate
bounds in Figure 7.5.

7.4.2 Setting a time bound

We consider the use of a fixed representation to solve a sequence of problems,


and describe a technique for learning a time bound. If we have no previous
data, we set some default bound; we will discuss heuristics for selecting this
initial bound in Section 7.4.4. If we use the gain function described at the
end of Section 7.1.2, the initial bound equals the reward R. This heuristic is
based on the observation that, for PRODIGY search, the probability of solving
a problem usually declines with the passage of search time. For example, if a
solver has not terminated within half a minute, chances are it will not find a
solution in the next half minute. Thus, if the reward is 30.0, and the solver
has already run for 30.0 seconds, then it is time to interrupt the search.
Now suppose that we have accumulated some performance data, which
allow finding the time bound that maximizes the gain estimate. To encourage
exploration, we select the largest bound whose gain is "close" to the maxi-
mum. We denote the maximal estimate by 9max and its standard deviation
7.4 Selection of a representation and time bound 217

Apply Delay Abstract


Ul 30 .. 30 . 30 .
"C
§ 25 25 25
o
.0
"C
c
20 20
til
Q) 15 15
E
';10
c
'c
c
2
10 20 30
problem's number problem's number problem's number

Fig. 1.6. Learning a time bound: running times (solid lines), selected bounds
(dotted lines), and maximal-gain bounds (dashed lines). The successes are marked
by circles (0), and the failures by pluses (+).

by O'max. Suppose that the estimate for some bound is 9 and its deviation
is 0'. Then, the expected difference between the gain 9 and the maximal gain
is gmax - g, and the standard deviation of the expected difference is approx-
J
imately O'~ax + 0'2.
We say that 9 is "close" to the maximal gain if the ratio of the expected
difference to its deviation is bounded by some constant. In most experiments,
we set this constant to 0.1:
gmax - 9
::; 0.1.
VO'~ax + 0'2
We thus select the largest time bound whose gain estimate 9 satisfies this
condition. We have also experimented with other bounding constants, from
0.02 to 0.5, and we give the empirical results in Part IV.
In Figure 7.6, we show the results of this selection strategy with the
bounding constant 0.1. In this experiment, we determine time bounds for the
transportation domain with a reward of 30.0. We apply each of the three
representations to solve the thirty transportation problems from Table 7.1.
The horizontal axes show the number of a problem, from 1 to 30, and the
vertical axes show the running time. The dotted lines show the selected time
bounds, and the dashed lines mark the bounds that give the maximal gain
estimates. The solid lines show the running times; they touch the dotted lines
where the system hits the time bound. The successfully solved problems are
marked by circles, and the failures are shown by pluses.
Apply's total gain is 360.3, which gives an average of 12.0 per problem. If
we used the maximal-gain bound, 11.6, for all problems, the gain would be
14.0 per problem. Thus, the learning has yielded a near-optimal gain despite
the initial ignorance. The time bounds (dotted line) converge to the estimated
maximal-gain bounds (dashed line) since the deviations of the gain estimates
decrease as the system solves more problems. After solving all problems,
218 7. Statistical selection

Apply's estimate of the maximal-gain bound is 9.6. It differs from the 11.6
bound, based on the data in Table 7.1, because the use of bounds that ensure
near-maximal gains has prevented sufficient exploration.
Delay's total gain is 115.7, which is 3.9 per problem. If we used the data
in Table 7.1 to find the optimal bound, which is 6.2, and solved all problems
with this bound, we would earn 5.7 per problem. Finally, Abstract's total gain
is 339.7, which is 11.3 per problem. The estimate based on Table 7.1 gives the
bound 11.0, which would result in earning 12.3 per problem. Unlike Apply,
both Delay and Abstract have eventually found the optimal bound.
The main losses occur on the first ten problems, when the past data are
not sufficient for choosing an appropriate bound. After this initial period,
the choice of a bound becomes close to the optimal. The selected bound has
converged to the optimal in two out of three experiments. Further tests have
shown that insufficient exploration prevents finding the optimal bound in
about half of all cases.
The total time of the statistical computations while solving the thirty
problems is 0.26 seconds, which is less than 0.01 seconds per problem. It is
negligible in comparison with the search time, which averages at 6.5 seconds
per problem for Apply, 7.7 seconds per problem for Delay, and 7.1 seconds
per problem for Abstract.

7.4.3 Selecting a representation

If we have no past data for some representation, we choose this representation,


thus encouraging exploration. If we have data for all representations, we first
select a time bound for each of them and determine the gain estimates for the
selected bounds. Then, for every representation, we estimate the probability
that it is the best among the available representations. Finally, we make a
weighted random selection; the chance of choosing each representation equals
its probability of being the best. This strategy leads to a frequent use of
representations that perform well, but it also encourages some exploratory
use of poor performers.
We now describe a procedure for estimating the probability that a given
representation is the best. Suppose that we have k representations, and we
select one of them, with gain estimate 9 and deviation CY. Recall that we
compute 9 and CY from past data for the selected time bound. We denote
the expected gain for the selected bound by G (Equation 6.3); then, g is an
unbiased statistical estimate of G. Finally, we denote the gain estimates of
the other representations by gl, ... , gk-l, and their deviations by CYl, ... , CYk-l.
First, suppose that we know the exact value of G, that is, the mean gain
for the population of all possible problems. The selected representation is the
best if G is larger than the expected gains of the other representations. We
apply the statistical z-test to determine the probability that G is the largest
among the expected gains.
7.4 Selection of a representation and time bound 219

We begin by finding the probability that G is greater than the expected


gain of another representation i, with gain estimate gi and deviation ai.
The expected difference between the two gains is G - gi, and the standard
deviation of the difference is ai. The z value is the ratio of the expected
difference to its deviation; that is, z = G;t. The z-test converts this value
into the probability that the expected gain for the selected representation
is larger than that for representation i. Note that this test uses the value
of G, which cannot be found from the past data. We denote the resulting
probability by Pi ( G).
We next determine the probability p( G) that G is larger than the expected
gains of all other representations. If the estimates gl, ... , gk-l are independent,
then the probabilities PI (G), ... ,Pk-d G) are also independent, and p( G) is
their product:
k-l
p( G) = II Pi( G).
i=1

Since we cannot compute G from the past data, we use its unbiased esti-
mate g. The distribution of the possible values of Gis approximately normal
with mean 9 and deviation a, which means that its probability density func-
tion is as follows:
e-(G-g)2 j(2u 2 )
f( G) = ...j'ii.
a· 211"
To determine the probability P that the selected representation is the best,
we integrate over possible values of G:

(7.7)

Note that we have made two simplifying assumptions to derive this result.
First, we have assumed that the sample means g, gl, ... , gk-l are normally dis-
tributed; however, if we compute them from small samples, their distributions
may not be normal. Second, we have viewed g, gl, ... , gk-l as independent vari-
ables. If we use the statistical learning, then the choice of a representation
for each problem depends on the previous choices, and the data collected for
different representations are not independent. Although Equation 7.7 is an
approximation, it is satisfactory for the learning algorithm.
For example, suppose that we are choosing among Apply, Delay, and Ab-
stract based on the data in Table 7.1. We select bound 13.1 for Apply, which
gives a gain estimate 13.5 with deviation 3.3; bound 5.3 for Delay, with a gain
5.3 and deviation 3.0; and bound 13.2 for Abstract, with a gain 11.2 and de-
viation 3.2. Apply outperforms the other two methods with probability 0.68;
the probability that Delay is the best is 0.01; and Abstract's chance of being
the best is 0.31. We choose one of the representations at random; the chance
of choosing each representation equals its probability of being the best.
220 7. Statistical selection

30

Q) 20
E
:;::

CD a:ro 0 CD 0 0 CD CD 0 CD CIlXlXD 0 0 0 0 CD CD 0::0 0::0 0 0 0 CIlXlXD


X X X xx x x x x X
lIeIeIE lIeIeIElIE lIEl!eIE lIE lIE l!eIE lIE lIeIE lIE lIeIE lIE l!eIE lIElIE lIE

10 20 30 40 50 60 70 80 90
problem's number

Fig. 7.7. Selection of a representation and time bound on ninety transportation


problems. The graph shows the running times (solid line), successes (0) and fail-
ures (+), and the choice among Apply (0), Delay (x), and Abstract (*).

In Figure 7.7, we show the results for the reward of 30.0. In this exper-
iment, we first use the thirty problems from Table 7.1 and then sixty other
problems. The horizontal axis shows the number of a problem, and the ver-
tical axis is the running time; we mark successes by circles, and failures by
pluses. The rows of symbols below the curve show the choice of a represen-
tation: a circle for Apply, a cross for Delay, and an asterisk for Abstract.
The total gain is 998.3, which gives an average of 11.1 per problem. The
overall time of the statistical computations is 0.78 seconds, or about 0.01
seconds per problem. The selection converges to the use of Apply with the
time bound 12.7, which is the optimal for this set of ninety problems. If we
used the final selection on all problems, we would earn 13.3 per problem.

7.4.4 Selection without past data

When the system faces a new domain, it has to choose the initial representa-
tion and time bound without analysis of past data. We discuss basic heuristics
for making these choices.
Initial time bounds. The learning procedure never considers time bounds
that are larger than the initial bound (see Section 7.4.2); therefore, the initial
choice should overestimate the optimal bound. We choose an initial bound B
that satisfies the following condition for all known problems and all possible
results, which almost always gives a larger-than-optimal bound:

gn(prob, B, result) :::; O. (7.8)

An informal justification for this heuristic is based on the observation


that, if we get a negative gain for solving a problem, then we should skip
the problem. On the other hand, a larger-than-B bound would encourage the
system to solve negative-gain problems.
7.4 Selection of a representation and time bound 221

We describe two techniques for computing an initial bound based on In-


equality 7.8. The first technique is for gain functions that satisfy Constraints 8
and 9 of Section 6.3.2, whereas the second technique is for gains defined
through solution quality.
Linear gain function. If gain satisfies Constraints 8 and 9, the user can
specify it by the payment for a unit time, pay(prob) , and the reward for
solving a problem, R(prob, result):

gn(prob, time, result) = R(prob, result) - time· pay(prob).

In this case, we compute the initial bound B through the minimum of pay
and maximum of R, taken over all known problems:

B _ max R(prob, result)


- minpay(prob) .

Solution-quality function. If gain satisfies Constraint 7 of Section 6.3.2,


the user can specify a solution-quality function, quality(prob, result), and ex-
press gain through quality, gnq(prob, time, quality). Then, we can compute a
time bound that satisfies Inequality 7.8 for a specific problem.
The computation requires the use of an upper bound on solution quality
for a given problem; we denote this bound by qual-bd(prob). For example, if
we define the quality through the total cost of operators in a solution (see
Section 6.3.2), its upper bound is the quality of a zero-cost solution.
The function gnq(prob, time, qual-bd(prob)) is an upper bound on prob's
gain for each specific running time. Therefore, if a time bound B satisfies the
following equation, then it also satisfies Inequality 7.8:

gnq(prob,B, qual-bd(prob)) = O.
The function gnq is decreasing on time; thus, we can solve this equation
using a binary search. We use this technique to determine B for each known
problem, and use the maximum of the resulting values as the initial bound.
Exploring new representations. When a representation has no past data,
the system uses it with the initial time bound until it successfully solves two
problems; then, it switches to the statistical learning. If the representation
gives only interrupts and failures, the system gives up on it after a pre-set
number of trials. By default, it gives up after seven trials without any success
or after fifteen trials with only one success.
The system accumulates initial data for all available representations be-
fore switching to the statistical learning. If the data for some representation
are not sufficient for statistical selection, the system chooses this unexplored
representation. This optimistic use of the unknown encourages exploration
during early stages of learning.
222 7. Statistical selection

Table 7.3. Performance in the extended transportation domain.

# time (sec)_ and outcome # of # time (sec)_ and outcome # of


Apply Delay Abstract packs Apply Delay Abstract packs
1 4.7 s 4.7 s 4.7 s 1 16 35.1 s 21.1 s 6.6 f 2
2 96.0 s 9.6 f 7.6 f 2 17 60.5 s 75.0 f 13.7 s 2
3 5.2 s 5.1 s 5.2 s 1 18 3.5 s 3.4 s 3.5 s 1
4 20.8 s 10.6 f 14.1 s 2 19 4.0 s 3.8 s 4.0 s 1
5 154.3 s 31.4 s 7.5 f 2 20 232.1 s 97.0 s 9.5 f 2
6 2.5 s 2.5 s 2.5 s 1 21 60.1 s 73.9 s 14.6 s 2
7 4.0 s 2.9 s 3.0 s 1 22 500.0 b 500.0 b 12.7 f 2
8 18.0 s 19.8 s 4.2 s 2 23 53.1 s 74.8 s 15.6 s 2
9 19.5 s 26.8 s 4.8 s 2 24 500.0 b 500.0 b 38.0 s 4
10 123.8 s 500.0 b 85.9 s 3 25 500.0 b 213.5 s 99.2 s 4
11 238.9 s 96.8 s 76.6 s 3 26 327.6 s 179.0 s 121.4 s 6
12 500.0 b 500.0 b 7.6 f 4 27 97.0 s 54.9 s 12.8 s 6
13 345.9 s 500.0 b 58.4 s 4 28 500.0 b 500.0 b 16.4 f 8
14 458.9 s 98.4 s 114.4 s 8 29 500.0 b 500.0 b 430.8 s 16
15 500.0 b 500.0 b 115.6 s 8 30 500.0 b 398.7 s 214.8 s 8

7.5 Empirical examples


We give results of the statistical selection in two domains. We first use an
extended transportation domain (Section 7.5.1), and then determine how long
one should wait on the phone when there is no answer (Section 7.5.2).

7.5.1 Extended transportation domain

We consider a domain that includes airplanes for carrying packages between


cities and vans for local deliveries [Veloso, 1994]. In Table 7.3, we give the
results of applying Apply, Delay, and Abstract to thirty problems in this do-
main. We use the example gain function with the reward R = 400.0; that is,
the gain of solving a problem is (400.0 - time), and the negative gain of a
failure or interrupt is (-time).
We present the results of learning a time bound in Figure 7.8. The Apply
learning gives a gain of 110.1 per problem and eventually selects a bound
of 127.5. The optimal bound for this set of problems is 97.0; if we used it
for all problems, we would earn 135.4 per problem. Delay earns 131.1 per
problem and chooses a bound of 105.3. The optimal bound for Delay is 98.4,
which would give a per-problem gain of 153.5. Finally, Abstract earns 243.5
per problem and chooses a bound of 127.6. The optimal bound for Abstract
is 430.8, which would give a per-problem gain of 255.8.
Although the bound learned for Abstract is much smaller than the optimal
(127.6 versus 430.8), the resulting gain is close to the optimal. In this exper-
iment, the dependency of gain on bound has a long plateau, and the choice
of a bound within the plateau does not make much difference. Note that Ab-
stract's optimal bound is larger than the initial bound (430.8 versus 400.0),
which shows the imperfection of the heuristic for choosing the initial bound.
7.5 Empirical examples 223

Apply Delay Abstract


(I) 400 400 . 400
-c
c:
~

~300 300 300


-c
c:
CIS
CD 200 200 200

en
100 100
'2c: 100
2
10 20 30 10 20 30 10 20 30
problem's number problem's number problem's number

Fig. 7.S. Learning a bound in the extended transportation domain: running times
(solid lines), selected bounds (dotted lines), and maximal-gain bounds (dashed
lines). The successes are marked by circles (0), and the failures by pluses (+).

400~---'~---'----~~---'--~-'-----'-----'-----'----~

300
CD

:§ 200
en
c:
'§ 100
2

10 20 30 50 60 70 80 90
problem's number

Fig. 7.9. Selection of a representation in the extended transportation domain. The


graph shows the running times (solid line), successes (0) and failures (+), and the
choice among Apply (0), Delay (x), and Abstract (*).

In Figure 7.9, we show the results of selecting a representation; we first


use the thirty problems from Table 7.3 and then sixty other problems. The
learning process converges to the choice of Abstract with a bound 300.6, and
gives a gain of 207.0 per problem. The optimal choice is Abstract with a
bound 517.1, which would give a per-problem gain of 255.8.

7.5.2 Phone-call domain

We apply the learning technique to the bound selection when calling a friend
on the phone. The algorithm determines how many seconds (or rings) one
should wait for an answer before hanging up. In Table 7.4, we give the time
measurements for sixty phone calls, rounded to 0.05 seconds. We made these
calls to sixty different people at their home numbers. We measured the time
from the beginning of the first ring, skipping the connection delays. A suc-
224 7. Statistical selection

Table 7.4. Waiting times (seconds) in sixty phone calls.


# time # time # time # time
1 5.80 f 16 4.10 s 31 6.45 s 46 4.25 s
2 8.25 s 17 8.25 s 32 6.80s 47 7.30 s
3 200.00 b 18 5.40 s 33 8.10 s 48 10.95 s
4 5.15 s 19 4.50 s 34 13.40 s 49 10.05 s
5 8.30 s 20 32.85 f 35 5.40 s 50 6.50 s
6 200.00 b 21 200.00 b 36 2.20 s 51 15.10 f
7 9.15 s 22 200.00 b 37 26.70 f 52 25.45 s
8 6.10 f 23 10.50 s 38 6.20 s 53 20.00 f
9 14.15 f 24 14.45 f 39 24.45 f 54 24.20 f
10 200.00 b 25 11.30 f 40 29.30 f 55 20.15 f
11 9.75 s 26 10.20 f 41 12.60 s 56 10.90 s
12 3.90 s 27 4.15 s 42 26.15 f 57 23.25 f
13 11.45 f 28 14.70 s 43 7.20 s 58 4.40 s
14 3.70 s 29 2.50 s 44 16.20 f 59 3.20 f
15 7.25 s 30 8.70 s 45 8.90 s 60 200.00 b

cess occurred when our party answered the phone; a reply by an answering
machine was considered a failure.
We first consider the gain function that gives (R - time) for a success
and (-time) for a failure or interrupt. We thus assume that the caller is not
interested in leaving a message, which means that a reply by a machine gets
a reward of zero. The reward R may be determined by the amount of time
that the caller is willing to wait in order to talk now, as opposed to hanging
up and calling again later. In Figure 7.1O(a), we show the dependency of the
expected gain on the time bound for the rewards of 30.0, 90.0, and 300.0.
The optimal bound for the 30.0 and 90.0 rewards is 14.7 (three rings), and
the optimal bound for the 300.0 reward is 25.5 (five rings).
If the caller plans to leave a message, then the failure reward is not zero,
but it may be smaller than the success reward. We denote the failure reward
by Rj and define the gain as follows:

R - time, if success
gn = { Rj - time, if failure
-time, if interrupt

In Figure 7.1O(b), we show the expected gain for the success reward of 90.0
with three different failure rewards: 10.0, 30.0, and 90.0. The optimal bound
for the 10.0 failure reward is 26.7 (five rings); for the other two rewards, it
is 32.9 (six rings).
In Figure 7.11, we show the results of selecting a bound for the 90.0 success
reward and zero failure reward. The learned bound converges to the optimal
bound, 14.7. The gain obtained during the learning process is 38.9 per call.
If we used the optimal bound for all calls, we would earn 41.0 per call.
To summarize, the experiments in the two PRODIGY domains and the
phone-call domain show that the learning algorithm usually finds a near-
7.6 Artificial tests 225

(a) Gains wlo failure rewards. (b) Gains with failure rewards.

150 150

100

50
------- :9~;'=.-::'.-:' - - -:::-

0L-____~__________~~
10 100 10 100
time bound time bound

Fig. 7.10. Dependency of the expected gain on the time bound in the phone-cali
domain: (a) for the rewards of 30.0 (dash-and-dot line), 90.0 (dashed line), and
300.0 (solid line); (b) for the success reward of 90.0 and failure rewards of 10.0
(dash-and-dot line), 30.0 (dashed line), and 90.0 (solid line).

g>10
""
·til
~ O~~--~~--~--~---L--~----L---~--~--~--~--~
5 10 15 20 25 30 35 40 45 50 55
call's number

Fig. 7.11. Learning a time bound in the phone-cali domain.

optimal time bound after solving twenty or thirty problems, and that the
resulting gain is close to the optimal.

7.6 Artificial tests

We give the results of testing the selection mechanism on artificially gen-


erated data. The "running times" in these tests are values produced by a
random-number generator, which allows controlled experiments with known
distributions. The system is effective for all tested distributions, and we have
not found a significant difference in performance for different distributions.
We have used the linear gain function with reward 100.0; that is, the
success gain is (100.0 - time), and the negative gain of a failure or interrupt
is (-time). We have considered four distributions of success and failure times:
226 7. Statistical selection

Normal: The normal distribution corresponds to the situation in which the


running times for most problems are close to some "typical" value.
Log-Normal: The distribution is called log-normal if time logarithms are dis-
tributed normally; intuitively, it occurs when the "complexity" of most
problems is close to some typical complexity, and the time grows expo-
nentially with the complexity.
Uniform: The times belong to some fixed interval, and all values in this
interval are equally likely.
Log- Uniform: The logarithms of running times are distributed uniformly;
intuitively, the complexity of problems is within some fixed interval, and
the time grows exponentially with the complexity.
For each distribution, we have run multiple tests, varying the values of
the following parameters:
Success and failure probabilities: We have varied the probabilities of success,
failure, and infinite looping.
Mean and deviation: We have experimented with different values ofthe mean
and standard deviation of success-time and failure-time distributions.
Length of a problem sequence: We have tested the learning mechanism on
sequences of 50, 150, and 500 problems.
We have run fifty independent experiments for each setting of the param-
eters and averaged their results; thus, every graph (Figures 7.12-7.17) shows
the average of fifty experiments.
Short and long problem sequences. We present the results of learning
a time bound on sequences of fifty and five hundred problems. The success
probability in these experiments is 1/2, the failure probability is 1/4, and the
probability of infinite looping is also 1/4. The mean of success times is 20.0,
and their standard deviation is 8.0; the failure-time mean is 10.0, and the
standard deviation is 4.0. We have experimented with all four distributions;
for each distribution, we have run fifty experiments and averaged their results.
In Figure 7.12, we summarize the results for fifty-problem sequences. The
horizontal axes in all graphs show the number of a problem in a sequence.
In the top row of graphs, we give the average per-problem gain obtained up
to the current problem. The circles mark the gain that the system would
obtain if it knew the distribution in advance and used the optimal bound.
The vertical bars show the width of the distribution of gain values obtained
in different experiments. Each bar covers two standard deviations up and
down, which means that 95% of the experiments fall within it.
The middle row of graphs shows the selected time bounds, and the bot-
tom row gives the system's estimates of the optimal bounds; recall that the
selected bounds are larger than the optimal, to encourage exploration. The
crosses mark the true values of the optimal bounds; note that the system's
estimates of the optimal bounds converge to their true values.
7.6 Artificial tests 227

In Figure 7.13, we present similar results for 500-problem sequences. In


these experiments, per-problem gains come closer to the optimal. The differ-
ence between the obtained and optimal gains comes from losses during early
stages of learning.
Varying success and failure probabilities. We give the results of learn-
ing a time bound for different probabilities of success and failure. The means
and standard deviations of the success and failure times are the same as in
the previous experiments.
We summarize the results in Figure 7.14. The top row of graphs is for a
solver that succeeds, fails, and goes into an infinite loop equally often; that
is, the probability of each outcome is 1/3. In the middle row, we giv~ the
results for a solver that succeeds half of the time, fails half of the time, and
never goes into an infinite loop. Finally, the bottom row is for a solver that
succeeds half of the time and loops forever otherwise.
The solid lines show the average gain up to the current problem, the dotted
lines are the selected time bounds, and the dashed lines are the estimates of
the optimal bounds. The crosses mark the true optimal bounds, and the
circles are the expected gains for the optimal bounds.
Varying the mean of time distributions. We vary the mean of failure
times. We keep the mean success time equal to 20.0 (with standard devia-
tion 8.0), and experiment with failure means of 10.0 (with deviation 4.0),20.0
(with deviation 8.0), and 40.0 (with deviation 16.0); we give the results in
Figure 7.15. The gains for normal and log-normal distributions come closer
to the optimal than the gains for uniform and log-uniform distributions, but
the difference is not statistically significant.
Selection of a representation. We show the results of the selection among
three representations on 150-problem sequences. In the first series of experi-
ments, we have adjusted mean success and failure times in such a way that
the optimal per-problem gain for the first representation is 10% larger than
that for the second, and 20% larger than that for the third.
We show the average per-problem gains in the top row of Figure 7.16, and
give the probability of choosing each representation in the bottom row. The
distance from the bottom of the graph to the lower curve is the probability of
selecting the first representation, the distance between the two curves is the
chance of the second representation, and the distance from the upper curve to
the top is the third representation's chance. The graphs show that the prob-
ability of selecting the first (best-performing) representation increases during
the learning process. The probability of selecting the third (worst-performing)
representation decreases faster than that of the second representation.
In the second series of experiments (Figure 7.17), the optimal gain of the
first representation is 30% larger than that of the second, and 60% larger
than that of the third. The probability of selecting the first representation
grows much faster due to the larger difference in the expected gains.
228 7. Statistical selection

normal log-normal uniform log-uniform

.~SO~
15 40 SO~SO~SO~
40 40 40
:g
~20 20 20 20
Q)
c.. 0 0 0 0
o 50 o ~ 0 ~ 0 ~

SO~SO~S~
40 40 4

6 sa
20 20 20

~ ~ ~ ~ ~ ~

""C
! 4
S

r+++ II I ! I I
SO~
H-H 111111
40
S08
40 4 H 1111111I H+1111111
~~ ~ ~ ~
g-
~ ~ ~ ~ ~ ~ ~ ~
Fig. 7.12. Per-problem gains (top row), time bounds (middle row), and estimates
of the optimal bounds (bottom row) for learning on fifty-problem sequences. The

.filsog
crosses mark the optimal time bounds, and the circles show the expected gains for
the optimal bounds.

normal log-normal uniform log-uniform

15= H!!II 1[ = H11111 I!! = H11111111 = r+ III I!!!!


0>
S0[;;1S0[;;1S0[;;1

sa
i20 [! 20 20 20

c.. 0 0 0 0
o 500 0 500 0 500 0 500

~4 I+HII!I!I
" " C S E 40 SO~
HIIIIIIII S OB 4 H-III1IIH H+f-H-H-H

soa soa
40

6
~~ ~ ~ ~
5i
00 500 00 500 00 500 00 500

S0
""C
!40 HIIIIIIII 40 HIIIIIIII 40 1111111111 S 6
4 tllllllill
~~ ~ ~ ~
g-
o 0 0 0
o 500 0 500 0 500 0 500

Fig. 7.13. Per-problem gains (top row), time bounds (middle row), and estimates
of the optimal bounds (bottom row) for learning on 500-problem sequences.
==
7.6 Artificial tests 229

log-normal uniform log-uniform

;§100ISJ" 10£:10'
50 ~ 1Ow'" 50 ~~~~.~~ 1°D'"
normal

~ 5: 50 ,_.~_~~~
.m.. '. ".

~10g
° 50 0 50 °
. . . . . . . . . 108·· · · · · · · · 1ooB······· 50
. . . . 100~° 50
.... :.:.
J!! _-_. _--
~-5 _ _ 5 50/~",- 50 .... ","'

~
° 50 ° 50 ° 50 ° 50
0 ° ------- 0

~ 50 . . . -.. . . . 5. .
~100r:J10Q100L:J10[:J...
50 . -. . . . . 50 . ..
.. ... ..

~ 0

o
~
50 ° 50 50 ° 50
~

Fig. 7.14. Per-problem gains (solid lines), time bounds (dotted lines), and esti-
0

o
-----. '-~~

mates ofthe optimal bounds (dashed lines) for different success and failure probabil-
ities. The crosses mark the optimal time bounds, and the circles show the expected
gains for the optimal bounds. We give the values of success probability ("suce")

. :b
and failure probability ("fail") to the left of each row.

=S~O' SQO. ::~ normal log-normal uniform log-uniform

,; 20°0 : . .- 50 2000 _ 50 2000 ./ 50 2000 ,. - - 50


c:::i ... O
60 ....
~ 60 .............. . ................... .
4 ............. .
m4 ------ ------- ------- -------

=s0t:d . . :0°················· s°eJ' sDO.


$

2000 -- - 50 20
c:::i ...•..

20 20
~ 60 '" 60 .-............. 60 .... - ....... .

-~4 _-----
------- ~

50 °0 50 °0 --- 50
_------ 4 _------
E
-

:D:"
o
]l
°
=S0[2]"
c:i ....

:0°'~~'~~
m20 - 2 20 20
..... 60 ...... .

::L:J'
~ ...

---
00 50 - ---
4 ------ ------- -------
~
:§ --
~
--- = ~ = ~ =
Fig. 7.15. Per-problem gains (solid lines), time bounds (dotted lines), and esti-
mates of the optimal bounds (dashed lines) for different mean values of failure
times. The mean of success times is 20.0 in all experiments.
230 7. Statistical selection

/ r r r
normal log-normal uniform log-uniform
20 20 20 20

.[ij 15 15 15 15
0>
E
CD
:c 10 10 10 10
ec..
.!..
~ 5 5 5 5

0 0 0 0
0 100 0 100 0 100 0 100

K ___________
:i
~ ~ ~
1[3 1[2] 1[2] 1 8
0.5 ~ 0.5 ~ 0.5 ~
.ci
0.5 __________

~ 0 0 0 0
o 100 0 100 0 100 0 100

Fig. 7.16. Selection among three representations. The average gain for the first
representation is 10% larger than that for the second, and 20% larger than that
for the third. We show the average per-problem gains (top row of graphs) and the
probability of selecting each representation (bottom row).

;-
normal log-normal uniform log-uniform

/ ( I
15 15 15 15
<=
.OJ
0>
~ 10 10 10 10
:c
ec..
c1
c..
5 5 5 5

0 0 0 0
0 100 0 100 0 100 0 100

t 1[C]
:10.5 0.5 ~ 1~
~ 1[CJ
0.5 ~
1[EJ
0.5 ~
~ 0 0 0 0
o 100 0 100 0 100 0 100

Fig. 7.17. Selection among three representations. The average gain for the first
representation is 30% larger than that for the second, and 60% larger than that for
the third.
8. Statistical extensions

We have considered the task of selecting a representation that works well for
most problems in a domain. If we have additional information about a given
problem, we can utilize it to make a more accurate selection. We describe
the use of problem-specific gain functions (Section 8.1), estimated problem
complexity (Section 8.2), and similarity among problems (Section 8.3).

8.1 Problem-specific gain functions


If the gain function satisfies certain conditions, we can construct a problem-
specific function and use it to estimate gains for a given problem. We first
consider the linear function defined in Section 7.1.2. If the system solves a
problem, the gain is (R - time); if it fails or hits a time bound, the gain is
(-time). We suppose that the user specifies different rewards for different
problems; that is, she provides a function R(prob) that maps problems into
respective rewards. Suppose further that the system has solved ns problems,
failed on niproblems, and hit a time bound on m problems. The success times
are tSI, ... , ts ns with the respective rewards R I , ... , R ns , the failure times are
tA, ... , tin!' and the interrupt times are bl , ... , bm .
We can use Equation 7.3 to estimate the gain for a specific bound B.
We denote the number of success times that are no larger than B byes, the
number of failures no larger than B by ei, and the number of interrupts no
larger than B by d. We distribute the weights of bl , ... , bd among the larger-
time outcomes (see Section 7.3), and denote the weights of ts l , ... , ts ns by
USI, ... , US ns , the weights of tA, ... , tin! by uA, ... , uinf , and the weight of all
other times by u*. We substitute these values into Equation 7.3 and get the
following gain estimate:
2:~:luSi . (Ri-tSi)+ 2:;=1 uk( -tJj)+u. ·(ns+nJ+m-cs-cJ-d)·( -B)
ns + nJ+ m (8.1)

If we fix some reward R* and use it for all problems, Equation 7.3 gives
a different gain estimate:
2:~:1 USi . (R. -tsi)+ 2:;=1 u:0· (-tJj )+U. ·(ns+nJ+m-cs- cJ-d)·( -B)
ns + nJ+ m (8.2)

E. Fink, Changes of Problem Representation


© Springer-Verlag Berlin Heidelberg 2002
232 8. Statistical extensions

Now suppose that the rewards change from problem to problem, and we
need to estimate the gain for a specific problem with reward R*. If we use
Equation 8.1, we get an estimate for a randomly selected problem. On the
other hand, Equation 8.2 incorporates the problem-specific knowledge of the
reward; thus, it gives a more accurate estimate.
We can use this problem-specific estimate only if the reward R(prob) does
not correlate with the problem complexity. To formalize this condition, we
define time(prob) as the running time on prob without a time bound. The
outcome of search without a bound may be a success, failure, or infinite run.
Conditions 8.1 We use Equation 8.2 if the following two conditions hold:
1. The reward R(prob) does not correlate with time(prob).
2. The mean reward of success-outcome problems is equal to the mean reward
of failure-outcome problems and to that of infinite-run problems.

If Condition 8.1 does not hold, Equation 8.2 may give misleading results.
For example, suppose that the system solves all problems, and the reward
is proportional to the running time, R(prob) = 0.9· time(prob). Then, the
gain of solving any problem is negative, -0.1· time(prob). If the system uses
Equation 8.1, it gets negative estimates for all time bounds, and it correctly
concludes that the best course is to avoid problem solving. If it applies Equa-
tion 8.2, it gets high gain estimates for problems with large rewards. These
misleading estimates encourage the system to solve problems with larger-
than-average rewards, which leads to losses.
We next consider an arbitrary gain function that satisfies Constraint 7 of
Section 6.3.2. We can define a measure of solution quality, quality(prob, result),
and express the gain in terms of quality, gnq (prob, time, quality). We obtain
a problem-specific function by substituting prob* into gnq, which gives the
following gain estimate:
Sum
(8.3)
n+m'
where
Sum = L:~=l Ui . gnq(prob*, ti, quality(pti , result;,))
+ u* . L:~=c+l gn(prob*, B, intr)
+ u* . L:;:d+l gn(prob*, B, intr).
We use the same gain values in the problem-specific version of Equa-
tion 7.4, which gives the standard deviation of the estimate:

SqrSum _ Sum 2
n+m
(8.4)
(n + m) . (n + m - d- 1)'
where Sum is the same as in Equation 8.3 and
8.2 Problem sizes 233

SqrSum = L:~=l Ui . gnq (prob*, ti, quality(pti' resulti))2


+ u* . L:7=cH gn(prob*, B, intr)2
+ U* . L:T=dH gn(prob*, B, intr)2.
To state the applicability condition for this estimate, we define time(prob)
as the running time on prob without a time bound, and quality(prob) as the
quality of the corresponding result.
Conditions 8.2 We use problem-specific estimates if, for every fixed pair of
time* and quality*, the following three conditions hold:
1. The function gnq(prob, time*, quality*) does not correlate with time(prob).
2. For success-outcome problems, gnq(prob, time*, quality*) does not corre-
late with quality(prob).
3. The mean value of gnq(prob, time*, quality*) for success-outcome prob-
lems is equal to its mean value for failure-outcome problems, to that for
rejection-outcome problems, and to that for infinite-run problems.

8.2 Problem sizes


If the system can estimate problem sizes, it adjusts the choice of a represen-
tation and time bound to the estimated size. We define a problem size as an
easily computable positive value that correlates with the problem complex-
ity; the larger the value, the longer it takes to solve the problem. The choice
of a size measure is the user's responsibility; finding an accurate measure is
usually difficult, but many domains allow rough estimates. For example, we
can estimate the complexity of package delivery by the number of packages.
In the rightmost column of Tables 7.1 and 7.3, we show the number of pack-
ages in each problem. We apply regression to find the dependency between
the size and the search time (Section 8.2.1 and 8.2.2), and show results of
this strategy (Section 8.2.3).

8.2.1 Dependency of time on size

We apply linear regression to find the dependency between the sizes of sample
problems and the times of solving them. We use three separate regressions: the
first is for successes, the second is for failures, and the third is for rejections.
We assume that the dependency of time on size is either polynomial or
exponential. If it is polynomial, the time logarithm depends linearly on the
size logarithm; for an exponential dependency, the time logarithm depends
linearly on the size. We thus use linear regression to find both polynomial and
exponential dependencies. In Figure 8.1(a,b), we give the least-square regres-
sion for a polynomial dependency; the regression for an exponential depen-
dency is similar. We denote the number of sample problems by n, their sizes
by sizel, ... , sizen , and the corresponding running times by timel, ... , timen .
234 8. Statistical extensions

(a) Approximate dependency of the running time on the problem size:


In time = a + f3 . In size;
that is, time = eO< • sizef3 •
(b) Regression coefficients:
l:~ In sizei ·In timei -SizeSum' TimeSum/n
f3 = 1-) SizeSqrSum SizeSum2 In '
a = TimeSumln - f3 . SizeSumln,
where
TimeSum = l:~=lln time;,
SizeSum = l:~=1 In size;,
SizeSqrSum = l:~=1 (In size;?
(c) The t value for evaluating the regression accuracy:
t-value = f3 . J(n - 2) . (SizeSqrSum - SizeSum 2In)1 SumSqrErr,
where
SumSqrErr = l:~=1 (In time;)2 - TimeSum 2In
- f3 . (l:~=1 In size; . In time; - SizeSum . TimeSum In) .

Fig. 8.1. Regression for the polynomial dependency of time on size.

We evaluate the regression results using the t-test; the t value is the
ratio of the estimated slope of the regression line to the standard deviation
of the slope estimate (Figure 8.lc). The t-test converts the t value into the
probability that the regression gives no better prediction of running time than
ignoring the sizes and simply taking the mean. This probability is called the
P value; it is a function of the t value and the number n of sample problems.
When the regression gives a good fit, t is large, and P is small.
In Figure 8.2, we give the results of regressing the success times for the
problems in Table 7.1. The top three graphs show the polynomial depen-
dency, whereas the bottom graphs are for the exponential dependency. The
horizontal axes show the problem sizes, and the vertical axes show the times.
The circles mark the sizes and times of problems, and the solid lines show
the regression results. For each regression, we give the t value and the corre-
sponding interval of the P value.
We use the regression only if P is smaller than a certain threshold; in
the experiments, we use it when P < 0.2. We have chosen 0.2 rather than
more "customary" 0.05 because the early detection of the dependency is
more important than establishing its high certainty. For example, all three
polynomial regressions in the top row of Figure 8.2 pass the P < 0.2 test;
the exponential regressions for Apply and Abstract also satisfy this condition.
On the other hand, the exponential regression for Delay fails the test. The
choice between the polynomial and exponential regression is based on the t
value; specifically, we prefer the regression with larger t. In Figure 8.2, the
polynomial regression wins for all three representations.
Note that the least-square regression is based on strong assumptions about
the distribution. First, for fixed-size problems, the distribution of the time
8.2 Problem sizes 235

Apply Delay Abstract


100 100 100 0
>. 8
u 0
c
Q)
"C
C
Q)
0.
Q)
0 0
"C 10 0 10 10
0;
'E 0
oe
0 €l 0
c 0 0 0
~
0
0.

1 1 1
1 10 1 10 1 10
t=4.2, P < 0.01 t = 1.6, 0.1 < P < 0.2 t=3.5, P < 0.01

100 100 100 0


>.
u
8
c 0
Q)
"C
C
Q)
0. 0
Q)
0 0
~ 10 10 10
til
E
Q)
c
0
g0
0.
X
Q)

5 10 15 5 10 15 5 10 15
t = 3.8, P < 0.01 t=0.5, P >0.2 t=3.3, P <0.01

Fig. 8.2. Dependency of the success time on the problem size. The top graphs
show the regression for a polynomial dependency, and the bottom graphs are for
an exponential dependency.

logarithms must be normal; second, for all problem sizes, the standard devia-
tion of the distribution must be the same. The regression, however, provides
a good approximation even when these assumptions are not satisfied.
The time complexity of the regression is linear in the number of data
points. For n terminations and m interrupts, the implemented Lisp procedure
on a Sun Sparc 5 performs both polynomial and exponential regression, along
with the related t-tests, in (n + m) ·7· 10- 4 seconds. During the statistical
learning, the system does not perform regression from scratch for each new
problem. Instead, it stores the sums used in the regression computation and
modifies them after adding a new problem. The system updates the sums,
recomputes the regression coefficients, and finds the new t values in constant
time, which is about 8 . 10- 4 seconds.

8.2.2 Scaling of running times

The use of sizes in estimating the gain is based on "scaling" the times of
sample problems to a given size. We illustrate it in Figure 8.3; we scale
Delay's times of a I-package success, an 8-package success, and an 8-package
failure for estimating the gain on a 3-package problem. To scale a problem's
236 8. Statistical extensions

100 r - - - - - - - - ,
Q)

E
g'10
'E
c:
2

10
problem size

Fig. 8.3. Scaling two successes (0) and a failure (+ ) of Delay to a 3-package problem.

time to a given size, we draw the line with the regression slope through the
point representing the problem (solid lines in Figure 8.3), to the intersection
with the vertical line through the given size (dotted line). The ordinate of the
intersection is the scaled time. Suppose that the size of a problem is sizeo,
the running time is timeo , and we need to scale it to size using a regression
slope (3. Then, we compute the scaled time as follows:
Polynomial regression:
In time = In timeo + (3. (In size - In sizeo);
that is , time = timea . ( s'tze
s.izeo ) {3 •

Exponential regression:
In time = In timeo + (3 . (size - size o);
that is, time = timeo . e{3·(size-size o ).
We use the slope of the success regression in scaling success times, the
slope ofthe failure regression in scaling failures, and the slope ofthe rejection
regression in scaling rejections. The slope for scaling an interrupt time should
depend on whether the system would succeed, fail, or reject the problem if we
did not interrupt it; however, we do not know which of these three outcomes
would occur. We "distribute" each interrupt point among success, failure,
and rejection slopes according to the probabilities that the continuation of
the execution would result in the respective outcomes. We thus break an
interrupt into three weighted points, with the total weight of l.
To determine these three weights, we scale the success, failure, and rejec-
tion times of the sample problems to the size of the problem that has caused
the interrupt. The weights are proportional to the number of successes, fail-
ures, and rejections that are larger than the interrupt time. If the number of
larger success times is n s , the number of larger failures is nl, and the num-
ber of larger rejections is n r , then the corresponding weights are ns +nf
ns + '
nr
nt and nr
ns+nt+nr' ns+nt+nr'
The implemented computation of n s , n I, and nr is somewhat different for
efficiency reasons. The system does not scale success, failure, and rejection
times to every interrupt point; instead, it scales them to some fixed size, and
then scales interrupt points to the same size (Figure 8.4). The system scales
8.2 Problem sizes 237

running
time

problem size

Fig. 8.4. Computation of ns for an interrupt point. The system scales all successes
times (0) and the interrupt time (.) to a fixed size using the success slope.

every interrupt three times: it uses the success slope for determining n s , the
failure slope for nt, and the rejection slope for n r .
After scaling the sample times to a given size, we apply the technique of
Section 7.3 to compute the gain estimate and its standard deviation (Equa-
tions 7.5 and 7.6). The only difference is that we reduce the second term
in the denominator for the deviation by 3 because the success, failure, and
rejection regressions reduce the degrees of freedom of the sample data; thus,
we compute the deviation as follows:

SqrSum _ Sum 2
n+m-e
(8.5)
(n +m - e) . (n +m - e - d - 4)"
The running time of the scaling procedure is proportional to the number
of data points. For a sample of n terminations and m interrupts, the im-
plemented procedure takes (2 . n + 6 . m) . 10- 4 seconds. We now give the
overall running time of the statistical computation, which includes polyno-
mial and exponential regression, related t-tests, scaling the sample times, and
determining the expected gains for l time bounds. If the system performs the
regression from scratch, the total time is (3·l + 9· n + 13· m) .10- 4 seconds.
If it incrementally updates the regression coefficients and t values, the time
is (3 . l + 2 . n + 8 . m + 8) . 10- 4 seconds.
In Figure 8.5, we show the dependency ofthe expected gain on time bound
when using Apply on I-package, 3-package, and 10-package problems. If we
use sizes in the experiments of Sections 7.4 and 7.5, we get larger gains in all
eight experiments (Table 8.1). The regression and scaling take 0.03 seconds
per problem, which is much smaller than the resulting gain increase.

8.2.3 Artificial tests

We have tested the regression on artificially generated values of running times,


using the same setup as in the artificial tests of Section 7.6. We have used
the linear gain function with the reward R = 100.0 and considered four
distributions of running times: normal, log-normal, uniform, and log-uniform.
The experiments have shown that the regression improves the performance
when there is a correlation between time and size, and it does not worsen the
results when there is no correlation.
238 8. Statistical extensions

1-package problems 3-package problems 10-package problems


80 ........... 80 80

60 /. .... ~ 60
'"c:
'(ij : .'
Cl
40 40 ..
-a
Q)
(3
Q)
Co
20 I 20
X
Q)
0 -., 0 0 .,.. 1 - '
"- ......
, .,-.-

"- ,
-20 -20 -20
10 100 10 100 10 100
time bound time bound time bound
Fig. 8.5. Dependency of Apply's gain on the time bound in the transportation
domain with rewards of 10.0 (dash-and-dot lines), 30.0 (dashed lines), and 100.0
(solid lines). The dotted lines show the standard deviation for the 100.0 reward.

Table 8.1. Per-problem gains with and without the use of sizes.
wlo sizes with sizes
transportation by vans (Section 7.4)
Apply's bound selection 12.0 12.2
Delay's bound selection 3.9 4.7
Abstract's bound selection 11.3 11.9
selection of a representation 11.1 11.8
transportation by vans and airplanes (Section 7.~)
Apply's bound selection 110.1 121.6
Delay's bound selection 131.1 137.4
Abstract's bound selection 243.5 248.3
selection of a representation 207.0 215.6

Selecting a time bound. We give the results of learning a bound on 50-


problem sequences for different correlations between time and size, and com-
pare the gains with and without the regression. Problem sizes in these exper-
iments are natural numbers from 1 to 10; the logarithms of mean success and
failure times are proportional to the size logarithms. The correlation between
time logarithms and size logarithms is 0.9 in the first series of experiments,
0.6 in the second series, and 0 in the third series.
We give the results in Figure 8.6; the solid lines show the average per-
problem gains with the regression, and the dashed lines are the gains without
the regression. The use of sizes improves the performance, and the improve-
ment is greater for larger correlations. If there is no correlation, the system
performs identically with and without sizes.
Choosing a representation. We show the results of the selection among
three representations with the same distributions as in the tests of Section 7.6.
In Figures 8.7 and 8.8, we present experiments with 150-problem sequences.
In Figures 8.7, the optimal gain for the first representation is 10% larger
than that for the second and 20% larger than that for the third. The top row of
8.3 Similarity among problems 239

normal log-normal uniform log-uniform

Fig. 8.6. Gains with the regression (solid lines) and without the regression (dashed
lines) for different correlations between size logarithms and time logarithms.

graphs shows the average per-problem gains with the use of sizes (solid lines)
and without sizes (dashed lines). The other two rows give the probability of
choosing each representation in the experiments with and without sizes. In
Figure 8.8, the optimal gain for the first representation is 30% larger than
that for the second and 60% larger than that for the third.

8.3 Similarity among problems

We have estimated the expected gain by averaging the gains for all sample
problems. If some problems are especially similar to a new problem, we can
improve the estimate by averaging only the gains for these similar problems.
We explain the use of similarity (Section 8.3.1 and 8.3.2), and give related
experiments in the transportation and phone-call domains (Section 8.3.3).

8.3.1 Similarity hierarchy

We encode similarity by a tree-structured hierarchy. The leafs of the hierarchy


are groups of similar problems; the other nodes represent a weaker similarity
among groups. The construction of this hierarchy is the user's responsibility.
For instance, we may divide transportation problems into within-city and
between-city deliveries. We extend this example with a new problem type,
which involves transportation of containers within a city. A van can carry
only one container, which makes container delivery harder than package de-
livery. In Table 8.2, we give the performance of Apply, Delay, and Abstract on
240 8. Statistical extensions

r
normal log-normal uniform log-uniform
25

20

15

OL----__
10

...J 0
100 o 100 100 0 100

Fig. 8.7. Selection among three representations. The average gain for the first
representation is 10% larger than that for the second and 20% larger than that
for the third. We show the average per-problem gains in the experiments with and
without the regression (top row of graphs), and the probability of selecting each
representation (two lower rows).

;::: r r
normal log-normal uniform log-uniform

r
25 25 25 25

.~20 20 20 20
C)

~ 15 15 15 15
:c
:r-2l..
E 10 10 10 10
I
5 ;
" 5 5
I
I 5

0 0 0 0
0 100 0 100 0 100 0 100

.~1~1~1~1~
io.:~ O.:~ O.:~ O.:~
o 100 0 100 0 100 0 100

~1~1~1~1~
~0.5~ 0 . 5 6 0.5EJ 0.5U

~ 00 100 00 100 00 100 00 100

Fig. 8.8. Selection among three representations with and without the regression.
The average gain for the first representation is 30% larger than that for the second
and 60% larger than that for the third.
8.3 Similarity among problems 241

Table 8.2. Performance on ten container-delivery problems.


# time (secl and outcome # of # time (sec). and outcome # of
Apply Delay Abstract conts Apply Delay Abstract conts
1 2.3 s 2.3 s 2.1 s 1 6 200.0 b 200.0 b 10.1 f 8
2 3.1 s 5.1 s 4.1 s 2 7 3.2 s 3.2 s 3.2 s 2
3 5.0 s 20.2 s 4.8 s 3 8 24.0 s 200.0 b 26.3 s 8
4 3.3 s 8.9 s 3.2 s 2 9 4.8 s 86.2 s 3.4 s 4
5 6.7 s 36.8 s 6.4 s 4 10 8.0 s 200.0 b 9.4 s 6

domain domain
succ dey: 1.39 succ dey: 1.08
fail dey: 0.38 fail dey: 0.37
./ ""- ./ ""-
within city between cities within city between cities
succ dey: 0.86 succ dey:1.60 succ dey: 0.64 succ dey: 0.69
fail dey: 0.44 fail dey: 0.33 fail dey: 0.23 fail dey: 0.29
. / ""- ./ ""-
packages containers packages containers
succ dey: 0.92 succ dey: 0.75 succ dey: 0.73 succ dey: 0.38
fail dey: 0.27 fail dey: - fail dey: 0.08 fail dey: -
(a) Similarity hierarchy. (b) Abstract's deyiations (c) Abstract's deyiations
without regression. with regression.

Fig. 8.9. Similarity hierarchy and the deviations of Abstract's search times.

ten container problems. We subdivide within-city problems into package and


container deliveries, and show this hierarchy in Figure 8.9(a).
We can estimate the similarity of problems in a group by the standard
deviation of time logarithms:

TimeDev = - 1- .
n-l
(Ln (1nhmei
. )2 - (L~=l In timei)2 )
n
. (8.6)
i=l
We compute the deviations separately for successes and failures; the smaller
these deviations for the leaf groups, the better the hierarchy. If we use the
regression, we apply it separately to each group in the hierarchy. If the re-
gression confirms the dependency between sizes and times, we compute the
deviation of time logarithms by a different formula:

SumSqrErr
TimeDev = (8.7)
n-2
where SumSqrErr is as defined in Figure 8.1(c). For example, the deviations
for Abstract are as shown in Figure 8.9. We give the deviations without the
regression in Figure 8.9(b), and that with the regression in Figure 8.9(c).
242 8. Statistical extensions

8.3.2 Choice of a group

We can estimate the expected gain for a new problem by averaging the gains
of problems in the same leaf group. Alternatively, we can use a larger sample
from one of its ancestors. The selection between a group and its parent is
based on two tests. The first test shows the difference between the distribution
of the group's problems and the distribution of the other problems in the
parent's sample. If the two distributions are different, we use the group rather
than its parent. If not, we perform the second test to determine whether the
group's sample provides a more accurate estimate than the parent's sample.
If we do not use the regression, the first test is the statistical t-test that
shows whether the mean of the group's time logarithms differs from the mean
of the other time logarithms in the parent's sample. We perform this test sep-
arately for successes and failures; the means are considered different when we
can reject the null-hypothesis that they are equal with the 0.75 confidence. If
we use the regression, we apply a different t-test; specifically, we determine
whether the regression lines are different with the 0.75 confidence. A statisti-
cally significant difference for either successes or failures is a signal that the
group's problems differ from the other problems in the group's parent.
For example, suppose that we use the data in Tables 7.1, 7.3, and 8.2 with
the hierarchy in Figure 8.9(a), and we need to estimate Abstract's gain on
within-city package delivery. We consider the choice between the correspond-
ing leaf group and its parent. The mean of the success-time logarithms for
the package-delivery problems is 4.07, and its standard deviation is 0.20. The
mean for the other problems in the parent group is 4.03, and its deviation
is 0.16. The difference between the two means is not statistically significant.
Since the container-transportation sample has only one failure, we cannot
estimate the deviation of its failure logarithms; thus, the difference between
the failure-logarithm means is also insignificant.
The second test is the comparison of the standard deviations of the mean
estimates for the group and its parent. The deviation of the mean estimate
is equal to the deviation of the time logarithms divided by the square root of
the sample size, Ti":}!!ev. We compute it separately for successes and failures,
and use it as an indicator of the sample's accuracy. If the group's deviation
of the mean estimate is smaller than that of the group's parent for either
successes or failures, then we prefer the group to its parent. If the parent's
deviation is smaller for both successes and failures, we use the parent.
Suppose that we apply this test for estimating Abstract's gain on within-
city package delivery. The deviation of the success-time estimate for the leaf is
0.20, and the deviation for its parent is 0.16. The deviation of the failure-time
logarithms is also smaller for the parent; thus, we prefer the parent.
After selecting between the leaf group and its parent, we apply the same
two tests to choose between the resulting "winner" and the group's grand-
parent; then, we compare the new winner with the great-grandparent and so
on. The time of the statistical computation is proportional to the height h
8.3 Similarity among problems 243

Table 8.3. Per-problem gains for different group-selection techniques.


using leaf using the heuristic
groups top group group selection
without problem sizes
Apply's bound selection 11.8 10.5 12.1
Delay's bound selection 7.0 4.7 7.5
Abstract's bound selection 19.5 18.1 19.5
selection of a representation 13.1 11.1 13.4
with problem sizes
Apply's bound selection 16.3 11.1 16.8
Delay's bound selection 12.1 5.2 12.0
Abstract's bound selection 22.6 18.4 22.6
selection of a representation 19.4 13.7 21.0

of a hierarchy. For n terminations and m interrupts, the amortized time of


performing the regressions, selecting a group, scaling, and determining the
expected gains for l time bounds is (3·l + 2· n + 8· m + 22· h) .10- 4 seconds.

8.3.3 Empirical examples

We describe experiments in the transportation and phone-call domains, and


show that the use of similarity increases gains in both domains.
Transportation domain. In Figure 8.3, we give the results of using the
similarity hierarchy of Figure 8.9. We have run the bound-selection tests on
a sequence of seventy problems constructed by interleaving the problem sets
of Tables 7.1, 7.3, and 8.2. Then, we have experimented with choosing among
Apply, Delay, and Abstract on a sequence of 210 problems.
The first column includes the results of using only leaf groups in esti-
mating gains. The second column shows the use of the top-level group for all
estimates, which means that the system does not distinguish among the three
problem types. The third column contains the results of using the hierarchy
with the group-selection algorithm, which gives larger gains than both the
leaf groups and the top-level group; however, this improvement is not large.
Phone-call domain. We have considered the outcomes of sixty-three calls
to six different people. We have phoned two of them, A and B, at their
office phones; we have called the other four, C, D, E, and F, at their homes.
We show the similarity hierarchy and call outcomes in Figure 8.10. For each
group, we give the mean of success and failure time logarithms ("mean"),
the deviation of the time logarithms ("deviation"), and the deviation of the
mean estimate ("mean's dev").
We have run learning experiments with the success reward of 90.0 and
zero failure reward. If we use leaf groups for all estimates, the gain is 57.8 per
call. If we use the top-level group for all estimates, the gain is 55.9 per call.
Finally, the use of the hierarchy with the group-selection algorithm yields
244 8. Statistical extensions

all phone calls


successes failures
mean: 1.55 mean: 2.72
deviation: 0.72 deviation: 0.32
mean's dev: 0.10 mean's dev: 0.11
/ "-
calls to an office calls to a home
successes failures successes failures
mean: 0.92 mean: 1.84 mean: 2.72
deviation: 0.89 NONE deviation: 0.55 deviation: 0.32
mean's dev: 0.19 mean's dev: 0.09 mean's dev: 0.11
/
calls to A calls to B calls to C calls to D calls to E calls to F
successes successes successes successes successes successes
mean: 0.92 mean: 0.92 mean: 1.77 mean: 1.81 mean: 1.89 mean: 2.02
deviation: 0.18 deviation: 1.13 deviation: 0.44 deviation: 0.60 deviation: 0.50 deviation: 0.007
mean's dev: 0.06 mean's dev:0.43 mean's dev: 0.14 mean's dev: 0.18 mean's dev:0.17 mean's dev: 0.004
failures failures failures failures failures failures
mean: 2.98 mean: 2.98 mean: 2.40
NONE NONE NONE deviation: 0.02 deviation: 0.11 deviation: 0.20
mean's dev: 0.01 mean's dev: 0.08 mean's dev: 0.05

outcomes outcomes outcomes outcomes outcomes outcomes


of calls to A of calls to B of calls to C of calls to D of calls to E of calls to F
200.0 b 2.30s 200.0 b 200.0b 6.80 s 2.6Os 8.30 s 1.70 s 5.60s 5.45s
2.50s 7.15s 7.80s 8.10 s 10.15 f 7.55 s
200.0 b 4.85 s 200.0b 7.90 s 9.70 s 4.15 s
3.20 s 20.10 f 19.30 f 11.65 f
18.30f
2.55s 1.85 s 17.20s 7.60 s 6.05 s 2.05 s 6.05 s 21.25f 7.45 s
2.05 s 9.70s 7.50 s
7.40s 8.35 s
3.10 s 200.0 b 0.50 s 2.30s 4.95 s 2.85 s
9.05 s 8.75 s 2.45 s 11.25 s 10.15f 12.40f
2.75 s 1.95 s 1.05 s 3.25 s 6.70s 8.10 s 19.75 f 9.65 s 8.85 s 9.90 s

Fig. 8.10. Similarity hierarchy and call outcomes in the phone-call domain.

59.8 per call. If we knew the time distributions in advance, determined the
optimal bound for each leaf, and used these optimal bounds for all calls,
then the gain would be 61.9 per call. These results confirm that a similarity
hierarchy improves the performance, but not by much.
9. Summary and extensions

We have described statistical procedures for evaluating representations. We


now outline heuristic rules for choosing representations (Section 9.1), and
review the results of the work on SHAPER'S top-level control (Section 9.2).

9.1 Preference rules

The system allows the use of heuristic rules for selecting representations. In
Section 6.2, we have discussed rejection and comparison rules, which prune
ineffective representations. We now describe preference rules, which gener-
ate judgments about the performance of the remaining representations. We
explain the application of preference rules (Section 9.1.1), conflict resolution
(Section 9.1.2), and combination of these rules with the statistical selection
(Section 9.1.3). We also outline the use of rules for delaying representation
changes (Section 9.1.4).

9.1.1 Preferences

A preference is an expectation that a certain representation rep! is better


than another representation reP2. We encode its reliability by two values,
called priority and certainty. The priority is a natural number that helps to
resolve conflicts among preferences; the certainty is an approximate proba-
bility that the expectation is correct. We denote a preference by a quadruple
(repl' reP2' prior, cert); rePl and reP2 are two representations, prior is a pri-
ority, and cert is a certainty.
A preference rule is a heuristic for generating preferences, which inputs
two representations and determines whether one of them is better than the
other. We encode a rule by an applicability condition and two numeric func-
tions. The condition determines whether we can use the rule to compare two
given representations. When the rule is applicable, the first function returns
the priority of the preference, and the second gives its certainty. A user must
hand-code a rule's condition and priority function, and she can optimally
provide a certainty function. If she does not specify a certainty, the system
determines it automatically; we use two techniques for learning certainties.

E. Fink, Changes of Problem Representation


© Springer-Verlag Berlin Heidelberg 2002
246 9. Summary and extensions

The first technique computes a certainty from past performance data.


The system identifies the pairs of old representations that match a rule's
condition, determines the percentage of pairs in which the first representation
gives larger gains, and uses this percentage as the rule's certainty.
The second technique compares representations by their performance on
test problems. When two representations match a rule's condition, the system
applies them to solve several small problems, and determines the percentage
of problems on which the first representation gives larger gains. The user has
to supply test problems or a procedure for generating them.

9.1.2 Preference graphs

A preference graph is a data structure for analysis of preferences; its nodes are
representations or groups of representations, and its edges are preferences. For
every domain, the system maintains a full graph and a reduced graph. The first
graph includes all preferences, whereas the second is the result of resolving
conflicts. We describe these graphs and conflict-resolution algorithms.
Full and reduced graphs. The system stores the results of applying pref-
erence rules in the full graph. The available representations are nodes of this
graph, and all generated preferences are its edges; it may contain multiple
edges connecting the same pair of nodes. When SHAPER generates a new
representation, it inserts the corresponding node into the graph, applies the
available rules to compare the new representation with old representations,
and adds appropriate edges.
Preferences in the full graph may conflict with each other. The system
prunes duplicate edges, resolves conflicts, and identifies equally preferable
representations, thus constructing the reduced graph. The nodes of this graph
are groups of equally preferable representations.
If several edges connect the same two nodes in the same direction, the
system keeps the highest-priority edge and prunes the others; if several edges
have the highest priority, the system keeps the one with the highest cer-
tainty (Figure 9.1a). If, for some edge (repl' reP2' prior, cert), there is an op-
posite path from reP2 to rePl' an.d the priorities of all edges in this path are
strictly greater than prior, then the system prunes the edge from rePl to reP2
(Figure 9.1b). After removing such edges, the system identifies the strongly
connected components, which correspond to the groups of equally preferable
representations (Figure 9.1c).
Constructing the reduced graph. The construction of the reduced graph
involves two intermediate graphs, called simple and transitive graphs (Fig-
ure 9.2). The first graph is the result of pruning duplicate edges, and the
second is the transitive closure of preferences.
In Figure 9.3, we give an algorithm that removes duplicate edges and
outputs the simple graph (Figure 9.2b). It uses two matrices, max-prior and
max-cert, indexed on representations. The first matrix is for computing the
9.1 Preference rules 247

~~"'--"""II".
~- 0.8
0.9
(a) Duplicate edges. (b) Conflict. (c) Preference groups.
Fig. 9.1. Operations for constructing the reduced graph: pruning duplicate
edges (a), resolving conflicts (b), and identifying preference groups (c). We show
higher-priority preferences by thicker lines and specify certainties by numbers.

maximal priority of duplicate edges, and the second is for the maximal cer-
tainty of the highest-priority edges. If the full graph includes k representations
and mf edges, the running time of the algorithm is O(k2 + mr).
To identify conflicts, the system constructs the transitive closure of the
simple graph (Figure 9.2c). For every two nodes rePl and reP2' if the simple
graph has a path from rePl to reP2' the transitive graph has an edge from
rePl to reP2' The priority of a path is the minimal priority of its edges, and
the priority of the transitive edge from rePl to reP2 is the maximal priority
of paths from rePl to reP2' This construction is a special case of the all-pairs
algebraic-path problem [Carre, 1971; Lehmann, 1977), solved by generalized
shortest-path algorithms. If the simple graph has k representations and ms
preferences, the time of constructing the transitive graph is O(k2 ·lg k+k·ms).
The system uses the transitive graph to detect conflicts in the simple
graph. We give an algorithm for resolving conflicts in Figure 9.3 and illus-
trate the results of its execution in Figure 9.2(d); its complexity is O(ms).
After removing conflicts, the system identifies strongly connected components
(Figure 9.2e), which takes O(k + ms) time. The overall time of constructing
the simple, transitive, and reduced graphs is O(mf + k 2 ·lg k + k . ms).
Modifying the reduced graph. If the system adds a new representation to
the full graph, it updates the other three graphs. SHAPER adds the new node
to the simple graph and determines its incoming and outgoing edges using
a procedure similar to Build-Simple-Graph in Figure 9.3. This procedure
processes only the new edges in the full graph; if the number of these edges
is m new , the processing time is O(mnew).
After modifying the simple graph, SHAPER updates the transitive graph.
First, it computes all incoming and outgoing transitive edges of the new node.
This computation is a special case of the single-source algebraic path problem,
which takes O(k + ms) time. Second, it updates transitive edges between old
representations using the procedure in Figure 9.4, which takes O(k2) time.
Then, SHAPER resolves path conflicts using the Remove-Conflicts algo-
rithm in Figure 9.3, which takes O(ms) time. Finally, it constructs the re-
duced graph from scratch in O(k + ms) time. The overall time of adding a
representation to the simple, transitive, and reduced graphs is O(m new + k 2 ).
248 9. Summary and extensions

(a) Full graph. (b) Simple graph.


(c) Transitive graph.

(d) Simple graph wlo conflicts. (e) Reduced graph.

Fig. 9.2. Constructing the reduced graph. Thicker lines denote higher priorities.

Build-Simple-Graph
Set the initial values:
For every two representations rePl and reP2:
max-prior[repl, rep21 := 0
max-cert[rePl, reP21 := 0
Compute the maximal priorities:
For every edge (rePl' reP2' prior, cert) in the full graph:
If prior> max-prior[ rep 1 , reP2],
then max-prior[rePl, reP21 := prior.
Compute the maximal certainties:
For every edge (repl' rep 2 , prior, cert) in the full graph:
If prior = max-prior[rePl, reP21 and cert> max-cert[rePl' reP21,
then max-cert[repl, reP21 := cert.
Build the simple graph:
Create a graph with the same representations and no edges.
For every two representations rePl and reP2:
Add edge (repl' rep 2 , max-prior[repl, rep2], max-cert[repl, reP2)).

Remove-Conflicts
For every edge (repl' reP2' prior, cert) in the simple graph:
If the transitive graph has the opposite edge with a higher priority,
then remove the edge (repl' rep2, prior, cert) from the simple graph.

Fig. 9.3. Construction of the simple graph.


9.1 Preference rules 249

U pdate-Transitive-Graph( new-rep)
For every two old representations rep! and reP2:
If there are transitive edges from rep! to new-rep and from new-rep to reP2:
Let prior be the smaller of the priorities of these two edges.
If there is no edge from rep! to rep 2 ,
then add this edge and set its priority to prior.
If there is an edge from rep! to reP2 and its priority is below prior,
then set its priority to prior.

Fig. 9.4. Updating edges between old nodes of the transitive graph.

(a) (b)
Fig. 9.5. Fringe nodes in the reduced graph. We show explored nodes by shaded
circles, unexplored nodes by empty circles, and fringe nodes by thicker lines.

9.1.3 Use of preferences

If we use the statistical learning without preference rules, the system begins
with eager exploration (see Section 7.4.4), which leads to large initial losses.
The use of preference rules allows delaying the exploration.
We consider a representation unexplored as long as the system uses it with
the initial time bound before accumulating data for the statistical learning. A
node in the reduced graph is unexplored if it contains at least one unexplored
representation.
The system identifies the unexplored nodes that do not have incoming
edges from other unexplored nodes. We call them fringe nodes; intuitively,
they are the most preferable among unexplored nodes (Figure 9.5). If some
fringe nodes do not have incoming edges (Figure 9.5a), the system selects one
of the unexplored representations in these nodes for solving a new problem.
If all fringe nodes have incoming edges (Figure 9.5b), the system computes
the maximal certainty of incoming edges for every fringe node. This value is an
estimated probability that the node's representations are less effective than
the best explored representation; in Figure 9.5(b), these estimates are 0.8
(left) and 0.9 (right). The system chooses the fringe node with the lowest
estimate, which is the most promising among the unexplored nodes.
Finally, the system decides between using an explored and unexplored rep-
resentation for solving a new problem. It makes a weighted random choice;
the probability of using an explored representation is equal to the estimate
of the chosen fringe node. If the random choice favors an explored rep res en-
250 9. Summary and extensions

tation, the system invokes the statistical learning; otherwise, it uses one of
the unexplored representations from the chosen fringe node.
When SHAPER faces a new domain, it eagerly experiments with preferable
representations until it accumulates initial data for all nodes that do not
have incoming edges. Then, it interleaves the exploration with the statistical
selection; the exploration process slows down as the system moves to less
preferable nodes.

9.1.4 Delaying representation changes

By default, the system generates all possible representations before solving


any problems. The user can overwrite this default strategy by suspension and
cancellation rules.
A suspension rule is a boolean Lisp function with two arguments: a
changer operator and description node. If it returns true, the system delays
the application of the operator to the description. When SHAPER generates
a new description, it identifies the applicable changer operators and deter-
mines whether some of them should be delayed. If SHAPER has delayed the
application of some operators, it periodically re-invokes the suspension rules
to determine whether the delay is over.
A cancellation rule prunes the suspended description changes that have
become obsolete. It is a boolean Lisp function that inputs a changer operator
and description node; if it returns true, SHAPER never applies the operator
to the description. If the system has suspended some description changes, it
periodically invokes the cancellation rules and prunes the matching changes.
After applying a changer, SHAPER combines the resulting description with
matching solvers, thus producing new representations (see Section 6.2.3). The
user may provide rules for delaying the use of new representations and for
cancellation of delayed representations.

9.2 Summary of work on the top-level control

We review the control architecture and its main limitations, outline tools for
the optional user participation in the top-level control, and propose some
directions for future research.
Architecture. We have developed a system for selecting solvers and im-
proving domain descriptions; it includes solvers, changers, a top-level control
module, and a library of descriptions (Figure 9.6). The changers and control
module compose the mechanism for changing representations. The control
is based on the concept of description and representation spaces. The main
objects in these spaces are solver operators, changer operators, descriptions,
and representations. In Figure 9.7, we show the main decision points of the
control module.
9.2 Summary of work on the top-level control 251

Rep-changing
- - mechanism - -

Problem
solvers
L-J

Fig. 9.6. Architecture of the SHAPER system.

input a problem
-------------------------- -----------------
solve the problem or construct a new description?
__________________ 1

construct a description solve the problem


------------ -------------~

which changer to apply? solve or skip the problem?


which representation to use?
to which old description? with what time bound?
failure ' - - - - - - - ; r . - - - - - . J
skip solve failure or
apply the selected interrupt
r------~------.
changer operator use the selected
representation
success and time bound
success
generate the corresponding
representations , wait for the next problem
'- - -selection of changer operators - -..! - - - - -selection of representations - - - -

Fig. 9.7. Decision cycle of the control module.

The system uses intelligent changers, which construct potentially good de-
scriptions and skip a vast majority of ineffective descriptions. For this reason,
the number of candidate representations for a particular domain is relatively
small, usually within one hundred. A small representation space is an impor-
tant advantage over Korf's [1980) model, which requires the exploration of a
huge space of all possible representations.
Evaluation model. We have proposed a utility model that unifies the three
main dimensions of performance: the number of solved problems, running
time, and solution quality. It allows the comparison of competing search algo-
rithms. The performance measure depends not only on a specific algorithm,
but also on the domain, problem distribution, gain function, and bound-
selection strategy. This dependency formalizes the observation that no search
technique is universally better than the others [Minton et al., 1991; Knoblock
and Yang, 1994; Veloso and Blythe, 1994). The relative performance of search
algorithms depends on a specific domain and on the user's value judgments.
252 9. Summary and extensions

Statistical selection. We have formalized the task of selecting among rep-


resentations, derived an approximate solution, and demonstrated its effective-
ness in selecting representations and time bounds. The selection algorithm
combines exploitation of past experience with exploration of new alterna-
tives. It can use an approximate measure of problem sizes and information
about the similarity between problems. The algorithm has proved effective
for all tested distributions of search times. It gives good results even when
distributions do not satisfy the assumptions of the statistical analysis.
The statistical model raises many open problems, which include relax-
ing the simplifying assumptions, extending the model to account for more
features of real-world situations, and improving the heuristics used with the
statistical estimates. To make the model more flexible, we need a mecha-
nism for switching the representation and revising the time bound during the
search for a solution. We should also allow interleaving of several promising
representations, which is often more effective than sticking to one represen-
tation [Howe et al., 1999]. In addition, we need to study the use of any-time
solvers, as well as solvers that improve their performance with experience.
We have constructed an alternative selection mechanism based on prefer-
ence rules, and combined it with the statistical learning. The system uses the
statistical evaluation as the main selection mechanism, and invokes preference
rules when statistical data are insufficient.
Limitations. Although the top-level control does not rely on the specifics
of PRODIGY, we have used some general properties of PRODIGY in making
assumptions about solvers and changers. These assumptions limit the gener-
ality of the control mechanism.

• Solvers run much longer than the statistical computation; hence, we neglect
the time of the statistical analysis.
• Changers always terminate in a reasonable time; hence, the system does
not interrupt their execution.
• An input language for describing domains and problems is the same for all
solvers and changers; hence, we do not use "translation" algorithms.
Manual control. The mechanism for generating new representations is fully
automated; however, the user can optionally participate in the top-level con-
trol. She can undertake any control decisions and leave the other decisions to
the system. The user interface allows specification of problem-solving tasks
and related knowledge, inspection and modification of intermediate results,
manual selection of representations and changer operators, and evaluation of
the system's effectiveness; it includes three groups of commands.
The first group is for supplying initial knowledge, providing additional
information in later stages, and correcting intermediate results. The system
supports an extended version of the PRODIGY domain language for specifying
problems, utility functions, representations, and heuristics (Figure 9.8).
9.2 Summary of work on the top-level control 253

The second group of commands allows the user to inspect the initial in-
formation, new representations, accumulated performance data, and results
of the statistical learning. When the control module is active, it outputs a
real-time trace of the main results. After SHAPER has reached a termination
condition, the user can get more details about the search results.
The third group is for invoking the control module and imposing restric-
tions on its decisions. The user can either manually invoke solvers and chang-
ers, or pass the control to the top-level module. When the user starts the
control module, she can select the order of processing problems, restrict the
allowed choices of representations, specify time bounds, and provide guidance
for selecting changers. In addition, she can define a condition for terminating
the problem-solving process.
Future research. We have assumed that the execution of solvers and chang-
ers is sequential, and that the system gets one problem at a time. The use of
parallelism is an important research direction, which may require significant
extensions to the selection mechanism. If SHAPER gets several problems at
once, it has to decide on the order of solving them. If it does not have time
for all problems, it has to choose the problems that give large gains. These
decisions also require an enhancement of the statistical selection.
Although all solvers and changers in SHAPER are domain-independent,
the top-level control allows the use of domain-specific algorithms. We expect
that a large library of specialized solvers and changers can give better results
than a small collection of general algorithms.
We have proposed a formal model of problem solving with multiple rep-
resentations. The future challenges include further investigation of the role
of representation changes and their formal properties, such as soundness and
completeness. A major open problem is to develop a unified theory that will
subsume our current formalism, Korf's model of representation transforma-
tions [Korf, 1980], and Simon's view of cognitive representations [Larkin and
Simon, 1987; Kaplan and Simon, 1990; Tabachneck et at., 1997], as well as
theories of abstraction [Knoblock, 1993; Giunchiglia and Walsh, 1992] and
macro operators [Korf, 1985; Cohen, 1992; Tadepalli and Natarajan, 1996].
254 9. Summary and extensions

SHAPER supports a collection of tools for describing domains, problems, and initial
knowledge. These tools extend the PRODIGY domain language and enable the user
to specify the following information.
Domains and problems: A domain includes a type hierarchy, operators, and in-
ference rules (Sections 2.2.1 and 2.3.2). A problem instance is specified by an object
list, initial state, and goal statement (Section 2.2.1).
Control rules: The user can construct heuristic rules for guiding the PRODIGY
search (Section 2.4.3).
Primary effects and abstraction: A domain may include multiple selections of
primary effects (Section 3.1.2) and abstraction hierarchies (Section 4.1.2).
Domain descriptions: A description consists of a domain encoding along with a
specific selection of primary effects, abstraction, control rules, and restrictions on
the allowed problems (Section 6.1.1).
Solver and changer operators: A solver operator consists of a solver and two
applicability conditions (Section 6.2.1). It may also include a sequence of problem-
specific changers, which improve a problem description before the invocation of the
solver. A changer operator includes a sequence of problem-independent changers
and two applicability conditions (Section 6.2.1).
Gain functions: The user specifies gains by a numeric Lisp function with three
arguments: a problem, running time, and search outcome (Section 6.3.1). If the
gain linearly decreases with the time, the user can encode it by the reward function
and unit-time cost (Section 6.3.2).
Size functions and similarity hierarchies: A size function is a Lisp procedure
that inputs a problem and returns a real-valued size (Section 8.2.1). A similarity
hierarchy consists of similarity groups, arranged into a tree, and a Lisp function
that identifies the leaf group for each given problem (Section 8.3.1).
Rejection, comparison, and preference rules: The user can provide heuris-
tic rules for pruning ineffective descriptions and representations (Sections 6.2.2
and 6.2.3), and for selecting among the remaining representations (Section 9.1.1).
Suspension and cancellation rules: The user can also provide rules for delaying
representation changes and pruning some of the delayed changes (Section 9.1.4).

Fig. 9.S. Main elements of the extended PRODIGY language.


Part IV

Empirical results
257

I used to be solely responsible for the quality of my work, but now my com-
puter shares the responsibility with me.
- Herbert A. Simon, personal communications.

The experiments with the main components of SHAPER have confirmed the
system's effectiveness. We have tested primary effects (Sections 3.6 and 3.7),
abstraction (Sections 4.4 and 5.1), problem-specific descriptions (Section 5.2),
and statistical selection (Sections 7.5, 7.6, and 8.2). We now present an em-
pirical evaluation of the integrated system on the Machining problems (Chap-
ter 10), Sokoban puzzle (Chapter 11), STRIPS world (Chapter 12), and Lo-
gistics tasks (Chapter 13). Note that we have not used these domains during
the development of SHAPER, nor "fine-tuned" the system for any of them.
We have composed solver libraries from three PRODIGY search engines;
specifically, we have used SAVTA and SABA, as well as the LINEAR engine,
similar to PRODIGy2 (see Section 2.2.5). We have constructed solvers by com-
bining these three engines with cost-bound heuristics. For each domain, we
have plotted SHAPER'S gains on series of randomly ordered problems.
The results have confirmed that the effectiveness of the control mechanism
does not depend on specific properties of domains, solvers, or gain functions.
In Chapter 10, we explain the design of experiments and give the results for
the Machining Domain; in Chapters 11-13, we describe similar tests in the
other three domains.
10. Machining Domain

We apply SHAPER to the model of a machine shop described in Section 3.6.2.


All problems in this domain are solvable, and they never cause a failure of
PRODIGY search.

10.1 Selecting a description

Suppose that the system includes only the SAVTA search without cost bounds.
We can apply changers to construct new descriptions of the Machining Do-
main, and then SHAPER selects the description that maximizes SAVTA's gains.
In Figure 10.1, we show solver and changer operators constructed from
SAVTA and the available changers. The control module applies changer op-
erators to the initial encoding of the Machining Domain and constructs two
new descriptions (Figure 10.2). The first is based on primary effects (Fig-
ure 3.31), and the second is a combination of primary effects with abstrac-
tion (Figure 5.5). The system pairs these descriptions with solver operators,
thus producing four representations. We describe the results of the statistical
selection among them for several utility functions.
Linear gain functions. We consider simple gain functions that depend on
the number of goals, search time, and solution cost. First, we experiment
with a function that depends only on the time:

{ 0.5 - time, if success (10.1)


gn = -time, if failure or interrupt

Second, we consider the situation in which the gain depends on the number
of goals and solution cost, whereas its dependency on the time is negligible:

_ { 2· n-goals - cost - 0.01· time, if success and cost < 2· n-goals (10.2)
gn - -0.01 . time, otherwise

Third, we use a gain function that accounts for all three factors:

_ {4 . n-goals - cost - 50 . time, if success and cost < 4· n-goals (10.3)


gn - -50. time, otherwise

E. Fink, Changes of Problem Representation


© Springer-Verlag Berlin Heidelberg 2002
260 10. Machining Domain

IChooser HCompleter I
rcost bound 1
SAVTA wlo
Applicability conditions:
Description has no primary effects
Abstractor has not been applied
No applicability conditions IAbstractor I
I
Applicability conditions:
Description has no abstraction
.1 Relator Chooser Abstractor SAVTA wlo

Applicability conditions:
cost bound
IMargie HCompleter HAbstractor I
Description has no primary effects and no abstraction Applicability conditions:
Description has no primary effects
and no abstraction
(a) Solver operators. (b) Changer operators.

Fig. 10.1. Solver and changer operators in the description-choice experiments.


--------------- r---------------------
( initial

1
-". ~

II cost bound I
SAVTAw/o Relator
Chooser
1 Chooser. 1
Completer ( initial ) Abstractor
SAVTA wlo
~ dash-and -dot lines cost bound
( prim 'I
~
( initial )
Margie
Completer
If SAVTA W/O]
cost bound dashed lines

Abstractor I
( I!rim )
~ I
I lower dotted lines
( E[imand I I
·erarchy J ~

II I
I
I
I
I SAVTAw/o
I cost bound

I(Eierarchy
I
I
I rimand )
I
I
upper dotted lines
space of descriptions J ~ - - space of representations --

Fig. 10.2. Descriptions and representations generated by the operators in Fig-


ure 10.1. The Abstractor operator does not produce new descriptions because the
hierarchy collapses into a single level. The small-font subscriptions refer to the
legend in Figures 10.3-10.6, which show the performance of each representation.
10.1 Selecting a description 261

We have run two experiments with each function. First, SHAPER has pro-
cessed sequences of fifty problems; then, it has run on 500-problem sequences.
We give the results of fifty-problem experiments in Figure 10.3; the top
row of graphs shows the system's behavior with Function 10.1, the middle row
is for Function 10.2, and the bottom row is for Function 10.3. The horizontal
axes show the number of a problem, and the vertical axes are the gains.
For every function, we give the raw data in the left-hand graph, which
shows the gain on each problem, and provide alternative views of the same
data in the other two graphs. The solid line in the middle graph is the result
of averaging the gains over five-problem intervals; we plot the mean gain for
problems 1-5,6-10, and so on. The solid curve on the right is analogous to
the artificial-test graphs in Section 7.6; it shows the cumulative per-problem
gain up to the current problem.
The other lines illustrate the performance of the four available represen-
tations without time bounds. For each representation, we plot the results of
applying it to all fifty problems. In particular, the dashed curves show the be-
havior of the problem-specific abstraction with unlimited search time, which
is the optimal strategy for all three gain functions.
In Figure 10.4, we give the results of applying SHAPER to 500-problem
sequences, and compare them with the performance of each representation.
The left-hand graphs are smoothed gain curves, averaged over ten-problem
intervals; the graphs on the right show the cumulative per-problem gains.
The system has converged to the optimal choice of a representation in all
three cases, after processing about a hundred problems.
Discontinuous gain functions. We describe three series of tests with non-
linear functions. In the first series, SHAPER earns rewards only for optimal
solutions:
_ {0.5 - time, if the solver finds an optimal solution
gn - _ time, otherwise (10.4)

In the second series, we reward the system for solving a problem by a deadline:
n = {1.0, if success and time::; 0.5
(10.5)
9 -1.0, otherwise
In the third series, we use a complex function, which depends on the number
of goals, search time, and solution quality:
4·n-goals, if success and time::; 0.3
{ 4·n-goals-50·time, if success and 0.3 < time::; 0.5
gn = 4.n-goals- cost-50· time, if success, time> 0.5, and cost< 4·n-goals
-50· time, otherwise
(10.6)
The system has identified the optimal strategy for each function. We sum-
marize the results for fifty-problem sequences in Figure 10.5, and the results
for 500-problem sequences in Figure 10.6.
262 10. Machining Domain

Summary. We have shown SHAPER'S ability to choose appropriate descrip-


tions in the Machining Domain. The system converges to the right strat-
egy after processing fifty to hundred problems; however, it suffers significant
losses in the beginning when trying alternative representations. The system
has proved equally effective for a variety of gain functions, and its behavior
on Machining problems is very similar to the artificial tests of Section 7.6.
In Table 10.1, we summarize the cumulative per-problem gains on short
and long sequences. If the system had prior knowledge of the optimal strategy
for each function, it would earn the gains given in the rightmost column. The
cumulative gains on the fifty-problem sequences range from 53% to 85% of
the optimal, whereas the gains on the long sequences are between 88% and
99% of the optimal.

Table 10.1. Average per-problem gains and their comparison with the optimal-
strategy gains. For each gain, we give the corresponding percentage of the optimal.
gain problem sequences optimal
function short long gain
10.1 0.148 \80'Y?! 0.177 \96~! 0.184
10.2 3.91 (55%) 6.67 (93%) 7.17
10.3 10.3 (53%) 17.1 (88%) 19.4
10.4 0.115 (63%) 0.176 (96%) 0.184
10.5 0.680 (68%) 0.956 (96%) 1.000
10.6 38.6 (85%) 44.6 (99%) 45.2
10.1 Selecting a description 263

detailed gain curve smoothed gain curve ave rage per-problem gains

0.3 0.3 0.3


0.2 0.2 0.2 ,."'\ /' ............... - -- - -
1/)
c:
0.1 0.1 0.1 J','.
'(ij
O> O O ./ - . "";

"

-0.1 -0.1
-0.2 -0.2
O 20 40 O 20 40 O 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 10.1) .
detailed gain curve smoothed gain curve average per-problem gains

12 12 12
10 10 10
8 8,
1/)
c:
'(ij
..... , ...... -, ..... "'- .... --
O> 6 6 6
4 4
2 2
O
O ~ ® ~ ® ~ ®
problem's number problem's number problem's number
(b) Gain mainly depends on the solution quality (Function 10.2).
detailed gain curve smoothed gain curve average per-problem gains
30 30 30
20 20 20 ,,--~- ____ _

10 10 10 .'.
1/)

.~ O1-t---h'-tW--1.-'--..L..U.--;f-{ Or-+-~--------~
O>
-10 -10
-20 -20 ,
-30~ ____________ ~
-30~ ____~____~~
O 20 40 O 20 40
problem's number problem's number problem's number
(c) Gain depends on both time and quality (Function 10.3) .
Fig. 10.3. Performance on fifty-problem sequences with linear gain functions. The
system chooses among the four representations in Figure 10.2 and determines ap-
propriate time bounds. We give the gains on each of the fifty problems (Ieft) , as
well as smoothed gain curves (middle) and cumulative per-problem gains (right).
We plot not only the system 's gains (solid li nes) , but also the performance of the
available representations, which include the standard PRODIGY search (dash-and-
dot lines) , primary effects (Iower dotted lines) , abstraction (upper dotted lines) ,
and problem-specific abstraction (dashed lines). We do not show the lower dotted
curves for Function 10.1 because they coincide with the dash-and-dot curves.
264 10. Machining Domain

smoothed gain curve average per-problem gains

0.3 0.3

U)
c
·cuOJ
0.1

-0.1
-0.2
0 /
,
\.
/., ..
'. i J_ \
r .,
\../.
:..
/
r· ,.
. ,....i,,...
~,
./
\
.
\I
V
::f-----------
-0.1
-0.2
/

0 100 200 300 400 o 100 200 300 400


problem's number problem's number
(a) Gain linearly decreases with the running time (Function 10.1).
smoothed gain curve average per-problem gains

12 12
10
U)
8
c
·cuOJ ,--------------

2
o ._. ._ ....;. . ... . :. . . "-.: ....; ...- ......
~ ~.:- ...
:,~
o .....
o 100 200 300 400 o 100 200 300 400
problem's number problem's number
(b) Gain mainly depends on the solution quality (Function 10.2).
smoothed gain curve average per-problem gains
30 30


~,--------------
10
U)
c
·cuOJ ..........
OHr--------------------~

-10 ,
. ..... :',
-10 fi- ....
-20 ."'/'.1\,
i '~'\
~
t
/./ './. ..
\/ ,.~
I
. J.'/-
. ,t .\. ,
Ii
-20 r- -.-

-30 -30L-__~__~__~__~__~
0 100 200 300 400 o 100 200 300 400
problem's number problem's number
(c) Gain depends on both time and quality (Function 10.3).
Fig. 10.4. Performance on 500-problem sequences with linear gain functions. For
every function, we give the smoothed gain curve (solid lines, left) and cumula-
tive per-problem gains (solid lines, right). We also show the performance of each
representation (the other lines) using the legend of Figure 10.3.
10.1 Selecting a description 265

detailed gain curve smoothed gain curve average per-problem gains


O.4r---------, 0.4 r - - - - - - - - - ,

0.2 r ~ - - - - - - - -
J

0 o~-~~---~

-0.2

-0.4
,l ....
-0.6
0 20 40 0 20 40 40
problem's number problem's number problem's number
(a) System gets a reward only for the optimal solution (Function 10.4).
detailed gain curve smoothed gain curve average per-problem gains

1----------
...........
0.5 0.5 0.5
III I ' ..;. .r . \
C
·iii 0 O~~~~_~/~\~~
co .. ·1 \ I
,i
-0.5 -0.5 \" -0.5.

-1 -1 -1 .

0 20 40 o 20 40 o 20 40
problem's number problem's number problem's number
(b) Gain is a discontinuous function of the running time (Function 10.5).
detailed gain curve smoothed gain curve average per-problem gains
60 60 60

40 40
III
c 20 20
·iii
co }:- .....
0 O~~·~\~~~~.J~/T<~.~/
\i
-20 -20

0 ~ ® 0 ~ ® 0 ~ ®
problem's number problem's number problem's number
(c) Gain is a complex discontinuous function (Function 10.6).
Fig. 10.5. Results of applying SHAPER to fifty-problem sequences with discontinu-
ous gain functions. The graphs include the raw gain curves (left), smoothed curves
(middle), and cumulative gains (right); the legend is the same as in Figure 10.3.
We do not show the lower dotted lines (primary effects) for Functions 10.4 and 10.5
because they are identical to the dash-and-dot curves (standard PRODIGY search).
266 10. Machining Domain

smoothed gain curve average per-problem gains


0.4.....--~-~--------, 0.4r--~-~----~----'

O~------------------~
CIl
c
·~-0.2 -0.2
...
:.:'., ..... -0.4
','

-0.6·
,.-.- -
o 100 200 300 400 0 100 200 300 400
problem's number problem's number
(a) System gets a reward only for the optimal solution (Function 10.4).
smoothed gain curve average per-problem gains

1 }N. :Y.·
0.5
CIl
C
.iij
0>

-0.5

-1 -1
o 100 200 300 400 0 100 200 300 400
problem's number problem's number
(b) Gain is a discontinuous function of the running time (Function 10.5).
smoothed gain curve average per-problem gains
60 60

~
.iij
20
i
....
0>
o ."., . /... ." ..."fi...
'f • I ., l. ......
\"

-20 -20

o 100 200 300 400 0 100 200 300 400


problem's number problem's number
(c) Gain is a complex discontinuous function (Function 10.6).
Fig. 10.6. Results for 500-problem sequences with discontinuous gain functions.
We give the smoothed gain curves (left) and cumulative per-problem gains (right)
using the legend of Figure 10.3.
10.2 Selecting a solver 267

10.2 Selecting a solver

We present experiments with a library of four solvers based on SAVTA and


SABA (Figure 10.7). Two solvers perform depth-first search without a cost
bound, whereas the other two utilize a bound that approximates the doubled
cost of an optimal solution. We have not included the LINEAR search because
its behavior in the Machining Domain is identical to SAVTA.
The cost bound depends on the number of goals and on the required
quality of machining operations. We divide the goal literals into two groups:
effects of rough operators, such as drilled and polished, and fine effects, such as
finely-polished. If the number of literals in the first group is n-rough, and that
in the second group is n-fine, the cost bound is 2· n-rough + 4 . n-fine.
Four solvers. If SHAPER runs witl;J. the four solvers in Figure 10.7 and no
changers, it generates the four representations in Figure 10.8. We have tested
the system's ability to choose among them for Functions 10.1, 10.2, and 10.6.
We give the results in Figures 10.11 and 10.12; solid lines mark the system's
performance, and the other lines show the behavior of each representation.
The gains of SABA without cost bounds are identical to its gains with bounds
(dashed lines), but SHAPER has no prior knowledge of this identity. The
best choice for Functions 10.1 and 10.6 is SABA. On the other hand, if we
use Function 10.2, SHAPER should not try to solve any problems because all
available algorithms give negative gains. The system adopts the right strategy
in all three cases after processing about twenty problems.
Sixteen representations. We next consider the library of eight solver op-
erators in Figure 10.9, combined with the changer library in Figure 10.1(b),

I cost bound
SAVTA wlo I I SAVTA with
cost bound
I I SABA wlo
cost bound
I I cost bound
with I
SABA

No applicability No applicability No applicability No applicability


conditions conditions conditions conditions

Fig. 10.7. Basic solver operators.


~---------~----------------------------------------- --,

( initial
I: r,=='\===i'I r.=~========i'I r.=\========i'I \
:IIII
I
SAVTA wlo
cost bound
I ISAVTA with I I SABA wlo III
cost bound cost bound II SABA with
cost bound
I
I

: ( initial ) ( initial ) ( initial ) ( initial)


: dash-and-dot lines dotted lines dashed lines
space of
-descriptions- L - - - - - - - - - - - - - space 0 f represent a t'Ions - - - - - - - - - - - --

Fig. 10.8. Representations constructed without changers. The small-font subscrip-


tions refer to the gain curves in Figures 10.11 and 10.12.
268 10. Machining Domain

I cost bound
SAVTA wlo I
I Relator Chooser Abstractor SAVTA wlo
cost bound
I
No applicability Applicability conditions:
conditions Description has no primary effects and no abstraction

I cost bound
SAVTA with I
I Relator Chooser Abstractor SAVTA with
cost bound
I
No applicability Applicability conditions:
conditions Description has no primary effects and no abstraction

I cost bound
SABA wlo I I Relator Chooser Abstractor SABA wlo
cost bound
I
No applicability Applicability conditions:
conditions Description has no primary effects and no abstraction

I cost bound
SABA with I
I Relator Chooser Abstractor SABA with
cost bound
I
No applicability Applicability conditions:
conditions Description has no primary effects and no abstraction

Fig. 10.9. Extended set of solver operators.

Table 10.2. Average per-problem gains in the solver-selection experiments (Fig-


ures 10.11 and 10.12). We give the results of processing short and long problem
sequences, and convert each result into a percentage of the optimal-strategy gain.
gain problem sequences optimal
function short long gain
10.1 0.112 (61%) 0.176 (96%) 0.184
10.2 -0.0044 - -0.0004 - 0
10.6 36.8 (82%) 44.5 (99%) 45.1

which leads to generating sixteen representations (Figure 10.10). We plot the


gains in Figures 10.13 and 10.14 (solid lines), and compare them with the
results of choosing among four descriptions (dash-and-dot lines) and among
four solvers (dotted lines). We also show the utility of SAVTA with problem-
specific descriptions and no cost bounds, which is the best strategy.
Summary. The tests have confirmed the system's ability to identify the right
strategy with moderate initial losses. The performance is similar to the results
of choosing a description (Table 10.1) and to the artificial experiments. In
Table 10.2, we list the per-problem gains in the solver-selection experiments.
In Table 10.3, we give the gains obtained with sixteen representations, and
compare them with the selection among four descriptions.
10.2 Selecting a solver 269
----------------I r ------------------------------------------

( initial
~II '\ ~ ~ \
"

" I cost bound


"
I :
wlo
SAVTA
I II SAVTA with
cost bound
I ,I SABA wlo
cost bound
I II SABA with
cost bound
I

" initial ) ( initial ) ( initial ) ( initial )


\0(

~ ~ ~ \
Relator Relator Relator Relator
Chooser Chooser Chooser Chooser
Abstractor Abstractor Abstractor Abstractor
SAVTA wlo SAVTA with SABA wlo SABA with
ICompleter
Chooser I cost bound cost bound cost bound cost bound
( initial ) ( initial ) ( initial ) ( initial )
~
( prim
~ ~ ~ \
': I cost
SAVTAw/o I II I II I I cost bound
"
Margie "
SAVTA with SABA wlo with
SABA
Completer : I bound cost bound cost bound
Abstractor ( ) ( ) (
"
" }!rim }!rim }!rim ) ( }!rim )
~ "
"I
( ~rimand
ierarchy
I ~ ~ ~ \
I cost bound
wlo I II with I II wlo I I
I
I
SAVTA SAVTA SABA with SABA
I
I
cost bound cost bound cost bound

( trim and ) I( trim and ) I( trim and ) ( trim and


I
I
I
I
'erarchy 'erarchy 'erarchy 'erarchy
I

- space of descriptions . ~ - - - - - - - - - - - - - space of representations

Fig. 10.10. Large representation space. The system utilizes the solvers in Fig-
ure 10.9 and changers in Figure 1O.1(b), which give rise to sixteen representations.
We show the results of selecting among them in Figures 10.13 and 10.14.

Table 10.3. Average per-problem gains in the experiments with sixteen representa-
tions (Figures 10.13 and 10.14). We also give the results of using four representations
(Figures 10.3-10.6) and the performance of the best available strategy.
gain selection among optimal
function sixteen I four
representations representations
gain

short problem sequences


10.1 0.009 (5%) 0.148 (80%) 0.196
10.2 5.11 (71%) 3.91 (55%) 7.17
10.6 33.8 (75%) 38.6 (85%) 45.2
long problem sequences
10.1 0.158 (86%) 0.177 (96%) 0.184
10.2 6.93 (97%) 6.67 (93%) 7.17
10.6 44.1 (98%) 44.6 (99%) 45.2
270 10. Machining Domain

detailed gain curve smoothed gain curve average per-problem gains

0.2 0.2 ,_ /' ~ - - - - - -

U) 0 H+-I---------I
c:
'(0
0> -0.2

-0.4 -0.4 -0.4 :

o 20 40 o 20 40 o 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 10.1).
detailed gain curve smoothed gain curve average per-problem gains
or---;::::=====I or-----;:::==::::::j o

,'\
-0.01 I './ - . / ............. -0.01 ....... _._._.

lA

-0.02 -0.02 -0.02 ,-''_ _ _ _ _~--'


o ~ ~ 0 ~ ~ 0 ~ ~
problem's number problem's number problem's number
(b) Gain mainly depends on the solution quality (Function 10.2).
detailed gain curve smoothed gain curve average per-problem gains
60 60
40 40 ..... ----------
U)
20
c:
'(0
0> 0 /
\'

-20 "

-40 -40 -40

o ~ ~ 0 ~ ~ 0 ~ ~
problem's number problem's number problem's number
(c) Gain is a discontinuous function of time and quality (Function 10.6).
Fig. 10.11. Selection among four solvers on fifty-problem sequences. The graphs
include the raw gains (left), smoothed gain curves (solid lines, middle), and cumu-
lative per-problem gains (solid lines, right). We also show the performance of every
solver: SAVTA without cost bounds (dash-and-dot lines), SAVTA with loose bounds
(dotted lines), and SABA (dashed lines). The behavior of SABA without cost bounds
is identical to that with loose bounds.
10.2 Selecting a solver 271

smoothed gain curve average per-problem gains

o(
0.2 / - ~:--::.~=-=-=-~~~~--=J

-0.2·/
. .. . ,"
.....
-0.4 " : ......:...'.: -0.4·
.~: .~:

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(a) Gain linearly decreases with the running time (Function 10.1).
smoothed gain curve average per-problem gains
o / o

-0.005 1""\ ~ J y'\ ......... .I ..... ./ \ ,... I ...... \ "\ r ,.. ,,\ /

-0.015 .. -0.015
',,:, '::'.:'
-0.02,--_~_~_~_~_--, -0.02,--_~_~_~_~_-,
o 100 200 300 400 0 100 200 300 400
problem's number problem's number
(b) Gain mainly depends on the solution quality (Function 10.2).
smoothed gain curve average per-problem gains
60 60

40 40('--

til
20 20
c: Ii
·iii
Cl 0
I. /
.1 I ..- , .,
I. I·
I I
J
I-
I
,
~
1\ f
If
.. i'
°lr -
-20 , .. -20
.. :.:~'.:'
.. .. ..
-40 -40

0 100 200 300 400 o 100 200 300 400


problem's number problem's number
(c) Gain is a discontinuous function of time and quality (Function 10.6).
Fig. 10.12. Selection among four solvers on 500-problem sequences. We give the
smoothed gain curves (solid lines, left), cumulative gains (solid lines, right), and
the performance of every solver (the other lines) using the legend of Figure 10.11.
272 10. Machining Domain

detailed gain curve smoothed gain curve average per-problem gains

0.2 0.2 0.2 I-/" - - - - - -::. ~ .

CI) 0 0
c:
.OJ
Cl
-0.2 -0.2

-0.4 -0.4

0 20 40 0 20 40 o 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 10.1).
detailed gain curve smoothed gain curve average per-problem gains

12 12 12
10 10 10

~
~ 8 8

~
I~
·a 6 6
4 V 4
2
o
o
~U ~ ~
2
O'--_ _~_ _~-'
0 ~ ~ ~ ~
problem's number problem's number problem's number
(b) Gain mainly depends on the solution quality (Function 10.2).
detailed gain curve smoothed gain curve average per-problem gains
60 60 60
40 40 40 .... ,. - - - - .-:.~.- -

CI)
20 20
c:
.OJ
Cl 0 O~~-----~

-20 -20
-40 -40 -40

0 20 40 o 20 40 o 20 40
problem's number problem's number problem's number
(c) Gain is a discontinuous function of time and quality (Function 10.6).
Fig. 10.13. Selection among sixteen representations on fifty-problem sequences.
The graphs include the raw gains (left), smoothed gain curves (solid lines, middle),
and cumulative per-problem gains (solid lines, right). We compare these results with
two smaller-scale tasks: the choice among the four representations in Figure 10.2
(dash-and-dot lines) and among the four solvers in Figure 10.7 (dotted lines). We
also show the performance of SAVTA with problem-specific abstraction and no time
bound (dashed lines), which is the optimal strategy.
10.2 Selecting a solver 273

smoothed gain curve average per-problem gains

o 100 200 300 400 0 100 200 300 400


problem's number problem's number
(a) Gain linearly decreases with the running time (Function 10.1).
smoothed gain curve average per-problem gains

12 12
10 10

:ir-
8
'"c:
'Cij '" - - -~=--=:"'::'''::'''=-==::'::::~-=-===='"='l
0)

2
O' OL-__~__~____~__~__~
0 100 200 300 400 o 100 200 300 400
problem's number problem's number
(b) Gain mainly depends on the solution quality (Function 10.2).
smoothed gain curve average per-problem gains
60 60

'"c:
.~ O~----------------------~
:Ir
O~--------------------~

-20 -20
-40 -40

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(c) Gain is a discontinuous function of time and quality (Function 10.6).
Fig. 10.14. Selection among sixteen representations on 500-problem sequences
(solid lines). We also plot the performance of the optimal strategy (dashed lines)
and the results for smaller-scale selection tasks (dotted and dash-and-dot lines);
the legend is the same as in Figure 10.13.
274 10. Machining Domain

10.3 Different time bounds

When SHAPER tests the available representations, it uses larger-than-optimal


time bounds. In Section 7.4.2, we have described a mechanism for choosing
a time bound. First, the system estimates the optimal bound and computes
the expected gain gmax, as well as its standard deviation a max . Second, it
processes larger bounds, and estimates the gain 9 and deviation a for each
bound. Third, it identifies the maximal bound with gain close to the optimal;
by default, it considers 9 close to gmax if ..}rr;.RX-g 2 ::; O.l.
Umax+U

We can change the limit for the .}rr;.RX-g


UITlax+U
2 ratio. If this limit is less
than 0.1, SHAPER chooses bounds that are closer to the optimal, which in-
cur smaller initial losses. On the other hand, a large limit causes a more
thorough exploration at the expense of greater initial losses. We have tried
several settings of this knob; the optimal setting varies across domains and
gain functions, but the default 0.1 value always gives near-optimal results.
We have re-run the sixteen-representation tests for two small knob values,
0.02 and 0.05. In Figures 10.15 and 10.16, we plot the differences between the
resulting gains and the default-knob gains; the performance is very similar
to the default case. We have also tested two large knob values, 0.2 and 0.5,
and again observed a close-to-default behavior (Figures 10.17 and 10.18).
In Table 10.4, we show the dependency of average gains on the exploration
knob, and convert the results into the percentages of the default gains. In
most cases, the knob changes affected the gains by less than two percents.
The only exception is the fifty-problem experiment with Function 10.1, which
has revealed a possibility of 22% improvement on the default behavior (first
line of Table 10.4); however, the absolute gain difference is not significant.
Since the cumulative gain is close to zero due to initial losses, the small
absolute improvement translates into a large percentage.

Table 10.4. Dependency of the average per-problem gains on the exploration knob.
We list the gains for different knob values and give the corresponding percentages
of the default-strategy gains.
gain small knob values default large knob values
function 0.02 I 0.05 0.1 0.2 I 0.5
short problem sequences
10.1 0.008 (89%) 0.009 (100%) 0.009 0.011 (122%) 0.008 (89%)
10.2 5.113 (100%) 5.113 (100%) 5.113 5.113 (100%) 5.113 (100%)
10.6 33.50 (99%) 33.84 (100%) 33.80 33.84 (100%) 33.76 (100%)
long problem sequences
10.1 0.159 (101%) 0.161 (102%) 0.158 0.156 (99%) 0.160 (101%)
10.2 6.963 (100%) 6.961 (100%) 6.931 6.957 (100%) 6.943 (100%)
10.6 43.97 (100%) 43.84 (99%) 44.13 44.13 (100%) 44.07 (100%)
10.3 Different time bounds 275

detailed difference curve smoothed difference curve average per-problem diffs


0.15 0.15,--------------,

0.1 0.1 0.01


8l
g 0.05 0.05
~
£ of--------,IIi 0 0
'5
.~ -0.05 -0.05
C>
-0.1 -0.1 -0.01

o ~ ~ 0 ~ ~ 0 ~ ~
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 10.1).
detailed difference curve smoothed difference curve average per-problem diffs
0.6 0.6 0.06
0.4 0.4 0.04
8l
g 0.2 0.2 0.02
£~ Ol--------------~ 0 0
~ -0.2 -0.2 -0.02
·~-0.4 -0.4 -0.04
-0.6 -0.6 -0.06

o ~ ~ 0 ~ ~ 0 ~ ~
problem's number problem's number problem's number
(b) Gain mainly depends on the solution quality (Function 10.2).
detailed difference curve smoothed difference curve average per-problem diffs

.r--r.
If)
(])
0
c:
0
~I 0 ....... 0

-0.2
-5 -5
~
(])
lI: -0.4
c: -10
'5 -10
'iii -0.6
C>

-15 -15
-0.8
o ~ ~ 0 ~ ~ 0 ~ ~
problem's number problem's number problem's number
(c) Gain is a discontinuous function of time and quality (Function 10.6).
Fig. 10.15. Performance with small values of the exploration knob on fifty-problem
sequences. The solid lines show the differences between the gains with the knob
value 0.02 and that with the knob value 0.1. The dashed lines mark the differences
between the 0.05-knob gains and O.l-knob gains. We plot the difference for each
problem (left), smoothed difference curves (middle), and cumulative per-problem
differences (right).
276 10. Machining Domain

smoothed difference curve average per-problem differences


0.15.---~-~-~-------, 0.015r--~-~-~--'------'

0.1 J 0.01

\11~
fIl
Ql "I
O
c 0.05 I 0.005 '-
~
~ 0 I~~I/1 I
1
'--------...----1 0
"C 1/ '
.~ -0.05 " -0.005 I,~
C) I,
-0.1 -0.01
-0.015"--~-~-~--~---'
o 100 200 300 400 0 100 200 300 400
problem's number problem's number
(a) Gain linearly decreases with the running time (Function 10.1).
smoothed difference curve average per-problem differences
0.6 0.06
0.4 0.04
g 0.2
fIl

0.02
~
o
~ O[~ 1/

~ -0.2 -0.02
·~-0.4 -0.04
-0.6 -0.06

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(b) Gain mainly depends on the solution quality (Function 10.2).
smoothed difference curve average per-problem differences

fIl
2l
~ -5
Ql
:j:: -0.4 I I r I

c -10
'C I/ I .-
'iii -0.6 '"
"
C)

-15
-0.8
o 100 200 300 400 o 100 200 300 400
problem's number problem's number
(c) Gain is a discontinuous function of time and quality (Function 10.6).
Fig. 10.16. Performance with knob values 0.02 and O.OS on SOD-problem sequences.
We plot the smoothed gain-difference curves (left) and cumulative per-problem
differences (right) using the legend of Figure 10.lS.
10.3 Different time bounds 277

detailed difference curve smoothed difference curve average per-problem diffs


0.15 0.15r--------,
0.1 . 0.1 0.01
~t: 0.05
I
0.05
'I
~ 'I
~ Of-------, 0 0
'C I
.~ -0.05 -0.05
-0.1 -0.1 -0.01

o ~ ~ 0 ~ ~ 0 ~ ~
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 10.1).
detailed difference curve smoothed difference curve average per-problem diffs
0.6 0.6 0.06
0.4 0.4 0.04
rn
~
t:
0.2 0.2 0.02
~
~ 0 0 0
~ -0.2 -0.2 -0.02
·~-0.4 -0.4 -0.04
-0.6 -0.6 -0.06

o ~ ~ 0 ~ ~ 0 ~ ~
problem's number problem's number problem's number
(b) Gain mainly depends on the solution quality (Function 10.2).
detailed difference curve smoothed difference curve average per-problem diffs

rn
Ql
0
~ 0 ........ 0
0
t: -0.2
~
-5 -5
Ql
11= -0.4
'C
t:
-10 -10
'iii -0.6
Cl

-15 -15
-0.8
0 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(c) Gain is a discontinuous function of time and quality (Function 10.6).
Fig. 10.17. Processing fifty-problem sequences with large knob values. The solid
lines show the differences between the 0.5-knob gains and O.l-knob gains. The
dashed curves mark the differences between the 0.2-knob gains and O.l-knob gains.
278 10. Machining Domain

smoothed difference curve average per-problem differences


0.15 r---~-~-~---____' 0.015,.-----~-~-~-------,

0.1 0.01
~
g 0.05 0.005
!!?
Jg 0 o
'6
.~ -0.05 -0.005 ,, ,,
f \'

-0.1 -0.01
,, ,,
,
-0.015'---~-~-~-~~---'
o 100 200 300 400 0 100 200 300 400
problem's number problem's number
(a) Gain linearly decreases with the running time (Function 10.1).
smoothed difference curve average per-problem differences
0.6 0.06
0.4 0.04
rn
g 0.2 0.02
!!?
Jg 0 r~ '--; I. 1 o
~ -0.2 ,
" -0.02
'jg,-0.4
I
"
-0.04
-0.6 -0.06

o 100 200 300 400 0 100 200 300 400


problem's number problem's number
(b) Gain mainly depends on the solution quality (Function 10.2).
smoothed difference curve average per-problem differences

o
rn
(])
o
~ -5
Jg
c -10
'6
'rCl
o
-15
-0.8
o 100 200 300 400 o 100 200 300 400
problem's number problem's number
(c) Gain is a discontinuous function of time and quality (Function 10.6).
Fig. 10.18. Processing 500-problem sequences with large knob values. We plot the
gain-difference curves for knob values 0.5 (solid lines) and 0.2 (dashed lines) using
the legend of Figure 10.17.
11. Soko ban Domain

The Sokoban puzzle (see Section 3.7.2) is more difficult than the Machining
Domain; it usually requires long search, and it includes unsolvable problems.
In most cases, PRODIGY cannot find near-optimal solutions, and cost bounds
may lead to gross inefficiency; hence, we use PRODIGY without cost limits.

11.1 Three representations

We begin with the selection among three representations. We use the gain
functions in Figure 11.1, which do not account for the solution quality. The
first function is a linear dependency on the running time:

10 - time, if success
{ (11.1)
gn = -time, if failure or interrupt

The second is a discontinuous function:


_ { 20 - time - LtimeJ, if success
(11.2)
gn - _ time - LtimeJ, if failure or interrupt

The third is a linear dependency on the time logarithm:

_ {5 -In(time + 1), if success


(11.3)
gn - -In(time + 1), if failure or interrupt

Selecting a description. The first experiment involves the LINEAR solver


and three changer operators (Figure 11.2a,b). The system inputs the do-
main description in Figure 3.35 and constructs two more descriptions (Fig-
ure 1l.2c), which are based on primary effects and abstraction. In Figures 11.4
and 11.5, we give the results of selecting a representation (solid lines), and
the behavior of each representation with the optimal time bound (the other
lines). The system has identified the right representation and time bound in
all three cases. The local maxima of gains between problems 50 and 100 in
Figure 11.5 are due to a random fluctuation in problem difficulty.

E. Fink, Changes of Problem Representation


© Springer-Verlag Berlin Heidelberg 2002
280 11. Sokoban Domain

gain gain gain

10 20
,,
15 ,,
5 10 ,,
5
,,
o t--~~~r-~_~~run o ~_~~'~~~~~run
,,
t--_~_~_~~~run
20 time 20 time 20 time
-5 ,, ,,
-5 -10 ,, success
-2

no success no success no success


-15 -4
(a) Function 11.1. (b) Function 11.2. (c) Function 11.3.
Fig. 11.1. Gain functions in the Sokoban experiments .

I
----------------- ... ----------

I
I
LINEAR wlo initial
cost bound I ~
No applicability conditions
~ :II
I

I
ICompleter
chooser·1 :
LINEAR wlo
cost bound
(a) Solver operator.
:( )
!
initial
I

dash-and-dot lines
IChooser HCompleter I
). I
C prim
~
Applicability conditions: Margie I II cost bound
wlo I
Description has no primary effects
Abstractor has not been applied
Completer
Abstractor
IAbstractor I LINEAR

IAbstractor I
~
E[imand
J1 ( ~rim

dotted lines
)

·erarchy
Applicability conditions: ~
Description has no abstraction
I cost bound
LINEARwlo I

~ Completer HAbstractor I I
I
I (E[imand
·erarchy J
I
Applicability conditions: I
Description has no primary effects dashed lines
and no abstraction
I fd· f I space of
- - space 0 escnp .oDS representations
(b) Changer operators. (c) Expanded spaces.

Fig. 11.2. Solvers, changers, and representations in the description-choice exper-


iments. The subscriptions refer to the gain curves in Figures 11.4 and 11.5. Note
that the application of Abstractor to the initial description does not produce an
abstraction hierarchy. On the other hand, when the system invokes Abstractor after
selecting primary effects, it produces the same hierarchy as Margie.
11.1 Three representations 281

I LINEAR wlo
cost bound
I I SAVTA wlo
cost bound
I I SABA wlo
cost bound
I
No applicability No applicability No applicability
conditions conditions conditions
(a) Solver operators .
.... - - - - - - - - -1,- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --1
prim and II
hierarchy

dashed lines dotted lines dash-and-dot lines


~au~ I I
-descriptions- - - - - - - - - - space of representations - - - - - - -
(b) Expanded spaces.

Fig. 11.3. Basic solver operators. The system pairs them with a given domain
description, thus producing three representations. The small-font subscriptions refer
to the gain curves in Figures 11.6 and 11.7.

Selecting a solver. The next experiment is based on the solver library


in Figure 11.3(a). We have provided a domain description with primary
effects and abstraction, and SHAPER has paired it with the three solvers
(Figure 11.3b). We give the results of choosing among them in Figures 11.6
and 11.7. SHAPER has found appropriate solvers and time bounds for Func-
tions 11.2 and 11.3, but it has made a wrong choice for Function 11.1. When
SHAPER runs with this function, it correctly identifies LINEAR as the best
solver, but converges to the 1.62-second time bound, whereas the optimal
bound is 3.02 seconds.
Summary. The system has selected the right representation in all experi-
ments and found an appropriate time bound in all but one case. The learning
curves are similar to that in the Machining Domain; SHAPER usually adopts
the right behavior after solving fifty to hundred problems. In Table 11.1,
we list the average per-problem gains and compare them with the optimal-
strategy gains. The percentage values of cumulative gains are smaller than
in the Machining Domain due to greater initial losses. Since many Sokoban
problems require infeasibly long search, large bounds incur high losses during
the early stages of learning.
The experiments have shown that a drastic reduction of search times may
not lead to a proportional upsurge of gains. The primary effects reduce the
search by three orders of magnitude (Section 3.7.2), and their synergy with
abstraction gives an even greater reduction (Section 4.4). On the other hand,
the resulting gain increase is relatively modest; the increase factor ranges
from 1.1 to 2.7 (Table 11.2). Intuitively, a much faster procedure is not nec-
282 11. Sokoban Domain

essarily much better in practice. For example, a software company usually


cannot achieve a sharp growth of profits by purchasing faster machines. The
results in other domains have supported this intuition; the gain-increase fac-
tor has always been less than 5.

Table 11.1. Average per-problem gains in the experiments with small representa-
tion spaces. We summarize the results of choosing among three descriptions (Fig-
ures 11.4 and 11.5) and among three solvers (Figures 11.6 and 11.7). For every gain,
we show the corresponding percentage of the optimal-strategy gain.
gain selection among optimal
function three I three gain
descriptions solvers
short problem sequences
11.1 0.625 (37%) 0.245 (14%) 1.711
11.2 2.989 (99%) 0.886 (29%) 3.007
11.3 0.689 (56%) 0.249 (20%) 1.240
long problem sequences
11.1 0.610 (76%) 0.421 (52%) 0.802
11.2 1.940 (88%) 1.939 (88%) 2.213
11.3 0.477 (85%) 0.371 (66%) 0.563

Table 11.2. Time savings and the respective growth of gains. A huge search re-
duction translates into a modest gain increase.
gain primary primary effects
function effects and abstraction
search-reduction factor
> 500 I > 5000
gain-increase factor
11.1 1.1 1.8
11.2 1.7 2.3
11.3 1.1 2.7
11.1 Three representations 283

.,
detailed gain curve smoothed gain curve average per-problem gains
10 10
2
I, I,', \ \'
" JI \ ",\I,
5 5 1\ I' VI

II)
t::
'iii
,I'
" "
, I',
Cl
0

-5
u 0

-5
0

-1

o 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 11.1).
detailed gain curve smoothed gain curve average per-problem gains
20 20
15 15 4

10 10
II) 2
.~ 5 5
Cl
ol-+t+IHttth~+Ht-ttl-tH-H O~T-~~~~L-~ O~~~----~

-5 -5
-2
-10 _ _ _---.-J -10
'---~------'
o 20 40 o 20 40 o 20 40
problem's number problem's number problem's number
(b) Gain is a discontinuous function of the running time (Function 11.2).
detailed gain curve smoothed gain curve average per-problem gains
6 6.---------, 1.5.---------,
, ,I, 1\

I,
11'1' I I \ ,I
4
~ 4
I,
I \ J\ /01' J

II)
t:: 2 0.5 f'
'iii
Cl
,I I ~',., ...
0 oh=':M-:x{Ji--------j

-2 ~ ~ -2 -0.5

o ~ ~ 0 ~ ~ 0 ~ ~
problem's number problem's number problem's number
(c) Gain linearly decreases with the time logarithm (Function 11.3).
Fig. 11.4. Selection among three descriptions on fifty-problem sequences. We plot
the raw gains (left), smoothed gain curves (solid lines, middle), and cumulative gains
(solid lines, right). We also show the performance of each description: the standard
PRODIGY search (dash-and-dot lines), primary effects (dotted line), and abstraction
(dashed lines). We do not show the dash-and-dot curves for Function 11.1 because
they coincide with the dotted lines.
284 11. Sokoban Domain

smoothed gain curve average per-problem gains


10
2 ,,
~ 1\

5
CIl
c:
'(ij -:~'.'''':''.''';''

Cl
o~--------------------~

-1
-5L-__~__~__~__~__~
o 100 200 300 400 o 100 200 300 400
problem's number problem's number
(a) Gain linearly decreases with the running time (Function 11.1).
smoothed gain curve average per-problem gains
20
15 4

CIl
10
2 ,: ---- ---
c:
'(ij "/'
/.
, ....... -'-'- _.-'-'-'-'-'
Cl
0
-5
-2
-10
0 100 200 300 400 0 100 200 300 400
problem's number problem's number
(b) Gain is a discontinuous function of the running time (Function 11.2).
smoothed gain curve average per-problem gains
f
I \
I'¥ \ '\ 1\
4

:g 2 0.5 ,
'(ij
Cl

,
Onr--------------------~

,
-2 -0.5,
o 100 200 300 400 o 100 200 300 400
problem's number problem's number
(c) Gain linearly decreases with the time logarithm (Function 11.3).
Fig. 11.5. Selection among three descriptions on 500-problem sequences. We give
the smoothed gain curves (solid lines, left) and cumulative per-problem gains (solid
lines, right), as well as the performance of each description (the other lines); the
legend is the same as in Figure 11.4.
11.1 Three representations 285

detailed gain curve smoothed gain curve average per-problem gains


10 10
2
v, "",-",
U)
c:
5 5
,
II
\
·iii
Cl /
0 V\l\.....L-JL-
0 0
L-

-1
-5 -5
o 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 11.1).
detailed gain curve smoothed gain curve average per-problem gains
20 20 10
15 15
10 10 5
U)
c:
.~ 5 5
o o""-"'.,...,.,4-"-+---'---4'--"--'''---1
-5
~ '-
-5
-10 -10 L -_ _ _ _ _ _ _ _ _ _ __

o ~ ~ 0 ~ ~ ~ ~
problem's number problem's number problem's number
(b) Gain is a discontinuous function of the running time (Function 11.2).
detailed gain curve smoothed gain curve average per-problem gains
6 6.--------------, 1.5,...--------------.

4 4

U)
c: 2 0.5
·iii
Cl

0
L-....,
~
-2 -0.5

o ~ ~ 0 ~ ~ 0 ~ ~
problem's number problem's number problem's number
(c) Gain linearly decreases with the time logarithm (Function 11.3).
Fig. 11.6. Selection among LINEAR, SAVTA, and SABA on fifty-problem sequences.
We plot the raw gains (left), smoothed gain curves (middle), and cumulative gains
(right). We also show the performance of LINEAR (dashed lines), SAVTA (dotted
lines), and SABA (dash-and-dot lines). We do not plot the dash-and-dot curves for
Function 11.1 because they coincide with the dotted lines.
286 11. Sokoban Domain

smoothed gain curve average per-problem gains


10
2 ~ 1\
I' \
I
5
U)
c
'Cij
----
.....,,:.. .....
C)

-1

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(a) Gain linearly decreases with the running time (Function 11.1).
smoothed gain curve average per-problem gains
20 10
15
10 5
U)
c
'Cij
/~
-./..... -------
C) I~~:i:-~ :..:. ...................

-5
OJ
-10 L -_ _ ~ ____ ~ __ ~ ____ ~ __ ~
-5 L -_ _ ~ ____ ~ __ ~ ____ ~ __ ~

o 100 200 300 400 0 100 200 300 400


problem's number problem's number
(b) Gain is a discontinuous function of the running time (Function 11.2).
smoothed gain curve average per-problem gains
6
I
1\
I" \ '\ 1\
4
... ,
~
..... - -
....

U)
c 2
'Cij
C)

-2
0 100 200 300 400 o 100 200 300 400
problem's number problem's number
(c) Gain linearly decreases with the time logarithm (Function 11.3).
Fig. 11.7. Selection among LINEAR, SAVTA, and SABA on 500-problem sequences.
We give the smoothed gain curves (solid lines, left) and cumulative per-problem
gains (solid lines, right), as well as the performance of each solver (the other lines);
the legend is the same as in Figure 11.6.
11.2 Nine representations 287

11.2 Nine representations

If SHAPER utilizes the changers in Figure 11.2(b) along with the solvers in
Figure 11.3(a), it generates the nine representations in Figure 11.8; we give
the results of using them with six different gain functions.
Simple gain functions. SHAPER has found the right representation and
time bound for all three gain functions in Section 11.1. In Figures 11.9
and 11.10, we show the resulting gains (solid lines), and compare them with
the selection among three descriptions (dash-and-dot lines) and among three
solvers (dotted lines). We also plot the performance of the optimal strategies
(dashed lines), which are based on the LINEAR solver with primary effects
and abstraction. When SHAPER utilizes nine representations, it converges to
the optimal strategy after processing 150 to 200 problems, whereas the choice
among three alternatives requires at most a hundred problems.
Distinguishing failures from interrupts. We next consider gain func-
tions that differentiate failures from interrupts. The first function is a linear
dependency on the search time with a partial reward for a failure:

10 - time, if success
gn = { 5 - time, if failure (11.4)
~time, if interrupt

The second involves a reward proportional to the area of the Sokoban grid;
thus, the system gets more points for harder problems:

area - time, if success


gn = { 0.5 . area - time, if failure (11.5)
-time, if interrupt

The third is the product of the grid size and linear dependency on time:

area. (10 - time), if success


gn = { area· (5 - time), if failure (11.6)
- area· time, if interrupt

The system has found the right strategy for all functions (Figures 11.11
and 11.12). The results confirm that the large representation space causes
slower convergence than the two smaller spaces.
Summary. The system has converged to the right representation and time
bound for all six functions. In Table 11.3, we summarize the gains and com-
pare them with the analogous data for smaller-scale selection tasks.
288 11. Sokoban Domain
1- - - - - - - - - - - - - - - - - - r - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -J
I
I ( initial I
I

"
~ ~ \
1 "
:: I LINEAR w/o I I SAVTA w/o I I SABA w/O I
I Chooser I I : cost bound cost bound cost bound
Completer "
:: ( initial ) ( initial ) ( initial )
: I

( prim
I

"
'\ ~ ~
Margie
I Abstractor I :: I LINEAR w/o I I SAVTA w/o I I SABA w/o I
"

Completer
cost bound cost bound
I : cost bound
Abstractor
I
I
"
:: ( Erim ) ( Erim ) ( Erim )
I
grim and : I
I
ierarchy
~ ~
I
I
I
~
I "

I LINEAR w/o I I SAVTA w/o I I SABA w/o I


I "
I : I
I :: cost bound cost bound cost bound
I
I
I
I : I (grim and ) ( grim and ) ( grim and )
I ': ierarchy ierarchy ierarchy
I
I I"
1 __ space of descriptions - - I '- - - - - - - - space of representations ------- I

Fig. 11.8. Experiments with nine representations.

Table 11.3. Average per-problem gains in the experiments with nine representa-
tions (Figures 11.9-11.12). We compare these gains with the analogous data for the
two smaller-seale selection tasks, and convert each gain into a percentage of the
optimal-strategy gain.
gain selection among optimal
fu net ion nme three three gain
representations descriptions solvers (dashed
(solid li nes ) (dash-and-dot lines) (dotted lines) lines)
short problem sequences
11.1 -0.077 - 0.625 (37%) 0.245 (14%) 1.711
11.2 -2.876 - 2.989 (99%) 0.886 (29%) 3.007
11.3 -0.762 -
0.689 (56%) 0.249 (20%) 1.240
11.4 0.172 (10%) 0.699 (41 %) 1.361 (80%) 1.711
11.5 3.70 (32%) 6.43 (56%) 4.17 (36%) 11.43
11.6 25.5 (97%) 25.5 (97%) 25.7 (98%) 26.2
long problem sequences
11.1 0.330 (41%) 0.610 (76%) 0.421 (52%) 0.802
11.2 1.169 (53%) 1.940 (88%) 1.939 (88%) 2.213
11.3 0.334 (59%) 0.477 (85%) 0.371 (66%) 0.563
11.4 0.382 (48%) 0.407 (51%) 0.747 (93%) 0.802
11.5 682 (77%) 7.50 (85%) 7.59 (86%) 8.78
11.6 15.1 (83%) 17.8 (98%) 17.8 (98%) 18.1
11.2 Nine representations 289

detailed gain curve smoothed gain curve average per-problem gains

10 10 , \
1\ /,'/ \ '
2 / I I' \ \ ",'w,
V/
s S /\
U) / \
c: /
.~

o 0 0
~U L-

-1
-s -S
o 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 11.1).
detailed gain curve smoothed gain curve average per-problem gains
20 20 10

lS
~ lS

10 10 S
~
'iij S S \ ./. ·r·' r""'"
/. oJ ,,':) .......... ; .
C)
/ ~ , . 't' .
o 0 o
-S -S

-10 L-- '-- L......J


-10 -S L...1..._::=.....:.........:'---_---'
o 20 40 0 20 40 o 20 40
problem's number problem's number problem's number
(b) Gain is a discontinuous function of the running time (Function 11.2).
detailed gain curve smoothed gain curve average per-problem gains
6 6

1.S
4 4 \ I /
\ / \/,' \ /I \" I /

,.,
~.

~ 2 2 j-\ /.
'iij O.S .'/
-/\
,.
/.;'''./. ;' ,..........
C) '.
ilj /.
o 0 0
j./'·.i;·
..

-2 1-'---' ~LJ -2
-o.s ,.
./ .'

j'
'.'

o 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(c) Gain linearly decreases with the time logarithm (Function 11.3).
Fig. 11.9. Selection among nine representations on fifty-problem sequences with
simple gain functions. We plot SHAPER'S gains (solid lines) along with the results of
two smaller-scale tasks: the selection among three descriptions (dash-and-dot lines)
and among three solvers (dotted lines). We also show the performance of LINEAR
with abstraction (dashed lines), which is the best available strategy.
290 11. Sokoban Domain

smoolhed gain curve ave rage per-problem gains

10
2 " 1\
l' \
I \ ..... '\
5
I f \ /'
'"c
'<ii
I ; :', . ~ ,.- - -
Ol
" :'~~:" ~~-'--~4

-5 L-__ ~ __ ~ __ ~ __ ~ __ ~

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(a ) Gain linearly decreases w it h the running time (Function 11 .1) .
smoolhed gain curve average per-problem gains
20 10

15

10

ori
5
'"
.~ 5
Ol

-5
-10 L-__ ~ __ ~ __ ~ __ ~ __ ~ -5'U L -_ _ ____ ____ __ __

O 100 200 300 400 o 100 200 300 400


problem's number problem's number
(b) Gain is a discontinuous function of the running time (Function 11.2).
smoolhed gain curve average per-problem gains
6

'"
c
'<ii
2
Ol

-2

O 100 200 300 400 o 100 200 300 400


problem's number problem's number
(c) G ain linearly decreases with the time logarit hm (Function 11 .3).
Fig. 11.10. Selection among ni ne representations on 500-problem sequences with
simple gain functions . We compare the results (solid lines) with two smaller-scale
selection t asks (dotted and dash-and-dot lines) and wit h the best available strat-
egy (dashed lines) j the legend is the same as in Figure 11.9.
11.2 Nine representations 291

.
detailed gain curve smoothed gain curve average per-problem gains

10 10 \
1\ 1,', \ '
2 1I J\ \ \ .... ',,\
VI
5 5 1\
(/)
c
·cael
o O O

I~ ~ Wl~
-1
-5 -5
O 20 40 O 20 40 O 20 40
problem's number problem's number problem's number
(a) Gain is a linear function of the running time
with a partial reward for a failure (Function 11.4).
detailed gain curve smoothed gain curve average per-problem gains

60 60 15

40 40

20 20
(/)
c
·ca O O
el

-20 -20

-40 -40 -10

-60 -60 -15 L -_ _ _ _ _ _ _ _ _ _ _ _ _ _ ~

O 20 40 O 20 40 O 20 40
problem's number problem's number problem's number
(b) Reward is proportional to the size of the Sokoban grid (Function 11.5).
detailed gain curve smoothed gain curve average per-problem gains
1000 1000 50

800 800 40

600 600 30
(/)
c
·caO> 400 400 20

200 200 10

O O O~--~~--------~

-200 -200
O 20 40 O 20 40 20 40
problem's number problem's number problem's number
(c) Gain is a complex function of the time and grid size (Function 11.6).
Fig. 11.11. Processing fifty-problem sequences with Functions 11.4- 11.6. We show
the results of the selection among nine representations (solid lines) , as weB as the
performance on two smaller-scale selection tasks (dotted and dash-and-dot lines).
We also show the performance of LINEAR with abstraction, which is the best avail-
able strategy (dashed lines).
292 11. Sokoban Domain

smoothed gain curve average per-problem gains

10
2 ~ 1\
, > :. \

III
C
.OJ
OJ
5 I :'", \ ..... ,
I -' .,",
~\.p~.~
. \~.

...
r. :"':'" ._....... .
-.--~

0 O~~----------------------~

-1
-5
0 100 200 300 400 o 100 200 300 400
problem's number problem's number
(a) Gain is a linear function of the running time
. with a partial reward for a failure (Function 11.4).
smoothed gain curve average per-problem gains

60

40
20
15

10
5 fr/. . . . : . : -:. :. : :.: .: -_":-:. -
'J< ..
-.// ..... ,

o .-
III
C
.OJ 0
OJ

-20 -5
-40 -10

-60 -15 L -_ _ ~ ____ ~ __ ~ ____ ~ __ ~

0 100 200 300 400 o 100 200 300 400


problem's number problem's number
(b) Reward is proportional to the size of the Soko ban grid (Function 11. 5).
smoothed gain curve average per-problem gains
1000 50

800 40

t ."'","",_.-..."""'
600 30
t ....
III
C ...........
.OJ 400 20 -'-.
'"
200

0 ~

-200
0 100 200 300 400 100 200 300 400
problem's number problem's number
(c) Gain is a complex function of the time and grid size (Function 11.6).
Fig. 11.12. Processing 500-problem sequences with Functions 11.4-11.6. We show
the results of the selection among nine representations (solid lines), the results of two
smaller-scale selection tasks (dotted and dash-and-dot lines), and the performance
of the optimal strategy (dashed lines); the legend is the same as in Figure 11.11.
11.3 Different time bounds 293

11.3 Different time bounds

We have experimented with several values of the exploration knob that con-
trols the time bounds (see Section 10.3). In Figures 11.13 and 11.14, we
show the results for two small knob values, 0.02 and 0.05. We plot the differ-
ences between the resulting gains and the gains with the default 0.1 value. In
Figures 11.15 and 11.16, we give the difference curves for two large knob val-
ues, 0.2 and 0.5. The spikes of the solid curves in Figures 11.13(a), 11.14(c),
11.15(a), and 11.16(b) are due to random fluctuations in problem difficulty.
The experiments have shown that the default 0.1 value usually ensures near-
optimal behavior (Table 11.4); the only exception is a fifty-problem experi-
ment with Function 11.1.

Table 11.4. Dependency ofthe gains on the exploration knob. We give the average
per-problem gains and the respective percentages of the default-strategy gains.
gain small knob values default large knob values
function 0.02 I 0.05 0.1 0.2 I 0.5
short problem sequences
11.1 0.171 - 0.171 - -0.077 0.171 - 0.023 -
11.3 -0.756 - -0.759 - -0.762 -0.762 - -0.762 -
11.6 25.46 (100%) 25.46 (100%) 25.46 25.46 (100%) 25.46 (100%)
long problem sequences
11.1 0.370 (112%) 0.323 (98%) 0.330 0.422 (128%) 0.262 (79%)
11.3 0.085 (25%) 0.079 (24%) 0.334 0.334 (100%) 0.334 (100%)
11.6 14.67 (97%) 15.13 (100%) 15.12 15.12 (100%) 15.12 (100%)
294 11. Sokoban Domain

detailed difference curve smoothed difference curve average per-problem diffs

10 10
I/)
Q)
0 5 5
c:
~
~ 0
"0
c:
.0;
C>
-5 -1

-10 -10 -2

0 20 40 0 20 40 o 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 11.1).
detailed difference curve smoothed difference curve average per-problem diffs
0.4
0.5 0.5
0.2
rn
~
c: Of---------'i o Of------------~
~
~ -0.2
"0
c:
-0.5 -0.5
.0;
CD -0.4
-1 -1
-0.6
-1.5 L -_ _ _ _ _ ~____' -1.5 L-_ _ _ _ _ ~____'

o 20 40 o 20 40 o 20 40
problem's number problem's number problem's number
(b) Gain linearly decreases with the time logarithm (Function 11.3).
detailed difference curve smoothed difference curve average per-problem diffs

0 0 Of-------------~

rn
Q)
0
c: -5 -5 -1
~
~
"0
c:
.0; -10 -10 -2
C>

-15 -15 -3

0 20 40 0 20 40 o 20 40
problem's number problem's number problem's number
(c) Gain is a complex function of the time and grid size (Function 11.6).
Fig. 11.13. Performance with small values of the exploration knob on fifty-problem
sequences. The solid lines represent the differences between the gains with the knob
value 0.02 and that with the knob value 0.1. The dashed lines show the differences
between the 0.05-knob gains and the O.l-knob gains.
11.3 Different time bounds 295

smoothed difference curve average per-problem differences

10 2
II)
Q)
5
"c:~
~
'i5
o ' --~

c:
·iii -5 -1
Cl

-10 -2

0 100 200 300 400 o 100 200 300 400


problem's number problem's number
(a) Gain linearly decreases with the running time (Function 11.1).
smoothed difference curve average per-problem differences
0.4
0.5
0.2
~
c: 0 0
i!?
:@ -0.2
"0 -0.5
c:
·iii
Ol -0.4
-1
-0.6
-1.5L--~--~-~--~--'
o 100 200 300 400 0 100 200 300 400
problem's number problem's number
(b) Gain linearly decreases with the time logarithm (Function 11.3).
smoothed difference curve average per-problem differences

o ~-~-~--------~
~
c: -5
~
:@
"0
c: -10
·iii
C)

-15

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(c) Gain is a complex function of the time and grid size (Function 11.6).
Fig. 11.14. Performance with knob values 0.02 and 0.05 on 500-problem sequences.
We plot the smoothed gain-difference curves (left) and cumulative differences (right)
using the legend of Figure 11.13.
296 11. Sokoban Domain

delailed difference curve smoolhed difference curve average per -problem diffs

10 10 2
<J)
Q)
u 5 5
cQ)
Q;
:t: O
'5
c
'Cii -5 -1
Ol

-10 -10 -2

O 20 40 O 20 40 o 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 11.1) .
delailed difference curve smoothed difference curve average per-problem diffs
004

0.5 0.5
0.2
<J)

c~ 01---------1 O Ol-----------~
~
~ -0.2
'5 -0.5 -0.5
c
'Cii
Ol -004
-1 -1
-0.6
-1.5"-----------' -1 .5
O 20 40 O 20 40 O 20 40
problem's number problem's number problem's number
(b) Gain linearly decreases with the time logarithm (Function 11.3) .
delailed difference curve smoolhed difference curve average per-problem diffs

O O Ol------------~

<J)
Q)
u
c -5 -5 -1
~
~
'5
c -10 -10 -2
'Cii
Ol

-15 -15 -3

O 20 40 O 20 40 O 20 40
problem's number problem's number problem's number
(c) Gain is a complex function of the time and grid size (Function 11.6).
Fig. 11.15. Processing fifty-problem sequences with large knob values. We show
the differences between the D.5-knob gains and D.1-knob gains (solid lines), as well
as the differences between the D.2-knob gains and D.1-knob gains (dashed lines).
11 .3 Different time bounds 297

smoothed difference curve average per-problem differences

10 2
(/)
Q)
u 5
cQ)
Oi
--.., ...... _----
:t: o
'5
c
'ia -5 -1
O>

-10 -2

O 100 200 300 400 o 100 200 300 400


problem's number problem's number
(a) Gain linearly decreases with the running time (Function 11.1).
smoothed difference curve average per-problem differences
0.4
0.5
0.2
(/)
Q)
g O
_ _~_~J\.. O
~
~ -0.2
'5 -0.5
c
'ia
O> -0.4
-1
-0.6
-1.5'---~--~--~--~---'
O 100 200 300 400 O 100 200 300 400
problem's number problem's number
(b) Gain linearly decreases with the time logarithm (Function 11.2) .
smoothed difference curve average per-problem differences

o Of-----------------~
(/)
Q)

g -5 -1
~
~
'5
c -10 -2
'ia
O>

-15 -3

o 100 200 300 400 O 100 200 300 400


problem's number problem's number
(c) Gain is a complex function of the time and grid size (Function 11.6) .
Fig. 11.16. Processing 500-problem sequences with knob values 0.5 and 0.2. We
plot the smoothed gain-difference curves (left) and cumulative differences (right)
using the legend of Figure 11.15.
12. Extended Strips Domain

The STRIPS world is larger than the other domains; it includes ten ob-
ject types and twenty-three operators (Figures 3.38-3.41). We have tested
SHAPER with two linear and one nonlinear gain functions (Figure 12.1).

12.1 Small-scale selection


We consider limited libraries of solvers and changers, which give rise to small
representation spaces.
Four descriptions. The first experiment involves two solver operators and
three changer operators (Figure 12.2a,b), which lead to four representations
(Figure 12.2c). We show the results of processing fifty-problem sequences
in Figure 12.4, and the results for 500-problem sequences in Figure 12.5.
The graphs include the learning curves (solid lines) and the performance of
three fixed strategies with optimal time bounds: the search with primary ef-
fects (dotted lines), abstraction (dashed lines), and problem-specific abstrac-
tion (dash-and-dot lines). The search without primary effects yields negative

Gain linearly decreases with the running time:

I - time, if success
. (12.1)
gn ={ -time, if failure or interrupt

Gain linearly decreases with the time and solution cost:

100 - cost - 50 . time, if success and cost < 100


gn ={ -50 . time, otherwise (12.2)

Gain is a nonlinear function of the time and solution cost:


100 - cost· time, if success and cost < 50
{ 100 - 50 . time, if success and cost> 50
(12.3)
gn = 50 - 50 . time, if failure -
-50 . time, if interrupt

Fig. 12.1. Gain functions in the Extended STRIPS domain.

E. Fink, Changes of Problem Representation


© Springer-Verlag Berlin Heidelberg 2002
300 12. Extended Strips Domain

gains, and we do not show its performance. The problem-independent ab-


straction is the most effective. When the system runs with the linear func-
tions, it correctly chooses this abstraction with appropriate time bounds. On
the other hand, it selects the second best description for Function 12.3.
Three solvers. The next experiment involves the LINEAR, SAVTA, and SABA
solvers without cost bounds. The system inputs a description with primary
effects and abstraction, and pairs it with the three solvers (Figure 12.3a).
In Figures 12.12 and 12.13, we show the learning curves (solid lines) and
the performance of each solver (the other lines). The outcome is similar to
the previous experiment; SHAPER finds the right strategy for Functions 12.1
and 12.2, but it chooses the second best solver for Function 12.3.
Three cost bounds. We have also experimented with three versions of
LINEAR (Figure 12.3b). The first version runs without cost bounds; the second
utilizes loose bounds, which are twice larger than optimal-cost estimates; the
third employs tight bounds, which closely approximate the optimal costs. We
present the learning results in Figures 12.8 and 12.9; the system converges to
the right choice of a solver and time bound in all three cases.
Summary. The system has found appropriate strategies for the linear gain
functions, but made suboptimal choices for the nonlinear function. The learn-
ing behavior is similar to the results in the Machining and Sokoban domains;
the system converges to its final strategy after solving 50 to 200 problems. In
Table 12.1, we summarize the cumulative gains and the respective percent-
ages of the optimal-strategy gains. The percentage values for fifty-problem
sequences range from 17% to 84%, and the values for 500-problem sequences
are between 68% and 93%, which is similar to the Sokoban results.

Table 12.1. Summary of the experiments with small-scale selection tasks. We list
the per-problem gains and the respective percentages of the optimal-strategy gains.
gain
function four
descriptions
1
selection among
threeJ
solvers
three
cost bounds
optimal
gain

short problem sequences


12.1 0.048 (17%) 0.144 (50%) 0.162 (56%) 0.288
12.2 11.1 (50%) 18.7 (84%) 8.8 (39%) 22.3
12.3 34.5 (55%) 52.1 (83%) 44.5 (71%) 62.6
long problem sequences
12.1 0.164 (68%) 0.182 (76%) 0.214 (89%) 0.240
12.2 14.4 (71%) 18.4 (91%) 18.3 (91%) 20.2
12.3 40.6 (69%) 50.8 (87%) 54.5 (93%) 58.6
12.1 Small-scale selection 301

IChooser HCompleter I
I cost bound
LINEAR wlo I
Applicability conditions:
Description has no primary effects
Abstractor has not been applied
No applicability conditions
IAbstractor I
II Relator Chooser Abstractor LINEAR
cost bound
I wlo
Applicability conditions:
Description has no abstraction

Applicability conditions:
Description has no primary effects and no abstraction
I Margie HCompleter HAbstractor I
Applicability conditions:
(a) Solver operators. Description has no primary effects
and no abstraction
(b) Changer operators.
--------------- ---------------------
( initial
~ ~
1 II cost bound
wlo I Relator
ICompleter
Chooser I
LINEAR
Chooser
( initial ) Abstractor
~ not shown
LINEAR wlo
cost bound
( prim 1.
~ ( initial )
Margie
Completer II cost boundwlo I dash-and-dot lines
LINEAR
Abstractor
I ( I!rim )
~
l ~rimand
ierarchy ) J
dotted lines
~

II cost bound
LINEAR wlo I

It trim and J
·erarchy I
I
I
dashed lines I

-space of descriptions - - - space of representations - - ~


(c) Expanded spaces.

Fig. 12.2. Experiments with the LINEAR search. The subscriptions in the repre-
sentation space specify the corresponding curves in Figures 12.4 and 12.5.
302 12. Extended Strips Domain
----------I r --------------------------------

prim and "


hierarchy

dashed lines dotted lines dash-and-dot lines


, space of ,
- (J.escriptiOlis- - - - - - - - space of representations - - - - - - - -'

(a) Search without cost bounds; subscriptions refer to Figures 12.6 and 12.7.

~---------lr--------------------------------1

prim and "


hierarchy

~====~ ~====~ \::=:====~ ,


, dashed lines dotted lines dash-and-dot lines:
space of !.. _______ space of representations - - - - - - - -'
- descriptions

(b) LINEAR with different bounds; subscriptions refer to Figures 12.8 and 12.9.

Fig. 12.3. Experiments without description changers. The first task is to select a
search engine (a), and the second is to choose a cost-bound heuristic (b).
12.1 Small-scale selection 303

detailed gain curve smoothed gain curve average per-problem gains

0.5 !I

0.5
~ 1\ il
II
If)
c:
'n; 0
'"
-0.5
~ ~
-1 f---' -1~ ____~____~__~ -0.5 ~-'-__~_______--'
o 20 40 o 20 40 o 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 12.1).
detailed gain curve smoothed gain curve average per-problem gains
100 100

VN 40 \

\~ ,
50 20 ~/,:.j:J:,c:~. ':..,:-.::.:-::...

0 \I
II'
If)
c:
'n;
'" -50
-20

-40
-100 -100
-60
o 20 40 o 20 40 o 20 40
problem's number problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 12.2).
detailed gain curve smoothed gain curve average per-problem gains
150 80.
150
: I, ~---\
60 . II , / "''''
100 100 :.ir' /~.". \ .... ....:.-
~
r~
.:"';'
40 ","

50 50
:·.i i..'
If)
c:
20
'n;
'" 0 0 0

-50 -50 -20


-40
-100 -100
-60
0 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(c) Gain is a nonlinear function of the time and solution cost (Function 12.3).
Fig. 12.4. Selection of a description on fifty-problem sequences. We show the raw
gains (left), smoothed gain curves (solid lines, middle), and cumulative gains (solid
lines, right). We also give the performance of each available description: the search
with primary effects (dotted lines), abstraction (dashed lines), and problem-specific
abstraction (dash-and-dot lines).
304 12. Extended Strips Dornain

smoothed gain curve average per-problem gains


0.5

0.5 ...... -""'" . -- ..... :-: - :-..- -- -


- - . .-:' -
"-
CI) .~_ . - - '- ' - ' - ' - ' - ' -
c:
'iij O o
O>

-0.5

-1 -0.51..1...-_~_ _~_~_ _~_--'


O 100 200 300 400 o 100 200 300 400
problem's number problem's number
(a ) G ain linearly decreases with t he running tirne (Function 12. 1) .
smoothed gain curve average per-problem gains
100
40
50
20
01
l':':~' ~ ' - .."". -.-..-.' ._.-'-._.-.._. _."
CI)
c: O
'iij
O>
- 20
-50
-40
-100
-60
O 100 200 300 400 o 100 200 300 400
problem's number problem's number
(b) Gain linearly decreases wit h t he tirne a nd solut ion cost (Funct ion 12.2).
smoothed gain curve average per-problem gains
80
150
/'
o., ' '\~. ---------- - --
60
100
40 : ~"

CI)
c:
20
'iij
O> O

- 20
-50
-40
-100
-60
O 100 200 300 400 O 100 200 300 400
problem's number problem's number
(c) Gain i s anonlinear fun ction of t he t irne and solution cost (Function 12.3).
Fig. 12.5. Selection of a descript ion on 500-problern sequences . We s how t he le arn-
ing c urves (solid lines) and t he p erforrnance of each descript ion (the other lines)
using t he egend
l of Figure 12.4.
12.1 Small-scale selection 305

detailed gain curve smoothed gain curve average per-problem gains


0.5 I

II)
0.5 ~ ~ 0.5

c:
'iij 0 0 0
'" I
/
J

-0.5 -0.5 I'


./
\.
I
-1 -1 -0.5
0 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 12.1).
detailed gain curve smoothed gain curve average per-problem gains
100 100

50

I.
~~~ ~ M\
II)
c: 0 'V' I
'iij
C)

-50 -50
-40
-100 -100
-60
o 20 40 o 20 40 o 20 40
problem's number problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 12.2).
detailed gain curve smoothed gain curve

rr (\
150 150

100 100
1\ ~ f\A
50 50 20
'"c:
'iij \/
\ .
/
/

'" 0 0 0

-50 -50 -20


-40
-100 -100
-60
0 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(c) Gain is a nonlinear function of the time and solution cost (Function 12.3).
Fig. 12.6. Choosing among LINEAR, SAVTA, and SABA on fifty-problem sequences.
We compare the learning curves (solid lines) with the performance of LINEAR
(dashed lines), SAVTA (dotted lines), and SABA (dash-and-dot lines).
306 12. Extended Strips Domain

smoolhed gain curve ave rage per-problem gains


0.5

0.5
1R5'::-' "':":-::'':::'=:::=-':-:-::'''~-=::''
(/)
<:
'iij Or.,------------------------~
Ol

-0.5

-1 -0.5 '--__~____~__~____~______'
O 100 200 300 400 o 100 200 300 400
problem's number problem's number
(a) Gain linearly decreases with the running time (Function 12.1) .
smoolhed gain curve average per-problem gains
100
40
50
20~~~~~-~.~~~~~~~~~
(/)
<: O O~----------------------~
'iij
Ol
-20
-50
-40
-100
-60
O 100 200 300 400 o 100 200 300 400
problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 12.2).
smoolhed gain curve average per-problem gains
80r---~----~--~--------_.
150

100

(/)
50 20
<:
'iij
Ol O Or---------------------~

-50 -20
-40
-100
-60'----~----~--~----~------'
O 100 200 300 400 O 100 200 300 400
problem's number problem's number
(c) Gain is a nonlinear function of the time and solution cost (Function 12.3).
Fig. 12.7. Choosing among LINEAR , SAVTA , and SABA on 500-problem sequences.
We show the learning curves (solid lines) and the performance of each solver (the
other lines) using the legend of Figure 12.6.
12.1 Small-scale selection 307

detailed gain curve smoothed gain curve ave rage per-problem gains
0.5

~
1\/ ' \ ...........
0.5 0.5 / v , / ..... -'"

<J)
li.'
c: O~~V~ __________~
'(ii O O
O>

-0.5 -0.5

-1 -1 -0.5 '--____~___~_ _'


O 20 40 O 20 40 O 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 12 .1).
detailed gain curve smoothed gain curve average per-problem gains
100 100
M
50

\. \~ r 1\ I~I\
O
Uw \
<J)
c:
'(ii \
O>
-50 -50
-40
-100 -100
-60
O 20 40 O 20 40 O 20 40
problem's number problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 12.2).
detailed gain curve smoothed gain curve average per-problem gains

--
150 150 80
A---\
60
100 100

<J)
50 I\r~ ~
50
40
20
c:
'(ii
O> O O O

-50 -50 -20


-40
-100 -100
-60
O 20 40 O 20 40 O 20 40
problem's number problem's number problem's number
(c) Gain is a nonlinear function of the time and solution cost (Function 12.3) .
Fig. 12.8. Selection among three cost-bound heuristics on fifty-problem sequences
(solid lines) . These heuristics include unbounded search (dashed lines) , loose cost
bounds (dotted lines) , and tight cost bounds (dash-and-dot li nes ).
308 12. Extended Strips Domain

smoothed gain curve average per-problem gains


0.5

0.5
~..-----
Ul
c:
'cClu 0 Or---------------------~

-0.5

-1 -0.5
~--~----~--~----~--~
0 100 200 300 400 o 100 200 300 400
problem's number problem's number
(a) Gain linearly decreases with the running time (Function 12.1).
smoothed gain curve average per-problem gains
100
40
50
20 I.' \. - - ~---....:-=-:::...:::....::...:::...:=-=..;=..;"4
~"":" ~', '...:.::"':" : :...::':"'" .
Ul
c: 0 O~/~----------------------~
'cClu
-20
-50
-40
-100
-60
0 100 200 300 400 o 100 200 300 400
problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 12.2).
smoothed gain curve average per-problem gains
80r---~----~--~----~--~
150
/ ....
60 ' - - - - - - - - - ____ _
100
40'~
Ul
c:
20 !': ~
'cClu oi
0

-50 -20
-40
-100
-60L---~----~--~----~--~
0 100 200 300 400 o 100 200 300 400
problem's number problem's number
(c) Gain is a nonlinear function of the time and solution cost (Function 12.3).
Fig. 12.9. Selection among three cost-bound heuristics on 500-problem sequences.
The legend is the same as in Figure 12.8.
12.2 Large-scale selection 309

12.2 Large-scale selection

If SHAPER uses the solvers in Figure 12.10 and the changers in Figure 12.2(b),
it generates twelve representations (Figure 12.11), and it identifies the right
representation and time bound after processing 150 to 300 problems (dotted
lines in Figures 12.12 and 12.13). We compare the learning curves with a
smaller-scale experiment (dash-and-dot lines), which involves the four repre-
sentations in Figure 12.2, and with the best available strategy (dashed lines),
which is based on LINEAR with problem-independent abstraction.
We have also experimented with eighteen solver operators, which include
the operators in Figure 12.10 and their combinations with the cost-bound
heuristics. The system has paired the initial description with all eighteen
solver operators, and the other two descriptions with nine operators, thus
producing thirty-six representations. It has correctly determined that LINEAR
with abstraction and no cost bounds is the best choice for all three gain
functions, and found the right time bounds for the linear functions. When
SHAPER has run with the nonlinear gain function, it has chosen a wrong time
bound (0.92 seconds), which is smaller than the optimal (1.55 seconds).
In Figures 12.12 and 12.13, we give the learning curves (solid lines),
and compare them with the optimal strategy (dashed lines) and with two
smaller-scale selection tasks (dotted and dash-and-dot lines). In Figures 12.14
and 12.15, we compare the same curves with the results of Section 12.1. The
large representation space causes greater initial losses and slower convergence;
the system processes 300 to 500 problems before choosing its final strategy.
The tests have confirmed SHAPER'S ability to choose from a sizable suite of
strategies. The system discards most representations in the beginning of the
learning process, and then gradually selects among near-optimal strategies.
In Table 12.2, we list the gains obtained with thirty-six representations and
compare them with the results of smaller-scale selection tasks.

Table 12.2. Average per-problem gains in the experiments with thirty-six repre-
sentations, and analogous results for two smaller-scale selection tasks.
gain selection among optimal
function thirty-six twelve four gain
representations representations descriptions (dashed
(solid lines) (dotted lines) (dash-and-dot lines) lines)
short problem sequences
12.1 -0.128 - -0.104 - 0.048 (17%) 0.288
12.2 -29.5 - -17.4 - 11.1 (50%) 22.3
12.3 -6.3 - 10.6 (17%) 34.5 (55%) 62.6
long problem sequences
12.1 0.108 (45%) 0.145 (60%) 0.164 (68%) 0.240
12.2 11.7 (58%) 14.8 (73%) 14.4 (71%) 20.2
12.3 38.5 (66%) 43.9 (75%) 40.6 (69%) 58.6
310 12. Extended Strips Domain

I cost bound
LINEAR wlo I I Relator Chooser Abstractor LINEAR wlo
cost bound
I
No applicability Applicability conditions:
conditions Description has no primary effects and no abstraction

I cost bound
SAVTA wlo I I Relator Chooser Abstractor SAVTA wlo
cost bound
I
No applicability Applicability conditions:
conditions Description has no primary effects and no abstraction

SABA wlo J I Relator


I cost bound Chooser Abstractor SABA wlo I
cost bound
No applicability Applicability conditions:
conditions Description has no primary effects and no abstraction

Fig. 12.10. Six solver operators.


---------------1~--------------------------------

l
I

initial " I

~
I
E,\ " ~ ~ I
I
"
"
"
LINEAR wlo
cost bound
SAVTA wlo
cost bound
I I cost
SABA wlo
bound
I
I
I
" I
"
::
"
( initial ) ( initial ) ( initial )
I
I
I

~
I
I
I

~ ~ I
I
I
Relator Relator Relator I
I
Chooser Chooser Chooser
Abstractor Abstractor Abstractor
LINEAR wlo SAVTA wlo SABA wlo
ICompleter
Chooser I cost bound cost bound cost bound
( initial ) ( initial ) ( initial )
W
( prim
~ ~ ~
Margie
Completer
LINEAR wlo
cost bound
SAVTA w/o.1
cost bound
II SABA wlo
cost bound
Abstractor
~
( prim ) ( !2rim ) ( I!rim )
( Krimand
ierarchy
" ~ ~ ~
"
"
"
LINEAR wlo
cost bound
SAVTA wlo
cost bound
I II SABA wlo
cost bound
"
"
" Krimand K[imand ) I( Krimand
" ierarchy 'erarchy ierarchy
"
"
"
-space of descriptions _,!.. - - - - - - - space of representations -------•

Fig. 12.11. Space of twelve representations.


12.2 Large-scale selection 311

detailed gain curve smoothed gain curve average per-problem gains


0.5

.,c::
0.5
~
m
'iii 0
'"
-0.5

-1 f- ~ ~
-0.5 '--.J..L-">L_ _ _~---l
o 20 40 o 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 12.1).
detailed gain curve smoothed gain curve average per-problem gains
100 100
40 1
1
1,.,./"" ..... - ................

~
50 20 1
I
...... " ......
.,c:: ~.
1
0 o~~--~~------~
\1 .",1'
'iii
OJ
-20
-50
-40
-100 f - '-- L-

-60
o 20 40 o 20 40 o
problem's number problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 12.2).
detailed gain curve smoothed gain curve average per-problem gains
150 150 80
, ~---\ ~~

r
60 /

100 100 I"


1
tA 40 ..... - ...... ........
.,c:: I~
"\

,
/
50 50 20 ."./
'iii
OJ 0 1/ 0 0

-50 -50 -20


-40
-100 ._ '-- L-
-100
-60
o 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(c) Gain is a nonlinear function of the time and solution cost (Function 12.3).
Fig. 12.12. Comparison of three selection tasks on fifty-problem sequences. The
tasks are to choose among thirty-six representations (solid lines), among the twelve
representations in Figure 12.11 (dotted lines), and among the four descriptions in
Figure 12.2 (dash-and-dot lines). We also show the performance ofthe best available
representation (dashed lines).
312 12. Extended Strips Domain

smoothed gain curve average per-problem gains


0.5

0.5 / ..... ""- - -- - - - - - - - --


:-:;~:.-;-.'.
"". ',7':: -:. -:-.'.~.
Ul
c:
·iii
o ('.:.:.".: .. /-
'"
,:.
-0.5
,:
-1 -0.5L..'L..·_~_ _~_ _~_ _~_--'
0 100 200 300 400 o 100 200 300 400
problem's number problem's number
(a) Gain linearly decreases with the running time (Function 12.1).
smoothed gain curve average per-problem gains
100
40
50 20 I\.- - - - - - - - - - - - -

Ul
c:
·iii
'"
o
-20
r'/-·.~··-·-::~·
i···
,:
-40 i:·
-100
-60 lli-_
' _ ~ __ ~ __ ~ __ ~ __ ~

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 12.2).
smoothed gain curve average per-problem gains
BOr--~--~-~----~---.
150
60 \ . - - - - - - - - __ - - -

2: (.p.
100
40 -'..:.'-"''-': .-.-.
Ul
c:
·iii
'"
-50 ,::
-20!if:
-40 i:
-100
-60UL--~----~--~--~~--~
0 100 200 300 400 o 100 200 300 400
problem's number problem's number
(c) Gain is a nonlinear function of the time and solution cost (Function 12.3).
Fig. 12.13. Comparison of three selection tasks on 500-problem sequences. The
tasks are to choose among thirty-six representations (solid lines), among twelve
representations (dotted lines), and among four descriptions (dash-and-dot lines).
We also show the performance of the best available representation (dashed lines).
12,2 Large-scale selection 313

detailed gain curve smoothed gain curve average per-problem gains


0.5

Ul
0.5
~
~ 0,5

c:
'mC> 0

-0.5 -0,5

-1 1----' '----' '-- -1~ ____~____~__~ -0.5 ~...w..---"<""""_______---'


o 20 40 o 20 40 o 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 12,1),
detailed gain curve smoothed gain curve average per-problem gains
100 100
40 i
\
1\. ..
~
50 50 20 \ "

i :" I .J ": :::;.,.. :-.... ~-


0 V 0 0
, h~ .11./
Ul
c:
'mC> II -20 II
I
-50 -50 ij I
-40 I
-100 1----' ~ '-- -100
-60
o 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 12,2),
detailed gain curve smoothed gain curve average per-problem gains
150 150 80

100 100
60 , / \''; ":":"",:\.

50 :~
~ if" 50
40 I: I
i: J
,I
/
""'\.'\.

Ul
c:
20 I ·"'.1
'mC> I
0 0 0

-50 -50 -20


-40
-100 1_ '-- L-. -100
-60
o 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(c) Gain is a nonlinear function of the time and solution cost (Function 12.3),
Fig. 12.14. Selection among thirty-six representations on fifty-problem sequences,
We plot the learning curves (solid lines) along with analogous data for three small-
scale tasks: the choice among four descriptions (dash-and-dot lines), among three
solvers (dotted lines), and among three time-bound heuristics (dashed lines),
314 12. Extended Strips Domain

smoothed gain curve average per-problem gains


0.5

-,\ ;' - - - - -____


..-::-;--~---'~.
I
0(
. ./ /"'
7.:', '':-.:-:: ':''->_:_':''_'_'"'_''_'~_''_'~
'""
'iii
Cl

_~ ~ 300~ ~_-'
-0.5
i

-1~__~____~__~~__~__~ -0.5 ,--i1... __ __ __

o 100 200 300 400 o 100 200 400


problem's number problem's number
(a) Gain linearly decreases with the running time (Function 12.1).
smoothed gain curve average per-problem gains
100
40
50
20

!
°v·r~ ---
!IJ

"
.~

-20
-40 i
-100
-60 I
~--~--~--~--~--~
o 100 200 300 400 o 100 200 300 400
problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 12.2).
smoothed gain curve average per-problem gains
80r---~----~--~----~--~
150

100

!IJ

"
'iii
'"
-50

-100
0 100 200 300 400
problem's number problem's number
(c) Gain is a nonlinear function of the time and solution cost (Function 12.3).
Fig. 12.15. Selection among thirty-six representations on 500-problem sequences.
We plot the learning curves (solid lines) along with analogous data for the choice
among four descriptions (dash-and-dot lines), among three solvers (dotted lines),
and among three time-bound heuristics (dashed lines).
12.3 Different time bounds 315

12.3 Different time bounds

We have re-run the large-scale selection experiments with different values of


the exploration knob using the same setup as in Section 10.3. We give the re-
sults for small knob values in Figures 12.16 and 12.17, and the results for large
knob values in Figures 12.18 and 12.19. For each value, we plot the differ-
ence between the resulting gains and the default-setting gains. The tests have
confirmed that the default setting gives near-optimal results (Table 12.3).

Table 12.3. Average per-problem gains for different values of the exploration knob,
along with the corresponding percentages of the default-strategy gains.
gain small knob values default large knob values
function 0.02 I 0.05 0.1 0.2 I 0.5
short problem sequences
12.1 -0.119 - -0.150 - -0.128 -0.119 - -0.124 -
12.2 -33.0 - -32.9 - -29.5 -29.1 - -29.7 -
12.3 -9.68 - -10.46 - -6.27 -7.14 - -6.35 -
long problem sequences
12.1 0.106 (98%) 0.088 (81%) 0.108 0.110 (102%) 0.096 (89%)
12.2 9.3 (79%) 10.1 (86%) 11.7 10.5 (90%) 8.9 (76%)
12.3 33.5 (87%) 37.6 (98%) 38.5 40.9 (106%) 39.2 (102%)
316 12. Extended Strips Domain

detailed difference curve smoothed difference curve average per-problem diffs

0.6 0.6 0.06


0.4 0.4 0.04

c~ 0.2 0.2 0.02


l!!
~ of--------------~~~ 0 0
"C ~,
.~ -0.2 -0.2 -0.02
0>
-0.4 -0.4 -0.04
-0.6 -0.6 -0.06

o 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 12.1).
detailed difference curve smoothed difference curve average per-problem diffs
40r-----~--------~ 40r---------------~ 4

20 20 2

c~
I
Of--------------4 0

\
Of--------------~
l!!
~ -20
c
'0;
N -20 -2

0>-40 -40 -4

-60 -60 -6
o 20 40 0 20 40 o 20 40
problem's number problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 12.2).
detailed difference curve smoothed difference curve average per-problem diffs
50 50 5

I/)
'M
21
c
0 0
~
0
\:t.
I'
l!! \ \
'"
!!:
"C -50 -50 -5
c "
'0;
Cl "
-100 " -100 -10

0 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(c) Gain is a nonlinear function of the time and solution cost (Function 12.3).
Fig. 12.16. Performance with small knob values on fifty-problem sequences. The
solid lines show the differences between the D.D2-knob gains and D.1-knob gains. The
dashed lines mark the differences between the D.D5-knob gains and D.1-knob gains.
12.3 Different time bounds 317

smoothed difference curve average per-problem differences

0.6 0.06
0.4 0.04
~
c: 0.2 0.02
~
~ 0 0
"0
\
.~ -0.2 -0.02
o
-0.4 -0.04
\/~\I
-0.6 -0.06 I

o 100 200 300 400 0 100 200 300 400


problem's number problem's number
(a) Gain linearly decreases with the running time (Function 12.1).
smoothed difference curve average per-problem differences
4.-------------------------,
20 2

~
c: 0
~

~ -20
c:
.(ij
0-40

-60 -6

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 12.2).
smoothed difference curve average per-problem differences
50 5

III
~
0 o
c:
~ ,\./ / ./
Q)
!E -50 -5 I
"0
c:
.(ij
0

-100 -10

0 100 200 300 400 o 100 200 300 400


problem's number problem's number
(c) Gain is a nonlinear function of the time and solution cost (Function 12.3).
Fig. 12.17. Performance with small knob values on 500-problem sequences. We plot
the differences between the O.02-knob gains and O.l-knob gains (solid lines), as well
as the differences between the O.05-knob gains and O.l-knob gains (dashed lines).
318 12. Extended Strips Domain

detailed difference curve smoothed difference curve average per-problem diffs

0.6 0.6 0.06


0.4 0.4 0.04
Ul
~ 0.2
c: 0.2 0.02
~
£ of----------------~
IT
I
0 0
'6
.1ij -0.2
, -0.2 -0.02
'" -0.4 -0.4 -0.04
-0.6 -0.6 -0.06

o 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 12.1).
detailed difference curve smoothed difference curve average per-problem diffs
40 40 4

J
20 20 2
Ul
~
c:
~
0 0 -'"'\ 0
CD
l§ -20 -20 -2
c:
.0;
"'-40 -40 -4

-60 -60 -6

0 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 12.2).
detailed difference curve smoothed difference curve average per-problem diffs
50 50 5

Ul
Q)
0
0 o Of--------------~~
c:
~
~ -50
"t:J -50 -5
c:
.0;

'" -100 -100 -10

0 20 40 o 20 40 o 20 40
problem's number problem's number problem's number
(c) Gain is a nonlinear function of the time and solution cost (Function 12.3).
Fig. 12.18. Processing fifty-problem sequences with large knob values. We plot
the differences between the D.5-knob gains and D.1-knob gains (solid lines), as well
as the differences between the D.2-knob gains and D.1-knob gains (dashed lines).
12.3 Different time bounds 319

smoothed difference curve average per-problem differences

0.6 0.06
0.4 0.04
~
t: 0.2 0.02
i!!
~ 0 0
"C
.~ -0.2 -0.02
C>
-0.4 -0.04
-0.6 -0.06

o 100 200 300 400 0 100 200 300 400


problem's number problem's number
(a) Gain linearly decreases with the running time (Function 12.1).
smoothed difference curve average per-problem differences
40r---------~--------~--, 4r---~----~----~--~----'

20 2

~
t: 0 o
i!!
~ -20 -2
t:
·iii
"'-40 -4

-60 -6

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 12.2).
smoothed difference curve average per-problem differences
50 5

/ /
\.

'"2lt: 0 o I
J

I
i!!
II) I \ I
:s:
'0 -50 -5
t:
·iii
C>

-100 -10

0 100 200 300 400 o 100 200 300 400


problem's number problem's number
(c) Gain is a nonlinear function of the time and solution cost (Function 12.3).
Fig. 12.19. Processing 500-problem sequences with large knob values. We plot the
differences between the O.5-knob gains and O.l-knob gains (solid lines), as well as
the differences between the O.2-knob gains and O.l-knob gains (dashed lines).
13. Logistics Domain

The last series of experiments is based on the Logistics Domain [Veloso, 1994]'
which includes eight object types, six operators, and two inference rules (Fig-
ure 3.43). We have experimented with three linear gain functions and three
nonlinear functions (Figure 13.1).

13.1 Selecting a description and solver

We have experimented with the representation spaces in Figures 13.2 and 13.3.
First, SHAPER has run with three changers and generated two descriptions
(Figure 13.2). We have tested the system with the linear gain functions, and
it has chosen the right description and time bound for each function (Fig-
ures 13.4 and 13.5). The second experiment has involved three solvers with
abstraction (Figure 13.3). The system has correctly determined that LINEAR
is the best solver, and found the right time bound for each gain function
(Figures 13.6 and 13.7). It has converged to the right strategy after process-
ing thirty to hundred problems. In Table 13.1, we summarize the gains and
compare them with the optimal-strategy results.

Table 13.1. Average per-problem gains in the small-scale selection experiments


(Figures 13.4-13.7) and the respective percentages of the optimal-strategy gains.
gain selection among optimal
function two I three
descriptions solvers gain
short problem sequences
13.1 0.325 (77%) 0.225 (53%) 0.424
13.2 13.6 (75%) 13.9 (77%) 18.1
13.3 55.4 (84%) 54.6 (83%) 65.7
long problem sequences
13.1 0.397 (95%) 0.360 (87%) 0.416
13.2 21.6 (95%) 19.1 (84%) 22.8
13.3 58.3 (97%) 54.2 (90%) 60.1

E. Fink, Changes of Problem Representation


© Springer-Verlag Berlin Heidelberg 2002
322 13. Logistics Domain

Linear functions
Gain linearly decreases with the running time:

1 - time, if success
{ (13.1)
gn = -time, if failure or interrupt

Gain depends on the time and solution cost:

100 - cost - 50 . time, if success and cost < 100


gn = { -50 . time, otherwise (13.2)

Reward is proportional to the number of packages:

50. n-packs - cost - 50 . time, if success and cost < 50 . n-packs


{ (13.3)
gn = -50 . time, otherwise

Nonlinear functions
Gain linearly decreases with the cube of the running time:

1 - time3 , if success
gn ={ -time3 , if failure or interrupt (13.4)

SHAPER has to find a solution with cost less than 50:

2 - time, if success and cost < 50


gn = { -time, otherwise (13.5)

Payment for the unit time is proportional to the number of packages:

4 - n-packs . time, if success


{ (13.6)
gn = -n-packs . time, if failure or interrupt

Fig. 13.1. Gain functions in the Logistics Domain.


13.1 Selecting a description and solver 323
----------1
,
IChooser HCompleter I
Applicability conditions:
Description has no primary effects
Abstractor has not been applied

IAbstractor I dash-and-dot lines


Applicability conditions:
Description has no abstraction

I Margie HCompleter HAbstractor I ( hierarchy)


Applicability conditions:
Description has no primary effects dashed lines
and no abstraction
space of space of
- descriptions representations
(a) Changer operators. (b) Expanded spaces.

Fig. 13.2. Experiments with three changer operators. Abstractor builds a four-level
hierarchy, whereas the other changers fail to generate new descriptions.

( hierarchy
~ ~ ~
I cost bound
LINEAR wlo I II SAVTA wlo I II
cost bound
SABAwlo I
cost bound
( hierarchl: ) ( hierarchy ) ( hierarchl: )
dashed lines dotted lines dash-and-dot lines
~ space of _ - - - - - - - space of representations - - - - - - - -'
- descriptions

Fig. 13.3. Experiments without description changers.


324 13. Logistics Domain

detailed gain curve smoothed gain curve average per-problem gains


0.6 I

0.4

0.5 0.5
0.2
rn
c: .....
'iii 0 0
OJ

-0.2
-0.5 -0.5
-0.4
-1 -1
0 20 40 0 20 40 o 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 13.1).
detailed gain curve smoothed gain curve average per-problem gains

100 100
40

50 50
rn 20
c:
'iii
~
C)

0 0 0
V~ W
-20
-50 -50
0 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 13.2).
detailed gain curve smoothed gain curve average per-problem gains
.\'
200 200 100 I~\\r .... ,
I / ....
80 ....
150 150

I~
d}
~ 100 100
'iii
C) 50 50

o \A 0
V Y
-50 -50 -20

o 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(c) Reward is proportional to the number of packages (Function 13.3).
Fig. 13.4. Choosing between the initial description and abstraction on fifty-
problem sequences. We show the learning curves (solid lines), and the performance
of the initial description (dash-and-dot lines) and abstraction (dashed lines).
13.1 Selecting a description and solver 325

smoothed gain curve average per-problem gains


0.6rr----------------,
c

:::r~----
'"t:
\
.-- r·-·-
.,
"'\
.0; 0 O~~------------------~
Cl

-0.2
-0.5
-0.4
-1
0 100 200 300 400 o 100 200 300 400
problem's number problem's number
(a) Gain linearly decreases with the running time (Function 13.1).
smoothed gain curve average per-problem gains

100
40 I
I
.1
20!~-. __ ~,:"_

O~-----------~

-20
-50 L -_ _ ~ __ ~ __ ~ __ ~ __ ~

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 13.2).
smoothed gain curve average per-problem gains

200 100 \
\
80 \
150
60f(\~~~~-~-~---~-~~~~~~
40
.,/._ ........ /
20
O~~--------------~--~ O~--------------------~

-50 -20

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(c) Reward is proportional to the number of packages (Function 13.3).
Fig. 13.5. Choosing between the initial description and abstraction on 500-problem
sequences. We plot the smoothed gain curves (left) and cumulative gains (right)
using the legend of Figure 13.4.
326 13. Logistics Domain

detailed gain curve smoothed gain curve average per-problem gains


0.6rm---------,
--
W
0.4
0.5 0.5

II~
0.2
.,r:::
·m 0 0
'" 1V ~
-0.2
-0.5 -0.5
-0.4
-1 -1
0 20 40 0 20 40 o 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 13.1).
detailed gain curve smoothed gain curve average per-problem gains

100 100

50 50
.,r::: 20
·m
'" 0 ! 0 Or--------~

-50
Ii v

-50
-20

0 20 40 0 20 40 o 20 40
problem's number problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 13.2).
detailed gain curve smoothed gain curve average per-problem gains

200 200

150 150
60

~~ ~
~ 100 100 ..
·m 40
'" 50 ~ 50 20
IA
o
-50
"~~ 0

-50
0
-20

o 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(c) Reward is proportional to the number of packages (Function 13.3).
Fig. 13.6. Choosing among LINEAR, SAVTA, and SABA on fifty-problem sequences.
We give the learning curves (solid lines) and the performance of LINEAR (dashed
lines), SAVTA (dotted lines), and SABA (dash-and-dot lines).
13.1 Selecting a description and solver 327

smoothed gain curve average per-problem gains


0.6n--~--~-~--~--'

0.4 . \ - - - - - - - - - - - - - -

0.2l~
..... \
II)
c:
'n;
'" -0.2
-0.5
-0.4
-1

a 100 200 300 400 a 100 200 300 400


problem's number problem's number
(a) Gain linearly decreases with the running time (Function 13.1).
smoothed gain curve average per-problem gains
100

II)
c:
'n; '~.
:: \'."._'''.'.' ..• c .•..•. c.- c..
'-'-._._.-._.-'
'" 0~------------------------1

-20
-50 L -_ _ ~ __ ~ __ ~ __ ~ __ ~

a 100 200 300 400 a 100 200 300 400


problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 13.2).
smoothed gain curve average per-problem gains

200 100 \

:: I~\ ~ -- ---- - - - - - - -
150

1.\~------------------1
40 '\
20
- ',,:,,:

Or-----------~

-50 -20

a 100 200 300 400 a 100 200 300 400


problem's number problem's number
(c) Reward is proportional to the number of packages (Function 13.3).
Fig. 13.7. Choosing among LINEAR, SAVTA, and SABA on 500-problem sequences.
We plot the smoothed gain curves (left) and cumulative gains (right) using the
legend of Figure 13.6.
328 13. Logistics Domain

13.2 Twelve representations

We use LINEAR, SAVTA, and SABA without cost bounds and with loose-bound
heuristics, along with the changers in Figure 13.2(a), which lead to twelve
representations (Figure 13.8). We have experimented with linear gain func-
tions (Figures 13.9 and 13.10) and with nonlinear functions (Figures 13.11
and 13.12). LINEAR with abstraction and no cost bounds is the most effec-
tive strategy, and SHAPER has correctly chosen it in all cases. The system
has found the right time bounds for Functions 13.1-13.5; however, it has un-
derestimated the optimal bound for Function 13.6, and the selected bound
(0.99 seconds) has given worse results than the optimal bound (1.15 seconds).
The convergence is slower than in the other domains; SHAPER finds the
right strategy after processing 300 to 500 problems. On the other hand, the
percentage values of the cumulative gains (Table 13.2) are no smaller than
that in the Sokoban and STRIPS domains. The slow convergence is due to
several dose-to-optimal representations, which cause the system to "hesitate"
among them. Since SHAPER discards ineffective strategies during the early
stages of learning, its later hesitation has little effect on the performance.

Table 13.2. Average per-problem gains in the experiments with twelve represen-
tations, and analogous results for two smaller-scale selection tasks.
gain selection among optimal
function twelve two three gain
representations descriptions solvers (dashed
(solid lines) (dash-and-dot lines) (dotted lines) lines)
short problem sequences
13.1 -0.314 - 0.325 (77%) 0.225 (53%) 0.424
13.2 3.4 (25%) 13.6 (75%) 13.9 (77%) 18.1
13.3 12.3 (22%) 55.4 (84%) 54.6 (83%) 65.7
13.4 0.071 (16%) 0.339 (75%) 0.432 (95%) 0.454
13.5 0.074 (15%) 0.274 (55%) 0.398 (79%) 0.501
13.6 0.083 (5%) 1.388 (84%) 1.262 (77%) 1.646
long problem sequences
13.1 0.268 (64%) 0.397 (95%) 0.360 (87%) 0.416
13.2 14.1 (62%) 21.6 (95%) 19.1 (84%) 22.8
13.3 37.7 (63%) 58.3 (97%) 54.2 (90%) 60.1
13.4 0.365 (70%) 0.454 (87%) 0.498 (96%) 0.521
13.5 0.423 (70%) 0.385 (64%) 0.582 (96%) 0.605
13.6 0.712 (44%) 1.511 (92%) 1.529 (94%) 1.634
13.2 Twelve representations 329

I I

( initial
A\ I
I
\ ~ ~
:: I cost bound
wlo I II SAVTAw/o I II
I

LINEAR wlo I
SABA
cost bound cost bound
) ( ) ( )

~
initial initial

~ ~
I LINEAR with
loose bounds
I II loose bounds
SAVTA with I II SABA with I
loose bounds
IAbstractor I
( initial ) ( initial ) ( initial )
!
( hierarchy A
~ ~ ~
I cost bound
LINEARwlo I II wlo I II
SAVTA
cost bound
wlo I
SABA
cost bound
( hierarchy ) ( hierarchy ) ( hierarchy )

'R
I LINEAR with
~
I II loose bounds
SAVTA
~
with I II SABA with I
loose bounds loose bounds
( hierarchy ) ( hierarch~ ) ( hierarchy )
space of - - - - - - - space of representations - - - - - - - -
- descriptionS

Fig. 13.8. Space of twelve representations.


330 13. Logistics Domain

detailed gain curve smoothed gain curve average per-problem gains

0.5
1\
0.5
~~ 0.5

o ./
OHh~~~--------~
III
\"
.~ -0.5 -0.5
Ol

-1 -1 -0.5

-1.5 -1.5

-2 -2 L-_~ _ _ __ ' -1
o 20 40 o 20 40 o 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 13.1).
detailed gain curve smoothed gain curve average per-problem gains

100 100
40

50
III 20
c::
'0;
Ol

0 OHr----------~~~

-20
-50 -50 _ _ _- '

0 20 40 o 20 40 o 20 40
problem's number problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 13.2).
detailed gain curve smoothed gain curve average per-problem gains

: / ~v- \
200 200
100 ii ""\
150 150
-- '- ...,.:
..... oJ.

100 100
...
III
c:
50 50
~N
Ol

0 O~----~~----~~ Or-----------~~

-50
I~ -50
-100 -100 -50
'---~----' '---~----'
0 20 40 o 20 40 o 20 40
problem's number problem's number problem's number
(c) Reward is proportional to the number of packages (Function 13.3).
Fig. 13.9. Choosing representations for the linear gain functions on fifty-problem
sequences. We show the results of selection among twelve representations (solid
lines), among two descriptions (dash-and-dot lines), and among three solvers (dot-
ted lines). We also plot the performance of the optimal strategy (dashed lines),
which is based on LINEAR with abstraction and no cost bounds.
13.2 Twelve representations 331

smoothed gain curve average per-problem gains

-1 -0.5
-1.5

-2 L -_ _ ~ __ ~ __ ~ __ ~ __ ~

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(a) Gain linearly decreases with the running time (Function 13.1).
smoothed gain curve average per-problem gains

100
401
!
I
In
c: 20:~~:~ ..~.~-:-:~~.~
'mCl
Or-=---------------------~

-20
-50 L -__ ~ ____ ~ ____ ~ __ ~ __ ~

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(b) Gain linearly decreases with the time and solution cost (Function 13.2).
smoothed gain curve average per-problem gains

200 100 \
\
150
100 50~\~:-,..-::..;::·.;:;,-;O'·::-"'":':-:-":"'. 7':: ~ :::-.~<.,c::,,:
In

I ~-.--------
c:
'mCl
Or-----------------------~

-50
-100 -50 L -_ _ ~ __ ~ __ ~ __ ~ __ ~

0 100 200 300 400 o 100 200 300 400


problem's number problem's number
(c) Reward is proportional to the number of packages (Function 13.3).
Fig. 13.10. Choosing representations for the linear gain functions on 500-problem
sequences. The legend is the same as in Figure 13.9.
332 13. Logistics Domain

detailed gain curve smoothed gain curve average per-problem gains


O.B rr-.,----~-----,
1.5 1.5 1/
o.s
:.:.:..\l:·':.v·:,,·~ ".>-
0.4
., 0.5 0.5
0.2
"
.OJ
Cl
0 0 0

-0.5 -0.2
-0.5
-0.4
-1 -1
0 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the cube of the running time (Function 13.4).
detailed gain curve smoothed gain curve average per-problem gains

2 2

1.5 1.5

0.5
.,
"
.OJ 0.5
Cl
0.5

0 0 0

-0.5 -0.5

-1 ~ LA~ -1 -0.5
0 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(b) SHAPER gets a reward only for a low-cost solution (Function 13.5).
detailed gain curve smoothed gain curve average per-problem gains

4 4

2 2

~ 0 fH--fH+-l++-IIfI-+H-lII-I--HHH 0 0
.OJ
OJ
-2 -2 -1

-4 -4 -2

_sL----------~ -s -3
o 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(c) Gain depends on the time and problem size (Function 13.6).
Fig. 13.11. Choosing representations for the nonlinear gain functions on fifty-
problem sequences. We show the selection among twelve representations (solid
lines), among two descriptions (dash-and-dot lines), and among three solvers (dot-
ted lines), as well as the performance of the optimal strategy (dashed lines).
13.2 Twelve representations 333

smoothed gain curve average per-problem gains


0.8,.,---------------,
1.5
0.6 ....

III
c:
·iii
Cl

-0.2
-0.5
-0.4

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(a) Gain linearly decreases with the cube of the running time (Function 13.4).
smoothed gain curve average per-problem gains

1.5

III
c:
·iii
Cl

-0.5

-1 -0.5
~-~--~-~--~-~
0 100 200 300 400 o 100 200 300 400
problem's number problem's number
(b) SHAPER gets a reward only for a low-cost solution (Function 13.5).
smoothed gain curve average per-problem gains
4r--~--~----~-__.

2·\"\

2 .~:>';. :"':~ :....(..;:. ~:~ :~7.-:-~.':~ ==.-=. :--~


1

III
c:
01 \ 1 ""
r--
_ - ' " - - - - - - - - -

·iii
C> -2
-1

-4 -2

_6L--~-~--~-~-~ _3L--~-~--~-~-~
o 100 200 300 400 o 100 200 300 400
problem's number problem's number
(c) Gain depends on the time and problem size (Function 13.6).
Fig. 13.12. Choosing representations for the nonlinear gain functions on 500-
problem sequences. The legend is the same as in Figure 13.11.
334 13. Logistics Domain

13.3 Different time bounds

Experiments with different values of the exploration knob have confirmed


that the default value usually ensures near-optimal performance. For each
value, we have re-run the tests of Section 13.2 with Functions 13.1, 13.3,
and 13.6. We present the difference curves for small knob values in Fig-
ures 13.13 and 13.14, the difference curves for large knob values in Fig-
ures 13.15 and 13.16, and the cumulative gains in Table 13.3. The optimal
choice of a knob value depends on a gain function, but the default setting
usually gives satisfactory results.

Table 13.3. Average per-problem gains for different values of the exploration knob,
along with the corresponding percentages of the default-strategy gains.
gain small knob values default large knob values
function 0.02 I 0.05 0.1 0.2 I 0.5
short problem sequences
13.1 -0.338 - -0.319 -
-0.314 -0.323 - -0.373 -
13.3 14.3 (80%) 16.1 (90%) 17.8 10.2 (57%) 13.9 (78%)
13.6 0.090 (108%) 0.201 (242%) 0.083 0.221 (266%) 0.233 (281%)
long problem sequences
13.1 0.050 (19%) 0.256 (96%) 0.268 0.227 (85%) 0.173 (65%)
13.3 42.0 (91%) 41.5 (90%) 46.2 44.8 (97%) 44.7 (97%)
13.6 0.491 (69%) 0.653 (92%) 0.712 0.802 (113%) 0.918 (129%)
13.3 Different time bounds 335

detailed difference curve smoothed difference curve average per-problem diffs

1.5 1.5 0.3

0.2
'"
Q)

"i!!c: 0.5 0.5 0.1

~
"0
0 ~ 0
~ 0
~
.~ -0.5 -0.5 -0.1
Ol
-1 -1 -0.2

-1.5 -1.5 -0.3


0 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 13.1).
detailed difference curve smoothed difference curve average per-problem diffs
200 200

5
100 100
'"
Q)

"i!!c:
Q)
0
~\ 0
~ 0
n\
;!;

~
"0 ~
.~ -100 -100
Ol -5

-200 -200
-10
0 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(b) Reward is proportional to the number of packages (Function 13.3).
detailed difference curve smoothed difference curve average per-problem diffs
4 4.----------------, 0.4

,
\
(\
2 2 0.2
'"

l\
Q)
11
"i!!c: 11
I
& 0 0 0
u
c:
·iii
0l_2 -2 -0.2

-4 -4 -0.4
0 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(c) Gain depends on the time and problem size (Function 13.6).
Fig. 13.13. Performance with small values of the exploration knob on fifty-problem
sequences. We show the differences between the O.02-knob gains and O.l-knob gains
(solid lines), as well as the differences between the O.05-knob gains and O.l-knob
gains (dashed lines).
336 13. Logistics Domain

smoothed difference curve average per-problem differences

1.S 0.3

0.2

g 0.5
U)

0.1
l"
~
'6
0 0
~'- ..... ---- ..... - - - - -
.~ -o.S -0.1
C>

-1 -0.2

-1.S L -_ _ ~ ____ ~ __ ~ ____ ~ __ ~


-0.3
o 100 200 300 400 0 100 200 300 400
problem's number problem's number
(a) Gain linearly decreases with the running time (Function 13.1).
smoothed difference curve average per-problem differences
200

100
U)

"l"'c::" 0
"0
'"
!!:
\
,-
/
c::
'iii -100
C> -S

-200
-10L---~----~--~--------~
0 100 200 300 400 o 100 200 300 400
problem's number problem's number
(b) Reward is proportional to the number of packages (Function 13.3).
smoothed difference curve average per-problem differences
4 0.4

U) 2 0.2
"l"'c::"
~ 0
'6
,I \ 0
/ \
c:: / \
'iii
C>
-2 -0.2

-4 -0.4
0 100 200 300 400 0 100 200 300 400
problem's number problem's number
(c) Gain depends on the time and problem size (Function 13.6).
Fig. 13.14. Performance with the knob values 0.02 and 0.05 on 500-problem se-
quences. We give the smoothed gain-difference curves (left) and cumulative differ-
ences (right) using the legend of Figure 13.13.
13.3 Different time bounds 337

detailed difference curve smoothed difference curve average per-problem diffs

1.5 1.5 0.3

,,\ 0.2
'"'0" 0.5 , 0.5 0.1
"~

V
~ 0 0 0
"0 I" ~ \;-
·ffi -0.5
0)
-0.5 -0.1

-1 -1 -0.2

-1.5 -1.5 -0.3


0 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(a) Gain linearly decreases with the running time (Function 13.1).
detailed difference curve smoothed difference curve average per-problem diffs
200,---------------~ 200,----------------,

5
100
, 100
~

V-
1\ 1\
"~ of------------~
~\, 0 0
~
"0 ~, \
·ffi -100 \ -100
0)
-5 r'{\ ,
-200 -200
-10
o 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(b) Reward is proportional to the number of packages (Function 13.3).
detailed difference curve smoothed difference curve average per-problem diffs
4 4 0.4

2 2 0.2
'"'"
/\
0

"~
~ 0 0 0
'0
"
'iii
0)
-2 -2 -0.2

-4 -4 -0.4
0 20 40 0 20 40 0 20 40
problem's number problem's number problem's number
(c) Gain depends on the time and problem size (Function 13.6).
Fig. 13.15. Performance with large values of the exploration knob on fifty-problem
sequences. We show the differences between the D.5-knob gains and D.1-knob gains
(solid lines), as well as the differences between the D.2-knob gains and D.1-knob
gains (dashed lines).
338 13. Logistics Domain

smoothed difference curve average per-problem differences

1.5 0.3

0.2
~
c: 0.5 0.1
2!
~
"C
.~ -0.5
C)

-1 -0.2

-1.5 L-__ ~ __ ~ __ ~ __ ~ __ ~
-0.3 L-__ ~ __ ~ __ ~ __ ~ __ ~

o 100 200 300 400 o 100 200 300 400


problem's number problem's number
(a) Gain linearly decreases with the running time (Function 13.1).
smoothed difference curve average per-problem differences
200

5
100
"'~
c:
.,2! 0
!!:
"C
c:
'iii -100
C)

-200
-10L---~----~--~----~--~
0 100 200 300 400 o 100 200 300 400
problem's number problem's number
(b) Reward is proportional to the number of packages (Function 13.3).
smoothed difference curve average per-problem differences
0.4

-0.2

-4L-__~__~____~__~__~ -0.4 L-__~____~__~____~__-"


o 100 200 300 400 o 100 200 300 400
problem's number problem's number
(c) Gain depends on the time and problem size (Function 13.6).
Fig. 13.16. Performance with the knob values 0.5 and 0.2 on 500-problem se-
quences. We give the smoothed gain-difference curves (left) and cumulative differ-
ences (right) using the legend of Figure 13.15.
Concluding remarks

A book is never finished, it is only published; therefore, I am solely responsible


for all errors and omissions.
- Derick Wood [1993], Data Structures, Algorithms, and Performance.

The main contribution of the reported work is an architecture for improv-


ing representations in AI problem solving. The architecture is based on two
principles proposed by Simon during his studies of human problem solving:
1. "A representation consists of both data structures and programs oper-
ating on them" [Larkin and Simon, 1987). A representation change may
involve not only modification of the data structures that describe a prob-
lem, but also selection of an appropriate search procedure.
2. "The same processes that are ordinarily used to search within problem
space can be used to search for problem space (problem representa-
tion)." When a system operates with multiple representations, its be-
havior "could be divided into two segments, alternating between search
for a solution in some problem space and search for a new problem space"
[Kaplan and Simon, 1990).
We have implemented SHAPER in Allegro Common Lisp and integrated
it with PRODIGy4. The size of the code is about 60,000 lines, which includes
PRODIGY (20,000 lines), description changers (15,000 lines), and control mod-
ule (25,000 lines).

Architectural decisions

The key architectural decisions underlying SHAPER follow from Simon's two
principles. We have defined a representation as a combination of a domain
description and solver, and subdivided the process of representation change
into the generation of a new description and the selection of a solver. The
system alternates between the construction of new representations and their
use in problem solving.
The evaluation of representations is based on a general utility model,
which accounts for the percentage of solved problems, efficiency of search,

E. Fink, Changes of Problem Representation


© Springer-Verlag Berlin Heidelberg 2002
340 Concluding remarks

and quality of the resulting solutions. We have developed a statistical tech-


nique for the automated selection of representations, and integrated it with
heuristics that guide the search in the representation space.
The work on SHAPER has led to the following results, which correspond
to the main components of the system:
• Automated selection and use of primary effects in AI problem solving.
• Abstraction for the PRODIGY language and its synergy with primary effects.
• Statistical analysis and prediction of the performance.
• Automated exploration of the representation space.

Advantages and limitations

The control architecture does not rely on properties of specific solvers or


changers. We can use it with any collection of search and learning algorithms
that have a common input language. On the other hand, the current imple-
mentation is based on the PRODIGY data structures. The construction of a
system-independent version of the control mechanism is an important engi-
neering problem.
The construction of new representations involves search at three differ-
ent levels. Description changers form the first level; every changer searches
in its own limited space of descriptions. For example, Chooser and Com-
pleter explore alternative selections of primary effects, Abstractor searches
for a fine-grained ordered hierarchy, and Refiner operates with alternative
partial instantiations of predicates. The second level is the expansion of the
global description space; the system invokes changers to generate new nodes
of this space. Finally, the third level is the selection of solvers for the resulting
descriptions. This three-level structure limits the size of the representation
space; it prevents the combinatorial explosion in the number of representa-
tions, reported by Korf [1980].
The problem-solving power of the developed architecture is limited by
the capabilities of the embedded solvers and changers. The system learns to
make the best use of the available algorithms, but it cannot go beyond their
potential. For example, SHAPER cannot learn macro operators or control
rules because it does not include appropriate changers. To our knowledge,
the only system without similar limitations is Korf's [1980] universal engine
for generating new representations, which systematically explores the huge
space of all isomorphic and homomorphic transformations of a given problem.

Empirical evaluation

We have tested SHAPER in several domains with a variety of problems and


gain functions. The results have confirmed the feasibility of our approach,
and demonstrated the main properties of the developed algorithms:
Concluding remarks 341

• Primary effects and ordered abstraction are powerful tools for reducing the
search, but their impact varies across domains, ranging from a thousand-
fold speed-up to a substantial slow-down; thus, the system's performance
depends on the choice among the implemented search-reduction tools.
• The statistical algorithm always chooses an optimal or near-optimal com-
bination of a description, solver, and time bound; its effectiveness does not
depend on the properties of specific solvers, domains, or gain functions.
• Control heuristics enhance the system's performance in the initial stages
of the statistical learning, when the accumulated data are insufficient for
an accurate selection of representations.
The empirical evaluation of the control architecture has two major lim-
itations. First, we have tested this architecture only with PRODIGY solvers.
Second, we have constructed only a small library of solvers and changers, and
the number of candidate representations has varied from two to thirty-six.
Applying the developed techniques to a larger library of AI algorithms is an
important research direction.

Future challenges

The described work is a step in exploring two broad subareas of artificial


intelligence: the automated improvement of representations and the integra-
tion of multiple AI algorithms. We have shown a close relationship between
these subareas and proposed an approach to addressing them.
The main open problem is to extend the control mechanism and build a
large library of search and learning engines, which may include both general
and domain-specific algorithms. The grand challenge is to develop an archi-
tecture that integrates thousands of AI engines and domain descriptions in
the same way as an operating system integrates file-processing programs. It
must provide standard protocols for communication among search and learn-
ing procedures, and support the routine inclusion of new domains, AI algo-
rithms, and control techniques.
Other major challenges include developing a unified theory of search with
multiple representations, automating the synthesis of specialized description
changers, and investigating the representation changes performed by human
problem solvers.
References

[Aho et al., 1974) Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman (1974).
The Design and Analysis of Computer Algorithms. Addison-Wesley Publishers,
Reading, MA.
[Allen et al., 1992) John A. Allen, Pat Langley, and Stan Matwin (1992). Knowl-
edge and regularity in planning. In Proceedings of the AAAI 1992 Spring Sympo-
sium on Computational Considerations in Supporting Incremental Modification
and Reuse, pages 7~ 12.
[Allen and Minton, 1996) John A. Allen and Steven Minton (1996). Selecting the
right heuristic algorithm: Runtime performance predictors. In Gordon McCalla,
editor, Advances in Artificial Intelligence: The Eleventh Biennial Conference of
the Canadian Society for Computational Studies of Intelligence, pages 41~53.
Springer-Verlag, Berlin, Germany.
[Amarel, 1961) Saul Amarel (1961). An approach to automatic theory formation.
In Heinz M. Von Foerster, editor, Principles of Self-Organization: Transactions.
Pergamon Press, New York, NY.
[Amarel, 1965) Saul Amarel (1965). Problem solving procedures for efficient syn-
tactic analysis. In Proceedings of the ACM Twentieth National Conference.
[Amarel, 1968) Saul Amarel (1968). On representations of problems of reasoning
about actions. In Donald Michie, editor, Machine Intelligence 3, pages 131~17l.
American Elsevier Publishers, New York, NY.
[Amarel, 1971) Saul Amarel (1971). Representations and modeling in problems of
program formation. In Bernard Meltzer and Donald Michie, editors, Machine
Intelligence 6, pages 411-466. American Elsevier Publishers, New York, NY.
[Anthony and Biggs, 1992) Martin Anthony and Norman Biggs (1992). Computa-
tional Learning Theory. Cambridge University Press, Cambridge, United King-
dom.
[Bacchus and Yang, 1991) Fahiem Bacchus and Qiang Yang (1991). The down-
ward refinement property. In Proceedings of the Twelfth International Joint
Conference on Artificial Intelligence, pages 286-29l.
[Bacchus and Yang, 1992) Fahiem Bacchus and Qiang Yang (1992). The expected
value of hierarchical problem-solving. In Proceedings of the Tenth National
Conference on Artificial Intelligence, pages 369-374.
[Bacchus and Yang, 1994) Fahiem Bacchus and Qiang Yang (1994). Downward
refinement and the efficiency of hierarchical problem solving. Artificial Intelli-
gence, 71(1):43-100.
[Backstrom and Jonsson, 1995) Christer Backstrom and Peter Jonsson (1995).
Planning with abstraction hierarchies can be exponentially less efficient. In
Proceedings of the Fourteenth International Joint Conference on Artificial In-
telligence, pages 1599-1604.
344 References

[Barrett and Weld, 1994] Anthony Barrett and Daniel S. Weld (1994). Partial
order planning: Evaluating possible efficiency gains. Artificial Intelligence,
67(1) :71-112.
[Blum and Furst, 1997] Avrim L. Blum and Merrick L. Furst (1997). Fast planning
through planning graph analysis. Artificial Intelligence, 90(1-2) :281-300.
[Blumer et al., 1987] Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and
Manfred K. Warmuth (1987). Occam's razor. Information Processing Letters,
24(6):377-380.
[Blythe and Reilly, 1993a] Jim Blythe and W. Scott Reilly (1993). Integrating re-
active and deliberative planning for agents. Technical Report cMu-cs-93-155,
School of Computer Science, Carnegie Mellon University.
[Blythe and Reilly, 1993b] Jim Blythe and W. Scott Reilly (1993). Integrating
reactive and deliberative planning in a household robot. In AAAI Fall Symposium
on Instantiating Real- World Agents, pages 6-13.
[Blythe and Veloso, 1992] Jim Blythe and Manuela M. Veloso (1992). An analysis
of search techniques for a totally-ordered nonlinear planner. In Proceedings of
the First International Conference on Artificial Intelligence Planning Systems,
pages 13-19.
[Boehm-Davis et al., 1989] Deborah A. Boehm-Davis, Robert W. Holt, Matthew
Koll, Gloria Yastrop, and Robert Peters (1989). Effects of different database
formats on information retrieval. Human Factors, 31(5):579-592.
[Borrajo and Veloso, 1996] Daniel Borrajo and Manuela M. Veloso (1996). Lazy
incremental learning of control knowledge for efficiently obtaining quality plans.
Artificial Intelligence Review, 10:1-34.
[Breese and Horvitz, 1990] John S. Breese and Eric J. Horvitz (1990). Ideal refor-
mulation of belief networks. In Proceedings of the Sixth Conference on Uncer-
tainty in Artificial Intelligence, pages 64-72.
[Carbonell, 1983] Jaime G. Carbonell (1983). Learning by analogy: Formulating
and generalizing plans from past experience. In Ryszard S. Michalski, Jaime G.
Carbonell, and Tom M. Mitchell, editors, Machine Learning: An Artificial In-
telligence Approach, pages 371-392. Tioga Publishers, Palo Alto, CA.
[Carbonell, 1990] Jaime G. Carbonell, editor (1990). Machine Learning: Paradigms
and Methods. MIT Press, Cambridge, MA.
[Carbonell et al., 1992] Jaime G. Carbonell, Jim Blythe, Oren Etzioni, Yolanda
Gil, Robert Joseph, Dan Kahn, Craig A. Knoblock, Steven Minton, Alicia Perez,
Scott Reilly, Manuela M. Veloso, and Xuemei Wang (1992). PRODIGy4.0: The
manual and tutorial. Technical Report cMu-cs-92-150, School of Computer
Science, Carnegie Mellon University.
[Carbonell and Gil, 1990] Jaime G. Carbonell and Yolanda Gil (1990). Learning by
experimentation: The operator refinement method. In Ryszard S. Michalski and
Yves Kodratoff, editors, Machine Learning: An Artificial Intelligence Approach,
volume 3, pages 191-213. Morgan Kaufmann, San Mateo, CA.
[Carbonell et al., 1991] Jaime G. Carbonell, Craig A. Knoblock, and Steven Minton
(1991). PRODIGY: An integrated architecture for planning and learning. In
Kurt VanLehn, editor, Architectures for Intelligence, pages 241-278. Lawrence
Erlbaum Associates, Mahwah, N J.
[Carre, 1971] B. A. Carre (1971). An algebra for network routing problems. Journal
of the Institute of Mathematics and Its Applications, 7:273-294.
[Chapman, 1987] David Chapman (1987). Planning for conjunctive goals. Artificial
Intelligence, 32:333-377.
[Cheng and Carbonell, 1986] Patricia W. Cheng and Jaime G. Carbonell (1986).
The FERMI system: Inducing iterative macro-operators from experience. In
References 345

Proceedings of the Fifth National Conference on Artificial Intelligence, pages


490-495.
[Christensen, 1990] Jens Christensen (1990). A hierarchical planner that generates
its own abstraction hierarchies. In Proceedings of the Eighth National Confer-
ence on Artificial Intelligence, pages 1004-1009.
[Cohen, 1995] Paul R. Cohen (1995). Empirical Methods for Artificial Intelligence.
MIT Press, Cambridge, MA.
[Cohen, 1992] William W. Cohen (1992). Using distribution-free learning the-
ory to analyze solution-path caching mechanisms. Computational Intelligence,
8(2) :336-375.
[Cohen et al., 1994] William W. Cohen, Russell Greiner, and Dale Schuurmans
(1994). Probabilistic hill-climbing. In Stephen Jose Hanson, Thomas Petsche,
Michael Kearns, and Ronald L. Rivest, editors, Computational Learning Theory
and Natural Learning Systems, volume II, pages 171-181. MIT Press, Cambridge,
MA.
[Cormen et al., 2001] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest,
and Clifford Stein (2001). Introduction to Algorithms. MIT Press, Cambridge,
MA, second edition.
[Cox and Veloso, 1997a] Michael T. Cox and Manuela M. Veloso (1997). Con-
trolling for unexpected goals when planning in a mixed-initiative setting. In
Ernesto Costa and Amilcar Cardoso, editors, Progress in Artificial Intelli-
gence: Eighth Portuguese Conference on Artificial Intelligence, pages 309-318.
Springer-Verlag, Berlin, Germany.
[Cox and Veloso, 1997b] Michael T. Cox and Manuela M. Veloso (1997). Support-
ing combined human and machine planning: An interface for planning by analog-
ical reasoning. In David B. Leake and Enrit;: Plaza, editors, Case-Based Reason-
ing Research and Development: Second International Conference on Case-Based
Reasoning, pages 531-540. Springer-Verlag, Berlin, Germany.
[Drastal et al., 1997] George A. Drastal, Russell Greiner, Stephen Jose Hanson,
Michael Kearns, Thomas Petsche, Ronald L. Rivest, and Jude W. Shavlik, ed-
itors (1994-1997). Computational Learning Theory and Natural Learning Sys-
tems, volume I-IV. MIT Press, Cambridge, MA.
[Driskill and Carbonell, 1996] Robert Driskill and Jaime G. Carbonell (1996).
Search control in problem solving: A gapped macro operator approach. Un-
published Manuscript.
[Duncker, 1945] Karl Duncker (1945). On problem solving. Psychological Mono-
graphs, 58:1-113.
[Ellman and Giunchiglia, 1998] Torn Ellman and Fausto Giunchiglia, editors
(1998). Proceedings of the Symposium of Abstraction, Reformulation and Ap-
proximation.
[Ernst and Goldstein, 1982] George M. Ernst and Michael M. Goldstein (1982).
Mechanical discovery of classes of problem-solving strategies. Journal of the
American Association for Computing Machinery, 29(1):1-23.
[Erol et al., 1994] Kutluhan Erol, James Handler, and Dana S. Nau (1994). HTN
planning: Complexity and expressivity. In Proceedings of the Twelfth National
Conference on Artificial Intelligence, pages 1123-1128.
[Etzioni, 1990] Oren Etzioni (1990). A Structural Theory of Explanation-Based
Learning. PhD thesis, School of Computer Science, Carnegie Mellon University.
Technical Report CMu-cs-90-185.
[Etzioni, 1992] Oren Etzioni (1992). An asymptotic analysis of speedup learning. In
Proceedings of the Ninth International Conference on Machine Learning, pages
129-136.
346 References

[Etzioni, 1993a) Oren Etzioni (1993). Acquiring search control knowledge via static
analysis. Artificial Intelligence, 62(2}:255-30l.
[Etzioni, 1993b) Oren Etzioni (1993). A structural theory of explanation-based
learning. Artificial Intelligence, 60(1} :93-140.
[Etzioni and Minton, 1992) Oren Etzioni and Steven Minton (1992). Why EBL pro-
duces overly-specific knowledge: A critique of the PRODIGY approaches. In Pro-
ceedings of the Ninth International Conference on Machine Learning, pages
137-143.
[Fikes et al., 1972) Richard E. Fikes, Peter E. Hart, and Nils J. Nilsson (1972).
Learning and executing generalized robot plans. Artificial Intelligence,
3( 4} :251-288.
[Fikes and Nilsson, 1971) Richard E. Fikes and Nils J. Nilsson (1971). STRIPS: A
new approach to the application of theorem proving to problem solving. Arti-
ficial Intelligence, 2(3-4}:189-208.
[Fikes and Nilsson, 1993) Richard E. Fikes and Nils J. Nilsson (1993). STRIPS, a
retrospective. Artificial Intelligence, 59(1-2}:227-232.
[Fink and Blythe, 1998) Eugene Fink and Jim Blythe (1998). A complete bidi-
rectional planner. In Proceedings of the Fourth International Conference on
Artificial Intelligence Planning Systems, pages 78-84.
[Fink and Veloso, 1996) Eugene Fink and Manuela M. Veloso (1996). Formalizing
the PRODIGY planning algorithm. In Malik Ghallab and Alfredo Milani, ed-
itors, New Directions in AI Planning, pages 261-271. lOS Press, Amsterdam,
Netherlands.
[Fink and Yang, 1992a) Eugene Fink and Qiang Yang (1992). Automatically ab-
stracting effects of operators. In Proceedings of the First International Confer-
ence on Artificial Intelligence Planning Systems, pages 243-25l.
[Fink and Yang, 1992b) Eugene Fink and Qiang Yang (1992). Formalizing plan
justifications. In Proceedings of the Ninth Conference of the Canadian Society
for Computational Studies of Intelligence, pages 9-14.
[Fink and Yang, 1995) Eugene Fink and Qiang Yang (1995). Planning with pri-
mary effects: Experiments and analysis. In Proceedings of the Fourteenth Inter-
national Joint Conference on Artificial Intelligence, pages 1606-161l.
[Fink and Yang, 1997) Eugene Fink and Qiang Yang (1997). Automatically select-
ing and using primary effects in planning: Theory and experiments. Artificial
Intelligence, 89(1-2}:285-315.
[Foulser et al., 1992) David E. Foulser, Ming Li, and Qiang Yang (1992). Theory
and algorithms for plan merging. Artificial Intelligence, 57(2-3}:143-182.
[Gentner and Stevens, 1983) Dedre Gentner and Albert L. Stevens, editors (1983).
Mental Models. Lawrence Erlbaum Associates, Mahwah, NJ.
[Gil, 1991) Yolanda Gil (1991). A specification of manufacturing processes for plan-
ning. Technical Report cMu-cs-91-179, School of Computer Science, Carnegie
Mellon University.
[Gil, 1992) Yolanda Gil (1992). Acquiring Domain Knowledge for Planning by Ex-
perimentation. PhD thesis, School of Computer Science, Carnegie Mellon Uni-
versity. Technical Report cMu-cs-92-175.
[Gil and Perez, 1994) Yolanda Gil and Alicia Perez (1994). Applying a general-
purpose planning and learning architecture to process planning. In Proceedings
of the AAAI 1994 Fall Symposium on Planning and Learning, pages 48-52.
[Giunchiglia and Walsh, 1992) Fausto Giunchiglia and Toby Walsh (1992). A the-
ory of abstraction. Artificial Intelligence, 57(2-3}:323-389.
[Golding et al., 1987) Andrew R. Golding, Paul S. Rosenbloom, and John E. Laird
(1987). Learning general search control from outside guidance. In Proceedings
References 347

of the Tenth International Joint Conference on Artificial Intelligence, pages


334-337.
[Goldstein, 1978] Michael M. Goldstein (1978). The Mechanical Discovery of
Problem-Solving Strategies. PhD thesis, Computer Engineering Department,
Case Western Reserve University. Technical Report ESCI-77-l.
[Ha and Haddawy, 1997] Vu Ha and Peter Haddawy (1997). Problem-focused in-
cremental elicitation of multi-attribute utility models. In Proceedings of the
Thirteenth Conference on Uncertainty in Artificial Intelligence, pages 215-222.
[Haigh, 1998] Karen Zita Haigh (1998). Situation-Dependent Learning for Inter-
leaved Planning and Robot Execution. PhD thesis, School of Computer Science,
Carnegie Mellon University. Technical Report cMu-cs-98-108.
[Haigh and Veloso, 1996] Karen Zita Haigh and Manuela M. Veloso (1996). In-
terleaving planning and robot execution for asynchronous user requests. In
Proceedings of the International Conference on Intelligent Robots and Systems.
[Haigh and Veloso, 1997a] Karen Zita Haigh and Manuela M. Veloso (1997). High-
level planning and low-level execution: Towards a complete robotic agent. In
Proceedings of the First International Conference on Autonomous Agents, pages
363-370.
[Haigh and Veloso, 1997bJ Karen Zita Haigh and Manuela M. Veloso (1997). In-
terleaving planning and robot execution for asynchronous user requests. Au-
tonomous Robots, 5(1):79-95.
[Haigh and Veloso, 1998] Karen Zita Haigh and Manuela M. Veloso (1998). Plan-
ning, execution and learning in a robotic agent. In Proceedings of the Fourth In-
ternational Conference on Artificial Intelligence Planning Systems, pages 120-
127.
[Hall, 1989] Rogers P. Hall (1989). Computational approaches to analogical rea-
soning: A comparative analysis. Artificial Intelligence, 39(1):39-120.
[Hansen and Zilberstein, 1996] Eric A. Hansen and Shlomo Zilberstein (1996).
Monitoring the progress of anytime problem-solving. In Proceedings of the Thir-
teenth National Conference on Artificial Intelligence, pages 1229-1234.
[Hansson and Mayer, 1989] Othar Hansson and Andrew Mayer (1989). Heuristic
search as evidential reasoning. In Proceedings of the Fifth Workshop on Uncer-
tainty in Artificial Intelligence, pages 152-16l.
[Haussler, 1988] David Haussler (1988). Quantifying inductive bias: AI learning
algorithms and Valiant's learning framework. Artificial Intelligence, 36:177-
22l.
[Hayes and Simon, 1974] John R. Hayes and Herbert A. Simon (1974). Under-
standing written problem instructions. In Lee W. Gregg, editor, Knowledge
and Cognition, pages 167-200. Lawrence Erlbaum Associates, Mahwah, NJ.
[Hayes and Simon, 1976J John R. Hayes and Herbert A. Simon (1976). The under-
standing process: Problem isomorphs. Cognitive Psychology, 8:165-190.
[Hayes and Simon, 1977] John R. Hayes and Herbert A. Simon (1977). Psycho-
logical difference among problem isomorphs. In N. John Castellan, David B.
Pisoni, and G. R. Potts, editors, Cognitive Theory. Lawrence Erlbaum Asso-
ciates, Mahwah, NJ.
[Hibler, 1994] David Hibler (1994). Implicit abstraction by thought experiments. In
Proceedings of the Workshop on Theory Reformulation and Abstraction, pages
9-26.
[Holte, 1988] Robert C. Holte (1988). An Analytical Framework for Learning Sys-
tems. PhD thesis, Artificial Intelligence Laboratory, University of Texas at
Austin. Technical Report AI-88-72.
348 References

[Holte et al., 1994] Robert C. Holte, Chris Drummond, Maria B. Perez, Robert M.
Zimmer, and Alan J. MacDonald (1994). Searching with abstractions: A unify-
ing framework and new high-performance algorithm. In Proceedings of the Tenth
Conference of the Canadian Society for Computational Studies of Intelligence,
pages 263-270.
[Holte et al., 1996a] Robert C. Holte, Taieb Mkadmi, Robert M. Zimmer, and
Alan J. MacDonald (1996). Speeding up problem solving by abstraction: A
graph-oriented approach. Artificial Intelligence, 85:321-36l.
[Holte et al., 1996b] Robert C. Holte, Maria B. Perez, Robert M. Zimmer, and
Alan J. MacDonald (1996). Hierarchical A·: Searching abstraction hierarchies
efficiently. In Proceedings of the Fourteenth National Conference on Artificial
Intelligence, pages 530-535.
[Horvitz, 1988] Eric J. Horvitz (1988). Reasoning under varying and uncertain
resource constraints. In Proceedings of the Seventh National Conference on
Artificial Intelligence, pages 111-116.
[Howe et al., 1999] Adele E. Howe, Eric Dahlman, Christopher Hansen, Michael
Scheetz, and Anneliese von Mayrhauser (1999). Exploiting competitive planner
performance. In Proceedings of the Fifth European Conference on Planning,
pages 62-72.
[Hull, 1999] John C. Hull (1999). Options, Futures, and Other Derivatives. Prentice
Hall, Upper Saddle River, NJ, fourth edition.
[Jones and Schkade, 1995] Donald R. Jones and David A. Schkade (1995). Choos-
ing and translating between problem representations. Organizational Behavior
and Human Decision Processes, 61(2):214-223.
[Joseph, 1992] Robert L. Joseph (1992). Graphical Knowledge Acquisition for
Visually-Oriented Domains. PhD thesis, School of Computer Science, Carnegie
Mellon University. Technical Report CMu-cs-92-188.
[Junghanns and Schaeffer, 1998] Andreas Junghanns and Jonathan Schaeffer
(1998). Single-agent search in the presence of deadlocks. In Proceedings of
the Fifteenth National Conference on Artificial Intelligence, pages 419-424.
[Junghanns and Schaeffer, 1999] Andreas Junghanns and Jonathan Schaeffer
(1999). Domain-dependent single-agent search enhancements. In Proceedings
of the Sixteenth International Joint Conference on Artificial Intelligence, pages
570-575.
[Junghanns and Schaeffer, 2001] Andreas Junghanns and Jonathan Schaeffer
(2001). Sokoban: Improving the search with relevance cuts. Journal of Theo-
retical Computer Science, 252(1-2):151-175.
[Kambhampati and Srivastava, 1996a] Subbarao Kambhampati and Biplav Srivas-
tava (1996). Unifying classical planning approaches. Technical Report 96-006,
Department of Computer Science, Arizona State University.
[Kambhampati and Srivastava, 1996b] Subbarao Kambhampati and Biplav Srivas-
tava (1996). Universal classical planner: An algorithm for unifying state-space
and plan-space planning. In Malik Ghallab and Alfredo Milani, editors, New
Directions in AI Planning, pages 61-75. lOS Press, Amsterdam, Netherlands.
[Kaplan, 1989] Craig A. Kaplan (1989). SWITCH: A simulation of representational
change in the Mutilated Checkerboard problem. Technical Report CIP 477,
Department of Psychology, Carnegie Mellon University.
[Kaplan and Simon, 1990] Craig A. Kaplan and Herbert A. Simon (1990). In search
of insight. Cognitive Psychology, 22:374-419.
[Knoblock, 1990] Craig A. Knoblock (1990). Learning abstraction hierarchies for
problem solving. In Proceedings of the Eighth National Conference on Artificial
Intelligence, pages 923-928.
References 349

[Knoblock, 1991] Craig A. Knoblock (1991). Search reduction in hierarchical prob-


lem solving. In Proceedings of the Ninth National Conference on Artificial In-
telligence, pages 686-691.
[Knoblock, 1992] Craig A. Knoblock (1992). An analysis of ABSTRIPS. In Proceed-
ings of the Second International Conference on Artificial Intelligence Planning
Systems, pages 126-135.
[Knoblock, 1993] Craig A. Knoblock (1993). Generating Abstraction Hierarchies:
An Automated Approach to Reducing Search in Planning. Kluwer Academic
Publishers, Boston, MA.
[Knoblock, 1994] Craig A. Knoblock (1994). Automatically generating abstractions
for planning. Artificial Intelligence, 68(2):243-302.
[Knoblock et al., 1991a] Craig A. Knoblock, Steven Minton, and Oren Etzioni
(1991). Integrating abstraction and explanation-based learning in PRODIGY. In
Proceedings of the Ninth National Conference on Artificial Intelligence, pages
541-546.
[Knoblock et al., 1991b] Craig A. Knoblock, Josh Tenenberg, and Qiang Yang
(1991). Characterizing abstraction hierarchies for planning. In Proceedings
of the Ninth National Conference on Artificial Intelligence, pages 692-697.
[Knoblock and Yang, 1994] Craig A. Knoblock and Qiang Yang (1994). Evaluating
the trade-offs in partial-order planning algorithms. In Proceedings of the Tenth
Conference of the Canadian Society for Computational Studies of Intelligence,
pages 279-286.
[Knoblock and Yang, 1995] Craig A. Knoblock and Qiang Yang (1995). Relat-
ing the performance of partial-order planning algorithms to domain features.
SIGART Bulletin, 6(1):8-15.
[Koenig, 1997] Sven Koenig (1997). Goal-Directed Acting with Incomplete Infor-
mation. PhD thesis, School of Computer Science, Carnegie Mellon University.
Technical Report cMu-cs-97-199.
[Kolodner, 1984] Janet L. Kolodner (1984). Retrieval and Organization Strategies
in Conceptual Memory: A Computer Model. Lawrence Erlbaum Associates,
Mahwah, NJ.
[Kook and Novak, 1991] Hyung Joon Kook and Gordon S. Novak Jr. (1991). Rep-
resentation of models for expert problem solving in physics. IEEE Transactions
on Knowledge and Data Engineering, 3(1):48-54.
[Korf, 1980] Richard E. Korf (1980). Toward a model of representation changes.
Artificial Intelligence, 14:41-78.
[Korf, 1985a] Richard E. Korf (1985). Learning to Solve Problems by Searching for
Macro-Operators. Putnam Publishing Inc., Boston, MA.
[Korf, 1985b] Richard E. Korf (1985). Macro-operators: A weak method for learn-
ing. Artificial Intelligence, 26(1):35-77.
[Korf, 1987] Richard E. Korf (1987). Planning as search: A quantitative approach.
Artificial Intelligence, 33(1):65-88.
[Kuokka, 1990] Daniel R. Kuokka (1990). The Deliberative Integration of Planning,
Execution, and Learning. PhD thesis, School of Computer Science, Carnegie
Mellon University. Technical Report cMu-cs-90-135.
[Laird et al., 1987] John E. Laird, Allen Newell, and Paul S. Rosenbloom (1987).
Soar: An architecture for general intelligence. Artificial Intelligence, 33(1):1-64.
[Laird et al., 1986] John E. Laird, Paul S. Rosenbloom, and Allen Newell (1986).
Chunking in Soar: The anatomy of a general learning mechanism. Machine
Learning, 1(1):11-46.
[Langley, 1983] Pat Langley (1983). Learning effective search heuristics. In Pro-
ceedings of the Eighth International Joint Conference on Artificial Intelligence,
pages 419-421.
350 References

[Larkin et al., 1988] Jill H. Larkin, Frederick Reif, Jaime G. Carbonell, and An-
gela Gugliotta (1988). FERMI: A flexible expert reasoner with multi-domain
inferencing. Cognitive Psychology, 12(1):101-138.
[Larkin and Simon, 1981] Jill H. Larkin and Herbert A. Simon (1981). Learning
through growth of skill in mental modeling. In Proceedings of the Third Annual
Conference of the Cognitive Science Society, pages 106-11l.
[Larkin and Simon, 1987] Jill H. Larkin and Herbert A. Simon (1987). Why a
diagram is (sometimes) worth ten thousand words. Cognitive Science, 11(1):65-
99.
[Lehmann, 1977] Daniel J. Lehmann (1977). Algebraic structures for transitive
closure. Theoretical Computer Science, 4(1):59-76.
[Levy and Nayak, 1995] Alon Y. Levy and P. Pandurang Nayak, editors (1995).
Proceedings of the Symposium of Abstraction, Reformulation and Approxima-
tion.
[Lowry, 1992] Michael R. Lowry, editor (1992). Proceedings of the Workshop on
Change of Representation and Problem Reformulation. NASA Ames Research
Center. Technical Report FIA-92-06.
[McAllester and Rosenblitt, 1991] David A. McAllester and David Rosenblitt
(1991). Systematic nonlinear planning. In Proceedings of the Ninth National
Conference on Artificial Intelligence, pages 634-639.
[Mendenhall et al., 1999] William Mendenhall, Robert J. Beaver, and Barbara M.
Beaver (1999). Introduction to Probability and Statistics. Duxbury Press,
Boston, MA, tenth edition.
[Minton, 1985] Steven Minton (1985). Selectively generalizing plans for problem-
solving. In Proceedings of the Ninth International Joint Conference on Artificial
Intelligence, pages 596-599.
[Minton, 1988] Steven Minton (1988). Learning Search Control Knowledge: An
Explanation-Based Approach. Kluwer Academic Publishers, Boston, MA.
[Minton, 1990] Steven Minton (1990). Quantitative results concerning the utility
of explanation-based learning. Artificial Intelligence, 42(2-3):363-39l.
[Minton, 1993a] Steven Minton (1993). An analytical learning system for special-
ized heuristics. In Proceedings of the Thirteenth International Joint Conference
on Artificial Intelligence, pages 922-929.
[Minton, 1993b] Steven Minton (1993). Integrating heuristics for constraint satis-
faction problems: A case study. In Proceedings of the Eleventh National Con-
ference on Artificial Intelligence, pages 120-126.
[Minton, 1996] Steven Minton (1996). Automatically configuring constraint satis-
faction programs: A case study. Constraints: An International Journal, 1(1-
2):7-43.
[Minton et al., 1991] Steven Minton, John Bresina, and Mark Drummond (1991).
Commitment strategies in planning: A comparative analysis. In Proceedings
of the Twelfth International Joint Conference on Artificial Intelligence, pages
259-265.
[Minton et al., 1994] Steven Minton, John Bresina, and Mark Drummond (1994).
Total-order and partial-order planning: A comparative analysis. Journal of
Artificial Intelligence Research, 2:227-262.
[Minton et al., 1989a] Steven Minton, Jaime G. Carbonell, Craig A. Knoblock,
Dan R. Kuokka, Oren Etzioni, and Yolanda Gil (1989). Explanation-based
learning: A problem-solving perspective. Artificial Intelligence, 40(1-3) :63-118.
[Minton et al., 1989b] Steven Minton, Dan R. Kuokka, Yolanda Gil, Robert L.
Joseph, and Jaime G. Carbonell (1989). PRODIGy2.0: The manual and tuto-
rial. Technical Report cMu-cs-89-146, School of Computer Science, Carnegie
Mellon University.
References 351

[Mitchell et al., 1983] Tom M. Mitchell, Paul E. Utgoff, and Ranan B. Banerji
(1983). Learning by experimentation: Acquiring and refining problem-solving
heuristics. In Ryszard S. Michalski, Jaime G. Carbonell, and Tom M. Mitchell,
editors, Machine Learning: An Artificial Intelligence Approach, pages 163-190.
Tioga Publishers, Palo Alto, CA.
[Mooney, 1988] Raymond J. Mooney (1988). Generalizing the order of operators
in macro-operators. In Proceedings of the Fifth International Conference on
Machine Learning, pages 270-283.
[Mouaddib and Zilberstein, 1995] Abdel-illah Mouaddib and Shlomo Zilberstein
(1995). Knowledge-based anytime computation. In Proceedings of the Four-
teenth International Joint Conference on Artificial Intelligence, pages 775-78l.
[Miihlpfordt and Schmid, 1998] Martin Miihlpfordt and Ute Schmid (1998). Syn-
thesis of recursive functions with interdependent parameters. In Proceedings
of the Workshop on Applied Learning Theory, pages 132-139, Kaiserslautern,
Germany.
[Natarajan, 1991] Balas K. Natarajan (1991). Machine Learning: A Theoretical
Approach. Morgan Kaufmann, San Mateo, CA.
[Newell, 1965] Allen Newell (1965). Limitations of the current stock of ideas about
problem solving. In Allen K. Kent and Oren E. Taulbee, editors, Electronic
Information Handling, pages 195-208. Spartan Books, Washington, DC.
[Newell, 1966] Allen Newell (1966). On the representations of problems. In Com-
puter Science Research Reviews. Carnegie Institute of Technology, Pittsburgh,
PA.
[Newell, 1992] Allen Newell (1992). Unified theories of cognition and the role of
Soar. In John A. Michon and Aladin Akyiirek, editors, Soar: A Cognitive Ar-
chitecture in Perspective, pages 25-79. Kluwer Academic Publishers, Boston,
MA.
[Newell et al., 1960] Allen Newell, J. Cliff Shaw, and Herbert A. Simon (1960).
A variety of intelligent learning in a general problem solver. In Marshall C.
Yovits and Scott Cameron, editors, International Tracts in Computer Science
and Technology and Their Applications, volume 2: Self-Organizing Systems,
pages 153-189. Pergamon Press, New York, NY.
[Newell and Simon, 1961] Allen Newell and Herbert A. Simon (1961). GPS, a pro-
gram that simulates human thought. In Heinz Billing, editor, Lernende Auto-
maten, pages 109-124. R. Oldenbourg Verlag, Munich, Germany.
[Newell and Simon, 1972] Allen Newell and Herbert A. Simon (1972). Human
Problem Solving. Prentice Hall, Upper Saddle River, NJ.
[Nilsson, 1971] Nils J. Nilsson (1971). Problem-Solving Methods in Artificial Intel-
ligence. McGraw-Hill, New York, NY.
[Nilsson, 1980] Nils J. Nilsson (1980). Principles of Artificial Intelligence. Morgan
Kaufmann, San Mateo, CA.
[Novak, 1995] Gordon S. Novak Jr. (1995). Creation of views for reuse of software
with different data representations. IEEE Transactions on Software Engineering,
21 (12) :993-1005.
[Ohlsson, 1984] Stellan Ohlsson (1984). Restructuring revisited I: Summary and
critique of the Gestalt theory of problem solving. Scandinavian Journal of
Psychology, 25:65-78.
[Paige and Simon, 1966] Jeffrey M. Paige and Herbert A. Simon (1966). Cognitive
processes in solving algebra word problems. In Benjamin Kleinmuntz, editor,
Problem Solving. John Wiley and Sons, New York, NY.
[Pednault, 1988a] Edwin P. D. Pednault (1988). Extending conventional planning
techniques to handle actions with context-dependent effects. In Proceedings of
the Seventh National Conference on Artificial Intelligence, pages 55-59.
352 References

[Pednault, 1988b) Edwin P. D. Pednault (1988). Synthesizing plans that contain


actions with context-dependent effects. Computational Intelligence, 4(4):356-
372.
[Penberthy and Weld, 1992) J. Scott Penberthy and Daniel S. Weld (1992). uCPOP:
A sound, complete, partial-order planner for ADL. In Proceedings of the Third
International Conference on Knowledge Representation and Reasoning, pages
103-114.
[Peot and Smith, 1993) Mark A. Peot and David E. Smith (1993). Threat-removal
strategies for partial-order planning. In Proceedings of the Eleventh National
Conference on Artificial Intelligence, pages 492-499.
[Perez, 1995) M. Alicia Perez (1995). Learning Search Control Knowledge to Im-
prove Plan Quality. PhD thesis, School of Computer Science, Carnegie Mellon
University. Technical Report cMu-cs-95-175.
[Perez and Carbonell, 1993) M. Alicia Perez and Jaime G. Carbonell (1993). Auto-
mated acquisition of control knowledge to improve the quality of plans. Techni-
cal Report cMu-cs-93-142, School of Computer Science, Carnegie Mellon Uni-
versity.
[Perez and Etzioni, 1992) M. Alicia Perez and Oren Etzioni (1992). DYNAMIC: A
new role for training problems in EBL. In Proceedings of the Ninth International
Conference on Machine Learning, pages 367-372.
[Peterson, 1994) Donald Peterson (1994). Re-representation and emergent infor-
mation in three cases of problem solving. In Terry Dartnall, editor, Artificial
Intelligence and Creativity, pages 81-92. Kluwer Academic Publishers, Boston,
MA.
[Peterson, 1996) Donald Peterson, editor (1996). Forms of Representation. Intellect
Books, Exeter, United Kingdom.
[Pohl, 1971) Ira Pohl (1971). Bi-directional search. In Bernard Meltzer and Don-
ald Michie, editors, Machine Intelligence 6, pages 127--140. American Elsevier
Publishers, New York, NY.
[Polya, 1957) George Polya (1957). How to Solve It. Doubleday, Garden City, NY,
second edition.
[Qin and Simon, 1992) Yulin Qin and Herbert A. Simon (1992). Imagery and men-
tal models in problem solving. In N. Hari Narayanan, editor, Proceedings of
the AAAI 1992 Spring Symposium on Reasoning with Diagrammatic Represen-
tations. Stanford University, Palo Alto, CA.
[Rich and Knight, 1991) Elaine Rich and Kevin Knight (1991). Artificial Intelli-
gence. McGraw-Hill, New York, NY, second edition.
[Russell, 1990) Stuart J. Russell (1990). Fine-grained decision-theoretic search con-
trol. In Proceedings of the Sixth Conference on Uncertainty in Artificial Intel-
ligence, pages 436-442.
[Russell and Norvig, 1995) Stuart J. Russell and Peter Norvig (1995). Artificial
Intelligence: A Modern Approach. Prentice Hall, Upper Saddle River, NJ.
[Russell et al., 1993) Stuart J. Russell, Devika Subramanian, and Ronald Parr
(1993). Provably bounded optimal agents. In Proceedings of the Thirteenth
International Joint Conference on Artificial Intelligence, pages 338-344.
[Sacerdoti, 1974) Earl D. Sacerdoti (1974). Planning in a hierarchy of abstraction
spaces. Artificial Intelligence, 5(2):115-135.
[Sacerdoti, 1977) Earl D. Sacerdoti (1977). A Structure for Plans and Behavior.
Elsevier Science Publishers, Amsterdam, Netherlands.
[Schank et al., 1975) Roger C. Schank, Neil M. Goldman, Charles J. Rieger III,
and Christopher K. Riesbeck (1975). Inference and paraphrase by computer.
Journal of the Association for Computing Machinery, 22(3):309-328.
References 353

[Schmid and Wysotzki, 1996] Ute Schmid and Fritz Wysotzki (1996). Induction of
recursive program schemes. In Proceedings of the Tenth European Conference
on Machine Learning, pages 214-226.
[Shell and Carbonell, 1989] Peter Shell and Jaime G. Carbonell (1989). Towards a
general framework for composing disjunctive and iterative macro-operators. In
Proceedings of the Eleventh International Joint Conference on Artificial Intel-
ligence, pages 596-602.
[Simon, 1975] Herbert A. Simon (1975). The functional equivalence of problem
solving skills. Cognitive Psychology, 7:268-288.
[Simon, 1979] Herbert A. Simon (1979). Models of Thought, volume I. Yale Uni-
versity Press, New Haven, CT.
[Simon, 1989] Herbert A. Simon (1989). Models of Thought, volume II. Yale Uni-
versity Press, New Haven, CT.
[Simon, 1996] Herbert A. Simon (1996). The Sciences of the Artificial. MIT Press,
Cambridge, MA, third edition.
[Simon et al., 1985] Herbert A. Simon, Kenneth Kotovsky, and John R. Hayes
(1985). Why are some problems hard? Evidence from the Tower of Hanoi.
Cognitive Psychology, 17:248-294.
[Smirnov, 1997] Yury V. Smirnov (1997). Hybrid Algorithms for On-Line Search
and Combinatorial Optimization Problems. PhD thesis, School of Computer
Science, Carnegie Mellon University. Technical Report cMu-cs-97-171.
[Smith and Peot, 1992] David E. Smith and Mark A. Peot (1992). A critical look
at Knoblock's hierarchy mechanism. In Proceedings of the First International
Conference on Artificial Intelligence Planning Systems, pages 307-308.
[Stefik, 1981] Mark Stefik (1981). Planning with constraints (MOLGEN: Part 1).
Artificial Intelligence, 16(2):111-140.
[Stone and Veloso, 1994] Peter Stone and Manuela M. Veloso (1994). Learning
to solve complex planning problems: Finding useful auxiliary problems. In
Proceedings of the AAAI 1994 Fall Symposium on Planning and Learning, pages
137-14l.
[Stone and Veloso, 1996] Peter Stone and Manuela M. Veloso (1996). User-guided
interleaving of planning and execution. In Malik Ghallab and Alfredo Milani,
editors, New Directions in AI Planning, pages 103-112. lOS Press, Amsterdam,
Netherlands.
[Stone et al., 1994] Peter Stone, Manuela M. Veloso, and Jim Blythe (1994). The
need for different domain-independent heuristics. In Proceedings of the Sec-
ond International Conference on Artificial Intelligence Planning Systems, pages
164-169.
[Tabachneck, 1992] Hermina J. M. Tabachneck (1992). Computational Differences
in Mental Representations: Effects of Mode of Data Presentation on Reasoning
and Understanding. PhD thesis, Department of Psychology, Carnegie Mellon
University.
[Tabachneck-Schijf et al., 1997] Hermina J. M. Tabachneck-Schijf, Anthony M.
Leonardo, and Herbert A. Simon (1997). CaMeRa: A computational model
of multiple representations. Cognitive Science, 21(3):305-350.
[Tadepalli and Natarajan, 1996] Prasad Tadepalli and Balas K. Natarajan (1996).
A formal framework for speedup learning from problems and solutions. Journal
of Artificial Intelligence Research, 4:445-475.
[Tamble et al., 1990] Milind Tamble, Allen Newell, and Paul S. Rosenbloom (1990).
The problem of expensive chunks and its solution by restricting expressiveness.
Machine Learning, 5(3):299-348.
354 References

[Tate, 1976] Austin Tate (1976). Project planning using a hierarchical nonlinear
planner. Technical Report 25, Department of Artificial Intelligence, University
of Edinburgh.
[Tate, 1977] Austin Tate (1977). Generating project networks. In Proceedings of the
Fifth International. Joint Conference on Artificial Intelligence, pages 888-893.
[Tenenberg, 1988] Josh D. Tenenberg (1988). Abstraction in Planning. PhD the-
sis, Department of Computer Science, University of Rochester. Technical Re-
port 250.
[Unruh and Rosenbloom, 1989] Amy Unruh and Paul S. Rosenbloom (1989). Ab-
straction in problem solving and learning. In Proceedings of the Eleventh Inter-
national Joint Conference on Artificial Intelligence, pages 681-687.
[Unruh and Rosenbloom, 1990] Amy Unruh and Paul S. Rosenbloom (1990). Two
new weak method increments for abstraction. In Proceedings of the Workshop
on Automatic Generation of Approximations and Abstractions, pages 78-86.
[Valiant, 1984] Leslie G. Valiant (1984). A theory of the learnable. Communications
of the Association for Computing Machinery, 27(11):1134-1142.
[Van Baalen, 1989] Jeffrey Van Baalen (1989). Toward a Theory of Representation
Design. PhD thesis, Artificial Intelligence Laboratory, Massachusetts Institute
of Technology. Technical Report 1128.
[Van Baalen, 1994] Jeffrey Van Baalen, editor (1994). Proceedings of the Workshop
on Theory Reformulation and Abstraction. Computer Science Department, Uni-
versity of Wyoming. Technical Report 123.
[Veloso, 1989] Manuela M. Veloso (1989). Nonlinear problem solving using intelli-
gent casual commitment. Technical Report cMu-cs-89-210, School of Computer
Science, Carnegie Mellon University.
[Veloso, 1994] Manuela M. Veloso (1994). Planning and Learning by Analogical
Reasoning. Springer-Verlag, Berlin, Germany.
[Veloso and Blythe, 1994] Manuela M. Veloso and Jim Blythe (1994). Linkability:
Examining casual link commitments in partial-order planning. In Proceedings of
the Second International Conference on Artificial Intelligence Planning Systems,
pages 170-175.
[Veloso and Borrajo, 1994] Manuela M. Veloso and Daniel Borrajo (1994). Learn-
ing strategy knowledge incrementally. In Proceedings of the Sixth International
Conference on Tools with Artificial Intelligence, pages 484-490.
[Veloso and Carbonell, 1990] Manuela M. Veloso and Jaime G. Carbonell (1990).
Integrating analogy into a general problem-solving architecture. In Maria Ze-
mankova and Zbigniew W. Ras, editors, Intelligent Systems, pages 29-51. Ellis
Horwood, Chichester, United Kingdom.
[Veloso and Carbonell, 1993a] Manuela M. Veloso and Jaime G. Carbonell (1993).
Derivational analogy in PRODIGY: Automating case acquisition, storage, and
utilization. Machine Learning, 10:249-278.
[Veloso and Carbonell, 1993b] Manuela M. Veloso and Jaime G. Carbonell (1993).
Towards scaling up machine learning: A case study with derivational analogy in
PRODIGY. In Maria Zemankova and Zbigniew W. Ras, editors, Machine Learning
Methods for Planning, pages 233-272. Morgan Kaufmann, San Mateo, CA.
[Veloso et al., 1995] Manuela M. Veloso, Jaime G. Carbonell, M. Alicia Perez,
Daniel Borrajo, Eugene Fink, and Jim Blythe (1995). Integrating planning and
learning: The PRODIGY architecture. Journal of Experimental and Theoretical
Artificial Intelligence, 7(1):81-120.
[Veloso et al., 1997] Manuela M. Veloso, Alice M. Mulvehill, and Michael T. Cox
(1997). Rationale-supported mixed-initiative case-based planning. In Proceed-
ings of the Fourteenth National Conference on Artificial Intelligence, pages
1072-1077.
References 355

[Veloso and Stone, 1995] Manuela M. Veloso and Peter Stone (1995). FLECS: Plan-
ning with a flexible commitment strategy. Journal of Artificial Intelligence
Research, 3:25-52.
[Wang, 1992] Xuemei Wang (1992). Constraint-based efficient matching in
PRODIGY. Technical Report cMu-cs-92-128, School of Computer Science,
Carnegie Mellon University.
[Wang, 1994] Xuemei Wang (1994). Learning planning operators by observation
and practice. In Proceedings of the Second International Conference on Artificial
Intelligence Planning Systems, pages 335-340.
[Wang, 1996] Xuemei Wang (1996). Learning Planning Operators by Observation
and Practice. PhD thesis, School of Computer Science, Carnegie Mellon Uni-
versity. Technical Report cMu-cs-96-154.
[Warren, 1974] David H. D. Warren (1974). WARPLAN: A system for generating
plans. Technical Report Memo 76, Department of Computational Logic, Uni-
versity of Edinburgh.
[Weld, 1994] Daniel S. Weld (1994). An introduction to least commitment plan-
ning. Artificial Intelligence Magazine, 15(4):27-6l.
[Wilkins, 1984] David E. Wilkins (1984). Domain-independent planning: Repre-
sentation and plan generation. Artificial Intelligence, 22(3):269-30l.
[Wilkins, 1988] David E. Wilkins (1988). Practical Planning: Extending the Clas-
sical AI Planning Paradigm. Morgan Kaufmann, San Mateo, CA.
[Wilkins and Myers, 1995] David E. Wilkins and Karen L. Myers (1995). A com-
mon knowledge representation for plan generation and reactive execution. Jour-
nal of Logic and Computation, 5(6):731-76l.
[Wilkins and Myers, 1998] David E. Wilkins and Karen L. Myers (1998). A mul-
tiagent planning architecture. In Proceedings of the Fourth International Con-
ference on Artificial Intelligence Planning Systems, pages 154-162.
[Wilkins et al., 1995] David E. Wilkins, Karen L. Myers, John D. Lowrance, and
Leonard P. Wesley (1995). Planning and reacting in uncertain and dynamic
environments. Journal of Experimental and Theoretical Artificial Intelligence,
7(1):197-227.
[Wood, 1993] Derick Wood (1993). Data Structures, Algorithms, and Performance.
Addison-Wesley Publishers, Reading, MA.
[Yamada and Tsuji, 1989] Seiji Yamada and Saburo Tsuji (1989). Selective learn-
ing of macro-operators with perfect causality. In Proceedings of the Eleventh
International Joint Conference on Artificial Intelligence, pages 603-608.
[Yang, 1997] Qiang Yang (1997). Intelligent Planning: A Decomposition and Ab-
straction Based Approach. Springer-Verlag, Berlin, Germany.
[Yang et al., 1998] Qiang Yang, Philip Fong, and Edward Kim (1998). Design pat-
terns for planning systems. In Proceedings of the 1998 AlPS Workshop on Knowl-
edge Engineering and Acquisition for Planning: Bridging Theory and Practice,
pages 104-112.
[Yang and Murray, 1994] Qiang Yang and Cheryl Murray (1994). An evaluation
of the temporal coherence heuristic in partial-order planning. Computational
Intelligence, 10(3):245-267.
[Yang and Tenenberg, 1990] Qiang Yang and Josh Tenenberg (1990). ABTWEAK:
Abstracting a non-linear, least-commitment planner. In Proceedings of the
Eighth National Conference on Artificial Intelligence, pages 204-209, Boston,
MA.
[Yang et al., 1996] Qiang Yang, Josh Tenenberg, and Steve Woods (1996). On
the implementation and evaluation of ABTWEAK. Computational Intelligence,
12:307-330.
Studies in
Fuzziness and
Soft Computing

H.-N. Teodorescu, L.C. Jain, A. Kandel (Eds.) W. Liu


Hardware Implementation of Propositional, Probabilistic and
Intelligent Systems Evidential Reasoning
The book offers an overview of a large spec- Integrating Numerical and Symbolic Approaches
trum of implementations for the computa- The book systematically provides the reader
tional intelligence based on neuro-fuzzy and with a broad range of systems/research work
artificial approaches. The clear and concise to date that addresses the importance of com-
explanations help the reader to understand bining numerical and symbolic approaches to
the hardware implementation aspects of the reasoning under uncertainty in complex
new computational intelligence paradigms. applications.
2001. XIII, 282 pp. 105 figs., 6 tabs. (Vol. 74) Hardcover
2001. XIV, 274 pp. 9 figs., 35 tabs. (Vol. 77) Hardcover
€ 59,95; sFr 99,50 ISBN 3-7908-1399-0
€ 59,95; sFr 99,50 ISBN 3-7908-1414-8

V. Loia, S. Sessa (Eds.) U. Seiffert, L.C. Jain (Eds.)


Soft Computing Agents Self-Organizing Neural Networks
New Trends for Designing Autonomous Systems Recent Advances and Applications
The book is devoted to a unifying perspective It is a compact and vivid collection of almost
of this topic. Fundamental topics explored in all aspects of current research on Self-Orga-
this volume are: - soft-computing approaches nizing Maps, ranging from theoretical work,
to define distributed problem-solving tech- numerical and implementation details on
niques to represent and reason about large- sequential and parallel hardware to
scale control systems; - enrichment of agent self-organisation with spiking neurons.
programming paradigm for cooperative soft- 2002. XIV, 278 pp. 119 figs., 27 tabs. (Vol. 78) Hard-
computing processing. cover € 59,95; sFr 99,50 ISBN 3-7908-1417-2
2001.VIII, 220 pp. 82 figs., 13 tabs. (Vol. 75) Hardcover
€ 54,95; sFr 91,- ISBN 3-7908-1404-0

You will find more information:


Please order from
Springer · Customer Service
http://
www.springer.de/studiesfuzziness
Haberstr. 7
69126 Heidelberg, Germany
Tel.: +49 (0) 6221-345-217/8
Fax: +49 (0) 6221-345-229
e-mail: orders@springer.de
or through your bookseller
All prices are net-prices subject to local VAT, e.g. in Germany 7% VAT for books.
Prices and other details are subject to change without notice. d&p · BA 50668
123

Вам также может понравиться