Вы находитесь на странице: 1из 63



Further Xoplieationa of the theory 1

The Spectrum of Cognitive Processes 1
Some Qualitative Features of Thinking and Problem
Solving 4
Set 5
Functional Fixity 9
Xnoub&tion 9
Insist 10
Rote fersua Meaningful Laarning 13
Concepts 17
Hierarchies of Processes 21
Susemry 22
Concepts 25
Soae Conditiona of Creativity 26
Planning and Xsmgery 26
Soae Consents on Representation 27
Repreeentation and the Senee Modalities 30
Abstraction and Generalisation 32
the Uses of Zaegery 34
aajaemryt Deagery 36
Onoorerentionality and Creatirity 36
Change of Set and Laarning 36
Stereotypy 37
la Unoonventionality Snongjh? 3d
Coaparieon with Other Theories 41
Aseoclationisa 42
Oestalt Theoriea 44
the Prooeeeee of Creative thinking 49
Problem Solving and Creativity 51
Vagueness of the Diatinotion 52
Siwlation of Problem Solving 53
Appendix 59
One-line Rules 59
Two-Line Rules 60
References 61

Further Implications of the Theory

The programs for the Logic Theorist f the General Problem

Solver, and the Chess Player constitute the formal part of our theory
of human thinking and problem solving. Of these, only the General
Problem Solver is intended as a detailed program for certain humans
solving a particular class of problems, and only this part of the theory
has been tested in any detail. The other two programs illustrate, how-
ever, various elements that are almost certain to be encountered in
human programs, in addition to those incorporated in GPS.
Let us make quite clear what is the fundamental assertion of
the theory: in the area of behavior that the programs handle, the be-
havior of the human thinker is to be explained by postulating that it
is governed by a program like these.
The programs we have written, even to the extent that they
prove valid explanations, handle only a relatively narrow range of the
total spectrum of human behavior. In the next few paragraphs we should
like to indicate what part of that total spectrum they cover, and what
kinds of programs are called for to deal with the parts they do not

The Spectrum of Cognitive Processes

We shall limit our discussion to cognitive processes. Writ-
ing programs thp.t will explain the affective and emotive aspects of
human behavior is no less important than writing pro "rams for the cog-
nitive ftsnects, but we shall have nothing to say about it because we
know nothing about it. The "whole nan" we shall try to describe will
be only a whole cognising man."
What are the main classes of processes we encounter in this
man? In terms of conventional classifications, which will prove ade-
quate for present purposes, we can list them ass
Sensory processes
Motor processes
Perception processes
Learning processes
Storage processes
Other thinking and problem-solving processes.
We use these terms as they are generally used in experimental
psychology, and shall not be concerned with drawing precise boundaries
between them* "Other thinking and problem-solving processes" means
those not encompassed in the other categories including most of thefrro*
grams we have described in this volume. Some aspects of our programs,
of course, incorporate processes of perception, learning, and storage.
It would serve no particular purpose to estimate how far the
present GPS, supplemented by some of the processes incorporated in the
Chess Player, falls short of being the "whole cognizing man." We would
not know what metric to apply in such an estimate, and if we had a metric,
the reader might find our estimate too optimistic for his tastes. We
can take a hint here from the General Problem Solver. We need not be
concerned with how far we are from the goal if we can find some steps
to take us toward it. What are the next steps toward a more complete
explanation of the cognitive aspects of human behavior?

An extremely important line of work is to develop programs in

the areas of perception and storage that are as elaborate as those we
now have in the area of problem solving. Some significant steps have
already been taken in this direction. The Elementary Perception and
Memorizing Machine is a first attempt at simulating human behavior in
the nonsense syllable memorizing experiment. But, in terms of the
processes it has available, it is a far more general device than this.
Among other things, it is capable of simulating the stages a child goes
through in learning to understand, speak, read and write language. How
good a simulation it will provide of the human process, in this or other
tasks, remains to be seen* Experience with it to date is limited to
less than one hour of computer operation. But it incorporates a suf-
ficiently wide repertory of processes to permit it to undertake the tasks
we have listed.
Peldman's program for binary choice, and the program for con-
cept attainment are experiments aimed at discovering the processes in*
volved in pattern recognition and concept formation. Again, t hey have
not yet reached a point where we can evaluate their worth as theories.
Although most of these programs respond "meaningfully" to sym-
bolic expressions of one sort or another, and are capable of producing
such expressions (including the "thirling aloud" trace), none of them
handles a language remotely resembling English in its syntax. Hence,
elaboration of the programs to handle such a language a line of in-
quiry now being pursued by linguists and by investigators interested in

machine language translation is an important direction of search.

In addition to these major explorations in new directions, both
the performance and learning aspects of OPS and the Chess Flayer need
to be further elaborated until they incorporate, as they certainly do
not at present f a reasonably complete repertory of the processes that
humans use in these tasks.
The test of whether a program explains a particular kind of be-
havior is whether it simulates humans exhibiting that kind of behavior.
The test of whether a program encompasses the "whole cognising man" is
whether it achieves such a simulation at least over the whole range of
tasks that has been used to study cognitive behavior in the laboratory.
In the next sections of this chapter we shall compare what we have learned
about the behavior of our programs with what has been reported in the
literature of psychology about human problem-solving behavior*

Some Qualitative Feature? oj[ Thinking and Problem Solving

A good part of the literature on thinking and problem solving,

particularly that coming out of the Qestalt tradition, is concerned with
establishing some of the qualitative phenomena that are observable in
or accompanying problem-solving behavior* We shall refer to these phen-
omena by their usual namess set, insight, hierarchy, concepts, rote and
meaningful learning, stages of problem solving, imagery, We shall see
what/our programs can cast on the meanings of these terms, and the sig-
nificance of the phenomena they denote for human thinking.

The term "set,*1 sometimes defined "as a readiness to make a

specified response to a specified stimulus," (Johnson, p. 65)» covers
a variety of psychological phenomena. We can illustrate several of
these by UP. Me should not be surprised to find that more than one
aspect of ET's behavior exhibits "set," nor that these several evidences
of set correspond to quite different underlying processes.
1. Suppose that after the program has been loaded in LT, the
axioms and a sequence of problem expressions are placed in its memory,
bffore the machine undertakes to prove the first problem expression, it
goes through the list of axioms and computes a description of each sub-
sequent use in the * similarity11 tests. For this reason, the proof of the
first theorem takes an extra interval of time amounting, in fact, to
about twenty seconds. Functionally and phenomenologically, this compu-
tation process and interval represent a preparatory set in the sense in
which that term is used in reaction time experiments."'
2. Directional set is also evident in LT»s behavior. When it
is attempting a particular subproblem, LIT tries first to solve it by
the substitution method* If this proves fruitless, and only then, it
tries the detachment method, then chaining forward, then chaining back-
ward. How when it searches for theorems suitable for the substitution

It turns out in LT that this preparatory set saves about one third
of the computing time that would otherwise be required in later
stages of the program.

method, It will not notice theorems that might later be suitable for
detachment (different similarity tests being applied In the two cases)*
It attends single-mindedly to possible candidates for substitution until
the theorem list has been exhausted] then it turns to the detachment
3. Hints and the change in behavior they induce have been men*
tioned earlier. Variants of I/T exist in which the order of methods at-
tempted by LT, and the choice of units in describing expressions depend
upon appropriate hints from the experimenter.
4« Effects from directional set occur in certain learning situ-
ations-was illustrated, for example, by the classical experiments of
Luchins. Luchins gave his subjects a series of problems, each problem
being solvable either by a hard general method, or by both the hard
method and an easy special method* After a sequence of problems solv-
able only by the general method, the subjects tended to persist in this
method with subsequent problems that could be solved by the easy method.
After a sequence of problems solvable by the easy method, they had dif-
ficulty in solving subsequent problems to which only the general method
We have already illustrated, in Chapter 7, the Luchins effect
as it arises from the learning of special methods in LT. With the special
thods learning program, when IS has learned that a particular theorem
y be useful, it gives that theorem a priority over others. The result
of trying the high-priority theorems first greatly reduces the time needed
to solve problems when these theorems work, but increases the time if
LT has to revert to its general theorem list for a solution. The extra
time required for the latter problems is attributable, as in Luchins*
experiments, to "persistence of inappropriate set."
The instances of set observable in th* program of LT are natural
and unintended by-products of a program constructed to solve problems
in an efficient way. In fact, it is difficult to see how we could have
avoided such effects. In its simplest aspect, the problem»solving pro-
cess is a search for a solution in a very large space of possible solu-
tions. The possible solutions must ve examined in some particular se-
quence, and if they are, then certain possible solutions will be examined
before others. The particular rule that induces the order of March induces
thereby a definite set in the ordinary psychological maaning of that term*
Preparatory set also arises from the need for computing effici-
eney. If certain information is needed each time a possible solution or
group of solutions is to be examined, it may be useful to compute this
information, once and for all, at the beginning of the problem-solving
process, and to store it instead of recomputing it each time.
Examples of directional set are particularly easy to find in the
Chess Flayer. A feature on the board (e.g., a piece under attack) evokes
a goal, or "set" (e.g., to prevent loss of the attached piece]. The
moves that are generated for consideration are the moves that are rele-
vant to that goal. Since goals are evoked in sequence, and since the Chess
Player accepts the first satisfactory move it finds, we may expect that in
general the move selected will depend on the set that was established
by the goal that evoked it. Indeed, one of the problems in the con-
struction of the executive routine is to prevent the effect of this
set from being excessive to prevent CP from evaluating a move only in
terms of the goal that evoked it* Thus, the evaluative procedure must
be kept apart from the move-evoking procedure at least sufficiently
to permit CP to evaluate a move from several aspects. Achieving this
detachment from the set that evoked a move is one of the important and
difficult problems that humans also face in learning to play a good game
of chess,
Set can also arise from the way in which information is stored
in memory. Information is stored in lists, and is found on these lists
by searching the lists in sequence. Hence, the order of items on lists
may have major effects on behavior. We have, in fact, made important
use jf this property in the LT special methods learning progran, for
this program operated simply by putting items on lints and changing
their order on lists. Similarly, in the Chess Player, there are lists
of moves and generators of moves associated with each goal. Which moves
will be considered first depend on the order in which moves are selected
from these lists or generated*
The examples cited show that set can arise in almost every aspect
of the problem-solving process. It can govern the sequence in which al-
ternatives are examined, the concepts that are used in classifying per-
ceptions, the order in which information is recovered from mory.
Functional Fixity
The phenomenon of functional fixity, first studied by Duneker,
is usually assumed to be caused by persistence of set. The following ex*
periment illustrates what is meant by the term. The subject is given a
task and discovers that he can perform it if he has some water* If thfjre
is a Jar of water in the room, he will discover and use it much more
readily than if there is a vase with water and a bouquet of flowers in it,
The usual (verbal) explanation of the phenomenon is that the use of the
water for the flowers bars the idea of using it for other purposes.
The Chess Player exhibits functional fixity and in its program,
the fixity proves to be adaptive* If one piece is essential to the pro-
tection of another, the Chess Player will generally notiea this if it
considers moving the first piece, and will reject the move. Tne Logic
Theorist and OPS do not, however, exhibit functional fixity the use of
a theorem for one purpose does not lead the programs to reject its use
for another. Since we have seen no evidence of functional fixity in our
human subjects in the logic experiment, we do not know whether its ab-
sence from the behavior of GPS in these situations should be regarded as
a defeat.

Anecdotes about solving problems by "sleeping on them* are
standard fare in books on creativity* The anecdotes are so common as to
suggest that they have some foundation in fact. The phenomenon is us-
ually explained in terms of "overcoming set,* but no one has said pre-

cisely how sleeping—or other forms of inattention to the problem—­

accomplishes this. To the best of our knowledge, none of the programs
we have written so far exhibits anything analogous to inofrbation, al­
though we shall watch for evidence of it as we add learning routines to
the programs.

In the psychological literature, "insight" has two principal
connotations: (1) "suddenness 11 of discovery, and (2) grasp of the "struc-
ture* of the problem, as evidenced by absence of trial and error. It
has often been pointed out that there is no necessary connection between
the absence of overt trial-and-error behavior and grasp of the problem
structure, for trial and error may be perceptual or ideational, and no
obvious cues may be present in behavior to show that it is going on.
In any of the problem-solving programs an observer's assessment
of how much trial and error there is will depend on how much of the record
of its problem-solving processes the machine prints out. Moreover, the
amount of trial and error going on "inside" varies within very wide limits
depending on small changes in the program.
The performance of our programs throws some light on the classi-
cal debate between proponents of trial-and-error learning and proponents
of "insight," and shows that this controversy, as it is usually phrased,
rests on ambiguity and confusion. A problem-solving program searches
for solutions to the problems that are presented it. This search must
carried out in some sequence, and the program* a success in actually
finding solutions for rather difficult problems rests on the fact that
the sequences it uses are not chosen casually but do, in fact, depend
on problem "structure.*
To keep matters simple, let us consider just one of the methods
UP uses—proof by substitution. The number of valid proofs (of some
theorem) that the program can construct by substitution of new expres­
sions for the variables in the axioms is limited only by its patience in
generating expressions. Suppos* now that the program is presented with
a problem expression to be proved by substitution. The crudest trial*
and«-error procedure we can imagine is for the program to generate sub­
stitutions in a predetermined sequence that is independent of the ex­
pression to be proved, and to compare each of the resulting expressions
with the problem expression, stopping when a pair are identical.2"
Suppose, now, that the generator of substitutions is constructed
so that it is not independent of the problem expression—-so that it tries
substitutions in different sequences depending on the nature of the latter,
Then, if the dependence is an appropriate one, the amount of search re­
quired on the average can be reduced* A simple strategy of this sort
would be to try in the acioms only substitutions involving variables
that actually appear in the problem expression.
The actual generator employed by LT is more efficient (and hence
more "insightful1* by the usual criteria) than this. In fact, it works

backward from the problem expression and takes into account necessary

conditions that a substitution must satisfy if it is to work. For ex-

ample, suppose we are substituting in the axiom *p implies (q or p), 14

and are seeking to prove wr implies (r or r).* Working backward, it
is clear that 1£ the latter expression can be obtained from the former
by substitution at all, then the variable that must be substituted for
£ is £. This can be seen by examining the first variable in each *x-
pression, without considering the rest of the expression at
Trial and error is reduced to still smaller proportions by
LT's aethod for searching the list of theorems. Only those theorems
are extracted from the list for attempted substitution which are
"similar11 in a defined sense to the problem expression. This means, in
practice, that substitution is attempted in only about ten per cent of
the theorems. Thus a trial-and-error seatreh of the theorem list to
find theorems similar to the problem expression is substituted for a
trial-and-error series of attempted substitutions in ea^h of the theorems,
In these examples, the concept of proceeding in a "meaningful*
fashion is entirely clear and explicit. Trial-and-error attempts take
place in some "space" of possible solutions. To approach a problem
"meaningfully" is to have a strategy that either permits the search to
be limited to a smaller subspace, or generates elements of tha space
in an order that makes probable discovery of one of the solutions early
in the process. The discussion of Bartlett*s problem in the Appendix to


Chapter 2 illustrates this aspect of meaningfulness very clearly.

We see that any heuristic that introduces selectivity into
search gives an appearance of insightfulness to the problem~solving
process it guides. But the element of "suddenness11 in insight is most
dramatically illustrated by the planning Method in GP3. This method,
it will be recalled, abstracts from the detail of the actual problem,
attempts to solve the abstracted problem and, if successful, uses the
solution as a plan for solving the original problem. When our labor­
atory subjects used the planning method, their discovery of a plan
(effective or i'lusory) was often accompanied by an "aha!" and other
evidences of the pleased surprise that is usually reported subjective
concomitant of suddenly achieved insight.
The heuristics incorporated in our problem-solving programs
seems to provide, then, an adequate explanation both of the understanding
of "structure" and the suddenness that are commonly taken as the earmarks
of insightful problem solving.

Rote Versus >feaningful Learning

Discussions of insight in the problem-solving literature have
often been associated with discussions of the difference between "rote11
and •meaningful" learning, the issue has great practical aa well as
theoretical Importance, for it lies at the very foundations of pedagogy.
Let us see what light our explorations of learning programs cast on the

Learning is program change* It involves discovering the new

program (or the changes) and fixating it. Attempts to classify learning
in terms of the processes of acquisitions-discovery and fixation—roight
lead to quite different distinctions from attests to classify learning
in terms of the content of the program changes—what is learned. Dis­
cussions of the differences between rote and meaningful learning really
refer to the latter distinction—to possible different content in the
program changes. We shall begin on this end of the problem*
In terms of everyday notions of what is "rote" and what is
meaningful, simple memorization of a proof would be regarded universally
as rote learning* But what about memorisation of a plan? If the plan
were simple compared with the whole proof, but efficacious in aiding re­
discovery of the proof, we would generally think of this as "aeaningful"
learning. In this case, the meaningfully-learned plan has no wider
transferability than the rote-learned proof, but it is Much more ef­
ficient in terms of the amount of fixation required to learn it, and
the amount of detail that has to be retained. This fact might help explain
the common finding that "meaningful* materials are better retained than
"rote11 materials.
The special methods learning of IX is, again, a rather direct fix­
ation, on the basis of a simple and not very deep analysis, of certain in*
formation about proofs* It is functional (and sometimes dysfunctional)
for transfer to new problems. It enhances the selectivity of future
search, hence reflects "insight* into problem structure, We would be even

x&ore inclined to think of it as meaningful if it were elaborated so

that the special methods were only applied to problems resembling
those in which they had previously worked... The table of connections
between differences and operators in OPS is very similar to the special
methods programs in LT, and would certainly, from its behavioral mani­
festations, be regarded as meaningful. The same remark applies to the
learning of a "good" set of differences.
We may summarize as follows: (1) One basis on which we judge
learning to be meaningful is that it is attended by behavior that re­
veals insight into problem structure. (2) Programs are more likely to
have this characteristic if they arise out of fairly complex analysis
during or after the performance process than if they simply involve
literal storagecf the product. (3) Learning that we would adjudge mean­
ingful on these criteria is more likely to be transferable to other tasks
than "rote* learning, although the learning of plans provides a striking
counterexample. (4) The greater parsimony of memory in meaningful as
compared with rote learning may provide part of the explanation for the
better retention of meaningful materials.5*
Some recent experiments growing out of the work reported by
Katona in his Organising and Msmorising have led to a further division
of meaningful learning into learning of methods versus learning of prin«*
ciples. In the typical experiment, of three groups of subjects, one is

Our speculations on this last point have convinced us that other

factors besides parsimony are involved, but since we have no experi­
mental facts to report, we will not develop the topic of retention

taught the specific solution for the problem, the second is taught a
method, applicable to some range of such problems, the third is taught
a principle that underlies the correct solution. The subjects are then
tested for retention of the solution and for transfer to other problems.
The general result has been tht subjects taught a method of solution
have outperformed, on both retention and transfer tests, subjects taught
a specific solution or a principle.-*/ What does the latter result—
the difference between teaching a principle and teaching a method—mean?
In our general theoretical formulation, we distinguish be­
tween the state language, a language for describing problem expressions
and their differences, and the process language, a language for describing
solutions as sequences of steps. Problem-solving was described as a
translation of state-language expressions (the problem statement) into
process-language expressions (the solution). Selective heuristics aid
in problem solving by suggesting what to do next—that is, they exercise
their selection in the space of possible processes, not in the space of
possible states* A statement about a characteristic of the solution of
a problem, if made i n the state language, is of no help in searching for
a solution unless there is already available a simple translation of that
statement into a statement in the process language*
Suppose that OPS faces a problem equipped with the problem ex­
pressions, lists of differences, and a list of operators. Suppose, how-

Of course, when they are tested for their ability to state the prin­
ciple, rather than to solve an old or new problem, subjects who have
been taught the principle explicitly outperform the others.
ever, that It does not yet have a tafol* of connections for this class
of problems, A hint likes "Try operator i3 n can be very helpful, for
it selects a particular subset of paths through the problem maze. A
hint like: "Eliminate the difference in number of variables between pre­
mise and conclusion* will be of no help, since, in the absence of a
table of connections, it suggests nothing about the operator to be ap­
plied next. Of course, if GP3 had a table of connections, it could
determine which operators were relevant to the difference in question
(translate the state-language hint about differences into a process-
language hint about operators), and try to apply theae operators.
We believe that the distinction we have just been using corres­
ponds precisely to the difference in the experimental situations between
teaching a principle and teaching a method. If we examine the actual
examples in the literature, we will find that teaching a principle means
providing a true statement, in the state language, about a characteristic
of the solutionj teaching a method means providing a selective heuristic
in the process language. We would not expect the principle to assist
materially in solving th® problem, or to transfer to other problem situ­
ations* And the data show that it doesn't*

An information processing system uses concepts as naturally and
necessarily as Holier*'s hero spoke prose. How do we test whether a per­
son or a laboratory animal has a particular concept? B>y noting whether

when appropriately motivated, he or it will behave differentially toward

stimuli that are instances of the concept and stimuli that are not* Such
differential behavior is based upon sonic* test, simple or complex, which,
applied to a stimulus, gives one result when it is an instance of the
concept, another when it is not.
Among the concepts that GPS employs in the environment of sym­
bolic logic are: connective, variable, wedge, horseshoe, number of vari­
ables (in an expression), nuniber of distinct variables, difference in
order, and so on. An example of a somewhat more complex kind of concept
in LT is a class of similar theorems, i.e., theorems that yield the same
output from a specified similarity teat. There is a routine in LT for
describing theorems and searching for theorems similar to the problem
•Depression or some part of it in order to attempt substitutions, detach­
ments, or chainings.
The bases for concepts, the criteria of classification, are pure­
ly pragmatic. Any set of objects can be put in a class and distinguished
from other objects if some kind of test can be constructed that will per­
form the discrimination. We may have, for example, the concept of a logic
expression that has a single variable, one argument place on its left side,
and two argument places on its right side. "P implies (P or P) w is an
expression exemplifying this concept; so is "Q implies (Q implies Q)."
In general, it will be useful to comprehend two objects in a con*
cept if there are occasions when the appropriate response to the occur­
rence of either object is the same. Hence, the set of concepts available
to an information processing system is an important basis for relating
its state«-de script ion of the environment to its process description of
the environment.
Let us elaborate a bit on this point. In Chapter 7> we proposed
the following as an important criterion for a good set of differences and
table of connections: that for every difference there should exist one
and only one relevant operator. A difference that corresponds to a single
relevant operator defines a "useful1' concept because it indicates to the
system what action should be taker, when the difference occurs.
Earlier in this chapter we provided an example from the Chess
Player of the adaptive function of set. Each goal in the chess program
corresponds to a concepti a piece under attack, a serious threat, and so
on. This concept involves tests in the state language, tests of charae*
teristics of the position on the board. With each concept is also asso­
ciated action programs in the process language—processes for generating
moves relevant to the concept and for evaluating moves with respect to
the concept. Hence, the concept again establishes a correlation between
states and processes.
A part of the body of psychological research on thinking and
problem solving is devoted to the topic of concepts. Indeed & recent
volume on concept attainment is titled* "A Study of Thinking." The two
subtopics that have perhaps been most explored are availability of con­
cepts and concept attainment. Representative of the former are studies
by comparative linguists of interlingual differences of concepts, and

studies by Heidebrenner of relative rates of learning of new names

for concepts, and the development studies of Piaget. Representative
of the latter are the studies by Bruner et al. just mentioned, and many
Every concept is a logical function, simple or complex, of the
elementary discriminations the information processing system can make.
When the function is complex, and the space of elementary discriminations
large, the process of acquiring a concept is usually called "concept
formation" or ^learning pattern recognition*" When the function is simple,
the phrase "concept attainment" is sometimes used. Thus, learning to
recognise a circle in the visual field would be an example of concept
formation or learning pattern recognition. Learning to choose all the
green triangles from a set of colored simple shcapes would be an example
of concept attainment. Whether this distinction between formation and
attainment has more than relative significance is not clear. It may well
be that quite different learning programs are used in concept formation
from those used in concept attainment. We do not have the evidence that
would answer this question.
If the distinction just discussed is more than superficial, then
the program for learning a good set of differences in GPS must be regarded
as a concept formation program, as must also the program of Selfridge and
Dinneen. None of the other learning programs we described for LT or GPS
involves the attainment or formation of new concepts*

Hierarchies o| Processes
Another characteristic of the behavior of our programs that
resemble human problem-solving behavior Is the hierarehlal structure
of their processes* In LT, for example, two kinds of hierarchies existt
In solving a problem LT breaks It down into component problems.
First of all, it makes three successive attempts! a proof by substitu­
tion, a proof by detachment, or a proof by chaining* In attempting to
prove a theorem by any of these methods, it divides its task Into two
partst first, finding likely raw materials in the form of axioms or
theorems prevlsouly proved, second, using these materials In matching.
In order to find theorems similar to the problem expression, the first
step is to compute a description of the problem expression, the second
step, to search the list of theorems for expressions with the same descrip­
tion. The description-computing program divides, in turn, Into a progrem
for computing the number of levels in the expression, a program for com­
puting the number of distinct variables, and a program for computing the
number of argument places.
IS has a second kind of hierarchy In the generation of new ex­
pressions to be proved* Both the detachment and chaining methods do not
give proofs directly, but instead, provide new alternative expressions to
prove* IS keeps a list of these subproblems and, since they are of the
same type as the original problem, It can apply all its problem-solving
methods to them* These methods, of course, yield yet other subproblems,
and in this way a large network of problems Is developed during the course

of proving a given logic expression. The importance of this type of

hierarchy is that it is not fixed in advance, but grows in response
to the problem-solving process itself, and shows some of the flexibility
and transferability that seem to characterize human higher mental pro-
The problem-subproblem hierarchy in LT's program is quite com-
parable with the hierarchies that have been discovered by students of
human problem-solving processes, and particularly by da Groot in his
detailed studies of the thought methods of chess players.^ Our earlier
discussion of insight shows how the program structure permits and ef-
ficient combination of trial-and-error search with systematic use of
experience and cue* in the total problem-solving process.

We have now examined a number of aspects of problem-solving
behavior that are prominent in the literature of the subject to see what
light is cast on them by the problem-solving programs. First, the pro-
grama provide rather clearcut and conclusive explanations for a number
of phenomena that have been sufficiently puzzling and unclear to have
engendered vast amounts of discussion—and controversy between behavior-
ists and Gestaltists. The programs exhibit both preparatory and direc­
tional set. In one case, they provide an example of functional fixity.
Their problea solving activity is often insightful, both in the sense of

6/ Op. cit., pp. 78-33, 105-111.


Baking use of problem structure and in the sense of sudden discovery.

The programs help us to understand some of the differential effects
that attend rote and meaningful learning, and enable us to distinguish
in unambiguous terms between the learning of methods and the learning
of principles. The programs cast light both on the nature of concepts
in an information processing system, and upon the learning of concepts.
Finally, the programs exhibit the hierarchical structures that are so
characteristic of thinking in complex situations.
The fact that the programs exhibit so well some of the most
striking characteristics that have been noted in phenomenological de­
scriptions of thinking gives us further reason for accepting them as
yalid theories of human thinking and problem solving. If we do so

accept them, we see that they give us a very powerful and unambiguous
language for talking about complicated phenomena that previously we have
talked about in exceedingly vague terms.
An operationalist might object that we have paid a large price
for this clarity—that we have erected an elaborate system of inter­
vening variables, in the form of programs and information processes, to
explain the observables. We agree that we have done just this but we
have done what is done in every successful physical theory. Our programs

are no more and no less objectionable as explantory constructs than the

Shrodinger aquation or the Heisenberg matrix in quantum mechanics.

If the kind of theory we have proposed be accepted, it will

eocercis* a very strong influence on the kind of experimental work that
is done in the study of higher mental processes. In the past, the
general paradigm for experimental work in this field has been very
simple! Take as the dependent variable the effectiveness of the thinking
or problem-solving process (number of problems solved, or what not)5 take
as the independent variable* one or more conditions that can be varied
by varying the task, the instructions, or the subjects; determine the
effects of variations of the independent variable upon the dependent

Independent variable Effectiveness

We would substitute for this scheme a more complex one. For

we can construct experimental situations in which the intervening vari­

ables, the programs, serve either as independent or dependent variables,
That is, we can study the effect of changes in experimental conditions
on the programs that result (these are experiments on learning); and we
can study the effect of differences of program upon problem-solving
performance. Introduction of the intervening variables will help us,
as it has helped us in the work discussed in this book, understand the
complex w*ys in which "inputs" into the information processing system
govern its "outputs."

f ' !
Environment ff—~~"7.\ Programs ~-^~> ] Effectiveness

Most of the psychological research on concepts has focussed
on the processes of their formation. The current version of IS is

mainly a performance program, and hence shows no concept formation.

There is in the program, however, a clearcut example of the use of con­
cepts in problem solving. This is the routine for describing the
theorems and searching for theorems "similar11 to the problem expression
or some part of it in order to attempt substitutions, detachments, or
chainings. All theorems having the same description exemplify a common
c+ncept. We have, for example, the concept of an expression that has
a single variable, one argument place on its left side, and two argu­
ment places on its right side.

Some Conditions of Creativity

In the remaining pages of this paper we shall use the

theory of problem solving developed in preceding sections to
cast light on three topics that are often discussed in relation
to creativity;
(1) the use of imagery in problem solving;
(2) the relation of unconventionallty to creativity;
(3) the role of hindsight in the discovery of new
These three topics were chosen because we think our theory
has something to say about them. We have not tried to include
all the traditional topics in the theory of creative activity—
we do not, for example, discuss the phenomenon of Incubation—
nor will we try to treat definitively the topics we have
included. We are still far from having all the mechanisms
that will be required for a complete theory of creativity.
Hence, these last pages are necessarily extrapolations and
are more speculative than the earlier sections.

^Planning and Imagery

Among the issues that have surrounded the topic of Imagery

in the literature on thinking the following have been prominen
1. What internal "language" is used by the organism In
thinking—to what extent Is this "language" related to the
sense modalities, arid is the thinking represented by elements
that correspond to abstract "symbols" or to pictures, or to

something else?
2. To what extent do the internal representations, whatever
their nature, involve generalization and abstraction from that
which they represent?
Using the example of planning we have been considering,
we believe some clarification can be achieved of both issues.

Some Comments on Representation

How are the objects of thought represented internally?

We are asking here neither a physiological nor a "hardware"
question. We wish an.answer at the level of information
processing, rather than neurology or electronics. ' In a state
description of an information-processing system, we can talk of
patterns of elementary symbols. These symbols may be electric
charges, as in some computer memories, or they may be the cell
assemblies of Hebb's theories, or they may be something quite
different. We are not interested in what they are made of.
Given that there are some such patterns—that the system is an
information processing system—our question is in what way the
patterns within mirror, or fail to mirror the patterns without
that they represent.
Let us take a simple example from logic. We may write on
a piece of paper the expression "(pvq)^p." What would it mean
to say that the "Game" expression was held in memory by the
Logic Theorist? With the present program it would mean that
somewhere in memory there would be a branching pattern of

elementary symbols (or the Internal counterparts of elementary

symbols) that would look like:


Of course, there would not literally be mounds of Ink like

"p", but there would be Internal elementary patterns In one-one
correspondence to these. Note however, that the correspondence
between the Internal and external representations as a whole
la far more complicated than the correspondences between
elementary symbols. The external representation of the expres­
sion In the Illustration is a linear array of symbols; the
Internal representation has branches that make it topologically
distinct from a linear array. The external representation uses
symbols like "(" and ")"; these are absent from the internal
representations—the grouping relations they denote being
implicit in the branching structure itself (i.e., the cluster
pvq, which is enclosed in parentheses in external representation,
is a subtree u: the entire expression In the internal represen­
The implicitness of certain aspects of the internal
representation goes even deeper than we have Just Indicated.
For the tree structure we represented on the paper above by
connecting symbols with lines is represented within the computer
memory by the fact that there are certain information processes
available that will "find the left subtree" and "find the
right subtree" of such a tree structure. The actual physical

locations of these elements In memory-can be (and usually arc)

completely scattered, oo long as these Information processes
have means for finding them.
Let us take another example. If we wish to represent on
paper the concept of a pair of elements, P and Q, abstracted
from the order of the pair, we can write something like: (PQ),
and append it to the statement that (PO.) is equivalent for all
purposes with (QP). In an internal representation, order—inde­
pendence of the terms of the pair might be secured in a quite
different way. Suppose that the symbols P and Q. were stored
(in some order) on a list in memory, but that the only informa­
tion processes available for dealing with lists were processes
that produced the same output regardless of the order of the
items on the list. Suppose, for example, that the "print list"
process always alphabetized the? list before printing. Then
this process would always print out "(PQ)" regardless of
whether the items wer stored on the list as PQ or QP. In
this case, the order-independence of the information processes
applicable to the lists would be an implicit internal represen­
tation of the equivalence of (PQ) with (QP)/?/
The main lesson that we learn from these examples is that
the internal representation may mirror all the relevant properties

A simple example of this in humans is well known t- teachers

of matrix algebra. Since all the elementary arithmetic and
algebraic systems that students have encountered previously
contain the commutative law, students must be taught, that the
matrix product AB is not equivalent to the product BA.

of the external representation without being a "picture" of It

In any simple or straightforward sense. It Is not at all clear
whether a human subject would be aware that his Internal repre­
sentation of a logic expression "carried" the information about
the expression in quite a different way from the string of
symbols on paper, or that, if he were aware, he could verbalize
what the differences were.
A similar point has been made in discussions of "encoding."
Our examples show, however, that encoding may Involve something
far more complex than translating a string of symbols in one
alphabet into another string of symbols in another alphabet.
The encoded representation may not be a string at all, and
there may be important differences In what is explicit and
what implicit in the two representations.

Representation and the Sense Modalities.

Since the internal representation of information need not

be a simple mapping of what is "out there," or even of what is
received by the sense organs, it is not easy to know what is
meant by saying that a particular Internal representation is
or is not "visual" or "auditory." Is the internal branching
structure that represents the logic expression inside the
Logic Theorist a visual image of the string of symbols on paper
or is it not?
There is an obvious fallacy in saying that it is not Just
because the spatial (or even the topological) relations are not

the same in the two. The internal representations we carry

around in our heads of even the most visual of pictures cannot
possibly have the same metrical relations within (and possibly
not even the same topologic relations) as without.
We believe that the explanation of why some memories are
visual, some auditory, and some verbal lies in a quite different
direction from a simple "mapping" theory. Since our explanation
rests on considerations that have not even been touched upon
in the present paper, we cannot discuss it at length. However,
a very brief statement of it may help us understand the role
of imagery in creative thought.
We will assert that an internal representation is visual
if it is capable of serving as an input to the same information
processes as those that operate on the internal representations
of immediate visual sensory experiences. These information
processes that can be applied to visual sensations literally
serve as a "mind's eye," for they can operate on memories that
have been encoded in the same way as sensory inputs, and when
they are so applied produce the phenomena of visual imagination.
Since there must be processes that can deal with sensory inputs,
there is nothing mysterious in the notion that these same
processes can deal with inputs from memory, and hence nothing
metaphysical or non—operational about the concept of "mind's
eye" or "mind's ear."
But the mind's eye is used not only to process inputs that
"nature" coded in visual form. Often we deliberately construct

visual representations of abstract relations (e.g., we draw

boxes to represent states of a system, and arrows connecting
the boxes to represent the processes that transform one state
into another). What can be the advantage of the imagery? The
advantage lies in the fact that when we encode information so
as to be accessible to visual processes, we have automatically
built into the encoded information all the relations that are
implicit in the information processes that constitute the
mind's'eye. For example, when we represent something as an
arrow, we determine the order in which the items connected by
the arrow will be called into attention.-
We are led in this way to the concept of systems of
imagery. A system of imagery comprises a set of conventions
for encoding information, and coordinated with these a set of
information processes that apply to the encoded information.
As we have seen, the information processes for Interpreting
the encoded information may be Just as rich in implicit conven­
tions as the processes for encoding. It is the fact that the
encoding makes available the former as well as the latter that
makes it useful sometimes to represent information in a modality
for which we have a rich and elaborate system of imagery.

Abstraction and Genera1iza11on.

Bishop Berkeley founded his (»pistemology on the personal

difficulty he experienced in imagining a triangle which is
"neither oblique nor rectangle, equilateral, equicrural nor

scalenon, but all and none of these at once." Hume, on the

other hand, found this feat of Imagination perfectly feasible I
The Logic Theorist would have to take Hume's side against
Berkeley. For In the planning program the problem solver has
the capacity to imagine a logic expression comprised of two
variables Joined by a connective, in which the connective is 4

neither v nor • nor ^ , but all and none of these at once.

For this is precisely what the representation (PQ) stands for
and the way in which it is used by the planning processes.
Once we admit that the relation between the object sensed
and its internal representation is complex, there is no difficulty
in admitting as corollary that the internal representation may
abstract from all but a few of the properties of the object
"out there." What we call "visual imagery", for example, may
admit of colorless images even if all light that falls on the
retina is colored.
The fact that the planning heuristic of the Logic Theorist
possesses generalized or abstracted images of logic expressions
does not prove, of course, that humans construct similar
abstractions. What it does prove is that the notion of an
image of a triangle "neither oblique nor rectangle, equilateral,
equicrural, nor scalenon, but all and none of these at once" is
not contradictory, but can be given a straightforward operational
definition in an information-processing system. Finally, since
the information processes that can operate on the abstracted
expressions in the Logic Theorist are of the same kind as those

that operate on the full-bodied expressions, we would be forced,
by any reasonable criterion, to regard the two images as belong­
ing to the same modality.
! Tne Uses of Imagery.

We have already hinted at the uses of imagery, but we would

' like now to consider them a little more explicitly. To think
about something, that something must have an internal represen­
tation of some kind, and the thinking organism must have some
processes that are capable of manipulating the representation.
k We•have called such a combination of representation and processes
, a system of imagery.

! Often, the term image is used somewhat;, more narrowly to

I ^ refer to those representations that correspond to one or another
of the sense modalities. Thus, we have visual images, auditory
images, and tactile irragcs, but we would not, in this narrower
usage, speak of "abstract images"—i.e., representations and

processes not used for representations of any of the sensory


When a particular representation is used for something, a

' large number of properties are imputed implicitly to the object
represented because these properties are imbedded in the informa—
tion processes that operate on representations of the kind in
question. Thus, if we represent something as a line, we are
1 likely—because that is the way our visual imagery operates—
i to impute to it the property of continuity.

Herein lies both the power and the danger of Imagery as a
tool of thought. The richer the properties of the system of
Imagery we employ, the more useful Is the Imagery In manipu­
lating the representation, but the more danger there is that
we will draw conclusions based on properties of the system of
imagery that the object represented doesn't possess. When we
are aware of the danger—and are conscious that we have encoded
information into a system of imagery with strong properties—
we are likely to call the image a "metaphor."
Often we are not aware of the danger. As has often been
observed, Aristotle's logic and eplstemology sometimes mistook
accidents of Greek grammar for necessary truths. From this

standpoint, the significance of modern mathematics, with its

emphasis on rigor and the abstract axiomatic method, is that it
provides us with tests that we can apply to the products of
thinking to make sure that only those assumptions are being
used that we are aware of.
The imagery used in the planning heuristic drastically
reduces the space searched by the solution generator by abstract­
ing from detail. This is probably not the only function of
imagery for humans, although it is the one best documented by
our present programs. We think there is evidence from data on
human subjects that even in those cases where there is not a
rich set of processes associated with the representation, imagery
may provide a plan to the problem solver at least in the sense
of a list of the elements he is dealing with and a list of which

of these ic related. We will have to leave detailed discussion

of this poscibility to another occasion.

Summary; Imaf.ery

We have applied our problem-solving theory to the classical

problem of the role of imagery in thought. Although our analysis
of imagery is admittedly speculative, it provides a possible
explanation of the relation of internal representations to the
sense modalities, and provides an example from one of the computer
programs of generalization or abstraction, and of an abstract
"visual" image. Finally, the theory shows how images of various
kinds can be used as the basis for planning heuristics.

Unoonv^ntionality and Creativity

Thus far, our view of the problem-solving process has been

a short-range one. We have taken as starting point a system of
heuristics possessed by the problem solver, and have asked how
it would govern his behavior. Since his initial system of

heuristics may not enable the problem solver to find a solution

in a particular case, we must also understand how a system of
heuristics Is modified and developed over time when it is not
adequate initially.

Change o_f Set and_ Learning

Although all adaptive change in heuristics might be termed

"learning," it is convenient to distinguish relatively short-run
and temporary changes from longer-run more or less permanent

changes. If we use "learning" to refer only to the latter,

then we may designate the former as "changes In set."
There Is a basis for the distinction between set change
and learning in the structure of the problem—solving organism.
The human problem solver (and the machine simulation) is
essentially a serial rather than a parallel instrument, which
because of the narrow span of its attention, does only one or
a few things at a time. If it has a rich and elaborate system
of heuristics relevant to a particular problem, only a small
part of these can be active in guiding search at any given
moment. When in solving a problem one subsystem of heuristics
is replaced by another, and the search, as a consequence, moves
off in a new direction, we refer to this shift as a change in
set. Change in set is a modification of the heuristics that
are actively guiding search, by replacing them with other
heuristics in the problem—solver's repertoire; learning is
change in the repertoire of heuristics Itself.


A major function of heuristics is to reduce the size of

the problem space so that it can be searched in reasonable time,
Effective heuristics exclude those portions of the space where
solutions don't exist or are rare, and retain those portions
where solutions are relatively common. Heuristics that have
been acquired by experience with some set of problems may be
exceedingly effective for problems of that class, but may

prove Inappropriate when used to attack' new problems. Behavior-

ally, stereotypy Is simply the subject's persistence in using
a system of heuristics that the experimenter knows is inappro­
priate under the circumstances.
It is a very common characteristic of puzzles that the
first steps toward solution require the solver to do something
that offends common sense, experience, or physical intuition.
Solutions to chess-mating problems typically begin with "sur­
prising" moves. In the same way, a number of classical exper­
iments with children and animals show that a simple problem of
locomotion to a goal can be made more difficult if a barrier
forces the subject to take his first steps away from the goal
in order ultimately to reach it. When the task has this char­
acteristic, the problem solver is obviously more likely to
succeed if his repertoire of heuristics Includes the injunction:
"if at first you don't succeed, try something counterintuitive."

Is Unconventionality Enough?

It sometimes seems to be argued that people would become

effective problem solvers if only we could teach them to be
unconventional. If our analysis here is correct, unconvention-
ality may be a necessary condition for creativity, but it is
certainly not a sufficient condition. If Unconventionality
simply means rejecting some of the heuristics that restrict
search to a limited subspace, then the effect of unconvention­
ally will generally be a return to relatively inefficient

trial-and-error search In a very much larger space. We have

given enough estimates of the sizes of the spaces Involved,
with and without particular heuristics, to cast suspicion on
a theory of creativity that places Its emphasis on Increase
in trial and error.
Let us state the matter more formally. Associated with
a problem is a space of possible solutions. Since the problem
solver operates basically in a serial fashion, these solutions
must be taken up and examined in some order. If the problem
solver has no information about the distribution of solutions
in the space of possibilities, and no way of extracting clues
from his search, then he must resort to a solution generator
that is, to all intents and purposes, "random"—that leads
him to solutions no more rapidly than would a chance selection.
At some later stage the problem solver learns how to change
the solution generator so that—at least for some range of
problems—the average search required to find a solution is
greatly reduced. But if the modified generator causes some
elements in the solution space to be examined earlier than
they would otherwise have been, It follows that the examination
of others will be postponed.
The argument for unconventlonallty is that at some point
a class of problems may be faced where the generator looks at
Just the wrong elements first, or even carefully filters out
the right ones so that they will never be noticed (as in the
chess example of queen sacrifices). A return to the original

trial-and-error generator would eliminate this perverse blind­

ness of the generator, but at the expense of reinstating a
search through an enormous space. What is needed in these-
cases is not an elimination of the selective power of a solution
generator, but the replacement of the inappropriate generator
by an appropriate one.
We have neither the data nor the space to illustrate this
point from classical instances of scientific creativity, but we
can give a simple example from chess, A chess novice is always
stunned when his opponent demolishes him with a "creative"
unconventional move like a sacrifice of a major piece. The
novice has carefully trained himself to reject out of hand
moves that lose pieces (and kicks himself for his oversights).
If he tries to imitate his more experienced opponent, he
usually loses the sacrificed piece. Clearly the opponent's
secret is not simply that he is willing to be unconventional—
to consider paths the novice rejects. The secret is that the
experienced player has various additional pieces of heuristic
that guide him to promising "unconventional" moves by giving
him clues of their deeper and more devious consequences. It
is the possession of this additional selectivity that allows
him, in appropriate positions, to give up the selectivity
embodied in the novice's rule of always preserving major pieces.
The evidence we possess on the point indicates rather strongly
that the amount of exploration undertaken by the chess master
is no greater than that undertaken by relatively weak players (l)

//' ' ^'f"

./,/ /

Comparison with Other Theories

We have proposed a theory of the higher mental processes,

and have shown how the General Problem Solver, which is a particular
examplar of the theory, provides an explanation for some processes
used by humans to solve problems in symbolic logie and other areas.
What is the relation of this explanation to others that have been

As sedation ism S-fL

The broad class cf theories usually labelledA. "assccla -
tionist ' share a generally behaviorist viewpoint and a
commitment to reducing mental functions to elementary
mechanistic neural events. We agree with the associatlonists
that the higher mental processes can be performed by mechanisms
indeed, we have exhibitedj a specific set of mechanisms capable
of performing some of them.
We have avoided, however, specifying these mechanisms
in neurological or pseudo-neurological terms. Problem solving,
at the information-processing level at which we have described
it—has nothing specifically "neural" about it, but can be
performed by a wide class of mechanisms including both human
brains and digital computers. We do not believe that this
functional equivalence between brains and computers implies
any structural equivalence at a more minute anatomical level
(e.g., equivalence of neurons with circuits). Discovering
what neural mechanisms realize these information-processing
functions in the human brain is a task for another level of
theory construction. Our theory is a theory cf the information
processes involved in problem solving, and net a theory of
neural or electronic mechanisms for information processing.

The picture cf the central nervous system to which cur

theory leads is a picture cf a more complex and active system
than that contemplated by mcst associationists. The notions
of "trace," "fixation," "excitation," and 'inhibition" suggest
a relatively passive electro-chemical system (or, alternatively,
a passive "switchboard"), acted upon by stimuli, altered by
that action, and subsequently behaving in a modified manner
when later stimuli impinge on it.
In contrast, we postulate an Information-processing
system with large storage capacity that holds, among other
things, complex strategies (programs) that may be evoked by
stimuli. The stimulus determines what strategy or strategies
will be evoked; the content of these strategies is already
largely determined by the previous experience of the system.
The ability of the system to respond in complex and highly
selective ways to relatively simple stimuli is a consequence
of this storage of programs and this "active" response to
stimuli. The phenomena of set, and insight that we have already
described, and the hierarchical structure of the response
system are all consequences of this "active" organization
of the central processes.
The historical preference of behaviorlsts for a theory
of the brain that pictured it as a passive photographic plate
or switchboard, rather than as an active computer, is no doubt
connected with the struggle against vitalism. The invention
of the digital computer has acquainted the world with a device--
obviously a mechanism--whose response to stimuli, is clearly
more complex and "active" than the responae of more traditional

switching networks. It has provided us with operational and

unobjectionable interpretations of terms like "purpose,"
"set," and "insight." The real Importance of the digital
computer for the theory of higher mental processes lies not
merely in allowing us to realize such processes ' : in the metal"
and outside the brain, but in providing us with a much pro-
founder idea than we have hitherto had of the characteristics
a mechanism must possess if itvis to carry cut complex
information-processing tasks.
Qestalt Theories
The theory we have presented resembles the assoclationist
theories largely in its acceptance of the premise of mechanism,
and in few other respects. It resembles much more closely
some cf the Qestalt theories of problem solving, and perhaps

most closely the theories of "directed thinking" of Selz and

de Groot. A brief overview of Selz's conceptions of problem
solving, as expounded by de Groot, will make its relation to
our theory clear.
1. Selz and his followers describe problem solving in
72 These are clearly the
terms of processes or "operations."-*^
counterparts of the basic processes in terms of which 0>
is specified.
2. These operations are organized in a strategy, in
which the outcome of each step determines the next.-^/ The

de Groot, op. cit., p. 42.

Ibid,, p. 44.

strategy is the counterpart of the program of GPS.

3. A problem takes the form of a "schematic anticipation. 11
That is, it is posed In some such form as: Find an x that stands in
the specified relation R to the given element E.*^ The counterpart of
this in GPS is the problems Find a sequence of sentences (x) that
stands in the relation of proof (8) to the given problem expression (E)«
Each method in GPS is an example of "schematic anticipation."
4. The method thatis applied toward solving the problem is
fully specified by the schematic anticipation. The counterpart in GPS
is that, upon receipt of the problem, the executive program for solving
such problems specifies the next processing step. Similarly, when a
subproblem is posed—like: "find an operator to reduce the difference"
the response to this subproblem is the initiation of a corresponding
program (here, the method of reducing differences).
5. Problem solving is said to involve (a) finding means of
solution, and (b) applying them.—' A counterpart in GPS is the divi­
sion between the method of reducing differences, which finds "likely"
•perators; and the Apply operator method, which tries to use these
materials* In applying means, there are needed both ordering processes
(to assign priorities when more than one method is available), and
control processes (to evaluate the application),**'

Jg/ Ibid., pp. 44-46

1J/ Ibid., PP. 47-53
Ibid., p. 50

6* Long sequences of solution-methods are coupled together.

This coupling may be cumulative (the following step builds on the result
of the preceding), or subsidiary (the previous stem was unsuccessful, and
a new attempt is now made).**' In GPS the former is illustrated by a
successful Reduce Difference goal followed by creation of a new Transform
goal; the latter by the failure of an Apply Operator go«l which is than
followed by a new attempt to reduce the difference with another operator.
7. In cumulative coupling, we can distinguish complementary;
methods from subordinated methods.-^ The former are illustrated by
successive substitutions and replacements in successive elements of a
pair of logic expressions. The latter are illustrated by the role of
matching as a subordinate process in the Transform method.
We could continue this list a good deal further. Our purpose
is not to suggest that the theory of GPS can or should be translated
into the language of "directed thinking." On the contrary, the specifi­
cation of the program for GPS clariliee to ft considerable extent notions
whose meanings are only vague in the earlier literature. What the list
Illustrates is that the processes that we observe in GPS are basically
the same processes that have been observed in human problem solving in
other contexts.
We come to the rather paradoxical conclusion that a theory of
thinking man as an information processing system is compatible with the

Ibid., p. 51
Ibid., p. 52

points of view of both S-R and Gestalt theories. We confidently ex­

pect that as information processing theories receive broader confirm­
ation they will be seized upon by both S-R and Gestalt theoriests as
vindications of their positions. How can this be?
There are strong drives in any theory-building toward complete­
ness—toward explaining everything that needs explaining. When a theory
has difficulty achieving completeness in some direction, its exponents
are likely to indulge in "hand waving11—substitution plausible rhetoric
for proof. (We have no doubt that appropriate examples of hand waving
will be found by reviewers in this volume too.) Now hand waving in
science takes two main forms:
1. If you can't explain it, name it. This is what Moliere's
physician was doing when he explained the sleep-producing powers of
opium by its possession of a dormitive quality.
2. If you can't explain the whole complex phenomenon, explain
some much simpler phenomenon and cl?im that the complex is simply an
extrapolation of the simple.
Gestalt psychology, in its insistence on starting with the
rich complexity of the higher mental processes, has been more often
guilty of the first kind of hand waving. It has named "insight,"
"good Gestalt," "pragnenz," and other phenomena which, though real
enough, are not better explained for having been named*
S-R psychology, in its abhorrence of vitalism, has been more
often guilty of the second kind of hand waving. It explains, or claims

to, the lea ning of nonsense syllables and asserts without proof that
learning the Gettysburg Address is a simple extrapolation involving the
sane mechanisms*
Vhe programs we have constructed show that the information pro­
cessing languages and technology are powerful enough to produce explana­
tions of complex human behavior, not just by naming or 3n principle,
but by accounting for that behavior in detail in terms of clearly speci­
fied mechanisms. The particular theories we have advanced may be wrong,
but they are precise, testable, and capable of handling complex be­
havior. It is this feature of the information processing technology
that makes it such a promising addition to the set of tools we have avail­
able for gaining an understanding of the workings of the human mind.


Alien Newell, J. C. Shaw, and H« A* Simon*

What Is meant by an "explanation" of the creative process?

In the published literature on the subject, the stages of
thought in the solution of difficult problems have been des­
cribed, and the processes that go on at each stage discussed.
Interest has focussed particularly on the more dramatic and
mysterious aspects of creativity — the unconscious processes
that are supposed to occur during "incubation," the imagery
employed in creative thinking and its significance for the
effectiveness of the thinking, and, above all, the phenomenon
of "Illumination/1 the sudden flash of insight that reveals the
solution of a problem long pursued. Experimental work — to the
limited extent that it has been done — has been most concerned
with directional set, including the motivational and cognitive
conditions that produce set and that alter set, and Inter­
personal differences in "inappropriate" persistence of set
All of the topics we have mentioned are Interesting
enough, and are appropriate parts of a theory of creative
thinking. In our own orientation to creativity, however, we
have felt the need for a clearer idea of the overall require­
ments and aims of such a theory. We propose that a theory of
creative thinking should consist in?

*Carnegie Institute of Technology


1. Completely operational speciftcatlons* for the

behavior of mechanisms (or organisms) that, with appropriate
Initial conditions, would in fact think creatively;
2* a demonstration that mechanisms behaving as specified
(by these programs) would exhibit the phenomena that commonly
accompany creative thinking (e.g., incubation, llluminiition,
formation and change In set^ etc,)3
3, a set of statements — verbal or mathematical — about
the characteristics of the class of specifications (programs)
that Includes the particular examples specified*
Stated otherwise, we would have a ©atlsfaotory theory
of creative thought if we could design and build some mechanisms
that could think creativ*ly (exhibit behavior Just like that of
a human carrying on creative activity),, and if we could state
the general principles on which the mechanisms wer© built and
When it is put in this bald way,, thes© aims sound Utopian,
How Utopian they are — or rather^ how imminent their realization —
depends on how broadly or narrowly we interpret the term "creative„"
If we are willing to regard all human complex problem solving as
creative, then — as we shall point out — successful programs for
problem-solving mechanisms that simulate human, problem solvers
already exist, and a number of their general characteristics are

*As we shall explain later, we propose that such a set of speci­

fications take the form of a. program,, as that term is used in
the digital computer field,' and we will henceforth refer to them
as "programs„"

known. It we reserve the term "creative" tor activities like

discovery of the special theory of relativity or composition
| of Beethoven 1 s Seventh Symphony, then no example ot a creative

i mechanism exists at the present time*

• However, the success already achieved in synthesizing
I mechanisms that solve difficult problems in the same manner as
humans is beginning to provide a theory of problem solving that
is highly specific and operational. The purpose of this paper
: is to draw out some of the implications cf this theory for
' creative thinking* To do so is to assume that creative think -
• ing is simply a special kind of problem-solving behavior* This
j seems to us a useful working hypothesis*
; We start by discussing the relation of creative thinking
to problem solving in general, and by inquiring to what extent
existing problem-solving programs may be considered creative.
Next we sketch the theory of problem solving that underlies
these programs, and then use the theory to analyze the programs,
f and to compare them with some human problem-solving behavior
exhibited in thinking-aloud protocols of subjects in the
laboratory. Finally, we consider some topics that have been
• prominent in discussions of creativity to see what this analysis
' of problem solving has to say about them.

[t Problem Solving and Creativity

l In the psychological literature, "creative thinking"
designates a special class of activities, with somewhat vague

and Indefinite boundaries (see,, e,g, f, Johnson^ pp „ 166-16?) *

Problem solving Is called creative to the extent that one or
more of the following conditions are satisfied:
1, The product of the thinking has novelty and value
(either for the thinker or for his culture)*
2«The thinking is unconventional. In the sense that it
requires modification or rejection of previously-accepted ideas.
3* The thinking requires high motivation and persistence:
either taking place over a considerable span of time (con­
tinuously or intermittently), or occurring at high intensity.
4, The problem a.s initially posed was vague and 111-
definedj so that part of the task was to formulate the problem
Vaguene s s of the Pi s 1 1 n c 1 1 on
A problem-solving process can exhibit all of these
characteristics to a greater or lesser degree, but we are
unable to find any more specific criteria separating creative
from non-creative thought processes. Moreover, the data
currently available about the processes involved in creative
and non-creative thinking show no particular differences
between the two. We may cite, as example f the data of Patrick
(11,12) on the processes involved (for both professionals and
amateurs) in drawing a picture or writing a poem, or the data
of de Groot (1) on the thought processes of chess players.
Not only do the processes appear to be remarkably similar from
one task to another — agreeing well with Wallas* (16) account

of the stages in problem solving — taut It is impassible to

distinguish, by looking solely at tha statistics describing
the processes 9 the highly skilled practitioner from the rank
Similarly^ there Is a high correlation between creativity
(at least In tha sciences) and proficiency In the more routine
Intellective tasks that are commonly us@4 to measure intelligence
There is little doubt that virtually all the persons who have
made major creative advances In science and technology in
historic times have possessed very great general problem-solving
powers'(U, pp. 431-432),
Thus, creative activity appears eImply to be a special
elass of problem-solving activity characterized by novelty,
unconventlonallty^ persistence f and difficulty In problem
Simulation of Problem Solving
As we indicated earlier r, the theory of problem salving
we are putting forth derives from mechanisms that solve prob
in the same manner as humans, and whose behavior can be obse
modified, and analyzed, Th® only available technique for
constructing problem solvers is to writ© programs for digital
computers,* no other physical mechanisms are complex enough.
The material In the present paper r©sts mostly on several
programs that we have constructed. These ares
1. Th® Logi® Theorist» The L,ogf Theorist is a computer
program that la ©apabl© of discoveries srcofa for theorems

elementary symbolic logic * using heuristic techniques similar

to those used by humans. Several versions of the Logic
Theorist have been coded for a computer,, and a substantial
amount of experience has been accumulated yiith one of these
versions and some of Its variants (6« 1 P 8 r 9)«
2. The Chess Flayer, We have written a program that
plays chess. It Is Just now being cheeked out on the computer,
but we have don® a good deal of hand simulation with the
program so that we know some of Its morv Immediate Qharaatt
Isties (10).
When we say that these programs are simulations of human
problem solving, we do not mean merely that they solve prob
that had previously been solved only by humans — although they
do that also. We mean that they solve these problems by usin
techniques and processes that resemble more or less closely
the techniques and processes used by humans. The most rece
version of the Logic Theorist was designed explicitly as a
simulation of a (particular) human problem solver whose
behavior had been recorded under laboratory conditions«
Although the RAND-Carnegle group 1© the only one to our
knowledge that has been trying explicitly to construct prog
that simulate human higher mental processes„ a number of work
have been exploring the capabilities of computer programs to
solve complex and difficult problems. Many of these programs
provide additional Information about the nature of the prob
solving process-. Some of the more relevant ares

3. Musical Composition^ A aempwtar program has been

written and nan on the ILLIAC that composes music employing
Palestrina's rules of counterpoint. Som® of its music has
been performed by a string quartet and tape-recorded, but as
far as we are aware, no description of th® program has been
published. Other experiments in musical composition have also
been made.
4. Chess Playing* Two programs besides ours have been
written that play chess. Although both of these proceed in
a manner that is fundamentally different from the ways humans
play chess, some of their features provide illuminating com­
parisons (10) .
5. Design of Electric _Motors. At l«ast two, and probably
more, computer programs have been written,, and are now being
used by industrial concerns, that design electric motors.
These programs take as their inputs th© customers 1 design
specification®, and produce as their outputs the manufacturing
specifications that are sent to th® factory floor* The programs
do not simply make calculations needed in the design process,
but actually carry out the analysis Itself and make the
decisions that were formerly the province of the design
The main objective of these motor design programs, of
course, is to provide effective problem-solving routines that
are economical substitutes for engineers- Hence, these programs
simulate human processes only to the extent that such processes

are believed to enhance the problem-solving oapabllltlea and

efficiency of the programs.
6. Visual Pattern Recognition* A program has been -
written that attempts to learn a two-dimensional pattern — like
an "A" — from examples. The program was developed by Selfridge

and Dlneen (2, 15). Although only partly successful, It was

a pioneering attempt to use computer simulation as a technique
for Investigating an area of human mental functioning.
i ,

Is the Logic Theorist Creative?

1 The activities carried on by these problem-solving
t computer programs lie In areas not far from what Is usually
regarded as "creative." Discovering proofs for mathematical
i theorems, composing music, designing engineering structures,
( and playing chess would ordinarily be thought creative If the
j product were of high quality and original. Hence, the
* relevance of these programs to the theory of creativity Is

I clear — even If the present programs fall short of exaot

simulation of human processes and produce a fairly mundane
I product.
Let us consider more specifically whether we should
| regard the Logic Theorist as creative. When the Logic Theorist
j is presented with a purported theorem in elementary symbolic
$ logic, it attempts to find a proof. In the problems we have
actually posed it, which were theorems drawn from Chapter 2
of Whitehead and Russell f s Prineipla Mathematlca (17), it has
- found the proof about three times out of four. The Logic

Theorlst does not pose Its own problems — It must be given

these — although In the course of seeking a proof for a
theorem It will derive the theorem from other expressions and
then attempt to prove the latter• Hence, In proving one
theorem, the Theorist Is capable of conjecturing other theorems
and then trying to prove these.
Now no one would deny that Whltehead and Russell were
creative when they wrote Principle., Mathematlca (17). Their
book Is one of the most significant intellectual products of
the twentieth century. If It was creative for Whitehead and
Russell to write these volumes, it Is possibly creative f«r
the Logic Theorist to reinvent large portions of Chapter 2 —
rediscovering in many cases the very same proofs that Whltehead
and Ruasell discovered originally. Of course the Logic
Theorist will not receive much acclaim for Its discoveries,
since these have been anticipated, but, subjectively although
not culturally, its product is novel and original. In at
least one case, moreover, the Logic Theorist has discovered a
proof for a theorem in Chapter 2 that is far shorter and more
elegant than the one published by Whitehead and Russell .-

I/ Perhaps even this is not creative. The Journal of Symbolic

Logic has thus far declined to publish an article, co-
authored by the Logic Theorist, describing this proof. The
principal objection offered by the editor is that the same
theorem could today be proved (using certain meta-theorems
that were available neither to Whltehead and Russell nor the
Logic Theorist) in a much simpler way.

If we wish to object seriously to calling the Logic

Theorist creative, we must rest our case on the way It gets
the problems It tackles, and not on Its activity In tackling
them. Perhaps the program Is a mathematical hack, since It
relies on Whltehead and Russell to provide It with significant
problems, and then merely finds the answers to these; perhaps
the real creativity lies In the problem selection* This
certainly is the point of the fourth characteristic we listed
for creativity. But we have already Indicated that the
Theorist has some powers of problem selection. In working
backwards from the goal of proving one theorem, it can con­
jecture new theorems — or supposed theorems — and set up the
subgoal of proving these. Historically, albeit on a much
broader scale, this is exactly the process whereby Whltehead
and Russell generated the theorems that they then undertook
to prove. For the task they originally set themselves was to
take the basic postulates of arithmetic (as set forth by Peano
and his students), and to derive these aa theorems from the
axioms of logic. The theorems of Chapter 2 of Princlpia wer«
generated, as nearly as we can determine the history of the
matter, in the same way that subproblems are generated by the
Logic Theorist — as subproblems whose solution would lead to
the solution of the problem originally posed,
We do not wish to exaggerate the extent to which the
Logic Theorist is capable of matching the higher flights of
the human mind* We wish only to indicate that the boundary



It may be helpful to the reader, in following the specific

examples in the text, to have a brief description of the
problem-solving task involving logic expressions that was
designed by 0. K. Moore and Scarvia B. Anderson.
A logic expression is a sequence of symbols of two types:
(1) variables—P, Q, R, etc.—and (2) connectives—not (—), and
(•) or (v), and implies ( :D ). An example from the text is
R* ( —Pr>Q)» which may be interpreted as "R and (not-P implies
Q)." The subjects are not provided with this interpretation,
however, but are told that the expressions are code messages
and that the connectives are named "tilde" ( —), "dot" (•)#
"wedge" (v), and "horseshoe" (:D),
The following rules are provided for transforming one or
two given logic expressions into a n®w expression (recoding
expressions). We will state them here only approximately,
omitting certain necessary qualifications.

One-Line Bul«s

1. AvB fH> BvA

A-B *t B-A A-B - (-Av-B

2. Az?B 6. Az? B $*>¥ -AvB


3. AvA *=* A 7, AvfB'C) '

A-A ^ A A* (BvC) <

4. Av(BvC) W> (AvB)vC 8. A«B «* A

A- (B-C) &> (A-B)«C A-B * B

9« A AvX, where X is any expression


The rules can be applied to complete expressions, or (exc

rule 8) to subexpressions. Double tildes cancel, i.e.,
but this cancellation la nat stated In a separate rule.

Two-Line Rules

10. If A and B are given, they can be recoded Into A'B

11. If A and AoB are given, they can be recoded Into B.
12. If A=?B and B^C are given, they can be recoded into
Subjects were Instructed In the use of these rules, then
were given problems like those described in the text.
were asked to think aloud while working on the problems,
each time they applied a rule to recode one or two given
sions, the new expression was written on the blackboard
by the
experimenter, together with the numbers of the expressions
rule used to obtain it.
By inspection of the rules it can be seen that In the plan
ning space, where connectives and the order of the symb
ols are
disregarded, rules 1, 2, 5 and 6 would leave expressions
unchanged. These are the inessential rules; the others, in
altered form, become the essential rules. Rule 8, for
becomes simply;


1. De Qroot, A. D., Het Dank en van den Schaker, Amster

Noord-Hollandsche Uitgevers MaatschappiJ, 19^6 , dam,
2. Dinneen, 0. P., "Programming Pattern Recognition,"
Proceedings of the 1955 Western Joint Computer
Conference, Ins trtute of Racilo Enginee rs , 1955 9
pp. 94-100.
3. Duncker, K., "On Problem-Solving," Psycho! . Monogr
No. 270, . , §8,
4. Johnson, D. M., The Psychology of Thought and
Harper, New York ,1555 • Judgment,

5. Moore, 0* K. and Scarvia Anderson, "Search Beh

and Problem Solving," Amer r Soclol. Rev., 195 avi
pp. 702-71^.
6. Newell, A. and H, A. Simon, "The Logic Theory
IRE Transactions on Information Theory, Vol. Mac hine/1
No. 3, September,"T95^
7. Newell, A., J. C. Shaw and H, A. Simon, "Empir
Explorations of the Logic Theory Machine: a Cas
in Heuristics," Proceedings of the Western Joi e Study
Computer Conference, InstlFufe" of Radio Engine nt
FeBruary , 1957. ers,

8. Newell, A. and J. C. Shaw, "Programming the Log

Machine," Proceedings of the Western Joint Comic Theory
Conference, February, 1557* puter

9. Newell, A., J., C, Shaw and H. A, Simon, "Eleme

nts of
a Theory of Human Problem Solving/' Fsychol , Rev
1958, 6S, Nr .,
,..-. .. -.^.^o., . ., j. C. Shaw and H. A, Simon, "Chess
ron:rnrc£ and the Problem of Complexity/* J. RejBPlaying
Development, IBM Corporation. . and

11. Patrick, Catherine, "Creative Thought in Poets," Arc

Psychol., 1935, 178, h,
12. Patrick, Catherine, "Creative Thought in Artists w
J. Psychol., 1937, i, pp. 35-73- 8

13. Polya, 0., Mathematics and Plausible Reasoning, Pri

University Press, P~rTnceton, 1954. nceton

14. Polya, 0., How to Solve It, Doutoleday, New York, 1957.
15. Self ridge, 0. 0. "Pattern Recognition and Modern Computers, 11
Proceedings of the 1955 Western Joint Computer Conference,
Institute of"TTa&To Engineers, I9&5» PP» 9l-9;5.
16. Wallas, G., The Art of Thought, Harcourt, New York, 1926.
17. Whitehead, A. N. and Russell, B*, frincipia Mathereatica,
University Press, Cambridge, 1925-1927;