Академический Документы
Профессиональный Документы
Культура Документы
r i
4.l,1;':
t.:.*.:.1
iiir
'
:i;:,
'
Contents
, ''5,n
'
:;li
.{
:._
726 r,
17L i"
6. Procedural Learning
7. LanguageAcquisition
251,
Notes
References
Index of Authors
General Index
307
315
335
340
';i
fr
F
$,"
t*
i,i:
li;.
lf
'
1".-i
ll..
u:
f$.
l.:
':rr
ti,
ii
',.
iffif
*
{&: .,.
ldji
i l , . : i
:1
Preface
In the mid 1950s,when I was an undergraduate, three of the active areasof research in psychology were learning theory, psycholinguistics, and cognitive psychology. At that time there
was no coherent connection among the three, but the question
of how language could be acquired and integrated with the rest
of cognition seemed interesting. However, there was no obvious way to tackle the question because the field just did not
have the relevant concepts.There were the options of pursuing
a graduate career in learning theory, cognitive psychology, or
psycholinguistics. I chose cognitive psychology and I believe I
chose wisely (or luckily).
When I went to Stanford in 1958,Gordon Bower assignedme
a hot issue of the time: to understand the categoricalstructure
in free recall. As we analyzed free recall it becameclear that we
needed a complete model of the structure of human memory,
with a particular focus on meaningful structure. Bower and I
worked on this for a number of years, first developing the
model FRAN (Anderson, 1972), then HAM (Anderson and
Bower, 1973).Through this work we c6uneto appreciatethe essential role of computer simulation in developing complex
models of cognition. HAM, the major product of that effort, was
a complete model of the structures and processesof human
memory, having as its centralconstruct a proPositional network
representation. It successfully addressed a great deal of the
memory literature, including a fair number of experiments on
sentencememory that we performed explicitly to test it.
Critical to predicting a particular experimental phenomenon
was deciding how to representthe experimental material in our
system.Usually our intuitions about representationdid lead to
correctpredictions, but that was not a very satisfactorybasis for
vll
ix
Preface
Preface
vlll
Parser.
The first step to develoPins a more complete theory was
identifying an appropriate fJrmalisrn to model procedural
transiknowledgl. My fitit effort was to use the augmented.
aPPropriate
so
tion networks (ATNs), which had seemed
.for
natural-languageprocessing.A theory of languageacquisition
to
was deveto-peJitt that framework. However, ATNs prove-d
too
and
formalisms
be at once too restrictive as comPutational
led
powerful as models of human cognition. Jl-resedifficulties
had
Newell
Allen
me to considerthe production-systemmodels
I have
been promoting, which had some similarities to ATNs.
features'
Proto admit that aI first, when I focused on surface
I beduction systemsseemedunattractive.But as I dug deeper,
key incame more and more convinced that they contained
knowledge.
procedural
sights
-rn" into the nature of human
ecr system, the product of my threg years at Michigan,
synthesis
197g-1976,and described in Anderson (1976),was a
architecof the HAM memory system and production-systuT
the first
for
used
ture. In this system a ptod.t.tion system was
boralso
time as an interpret"t of a Propositional network. AcT
researchers,
rowed the spreiding activation concept from other
operatedon
such as Collins andbuillian. Its production system
of its longportion
a working memory defined by the active
simulation version of
term memory netfuork. In a computer ,,program,,
production
ACT, called ACTE, it was possible to
book desets that modeled various tasks. Much of my 1976
tasks'
scribes such production sets that modeled various small
where
of
However, ultho,.gh ACTE answered the question
from the
propositional ,"pr"J"r,tations came from-they came
where
u.tior6 of productions-now the natural question was,
learning
a
do produ.lior,, come from? This led to the issue of
Beasley,and
theory for production systems.Paul Kline, charles
Preface
task in a particular
how the system sets itself to do a particular
the choice of
system
production
way is left to intuition. ln a
to exeproduction
what
of
what to do next is made in the choice
stratresolution
conflict
the
are
cute next. Central to this choice
.gi",(seeChap*+l.Thusproductionsystemshavefinallysucpsychology'
ceeded in banishing the homunculus from
a gle-at *TI
In writing this book I have been helped by
B-radshawl
qtlt
Gary
drafts'
on
people who provided comments
Chase,ReneeElio,IraFischler,SusanFiske'|imGreeno'Keith
Holyoak,Billlones,PaulKline,SteveKosslyn'MattLewis'
Newell, ]ane
Brian MacWhin"ey; Michael McCloskey, Allen
zenon
Pirolli,
Peter
Pinker,
Perlmutter, Rolf Pfeifer, steven
Pylyshyn,RogerRatcliff,LanceRips,PaulRosenbloom'Miriam
ted on one or more of
schustack, and feff shrager have *^*"t
Ed Smith,
McClelland,
g,
these chapters. Robert i'rederkin IaY
in its
manuscript
the
read
and two anonymous reviewers have
editor-reading
excePtional
entirety. Eric lvanner has been an
andcommentingontheentirebook.LynneRederhasgone
idea by idea'
through this boik with me, word by word and
of
the ideas denumber
a
u*poJi.g weaknessesand providing
my "Thinkas
well
as
g-.,p,
.r"iop"aiere. The A.C.T. research
with me'
times
many
thesJideas
irrJiclasses, have gone oI"1
a1f exprograms
the
get
to
hard
tlurry people ha; worked
Bf-d:.h3*'
Gary
students'
perirnents ,urrning. My erafuate
and Miriam
it"rr"" Elio, Biil lJner, Mitt Lewis, Peter Pirolli,
Iwasawa
Takashi
collaborators.
Schustack,have tu"r, valuable
experiment-running
Proand Gordon Pamm wrote countless
Winslow
Rane
and
Riehle
Barbara
and data analyses;
fr*,
lwasawa and
greatly assisteJ in testing subiects; and Takashi
Two-outgeometry-proiect'
the
fir*k goyle did much work on
have
Sauers'
Ron
and
Farrell
,t"iai"g'.rrrdergraduates, Ro-b
acquisition
of
.the
simulation
L;;" reJpor,siUfi for the GRAPES
her many other duties'
of LISP progru*ming skills' Among
for prepalils Td
iiuitity
respot
Rane winslow had frimary
grateful^to..hut
deeply
am
I
and
this *i^t"'ipt,
wallace
Monica
work.
"oorai."ting
hard
and
for all of her resourcefulness
book
the
shepherding
of
months
*u"."rponsible for the last
into publication.
who have
.Finallv, I want to thank those Sovernment agencies
from
grants
.support
Research
research.
my
to
-".i
*;;;J
and
ACTE
of
development
the
in
NIE and NIMH **ru critical
sciences
Information
the
from
grant
ACTF, and a current
the geometry
Lu".n of NSF (tsr-go-r5357) has supported
i
Preface
x1
"d
4
gEH
fiH
o
l-lr
(n
E
A
e
7
!iD
rl
7
o
o
?$tie[[i]iaI eiH8
rEilti'6.,
9.
6
n
a-lii.
t.IH
rEiii
eegiligggalg
llgg
ill
Fl-
F'
Fg
CN
t
5'x I 6
A
t
s)
a
r
8fr5
6'E b
*r'EiilBirIB
[*i
iiEFillii
t
sE
lfiliialis
Fn<
rFg o)
a
al.
H.
5
p.
cs6
(')
K
v,
?F
o
Hi;rf*gt*
ilEH
lE[9i,*rnrF-ua
7
IO
n
-l
ProductionSystenrsand ACT
the product I have been working toward for the past seven
years. In ACT* the same core system if given one set of experiences develops a linguistic facility, if given another set of
experiencesdevelops a geometry facility, and if given another
set of experiencesdevelops a programming facility. Therefore
ACf is very much a unitary theory of mind.
learningtheory, or of someyet-to-beconceivedgeneral-purPose
leaming strategy-that are common to all domains. (Chomsky,
1980,p. 3)
This faculty approach holds that distinct cognitive principles underlie the oPeration of distinct cognitive functions'
tne unitary approach holds that all higher-level cognitive
functions can be explained by one set of principles. In some
ways the faculty approach seems iust plain common sense.
Many cognitive theories, extending back at least to the phrenology of Gall (seeBoring, 1.950,for a discussion of this and
the more "reputable" faculty theories) have held this view.
Its truth might almost seem a tautology: clearly we perform
different intellectual functions, so it might seem that we must
have different faculties for these functions. The faculty proposals that have been advanced have always gotten into difficulties in their specifics,but it never has been clear whether
there is anything fundamentally wrong with the faculff aPproach.
The early proposals for unitary systems (for example, stimulus-response, or S-R, theories) were also shown to be basically inadequate. However, the unitary theory found an important metaphor in the modern general-purPose comPuter
and, perhaps more significantly, in symbolic Programming
languages,which showed how a single set of principles could
span a broad range of computational tasks. It also became
clear that the set of computational functions was unlimited,
meaning that general processing principles were essential to
span broad ranges of tasks. It made no sense to createa special system for each conceivablefunction.
A number of candidates for a general system have been offered, including general problem solvers (Fikes and Nilsson,
1971;Newell and Simon, 1972;Sacerdoti, 1977),general inference systems (Green and Raphael, L969; McDermott and
Doyle, 1980; Robinson, 7967), and general schema systems
(Bobrow and Winograd, 1977; Minsky, L975; Rumelhart and
Ortony, 1976; Schank and Abelson, 1977). My research has
been predicated on the hypothesis that production systems
provide the right kind of general computational architecture
for achieving a unitary mental system. The particular line of
production system theories I have developed all go under the
name ACT. This book will describe a special ACT instantiation called ACf (to be read ACT-star). As will become clear,
ACT* is not just a random member of the ACT series. It is
ProductionSystemsand ACT
lack of the right knowledge, not becauseof a fundamental incompatibility of the knowledge categories.What is remarkable
is the easewith which novices switch among categoriesand the
sheer impossibility of identifying where one faculty might
is similarly imbegin and another end. Compartmentalization
possiblein the caseof languageuse (seeSchankand Birnbaum,
in press).
In summary then, there are three lines of evidence for the
unitary approach. One is the short evolutionary history of
many of the higher human intellectualfunctions, such as those
concerned with mathematical problem solving. The second is
that humans display great plasticity in acquiring functions for
which there was no possibility of evolutionary anticipation.
The third is that the various cognitive activitieshave many features in common.
I would like to head off trr"opossiblemisinterpretationsof my
position. First, the unitary position is not incompatible with
the fact that there are distinct systems for vision, audition,
walking, and so on. My claim is only that higher-level cognition
involves a unitary system. Of course, the exact boundaries of
higher-levelcognition are a little uncertain, but its contentsare
not trivial; language, mathematics,reasoning, memory, and
problem solving should certainly be included. Second,the unitary position should not be confused with the belief that the
human mind is simple and can be explained by iust one or two
principles. An appropriate analogywould be to a programming
language like INTERLISP (Teitleman, 1976),which is far from
simple and which supports a great variety of data structures
that is, one can
and functions. However, it is general-purpose,
use the same data structures and processesin programs for language and for problem solving. Individual programs can be
createdthat do languageand problem solving as specialcases.
In analogy to INTERLISP, I claim that a single set of principles
underlies all of cognition and that there are no principled differencesor sepaiatidiis of faculties. It is in this sensethat the theory is unitary.
Production Systems:History and Status
I
I
I
I
-qns arll sr:luurlr uarll zel uotlJnPor.l 'sutunlo) aql qBnoJHl ate
-ral! ol poSqns e sE las plno^{ 'd1dde ol lsrlJ aql 'Id uolpnPoJd
'-f
't
989
8t
tr9
:uralqord uorllpPE Surrrtollo;aql
se r{Jnsuralqord alduexa ue ol uorlelldde s1r8u1rer1dq ural
-sds uollrnpord e q)ns PuelsraPunol lsalsea sI ll 'spo8 qrrqrvr
las suollJnpord qrlq^ pue spo8 qJIq^ o1 puodsar suollrnpord
aql dq pa11or1
rlrlrllvrs/vroqsdllerrseq1'1 arnSrg1s1eo8;o8ur11as
-uof, sr suollJnpord asaql to uollEJttddV 'las lt{l ur suolpnp
-ord aqt Euoure lorluoJ Jo /\{oUeql salprlsnl[ 1'1 arnSrc r'rBd
N1IHI aqt dq suorlf,erra{l pue ged dt aql {q uan18are suoll
-rnpord asaql Jo suolllpuor aql 'alqet uolllppe aql PazlJotuatu
'I'[ alqsl ur uarrrS
seq pa{qns aql lpql sawnssealduexa slql
sr uorlrppe Suruuo;rad JoJ suollrnpord Jo las alduexa uy
uor^vHilfl
lIHr 9NIJVITI
'ralel apprll aq
ill/rr lErll slulod perlsqe
alow Sullardralq roI lualaJar alarJuoJ e sP InJasn aq ilr/$. Isel
rr;nads p suuouad terlt ualsds uollJnPoJd e 1o alduexa uy
ruagsfg uolpnPud
E to aldruexX uY
li,
il
t\
lit
'sasou8ulp
[EJIpatusnoIJuAJo salllllquqord
'aluels
aql atepdn ol ss,l\A
NI)AI{ ul uopJP;o droSaler.rolsule
-"1 ,oj.,{rouraur Sq*orrr ol sluauala Surpp. eAIoAurdlrre_ssa
-Jau lou saop ror1pqaq 1nq 'u1epJo af,ueJgaddeaql ol asuodsar
'suralsdsalp,'ral rllstnS
uI sI loJluo) ,sualsdi .rorljttpord ur sy
,sarn1rallr{rre
pruaqJs
-url puE suralsds
sB qJns
1o sadfi laqlo
/aIJIIuoqs)
pve (gL6I
NIJAI I aill surals{s .u?lPlP?P saPnll
-ur.,(gL6l,qlog-sadpH pus ueuual96 aas)Sutajsfis pa1catrp-ua I
x, -aid;;il;" r"-p"rooi ffto8"lpr lerauaS arour e aruarrs ralnd
-ruof, uI .ur8aq sarroaql pcfoloqodsd ro stus{Puuo, aJuaIJs
ralnduroJ Jaqlo alaqlr pue Pua salJePunoqsll aJaql aurulralap
aq1
ol pJEr{sr lr pus 'anfle1 s1uralsdsuolpnpord u;o ldaruol 'slsrSo
-loqrdsd Jo ssauare^B otll ol suralsdsuorpnpord lqSnotq (ZL6I
'ZL6T'1ar*a51 :VI;T '{Jerllod PUP
'uourrg pue
fie/ttaN 1l5I
'salroaql
lunH) sallua1asdpua aql ul suoIlEJIIqnd1o salJasV
arua's ralnduor rot saSen3uBlSurur
inrpoloqcdsd ged u1 pu"'sn1e1s
snonSlqtue ue Peq suralsdsuo11
-ruerSord ped ur 8u1aq
'ProJupls
-rnpord ugls aql tuory lq8ru
le (0/6r) Iro/r^ uollsuas
-srp ralEl s,upuualsM pue sapxls dJreaeql uI uofial l-ar8aruu3
uapory
ls lro/r^ s,ila^AaNHlIr* ueSaq suralsfs uolpnPoJd
'sruJalasaql
Jo
aqt
atldsap'satroaql
,,eAIl
uorlJlPerluoJ
U-S
uI
aql
suollelouuoJ
'kt,t
-r,rdot,, sp srualsds uollenPord Jo a^IaJuor lq3rur euo
'uoslapuv)
'uoruls puE
Puoq asuodsar-snpulls
IIa/vtaN1916I
'dlqelou lsory 'sarroaql
aql
uollrnpord
sI
{ral
qJnw
faqt aTI
saf,uaraJJlp
JU-SJo salJpnbapeurlpuopelnduror aql a^otuar let{l
asuodsar-snlntulls
sarroaql
ur
{uew
sder'r
qll/r^
1nq
i l""ioaul
'tuslleuuoJ s,lsod
i o1 rnltruls als sarJoaql utalsds uolpnPord
aq uuo srlud uollJE
Jo salnJ alrr^{al aql tuoJJpaAIJapse Pa/ltaIA
-uolllpuoJ lualrnlaql lutll ldacxa srualsdsuog)nPo'rd luarrnrl ol
'(t61) lsod
aruslqurasaraIUII reaq sualsds uolpnpord lso4lnq
aqror{r'q ParPIr
yospsodord
:1,ffi'"i1ilfil"lill"!'$u,
ProductionSystemsand ACT
addition
systemfor performing
Table 1.1 A proiluction
goal is to do an addition problem
IF the -subgoal
is to iterate through the columns of the
the
THEN
problem.
IF the goal is to iterate through the columns of an addition
P2
problem
and the rightmost column has not been processed
THEN the subgoil ir to iterate through the rows of that rightmost column
and set the running total to 0.
IF the goal is to iterate through the columns of an addition
P3
P4
P5
P6
P7
P8
problem
and a column has iust been processed
and another column is to the left of this column
THEN the subgoal is to iterate through the rows of this column
to the left
and set the running total to the catry.
IF the goal is to iterate through the columns of an addition
problem
and the last column has been processed
and there is a carry
THEN write out the carry
and POP the goal.
IF the goal is to iterate through the columns of an addition
problem
and the last column has been processed
and there is no carry
THEN POP the goal.
IF the goal is to iterate through the rows of a column
and the top row has not been processed
THEN the subgoal is to add the digit of the top row to the
running total.
IF the goal is to iterate through the rows of a column
and a row has iust been Processed
and another row is below it
THEN the subgoal is to add the digit of the lower row to the
running total.
IF the goal is to iterate through the rows of a column
and the last row has been Processed
and the running total is a digit
THEN write the digit
and delete the carrY
and mark the column as Processed
and POP the goal.
Table1.1 (continued)
IF the goal is to iterate through the rows of a column
and the last row has been processed
and the running total is of the form "string digit"
THEN write the digit
and set carry to the string
and mark the column as processed
and POP the goal.
IF the goal is to add a digit to another digit
P10
and a sum is the sum of the two digits
THEN the result is the sum
and mark the digit as processed
and POP the goal.
IF the goal is to add a digit to a number
P11
and the number is of the form "string digit"
and a sum is the sum of the two digits
and the sum is less than 10
THEN the result is "string sum"
and mark the digit as processed
and POP the goal.
P9
PL2
goal to adding the digits of the rightmost column and sets the
running total to 0. Then production P6 sets the new subgoal to
adding the top digit of the row (4) to the running total. In terms
of Figure 1.1, this sequenceof three productions has moved the
system down from the top goal of doing the problem to the bottom goal of performing a basic addition operation. The system
has four goals stacked, with attention focused on the bottom
goal.
At this point production P10 applies, which calculates4 as
the new value of the running total. In doing this it retrieves
from the addition table the fact that 4 * 0 : 4. Production P10
also pops the goal of adding the digit to the running total. PoPping a goal means shifting attention from the current goal to the
one above it in the hierarchy. In this situation, attention will
return to iterating through the rows of the column. Then P7 ap-
aSaln
f^',Brh
I
I PRoBLEMI
f' l*
f
I
-l
rrffiATE
TFTROUGH I
I m.,,r,I\
_l
I-
THRqJGH
"dtE
| gn'"r.\
I
I
I
"E*rE
THROIJGH
nowsor
lt
;/,1'{
\ '//"1"1\\
\ \' ,//,,1'{\
mBlHt_flmmHmBl
I
- l
coLUMN3 l\
/ tt
l\
\Ptl
*,
,tx
t1
J
t
I
t'{{
tr
$
t
ff,
-t
ory:
The goal is to iterate through the rows of column 2
Row 3 is the last row of column 2
Row 3 has been processed
Running total is of the form "2 4"
i.r
"t
r.
ril
I
1,4
ProiluctiottSystemsand ACT
TheArchitectureof Cognition
ory by having one or more productions fire to deposit declarative information into working memory. For instance,a production responsiblefor retrieving this information might look like:
I
s
il
,
Working memory in the neoclassical system consists of a i.
c
number of slots, eachof which holds one clauseor element. An I
element is a list or relationalstructure built out of relations and
arguments; (8 + 3 = 11) could be an element. Working memory, which can hold a limited number of elements,orders them
according to recency. As new elements enter, old ones are
pushed out. This conceptionshows the strong influence of the
The original limit
buffer model of Atkinson and Shiffrin (196,8).
on working memory was taken to be on the order of seven ele.
ments, reflecting the results of memory sPan tests. However, it
6ften proves difficult to simulate behavior given this limit, and
informal proposals have been made to increaseit to twenty or
more elements. In nonpsychological engineering applications
of the OPS system (McDermott, 1981), the limit on working
memory is simply eliminated.
Another feature of the neoclassicalarchitecture is that the
-1
only long-term memory is production memory. There is not a
separate declarative memory to encode facts like "George
Washington was the first president of the United States" or the
addition facts used in the simulation in Table 1.L. This is a
maior point where ACT differs from the neoclassicalsystem.,
Nern'ellrvould achievethe effect of a separatedeclarativemem?l
i.r
I
,t
t1
15
IF GeorgeWashingtonis mentioned
THEN note that he wasthe first presidentof the UnitedStates.
PanensllsM, SrnIarmY, AND Coxrucr RESoLUTIoN
In the earlier neoclassicalarchitecture (but not in Newell,
1980), multiple productions may have their conditions
matched, but only a single production can aPPlyat a time. A set
of conflictresolutianprinciplesdetermineswhich production will
apply. The original conflict resolution principle in PSG invoived a simple ordering of productions, with the highest-ordered production applying. The ordering was simply specified
by the person who wrote the productions with the goal of m1king them function correctly. The conflict resolution principles
in the OPS system,which seemmore plausible, involve a combination of refractoriness,recency, specificity, and production
ordering. Refractorinessprevents a production from repeating
if it matchesthe samedata structure,thus preventing most accidental looping. Recency,probably the most powerful principle,
selectsthe production that matches the data element most recently added to working memory. lf two or more productions
match the samemost recent element, then a test is performed to
see which has the second most recent element, and so on.
Should a tie remain after all the elements have been matched,
the specificity principle is applied. Essentially this principle
says that the production with more condition elementsis preferred. If there still are competing productions (and this is very
seldom the case),then the conflict is resolvedby an ordering of
the productions.
Becauseonly a single production can apply during any cycle
of the system, it is difficult to model the parallelism in human
cognition. We can be simultaneously perceiving obiects, driving a car, generatinga sentence,and processinga conversation.
Underlying a processlike languagecomprehension are a large
number of parallel, interacting Processessuch as perception,
syntactic parsing, semantic interpretation, and inference (Andersot, Kline, and Lewis, 1977).In addition to the behavioral
evidence for parallelism, it is clear that our neural hardware
supports it. It is argued 0. A. Anderson,1973)that the brain is
t oi it all a serial computer becauseindividual operations take
relatively long times (10 msec or more) to perform. However,
16
TheArchitectureof Cognition
ProductionSystemsand ACT
the brain achievescomputational power by doing many operations in parallel. The neoclassicaltheory depends heavily on
parallel processingin its pattem matching, but parallelism must
go beyond pattern matching.
Newell (1980)proposed a variant of production system architecture called HPSA77, which involved a major departure from
the previous serial conception. He proposed it as an analysis of
how the HARPY speech recognition system (Lowerre, 1976)
could be achieved in a production system framework. HPSA77
is probably the best current instantiation of Newell's beliefs
about production system architecture. He distinguished between productions that did not involve variables and those that
did. He proposed that productions without variables could
apply in parallel without restriction and that productions with
variables had a limited serial restriction. His basic claim was
that there was a variable-using mechanism that could handle
the variablesin only one instantiation of a production at a time.
All limitations on parallelism came from this limitation; if the
production involved no variables, there was no limitation; if it
did involve variables, the production would have to "wait its
turn" to use the variable mechanism.
LEenurNc
Although many of its design decisions were motivated by
learning considerations,the neoclassicalsystem has not had a
strong position on how learning proceeds. What has been implemented basicallyfollows the ideas of Waterman(1974,1975),
who proposed that productionscould deposit in working memory specifications as to the condition and action of a new production. Then a specialBUILD operator would be called to create the production according to these specifications. The
following production might cause new productions to be built
to encode arithmetic facts:
IF the addition tablereads"LVA + LVB : LVC"
THEN tag [the goalis to add LVA and LVB] as condition
and tag [the answeris LVC] as action
and BUILD.
If this production read "3 + 2 : 5" from the addition table, it
would deposit in working memory the following clauses:
(CONDITION[the goalis to add3 and 2])
(ACTION[the answeris 5]).
17
.,*'*'
18
of the performance
ACTF differed from ACTE in some details
theory of production ac;t;t""t, and principally in that it had a
pronotn systems were developed as simulation
;'"ift;.
then-current
the
of
,nat emboiied all the assumptions
d**,
of programming
theory. These simulations had the character
be implecould
models
languages in which various ACT
m e n t e d . T h e e x p e r i e n c e w i t h A C T E a n d A C T F o v e r t h e p a s t of
maior reformulations
several years t u', p.o"ided-a-basis for
its psychological
improve
to
theory
many aspects oiin'" original
and
implemented
b"ut
has
aspect
accuracy. Each ,efo.*Jated
t es t edas as im u l a ti o n Pro g ra m.S i mu l a ti n gthevari ousaspects
quite- expensive as we have
(in comp,rtu, .y.tes) has 6ttot"
in the computer
moved to assumptions that are less efficient
s i m u l a t i o n , a l t h o u g h n o t n e c e s s a r i l y i n t h e o p e r a t i o n s o f t h enot
cost, we have
human brain. BecJ,.rseof the computational
all these assumpcreated a general simulation that embodies
tions at once.
this system
One might think that by logical Pt9.g.t":tjon
reflect the
to
ACT'}
callit
we
should be called ACTG. Howeve-r,
the ACT
within
reformulation
belief that it is the final major
subthethe-performance
theories
framewort. 1n prfrriot" ACT
learning-zubtheories'
the
with
ories were not'iunt integrated
ACT* have also
b-u! $is-.gap is now closel' The assumptions-of
of the earlier
been revised to te^edy the known difficulties
concerns certain astheories. The only part I feel is tentative
These assumptions will
sumptions about ihe pattern matcher.
bef laggedwh e n th e p a tte rn ma tc h e ri s d i s c ussedi nC hapter4.be
I expect ACT* to
The previous paragraph implies that
is based not on any weaknesses in the
wrong. This
"*p".tutiln
in science, which comes
*account
theory, but onihe nature of progress
for a wide range of
that
from formulati;;;h;"ries
what is wrong with
known pher,ome"na and then finding out
t h e s e t h e o r i e s . I r e g a r d a P r o s r e s s i o n l i k e t h e o n e f r o m H A Mto
ACTI (Anderson' 1976)
(Anderson and Bo#er, Lglrtirough
A c T : t as anexa mp l e o fre a s o n a b l e s c i e n ti fi cP rogress.
the HAM theory:
ir, my \976 booi on ACTE, I compargd- it to
,,In completinfthe ierra book we had the feeling th-1t we had
and for all and that
more or less azfinea the HAM theory once
its empirical conse--r/
the maior task ahead of us was to test
its final shape'
quences. I feel less certain that ACT has achieved
done as to
be
to
a^sa theory. There remains much exploration
because
largely
on it,
the potential of the theory and variations
A CT is at heo ry o fmu c h b ro a d e rg e n e ra l i ty thanH A Mandcon.
In this book I have
sequently hur'*oru potential to explore.
t9
OUTSIDEWORLD
Figure1.2 A generalframeworkfor the ACTprodnctionsystem,identifynnd their interlinkingproing the major structutalcomponents
cesses.
20
1 . The representational properties of the knowledge structures that reside in working memory and their functional
consequences.
2 . The nature of the storage process.
3 . The nature of the retrieval process.
4 . The nature of production application, which breaks
down to:
a. The mechanism of pattern matching.
b. The process that deposits the results of production
actions in working memory.
c. The learning mechanisms by which production application affects production memory.
ACT* differs from ACTE in its instantiation of each of these
points except 2 and 4b, where the two theories basically agree.
Tur AssuMPTroNs or ACT*
Table 1.2 lists the fourteen basic assumptions of ACT* (what
is listed as assumption 7 is not an additional assumption), and
21
22
and ACT
Productiort Systems
ot' ACT*
Table 1.2 AssumPtions
Technical time assumption-'[ime is continuous'
system comBasic architectural assumption There is a production
representation'
knowledge
declarative
ihe
on
operates
that
ponent
can be decomZ. beclaratiae ,ipresentation. Declarative knowledge
Each cognitive
cognitive-units.
of
posed into a tangled hierarchy
in a specified
five
elements
than
more
no
of
set
a
of
unit consists
relation.
f, any cognitive.unit
3. Activation of ileclaratiaememory. At any time
a,(t) associated
activation
of
level
a
nonnegative
has
i
or element
with it.
i (cognitive
4. Strength in ileclaratiue memory' Each memory node
r1lof a link
strength
relative
The
si.
a
strength
has
unit or element)
where the sum)6sl
s1f
as
is
defined,
node
i
and
node
between
I
f [,r 5tn1t1
mation is over nodes confrected to i. Df[p.,r rwl]S
0.
1.
5.Spreadofactiaation.Thechangeinactivationatanode'rs
dlscribed by the following differential equation
U9':Bn,(t)-P*a{t)
dt
wherenl(f)istheinputtothenodeattimefandisdefinedas
nt$l:c1(f)+lrna{t)
6,
7.
8.
9.
10.
23
12.
t,l
a satisfactory match to a set of declarative structures, the production is selected to apply. The pattem matcher is represented as a
data-flow network of pattem tests. The rate at which these tests
are performed is a function of level of activation of the pattem
'fhe
level of activation of that node
node that performs the tests.
is a positive function of the strength of the node, the level of
activition of the data structures being matched, and the degree of
match to these structures. It is a negative function of the level of
activation of competing pattems matching to the same data.
Goal-directeil processinS. Productions can specify a goal in their
condition. If their goal specification matches the current goal,
these productions are given special precedence over productions
that do not.
difference in the time taken to store declarative versus Procedural information. A fact may be committed to memory after a
few seconds of study. In contrast, it aPPears that new Proceduies iin be created only after much Practice (see Chapter 6)' It
i3 aifficult to explain this huge discrepancy if both types of
knowledge are encoded in the same way'
Declaritive representation. Knowledge comes in chunks or
cognitiae units as they are catled in ACT*. Cognitive units can be
,,if, things as propositions (for example, (hate, BilI, Fred)),
strings (oie, two, tiree), or spatial images (a triangle above a
rq,t"L;. In each case a cognitive unit encodes a set of elements
in a particular relationship. Chunks contain no more than five
elements.RMore complex structures can be created by hierarchiCal structures, such as a sentence's phrase structure or a ProPOsition embedded in another. [n such cases one cognitive unit
appears as an element of another. These are familiar ideas in
coinitive psychology, and stated at this level, there is much
evldence for-cognitive units (Anderson and Bower, 1973; Bad-
ProductionSystemsand ACT
of ACTE
Table 1.3 AssumPtions
0 . Technical time assumption. Time is discrete'
1 . Basie architectural assumption. There is a production system com-
5.
6.
7.
8.
9.
10.
slbn.
25
lont
-//
-/
(ENTER,ORDER,EAT.Ilrl -
Decidc
Go To
tryoil fo.
Hoslcas
Shorn lo
Check
foblc
Resc,Yolion
slr \
Dorn/
Loc
-+Rcstouronl
R e l o fi o n
lvo lks
inlo
Cu3fomer
Hungry
Hos Money
27
more active, contextually primed sense is selected.We were unable to model such selection in the all'or-none activation system of ACTE (Anderson, Kline, and Lewis,1977) because both
senses would be active, and there would be no basis for selection.
Activation serves the function in ACT of an associative relevancy heuristic. That is, activation measures how closely associated a piece of information is to information currently used,
and it is a heuristic assumption that associated information is
most likely to be needed to guide processing. To simply have
two levelsof activation (or and off) severely limits the value of
activation as a relevancy heuristic. The value is maximized
when activation can take on a continuous range of levels. There
are neurophysiological arguments for a continuously varying
level of aciivation. At some level of abstraction it is reasonable
to identify activation with rate of neural firing (although nodes
probably do not corresPond to simple neurons; see Hinton and
Anderson, lgSl). Since the rate of neural firing varies continuously, it follows that activation should do so also.
Strength in declaratiaememory. Each node in declarative memory haJan associated strength, which is basically a function of
use of that cognitive unit or element. The exact
t!r^e.fre_quency.of
failori lh"t t"u^ to be implicated in strength accumulation will
be discussed in Chapter 5. With the concept of node strength, it
is possible to define the relative strength of association between
nodes. If we are considering the connection from node i to node
:
Xpsi where s1is the
i, its relative strength is defined ds f 11 St /
strength of nodei and the summation is over all the nodes, including i, that are connected to i. These relative strengths are
important in the spread of activation (Assumption 5), because
more activation flows down stronger paths.
In ACTE the strength of links was the primitive concept, and
there was no separate notion of node strength. Link strength
was treated separately because links were formed independently of the propositions they defined. Thus different links
coutd have different strengths depending on how frequently
they were used. With the introduction of all-or-none storage of
cognitive units, this reason for an independent link strength
eviporates. On the other hand, there is evidence that the
strength of a node determines how much activation it can emit
(as reviewed in Chapter 5). Thus there has been a switch from
treating link strength as primitive in ACTE to treating node
strength as primitive in ACf. However, the derived concept of
relative link strength serves the same function in ACT* as the
26
28
.!'
'.1
"'1..
i)J
v
<<
,
more primitive construct in ACTE; it determines how much activation goes down a path.
Spread of activation. The basic equation in Assumption 5 in
Table 1.2 for spread of activation is continuous both in time and
in quantity of activation. This is in keeping with the character
of the ACT* theory. The momentary change in activation, according to the differential equation, is a function of the input to
a node and a spontaneous rate of decay at that node. The parameter B governs how rapidly activation accumulates from the
input converging on the node, and the parameter p+ Soverrts
how rapidly the activation of the node decays. The input to that
node is a possible source activation plus the sum of activation
from associated nodes weighted by relative strengths. Source
nodes provide the "springs" from which activation flows
throughout the network. Note that the amount of activation
provided by a source node is a function of the strength of the
node.
ACTE's spreading activation Process was different from
ACT*'s in two ways. First, as already mentioned, in ACTE activation was an all-or-none ProPerg of nodes rather than a continuous property. Second, in ACTE the time necessary for
spread of activation determined the temporal properties of information-processing; in ACT'f the important variable is time
for pattern matching. As will be discussed in Chapter 3, the parameters we assume for the differential equation in Assumption 5 imply that activation spreads rapidly from the source
nodes, and asymptotic levels of activation are rapidly achieved
throughout the long-term structure. Chapter 3 also will review
the current evidence indicating that activation effects can be
rapidly felt throughout the network. What takes time is the processing of information, once activated. ACT* proposes that
level of activation controls the rate of pattern matching required
for production application.
Maintenance of actiaation Activation is spread from various
sourcenodes, each of which supports a particular pattern of activation over the declarative network. The total activation pattern
is the sum of the patterns supported by the individual source
nodes. When a node ceasesto be a source, its pattern of activation quickly decays.
Nodes can become source nodes in a number of ways. They
can be created by the perception of obiects in the environment,
and any qlements deposited in working memory by a production's action become transient source nodes. These stay active
'
",
'
Zg
30
TheArchitectureof Cognition
F
/\
t_
\
Figure 1.4 A simple data-flow network fo,
matching.
discussionof pattern-
V1 and V2,
left of Hl
right of H2
the left of H2
the right of t12
Note that this pattern says nothing directly about the features
which, if present, indicate that the element is not an A (for example, a bottom horizontal bar). Other pattems that include (
these features that must be absent in an A will inhibiF the A I
pattem. Thus these features have their influence implicitly
ihrough inhibition. Pattern matching is achieved by combining
tests up from the bottom of such a network. Tests are performed
to determine what elements in working memory match the
primitive features at the bottom nodes. Supernodes comhine
the test results to determine if they are comPatible. For instance, the F pattern is a supernode that tests if the left point of
the horizontal bar corresponds to the midpoint of the vertical
bar. Thus the data flows up from the bottom of the network
until compatible instantiations of the top node are found. The
individuai nodes in the network are computing in parallel so all
patterns are being matched simultaneously. This system design
I' a
32
TheArchitectureof Cognition
was strongly influenced by the data-flow pattem matching proby Forgy (1979\.
posed
'
Because the process of performing tests at a node takes considerable time, evidence iot u match at a node will gradually
build uP or decreasewith time. The node's level of activation,
which rlflects the system's current confidence that the node will
result in a match, determines the speed of pattern matching'
The level of activation of the bottom terminal nodes is determined by the activation of the data structures to which they are
attached. The level of activation of higher nodes is determined
by how closely they match, how closely their subpattern, urrd ,l
superpatterns match, and how closely alternative patterns I
There is a set of excitatory and inhibitory influences
*it.n.
among the pattern nodes somewhat similar to the scheme proposed"by tti.Ct.ttund and Rumelhart (1981)and Rumelhart and
ilrlcClellind (1982). Subpatterns of a node send positive activation to the node. This positive activation is referred to as boftom-up excitatory influence. Thus activation of the F pattern in
Figure 1.4 increases activation of the F pattern. Alternate interpt-"t"tiot s (in this case, A and H) of the same subpatterns compete by means of lateral inhibitory influences.Finally, patterns
r.tppoit the activation of their subpatternsby top-down excitatory'influences.So a highly active A will tend to support activation of the F node. Tfiese comPutations on activation are designed to achieve the data refractorinessprinciple, the idea that
data structure will be matched to a single pattern' This
"uih
phur,o*"hon is'u6iquitou9,,,in perception (the mother-in-law
and wife picture c". b" teen only one way at a time), and similar phenomena also exist at higher levels of cognition (a Person
who shoots himself is not perceived as a murderer)'
Much of the actual pattern matching is achieved by the lateral
inhibition process forcing a choice among different interpretations of the same subpittem. A fundamental difference be-
L rtro
33
34
of Coqr^itiot'
-^.1
^'lt'( Architecture
,I^^trt'k
i-roi-\1ad'
L. +rh:/'f
more rapidly than
;;t;-iik;i
I to Kutch- and to match
focusing at"Ju
those that do not refer to a goal. This explains why
it' as
matching
facilitate
tention on an unfamiliat p"tt"* will
be
to
capacity
cause
also
will
observed by LaBerge (1973). This
cost-benethe
producing
pattems,
shifted from matching other
points
fit tradeoff observed by Posner and Snyder (1975)' These
4'
and
3
will be expanded in Chapters
Productian compilation. BecauseACT* has a set of mechanisms
with the
to account for acquisition of productions, it can deal
systems
earlier
plagued
that
maior source of underdeterminacy
that
fact
the
from
derives
underdeterminacy
like ACTE. This
partica
performing
for
sets
production
possible
there are many
may appear
ular task, and-the choiceof one rather than another
for the
arbitrary. As noted earlier, a principled explanation
mechalearning
choice of a particular production set lies in the
to derive
nisms that iive rise toit. In practice, it may be difficult
difficult
be
may
it
the subiect's current Produ"tiot set, because
history'-H9*learning
to ideniify all the relevant aspects of the
derivaever, in principle it is alwiys possible to make such
to ,
constraint
add
does
theory
ir, piactice a learning
tions,
"rri
sets'
the postulation of production
declaraAccording to AiT*, all knowledge initially comes in
Howprocedures'
general
by
tive form arid must be interpreted
replaces
gradually
task)proceduralization
a
ever, by performing
perform the
the interpretive apflication with productions that
rehearsing
behavior directly. ior examPle, rather than verbally
how it apout
figuring
and
geometry
in
rule
the side-angle-slde
that
production
plies to a pioblem, a student eventually has a
proThis
directly recogr,ites the application of side-angle-side'
ceduralization process ii lomplemented by u composition Proa single
cess that can combine a ,"q,tutt.u of productions into
towhich
production. Proceduralization and composition,
protask-specific
create
gether are called knowledgecompilation,
is the
ductions through pr".t"i.". Iinowledge compilation
are
They
system.
the
enter
means by whict new productions
do
productions
because
caution
ucquirei gradually and with
of
set
a
of
creation
Gradual
directly.
control behavior rather
be
will
thaterrors
likely
more
it
makes
task-specific productions
has totally retletectld in the learning process before the system
process corlinquished control to the new production set' This
(for exresponds to the automatization noted in many domains
u*pl", Shiffrin and Schneider, 1977)'
it
Proiuction tuning. Once a production set has been created'
acstrength
can be tuned for a domain. Pioductions accumulate
]3t
tr' r['\7 k gou,
35
cording to their history of success.Generalization and discrimination processessearch for problem features that are predictive
of the successof a particular method. Composition continues to
collapse frequently repeated sequencesof productions into singte productirons. These various learning processes produce the
gradual, continued improvement in skill performance that has
b"ut noted by many researchers (see Fitts, 1964). Chapter 6 will
discuss the wide range of skill improvement, and Chapter 7 will
show how tuning accounts for many of the phenomena observed in first-language acquisition.
The ppocedural learning mechanisms, which are gradual and
inductive, contrast sharply with the abrupt and direct learning
that is characteristic of the declarative domain. This contrast is
the best evidence for the declarative-procedural distinction,
which is basic to the ACT architectureRurrs aNp Nruner PrlusIBILrrY
It is worth noting that ACT* is a rule-based system (the productions being the rules), but one in which we are concerned
with how these rules are implemented in a psychologically and
a neurally plausible manner. Recently, it has been argued (see
and especially Anderson and HinHinton and Anderson, 198"1,;
ton, 1981) that there are no rules in the brain, only a system of
parallel interactions among connected neurons involving excitation and inhibition. It is argued that the apparent rulelike
quality of human behavior is an illusion, that humans do not
truly possess rules any more than a bouncing ball Possesses
rules that guide its behavior.
ACT* makes the point that there need be no contradiction between a rule-based system and the kind of neural system that
was found plausible by Anderson and Hinton (1981). According to the ACTI framework these researchers have simply confused the levels of description. They have focused on the neural
interactions that implement the rules and have not acknowledged the rules-a case of not seeing the forest for the trees.
Aglin, a computer metaphor may help make this point: A progtlm in a high-level langtrage is compiled in machine code, bttt
[he fact that machine code executes the program should not obscure the fact that it derived from the higher-level language.
tust as both higher-level and lower-level computer languages
exist, so do both cognitive rules and neurons.
One might counter that the human "machine code" does not
correspond to any higher-level rules, that the human system
programs itself directly in terms of machine code, but this
36
'[he
Arclritecture of Cognition
trr)eva^b
seems@
Intelligent behaviorrequires an integration of units that probably could not be achieved if the system were organized at such a myopic level. It is telling that most
of. the interesting applications of neural models have been to
peripheral processes like word recognition (for example,
McClelland and Rumelhart, L981) which require little higherlevel integration. When we look at complex cognitive processes
we see a seriality and a structure unlike anything that has been
achieved in a neural model before AcT* used rules to structure
the networks. Even these modest "ruleless" neural networks
have to be carefully structured by the theorist (an implicit reservoir of rules) rather than grown according to general principles.
The examples of successful modification of neural nets are even
more modest (but see Hinton and Anderson, 1981,,for a few examples). When we look at learning in ACT* (Chapters 5 and 7)
we will see that rule organization plays an absolutely critical
role in structuring the acquisition of complex cognitive skills. It
is unlikely that we can have an accurate psychological theory
without rules, any more than we can have an accurate psychological theory without a plausible neural implementation.
3T
, ,lr.
TheArchitectureof Cognition
ILTpLSMTNTINGSCHfrtrnrA IN PnOOUCrtOt"lSvSrnr"rS
a
It is possible to have much of the computational character of
prothe
instance,
schemi system in a production system. For
cessesthat match production conditions, especially in a partialmatch system Hk; ACT*, provide many of the features associated with schema instantiation. Indeed, much of the literature
1971;
on prototype formation (such as Franks and Bransford,
Schaf.f-er,1978)
and
Medin
1977;
Hayes-Roih ur,d Hayes-Roth,
asiciated with schlma theory can be explained by the ACT
learning processesdefined on the pattems contained in producand
tion conditions (Anderson, Kline, and Beasley, 1979; Elio
in I
pattem-matching
the
However,
PnocessesAnderson, 1981).
as
i"tbottnded
ot
\ fQ V r 5 qt<
po*erfu-l
,o
nearly
not
are
production systems
dynamically
to
possiblenot
is
it
example,
For
schema systems.
\,
invoke a second production in service of matching the condi- ir^ f n:
tion of a first otr". Th" amount of computation that goes into J/V s f')
matching any one production is bounded by the size of its condition. In a schema system, on the other hand, in trying to recognize a street scene one can evoke a truck schema, which can
eioke a car schema that witl evoke a wheel schema, which can
and
evoke a tire schema, which can evoke a rubber schema,
so on.
On the other hand, because it is computationally universal,
ACT* can simulate the operation of any schema by the oPera-tion of some production set. For instance, consider the use of
schemata to Llaborate stories, as is done by script theory
(schank and Abelson, 1977). As shown in the appendix to
ihapter 5, this situation can be modeled by having a set of produciions apply to a declarative rePnesentation of the schemata.
A sequence-oiproductions applies in which each productio" 91ther puts a small piece of the ichema into correspondence with
a story or uses a small piece of the schema to elaborate on the
story. while the end products in the ACT case and the schema
.uru are identical, the performance details are Presumably not
the same. In the ACT i"t", time to make the correspondence
should be basically linear with the complexity of the schema,
and errors should increase as more partial products are kept in
memory. Schema theories are rather thin in their performance
but there is no reason to expect the same predic"rr.r*ptions,
tions to hold.
'
Cnrrrctsrvrs oF SCHEMATHEoRY
A major problem with schema theory is that it blurs the proi cedural-declarative distinction and leaves unexplained all the
39
40
TheArchitectureof Cognition
42
TheArchitectureof Cognition
do digestion, I am almost certain it will be the exactsamemechanisris as are involved in the stomach.And if, God forbid, that
seemsnot to be the case,I will searchfor somelevel where there
is only a uniform set of digestive principles-even if that takes
me to a subatomic level."
We can be sure that the human mind is not to be explainedby
a small set of assumPtions.There is no reason to suppose the
mind is simpler than the body. Every effort to.discem a small
diset of operating principles for ih" h,'t.t an mind has fallen on
unitary
the
and
saster.T.f," issne betwten the faculty approach
approachis only secondarilyone of complexig. The maior issue
a
is whether a complex set of structures and ProcessessPans
difwhether
or
approach)
(a
unitary
broad rangeof phenomena
ferent structures and processes underlie different cognitive
iunctions (the faculty ipproach). The only choice is between
two complex viewPoints.
charFrom a biotogicai point of view, it seemsclear that our
of
level
intermediate
some
at
be
to
going
mindis
acterizationof
is
p.rsi*ony. On the oni ha|d, a great deal of information
amount
the
iransmitt6a Uy the geneticcode.On the other hand,
comof information in tf,e code is much less than the potential
(such
as
ways
piexity of the body or the-brain. There are many
made
is
information
fn"t"rut symmetry) that the samegenetic
numerous
to do multiple duty. Thus we should expect to see
princieach
see
to
expect
also
mental principles, but we should
ple showing uP in manY contexts'
Tnr DaNGERoF lcNonrnc Pansrrroxv
The danger in this freedom from the tyrannical rule of parsimony is thit assumptionswill be spun out in a totally unprincipled and frivolous way. Although the_-systemis complex, it is
equaly certain that most principies will hlve at leasta modestly
g"t et"l range of application. If one feels free to ProPosea new
turn, there is a real danger that one will
Issumption at
"".h
miss the fact that the same Processunderlies two phenomena
and so miss the truth about cognition. For this reason' new assumptions should not be added frivolously. I have worked with
the following criteria for introducing a new theoreticaldistinction:
1. There is a broad range Of evidence from numerous empirical domains that points to its existence'
2.
3.
43
The first two criteria are standard statements related to considerations of parsimony. However, because of the third I do not
feel the need to give the first two as much weight as I did when I
formulated the t976 ACT theory. The consequences of this
slight deemphasis on parsimony have hardly been disastrous.
Although it is impossible to apply a metric to parsimony, it is
clear that ACT* is much less than twice as complex as ACTE,
while it accounts for much more than twice as many Phenomena.
A good example is the concept of node strength (Assumptions 4 and 5 in Table 1.2). For six years I assumedonly link
strength for dividing activation. I refused to adopt the concept
that different nodes have different capacitiesfor spreading activation, even though in 7976 strong circumstantial evidenceI
pointed to such a concePt.Eventually, as partially documented I
in Chapter 5, the evidence becameoverwhelming. The current
theory has node strength as a primitive concept and link
strength as a defined construct, so it is only marginally more
complex, but it is much more accurate.In'retrospect I think I
was being foolishly faithful to parsimony in trying to avoid the
assumption of differential node capacity. It makes eminent
adaptive sense that more frequently used nodes would have
greatercapacity for spreading activation. The cognitive system
is just concentrating its resourceswhere they will be most used.
Some interesting assumptions would be required to explain
why we would not have evolved differential capacityfor nodes
of differential frequency.
If the human mind is to be understood through many princi--l
ples, one cannot expectACT* or any other theory at this time to I 't' "r; r"
be a complete theory of mind. It takes a good deal of effort to ) "
identify and formulate a single principle, and the time put into
""' ,,
ACT ii hardly sufficient to do more than make a beginning.
Hopefully, it will at least provide a framework for discovering
new principles. Again, consider an analogy: it is important,
perhaps essential, to understand the overall functioning of the
circulatory system before trying to understand the principles
that govem the behavior of capillariesin providing nutrients to
tissues. Similarly, it is useful to have an understanding of the
d$
ory.
2 | Knowledge Representation
S i ,.r. t.rr.
46
rePresentationisamaiordepartlll.frommylong-standing
that
position (Anderson and Bower, 1973; Anderson, 1976)
It
propositional'
is
ihe only type of knowledge rePresentation
(Anclaims
recent
might ilso leem contradiitory to my more
derson, 1976,\g78, lg1g) thit it is not possible to discriminate among various rePresentationalnotations. In Anderson
(1982)I diiuss at lengih the connection between the current
proposal and past the6retical and metatheoreticaldiscussions
bein cognitive science.The central issue is the distinction
the
of
view
of
point
the
from
Processes
tween rePresentations
view
defined on them and representationsfrom the point of
had
past
claims
My
them'
express
to
used
\\.. { of the notations
,t.ru., .i-)been about notation; the current claims are in terms of proa particular notation
. s-)I:-', ;;J- It is impossible to identify whether
or
correctly e*presses the structure of a rePresentation
,L'it^t
accorddifflrent knowledge structures are encoded
*t"tt"t
notations (Anderson, 1978, 1982). on the
different
to
ing
other hand, it is possible to decide that different knowledge
structures have different Processesdefined uPon them' The
in \ ( rr' l '
basic assumption is that representations can be defined
the
than
rather
them
on
terms of the prccessesthat operate
notation that exPressesthem.
This chapter sets forth the rePresentational assumptions
that are the foundation for the rest of the book. I will specify
its"
the notation to be used, although I make no claim for
\
system
production
I'u"'.i*tthe
how
specify
I
will
uniqueness. Also
l\
''t(
processesoperate on the various rePresentationalyP"t ""{ Jl
piti.ul and logical evidence for the existerrceof
present
"tt
The reseaich in this book is based mostly on
pro."ri.r.
such
prop*itions, less on strings, and least on images. Therefo.re,
the claims for image t"pr"ie^tation, especially, are somewhat
tentative and incomplete. However, it is important to address
this topic becauseimagery has been so important in discussions of mental rePresentation'
There may weti be more than the three rePresentational
types proposed here. However, there are severe constraints
J" tn.p"itulation of new rePresentationaltyPes. There must
be substantial empirical data that cannot be explained by eIisting types, and there must reasonably have been time in
our evolutionary history to create such a rePresentation and
an adaptive advantageio doing so. As an example of the second constraint, it would make no senseto propose a separate
,,algebra code" for parenthetical expressions.The most likely
."r,-didute for an additional code is a kinesthetic and/or motor
code as suggestedby Posner(1967,1973)'
47
KnowledgeRepresentation
TheArchitectureof Cognition
I ' :
t.)
of the threerepresentations
Table2.1 Theproperties
Process
Temporal
string
Spatial
image
Abstract
proposition
temporal
sequence
Preserves
configural
information
Preserves
semantic
relations
All or none of
phrase units
All or none of
image units
All or none of
propositions
All or none of
image units
All or none of
propositions
End-anchored
at the
beginning
Function of
distance and
configurations
Function of
set overlap
Ordering of
any two
elements,
next element
Distance,
direction,
and overlap
Degree of
connectivity
Combination
of objects into
linear strings,
insertion
Synthesis of
existing
images,
rotation
Insertion of
obiects into
relational
slots, filling
in of missing
slots
Sronecs PRocEss
Ma,rcH PRocEss
(A) Dncnrr or
MATCH
(B) Sarrcur
PROPERTIES
Exrcwror.l:
CONSTRUCTION OF
NEW gTRUCTURES
48
KnowledgeRepresentation
49
structureof an event sequenceis critically dependenton the ordinal structure but often is not critically dependenton the interval structure. Events like pausesmay be significant, but they
become another element, a pause, in the event sequence.s
Pausesmay also be important in determining the hierarchical
structure of an ambiguous stimulus (Bower and Springston,
7e70).
Further evidence for the belief that temporal strings encode
only ordinal information lies in the poor quality of human i.tdgment about absolute time. The current proposal for ordinal encoding correspondsto Ornstein's theory $969) that perception
of time is related to the number of intervening events (or units
in a string representation).This is not to say that we cannot perceive or remember interval properties of a time sequence,but
that such properties are not directly encoded in the structure of
the temporal string. Such information can optionally be encoded as attributes of the ordered elements ("the goal was
scoredat 2 :03 of the secondperiod").a
Long sequencesof events are not encodedas single-levellinear structuresbut rather as hierarchiesof strings within strings.
This is frequently referred to as phrase structuring, where a
phrase refers to the units in a level of the hierarchy. A phrase
contains five or fewer elements.The idea of such hierarchical
organization for temporal strings with a limited number of elements at any level has been proposed by many researchers
(Broadbent, 1975; Chaseand Ericsson,1981; fohnson, 1970; Lee
and Estes,l98l; Wickelgren,1979).Thesephrase stmctures are
often indicated by the location of pausesin serial recall.
Propositionalaersusstring representation.One of the standard
argumentsfor a propositional systemover a multicode system
(Pylyshyn, 1973;Anderson and Bower, 7973)has been that the
propositional code is sufficient to encode all kinds of information. Much of our researchon languageprocessing(Anderson,
Kline, and Lewis, 1977)has worked with propositional encodings of sentenceword order. Figure2.1 shows the propositional
network representation for the tall youngman. The readeris undoubtedly struck by the complexity of this representationrelative to the simplicity of the string to be represented.It is complex becauseone has to use conventionsthat are optimized for
representing the complexities of other knowledge structures,
and becauseone cannot capitalizeon the peculiar properties of
the simple knowledge to be represented. The problem is not
iust one of awkward notation; there is considerableinefficienry
becausethe processesmust alsooperateon the needlessdetail.
TheArchitectureof Cognition
50
51
(
R
LIST I
Type
Ty pe
BEGINS
-WITH
TALL
A fl r i b u l e
Type
YOUNG STRESS
SUBSTRUCTURE
Type
Figure2.2 Astringrepresentation
of theinformation
shownin Figure
2.1.
Thisnetworkmakesexplicitthe type-tokendistinction.
Also
represented
aresubstructure
linksgivingpartinformation
and
attibute linksgiuingailditional
properties
of the tokens.
LIST2
alsorepof stringinformation,
tepresentation
Figure2.1 Apropositional
that tall
and
stressed
rs
theinlormationthatyoung
resenting
1976'l
Anderson'
(Adapted
+
AH"'
from
,o, pro"rouncri"T
the
Figure 2.2, showing a possible network notation for
inforother
how
strini encoding the tall yoing myn, illustrates
mation besidesword ori"t riould be encoded'It shows thattall
!r/as
was pronounced without articulating the l,-and thatyounS
inthese
about
stressed.To represent particular information
distinction;
type-token
a
stantiations of lhe -otd, one needs
but
that is, it is not the word younSin general that is stressed,
about.the
explicit
be
to
needs
one
rather this token of it. Whin
notation'
typu-tor.en distinction, it is useful to use a network
tokens,
word
represent
nodes
tn Figure 2.2, individual
lltu
are
here
illustrated
Also
links.
by
words
the
rype
whicf, point to
su!a
attribuie links (to indicate-thai token 3 is stressed)and
in addistructurelink to indicate the contents of token 2. Thus,
represent
to
tion to the string structure, there are conventions
inforcategory informition (the tyPes for the tokens), attribut
attri'
category,
*"tion, and substruciure information. Similar
52
Stonacn
KnowledgeRepresentation
53
54
taking longer. When it comes to longer, hierarchically structured"strinls, the main factor is position within the subphrase
(Klahr, Chase,and Lovelace,in press).
The operation of extracting the next element distinguishes
tempori from spatial images. If no direction is specified, the
,r""i element in a multidimensional spatial image is not defined. Even with a direction specified, Kosslyn (1980) has
shown that the time taken to go between two elements is a
function of the physical distance between them and not the
number of intervening obiects.
Frequently one must iudge the order of two
oriler iuitgments.
This is accomplished in a prod-uction
string.
elementi f-* a
match predicatesfor order' For inspecial
of
use
system by the
would directly retrieve the
production
following
,i.rr.", tlre
is before D in the string
A
whether
question
the
to
answer
ABCDEF:
IF asked"Is LVX beforeLVY?"
and LVX is beforeLVY
THEN respondYes.
The first clausejust matchesthe question while the secondrequires direct accessto information about order in the memorized string.
A greatJeal of work has been done on linear orderings (Potts,
D7Z: 1975;Trabasso and Riley, 1975), suggesting that tl"h
order information is quite directly available and that one does
not have to chain through the intermediate terms between A
and D to determinetheir order (that is, order iudgmentsare not
implemented as a seriesof nexts).In these experimentssubiects
learn a linear ordering by studying a set of pair orderings-for
instance,A is befot" F, ti is beforec, c is before D, D is before
E, and E is before F. Even though subjects memorize only the
adiacent pairs, judgments are easier to make the farther apart
the elements aie. fnit result is generally taken as evidence
against a ProPositionalencoding for linear o.rdgrinformation' (.
E;en when linear order iudgments are made 6ver very long
strings (woocher, Glass, and-Holyoak, 1978)in which the ele*"nir must be chunked into subphrases,with many iudgments
crossing phrase boundaries, judgment tiT" is still a function
of ordiiril ditt"t ce between elements. Woocher, Glass, and
Holyoak failed to find that phraseboundaries had any effect on
order iudgments.
ACT'I doesnot have a psychologicalmechanism for extracting
Knowledge Representation
55
\.\(i)( - linear order-information. Although it would be perfectly reasonable to develop such a mechanism, our current purposes are
served if we can demonstrate that temporal strings show a distance effect unlike that for other representational types. One
might be able to borrow one of the current proposals about how
ordering judgments are made (see Holyoak and Patterson,
7981), and embed it within ACT.
It is interesting to consider the results in experiments like
those of Hayes (1965) or Hayes-Roth and Hayes-Roth (1975), in
which subjects must commit to memory partial orderings rather
than simple linear orderings. That is, they have to remember
things like A > B, A > C; but B and C are not ordered. In such
situations subjects cannot encode the information in a simple
linear string; possibly they encode it propositionally. In this
case, the opposite distance effects are obtained-the
farther I
apart a pair of items, the longer it takes the subject to judge. \ '":''
Presumably, they have to chain through the propositions about 7
the pairwise orderings that define the partial ordering.
56
Knowledge Representation
\r,y
57
:
The current proposal, that images Preserveorientation and i
ii';ti:
not size, is based on what seems important about a spatial i
image. People need to recognize a pattern when the size
changes(for example, letters in a different size $pe) and seemingly do so with ease.On the other hand, the identity of an object may changeif it is rotated(for example,a Z becomesan N; a
squarebecomesa diamond), and people find it difficult to make
matches that must correct for rotation. However, I also must
stressthat no major claim in this book restson the proposal that
imagesdo not encode absolutesize in their stnrcture. Far from
that, the tri-code proposal would be strengthenedif it could be
shown that images do encodeabsolute size, becausethis property is not encoded in temporal strings or abstract proposition-s.
A notation for expressingspatial images is somewhat awkward for scientific discourse, and much of the debate over
image representationhas really been causedby this awkwardness.Figure2.3 illustrates a number of ways to representspatial
images.One might use the actualstimulus (a), but this notation
is ambiguous about exactly what is stored in the image. Becauseofworking memory limitations, all the details of all three
objectsin (a) could not be encodedin a single image. If the visual detail is encodedit must be done by meansof subimages.
In (a) it is not clear whether the detail of the obiects is encoded
at all, and (b) and (c) illustrate that ambiguity. The letters in
eacharray are tokens for the parts, but in (b) thesetokens point
to subimagesthat encodethe visual detail, while in (c) there are
only categorical descriptors. In (b) each of these subimages
would have an encoding of its subparts, but this is not shown.
The full encoding in (b), including subimages, allows iudgments such as whether the triangle is equilateral. Part (d) gives
a more preciseencoding of (c), reducing the array to coordinate
information. Part (e) is a loose English rendition of the image
information. There is not a "cortrect"notation. One choosesa
notation to make salient the information being used by the processesoperating on the representation.If one is concemedwith
the processesthat operateon the position of subelements,then
(d), where that information is explicit, is preferable to (c),
where it is implicit. Whateverthe choiceof notation, it should
not obscure what the image really is: information about the
spatial configuration of a set of elements.
Images, like strings or propositions, can encode structure, i , , ,l(
category,and attribute information. Parts(b) and (c) of the fig- -5 ,, J...,
,rr" ill,trtrate the use of categoricaland substructure information, which can be used to embed images within imagesto ar-
(o)
TheArchitectureof Cognition
58
A CT UA LS T I M U L U S
( b } A R R A YW I T H S U B I M A G E S
AN
Subslruclure
i*i
( c ) A R R A Y W I T H C A T E G O R I C A L( d } C O O R D I N A T EE N C O D I N G
( l O x l O G R I D) c A T E G O R I C A L
D E S C RPI T O R S
D E S C RPI T O R S
( IMAGE
IDENTITYx
SUBPART o
SUBPART b
SUBPART c
( 7 . 5 ,2 . 5 )
(7.5,7.5)
( 2 . 5 ,5 . O )
l
( TRIANGLEo)
( SOUARE b)
(CIRCLE
c)
( e l L O O S EE N G L I S H D E S C R I P T I O N
T h e r e i s o l r i o n g l et o l h e l e f I o f o s q u o r e
ond bothore oboveocircleFigure 2.3
59
t'-
f'
60
TheArchitectureof Cogttition
RePresentation
IQrowledge
AO
tr
6l
Tcsl
Jrrt) i
S:rnr clcmcns,
lincarc<xrfiguration
f)ifferent elemcns,
sanrcrlrfiguration
Ditlcrcnl clcmcnts.
lincarcunfiguratiorr
( B) vtrbal G)ndrli()n
,4
II t.z;
q,)
cq)
Gcomctric
Verlxl
'E
C
tt't:
$
at
0)
&
Srme words,
linear configurarion
Differcnt words.
sameconfiguraion
Differcnt wortls,
linearctxrfigurarion
Same
configurati<ln
l.inc:tr
ltt
corttigttrtttit
be'
Figure 2.5 Reactiontimesfor santa'sexpeiment, showingintetaction
tween thetype of mateial anil the testconfigurlt_ion.(AnderandCo',@1980')
W. H. Freeman
permissionof
son,1980biby
SaugNt Pnoptnrrns
It has been frequently noted (for example,Kosslyn and Pomerantz, L977;Paivio, 1977)that spatial imageshavemany salient
properties. For instance,just as it was possible to directly test
iot it. order of two obiectsin a temporal strinS, so it is possible
to directly test an image for the distance between two objects,
direction between two obiects,and whether they overlap.
An example adapted from Simon (1978)illustratesthe salient
property of overlap judgment: "Imagine but do not draw a rectl"gle 2 inches wide and 1.inch high, with a vertical line cutting
it into two L-inch squares.Imagine a diagonal from the upper
left-hand corner to the lower right-hand comer of the 2 x linch rectangle.Imagine a seconddiagonal from the upper righthand comer to the lower left-hand corner of the right square.Do
the two diagonals cross?"One's intuition is that the answer to
this question is immediately available uPon formation of the
image.
Another salient property of an image concernsthe relative
positions of two objects. Maki (1981) and Maki, Maki, and
Marsh (1977)have looked at subiects' ability to make northsouth or east-westiudgments about the positions of cities on a
map. It is of interest to comparetheir resultswith the results on
or'
iudgment of linear orderings. As in the researchon linear
apart
farther
judgments
the
faster
were
the
found
they
derings
the ciiies were. However, there is an important difference; the
hierarchicalstructureof the picture affectsthe iudgment, so distanceeffectsdisappear for pairs of cities on oPPositesides of a
state boundary (Maki, 1981).Stevens and coupe (1978) have
shown that position iudgments can be severelydistorted by the
hierarchical characterof an image rePresentation.They documented the common misconception that Reno is east of San
Diego, which derives from the fact that Nevada is east of California and the location of the cities is representedwith respect
to the location of the states.On the other hand, judgments of
tion
Knouledge RePresenta
63
linear ordering seem unaffectedby attemPtsto impose a hierarchical structure (Woocher, Glass,and Holyoak, 1978)'
PerrrnN MarcHtNc: DncnrE oF MarcH
The Santastudy illustrated that speedof matching is affected
by the spatial .or,fig,tr"tion of a series of obiects. It is also affected Uy ttre interval properties of the figure. Speed o{ pattern
matching seems to be a continuous function of the degree-of
distortion between the pattern configuration and the data to be
matched. For instat.", Posner d Keele (1970)documented
"t
that time to recognize a random dot pattern is a function of the
distancethe dots-in the test pattem had been moved from their
positions in the prototype. This is unlike what is seen with
matching strings or ProPositions'
Anoth-eruni{ue featwe of image pattern matching is that a
of
data structure can be matched to Jpattern solely on the basis
elethe
if
even
a match in the configuration of the elements,
ments do not at all *it.h the pattern. For instance,without trying, one can see the structure in Figure 2.6 (adapted from
pJlmer, 1g7S)as a face. With temporal strings, in contrast, it is
unlikely that one string of elementswill evoke recognition of a
co^pleiely different one. of course, one-dimensional strings
without interval structure do not permit the same variety of
configurational properties as images and so lack much of the
uniqueness of an image configuration'
I*recr CoNsrnucrloN
action-of
New knowledge structures can be created by the
f":
pos-sibiliy
the
string
proJuctions. In ti" .ur" of a temporal
ordered
be
could
construction was pretty mundane-elements
ooo
64
Ifttowleilge RePresentation
C O L U M NI
55
COLUMN
(o)
(bl
(c)
(d)
66
TheArchitectureof Cognition
F i g u r e 2 . S i s a h i e r a r c h i c a l r e P r e s e n t a t i o n o f a S h e (subpardand
elbows
Metzler figure. ti is analyzed into two overlapping
two overlapping arrns
lig,r*r), elch of whichis analyzed into
(subsubfigures).Itisassumedthatforrotationthearmsneed
not be br6ken into individual cubes'
three substagesin matching these
just and c"rp."i"iidentify
(arms) of,the two figures
fig"r"* ite fitJir to find end parts
the end part of one figthat correspond.The second is to rotate
the other, beginning
of
part
ure into congruencewith the end
of one figure' The
rotation
the
is
the creation of * i*"g" that
image by moving
this
of
third substagecompletesconstruction
to the image and
figure
the
of
pieces
copies of the remaining
suggest that.this
testing for congruence'|ust.and Carpenter
the current model asthird stage ."r,-ir,.rolve rotation or nou
call these stagessearch'
sumesno rotation. Iust and carpenter
model maintains
transformation,and confirmation. The current
thesedistinctionsbutbreakseachstagedownintomoreinformation-Processingdetail'
specification
The appendix tJ tnis chapter provides a.detailed
problem of
the
to
of the production system and iis application
one.rotated
of
image
an
taking two figures ar,d trying to creite
2.9, a summary illustrainto congruencewith thsotlier. Figure
system, shows the two
prod,titiott
tion of the operation of that
figures,thefocusofattentiono,.eachfigure(theshadedareas),
simulation startsout
and the .or,t"r,i, of the mental image. The
(b) an image of the
(a) by focusing on the two uppet ittttt; in
it is
ii, .ruut"d, and by a seriesof rotations
upper arm of
1
"ti*,
with the upper_arrn of obiect (c).
brought ir,to .i'"i;;;
other arm in the upper
When an attempt"is made to attach thie
d (e), which leadsto
elbow of oUleci'i a mismatch is discovere
upper parts of the two
abandonmentof the attempt to make the
to createa correobiects corresPond.In (/) an attempt is-1a!e
2 and the upper
,por,a"rr.e between the lower elbow of object
e l b o w o f o b j e c t l . I n ( 8 ) a n i m a g e i s c r e a t e d . o f t h e l o warm
e r a rof
mof
the uPPer
obiect 2 and is rotated into coniruence.with
arm fragmentsare
oUj".t 1. Then in Figure (h)-(l) the various
achieved'
image is
attacheduntil
" "o^pl"te
of rotation has
This production system pruii.tr that the angle
the greater
of_rotatinn
angle
the
two efflcts. First, the greater
Figure
in
illustrated
as
-9(a)-.(e.)'
the likelihood of false starts,
to align the initial
second, a number of rotations are required
segments,(c)and(g).Theproductionsystemalsopredictsthat
terms of the number
the compfe*iryfitii" i*ug" has an effeciin
(b)-(t)' This example
of separatepilces that ne-edto be imaged
ra
q)
L
a'/
;r
-oo
(tr.
,s
;t
rt)
.5
:t
(
\t
(,I
q)
L
oo
(tr-s
sra
s
q.t
L
oo
.tr_
q)
N
q)
E
\s
si
\l
ql
trq)
(
s
o
s
o
(n
o
F.
o
a
q)
o
CD
N
0,
L
00
lr
.ii
:.i
I&owledgeRepresentatioi
o
3
q)
"s
ra
qt
q)
ffi
ffiffi
3l
qt
ffi"tS
sv ur3
F
ra
tu
L
OO
(s'.
www w
ffi
ffiffiffi
or
E
9{
4-
-?&
=ifiE =i{
I
a
-:
3it
3*{
10'
-M
AY\-YV/
VE
L
q)
H
cr
\3
sql
-A\-,
EAV)(\A)
gY
w
Ecl
$
{(/)
--q:' E
!e
:*t
=
c
.
R
_<ree
s.
ct
b
OO
' s6
(a
q)
3l
()
P
@&&ffi
F.
q)
*
s
oo
s
'tr
:
\ts
-l
1 ' L S
EI ww@M
nos
,6X1d
\PwEt
-1 - t s
,.laF
q
^5ry
ai:
-1r1
^!c
ait
^9r-
eet
ffiSg6 S
v)i
SP
o
^ 5-o,
eid
-< a
! o
^ l
oicr
-{-o,
etr
ES
I .*gc o'
ffi
ffiffiffi
>r.9
69
; $
3;{; F
-(ts
l>O
q
N
g
a
bo
ii
ErscoorNc
The encoding of propositional representations is more abstract than that of temporal strings or spatial images in that the
code is independent of the order of information. For instance,
the propositional repnesentation(hit |ohn Bill) doesnot encode
the difference between lohn hit Bill and BiII was hit by lohn.
What it doesencodeis that the two arguments,Iohn and Bill, are
in the abstractrelation of hitting. In encoding a sceneof ]ohn
hitting Bill, the propositional representationdoesnot code who
is on the left and who is on the right. Propositional encodings
are abstract also in that they identify certain elementsas critical
and ignore all others. Thus the encoding of the scenemay ignore all physical details, such as |ohn's or Bill's clothing.
79
KnowledgeRepresentation
TheArchitectureof Cognition
oneoftheprincipallinesofempirical.evidenceforproposi.
memory
tional representationscomesfrom the various sentence
predicted
better
is
perforrnance
u{'f
memory
studies showing that
original
the
of
structure
word
e
tt
Uy
than
Ly ,"-ur,tic vaiables
of experisentence.This researchinclldes the long tradition
for the gist is better than memory
'L97"1,;
ments showing that memory
Brinsford and Frank, 1971';
for exact wording (Begg,
showing that the
sachs, 7g67;Wuii"i, r56al and experiments
a sentenceare
in
L"ri pr"*pts for recalling a particular word
cfos-e.(Ar.temporally
words that are semanticity iatt er than
Similar
1968)'
Wanner'
no*"t, L973;Lesgold, 1972;
derson
mempicture
to
"r,a
respect
demonstrationshave been offered with
oqy(Bower,Karlin,andDueck,LSTS;MandlerandRitchey'
that are Pretiffi, that it is the underlying semantic relations
some exresearch,
this
against
dictive of memory. In tu".tiot
for senmemory
verbatim
good
p"ii-"rrts have dlmonstrated
for
visual
memory
good
or
igZS')
tences(Graesserand Mandler,
HowL978)'
(Kolers,
important
detail that is not semantically
multireprethe
for
troublesome
not
are
ever, theseobservations
they can be
sentationalposiiior, U.it g advancedhere, although
for the pure propositional position that has been
is that there t
"*u".rursing
advanced.Whaf is important ior current purposes
memory is
are highly reproducible circumstances in which
\
its physi- rl
for
good for the meaning of a stimulus event and not
is necessary.loPIoIal details. To u.co,rritfor these situations it
i
semantic rela- \
posea representationthat extractsthe significant
for situations that
iior,st ips from these stimuli. To account
temporal string
the
use
show good *"*ory for detail, one can
or
- image rePresentation.
is that
hirtir,.tive feature of abstract propositions
A;;h",
takes II
hff
Thus
there are strong ionitraints among the elements.
must have as one of its
two arguments,Siru tt r.e, and it-ecide
is nothing.like this
There
argumentsan emiedded propgsition.
what is in the
directly
encode
i;;;;s, which
with strings
one element
",
possible.
logically
is
world, and any comLination
elements
other
the
what
not-consirain
of a string or image does
relational
rePresent
ha1d,
'1
o-ther
g"
the
might U". rrop*?tior,",
to
learned
only
hal
mind
tlie
ar,d
of experience,
categorizations
iertain Patterns.s
see
--Ur"p;ritional
representationshave been particularly imporACT* emerged
tant to the devei"p*."t of ACT*. Historically,
and
from ACTE, which had only propositional representations,
analyses-were
therefore, many of the "mpiiicai and theoretical
that the AcT architecconcernedwittrthese. Although I believe
representations, the
ture appli", uq""iiy *"il to ail knowledge
I t\*
7l
presen
TheArchitectureof Cognition
72
ATTRIBUTE
^"r*/?
,7
etnwner/
/\
TALL
,,i.u,
/\
\ATEGoRY
LAWYER
/\
*r^,rurr1/
\or.*^,
/\
PLURAL
a \
i'
l.
t
.:l
I
l
tional value within a production system. Our experience has
been that it is not adaptive to store or retrieve partial proposi-;
tions becausepartial information cannot be easily used in fur- | ,:1.
ther processing, and its retrieval only ClUttgqyp working mem-'l
ory, or worse, misleads the information processing.It seems
unlikely that an adaptive systemwould waste capacity on useless partial information. In this case,where the empirical evidencl is ambiguous, our Seneralframework can guide a decision.
Arithmetic facts may be the easiestexamplesfor illustrating \
=
the impact of partial encoding. If we store the proposition (5
\
(plus d Zll as l= (plus 3 2)\, with the sum omitted in a partial i
encoding, the fact is of no use in a system.Whether partial encoding leads to disasterrather than iust wastedependson one's
:
system assumptions; if propositions encoded facts like (5
:
(6
lptus g 21)'), imagine what a disasterthe partial encoding
with
(pf"t 3 2)) would lead to! The crisp semanticsassociated
arithmetic proPositions makesclearthe consequencesof partial
encoding. i{o*"ner, similar problems occur in an inferential
system If lR"ugan defeated Carter) is encoded as (defeated
iarter) or if (Miry give Bill Spot tomorrow) is encodedas (Mary
-give Bill Spot).
The complaint against the partial encoding scheme is not
simply thaferrors are made becauseof the failure to encode informaiion. It is transparentthat humans do fail to encode information and do make elrors as a consequence.It can be argued
that failure to encode is actuallyadaptive in that it prevents the
memory system from being cluttered. The occasionalfailure of
memory may be worth the savings in storageand processing.
This is a diificult argument to evaluate, but it does make the
point that failure to encodemay not be maladaptive.However,
ihe result of partial encodingls maladaptive;the systemstill has
to store, retrieve, and processthe information, so there is no
savings in processingor storage.The partial memory that is retrieved is no better than nothing and perhaps is worse. An
adaptive system would either iettison such partial encodings if
they occurred or at least refuse to retrieve them.
PlrrrnN
74
TheArchitectureof Cognition
Knowledge Representation
Perrnnn MnrcHtNc:
75
DecREE oF Mercrr
I' c lv
76
A has thrown a ball (A raised his hand over his head, A's hand
held round object, A's hand thrust forward, A's hand released
the ball, the balt moved forward, etc.) or the exactwords of the
sentencethat was parsedinto this meaning, the code represents
the significant relationship directly. The propositional rePresentation does yield economyof storagein long-term memory,
but other advantagesare probably more significant. For instance, the representation will occupy less sPacein working
memory and will not burden the pattern matcherwith needless
detail. Thus, it often will be easier to manipulate (think about)
these abstractedstructures. Another advantageis that the inferential rules need be stored only for the abstractrelation and
not separatelyfor all the types of input that can give rise to that
relation.
Storage and Retrieval
In ACT* theory storageand retrieval are identical for the three
representationaltypes. A unit (phrase, image, or PrcPosition)
is treated as an unanalyzed package.Its internal contents are
not inspected by the declarative Processes,so these Processes
have no basis for responding to a phrase differently than to a
proposition. It is only when units are oPeratedupon by productions that their contentsare exposedand processingdifferences
occur. This is a fundamental difference between declarative
memory, which treatsall types the same,and production memory, which treatsthem differently. If ACT+is correctin this hypothesis, traditional memory research(for example, Anderson
and Paulson, 1978;Paivio, t97l), which attempted to find evidence for different types, is doomed to failure because it is
looking at declarativememory. Traditional researchhas been
used to argue for different types by showing better memory for
one type of material, but one can argue that the research is
better explained in terms of differential elaboration (Anderson,
1976,1980;Anderson and Bower, 1973;Anderson and Reder,
77
KnowledgeRepresentation
. t 'I'
, Jr ; ' ",ur--
r97e).
,. The tefr! cognitiaeanif (Anderson, 1980)is used for structures
\ ihat have all-or-none storage and retrieval properties. By this
,' ,,.ldefinition, all three types are cognitive units. Becauseof limits
' '' bn how much can be encoded into a single unit, large knowledge structures must be encoded hierarchically, with smaller
cognitive units embedded in larger ones. It has been suggested
(Broadbent, 1975\that the limits on unit size are related to the
limits on the ability to accessrelated information in working
memory. For a unit to be encodedinto long-term memory, all of
a 2 3 4 5 6
Figure 2.ll
7 89
1 9 2 0 2 21 2 2 3 2 4 2 5 2 6 2 7
l o l l 1 2 1 3l 4 l 5 1 6 1 7 1 8
78
note that if cued with 10, the subject would be able to retfieve 1 *
the fragment F and hence the elements11 and L2 but nothing R
elseof thu hier"rchy.sSuch hierarchicalretrieval would produce i .,
the phrase patterns documented for linear strings (Johnsorr, i-* ".t t
]97Oi, propositional structures (Anderson and Bower, 1973)' I , j
and story ltructures (Owens, Bower, and Black, t979; Rumel- | I
t
hart, 1gi8). To my knowledge no one else has explored this I t'"i
issue with respectto picture memory, but it would be surpris- | ),i.
were not also found i ' -::
qlLrrrLqr
structures
ulr s=lers
rrecall
sLsu
ing if such hierarchiial
-J
therg.
If one could segmenta stnrcture to be recalledinto its hierarchical units, one should see all-or-none recall for the separate
units. The empirical phenomenonis never that clear-cut, partly
bbcausesubjectsdo not adopt entirely consistent hierarchical
encoding ,.it"^"r. The hierirchical structure of their scheme
might a'iffer slightly.from the one assumed by the expe-rimenter. A moreimportant reasonfor deviation from hierarchi
produce
elaboracal all-or-none recall is that the subiect may
tions that deviate from the specified hierarchy. For example,
consider a subiect'smemory for the multiproposition sentence
The rich itoctoigreetedthe sickbanker. A typical propositional
analysis (Kintsch, 1974;Goetz, Anderson, and Schallert, 1'981)
*orrid assignrich and iloctorto one unit, and sick andbankerto
another. However, a subiect, elaboratingmentally on what the
sentencemeans, might introduce direct memory connections
between rich and banker or between doctor and sick. Then, in
recall, he may recallsick and doctorbut not rich andbanker,violating the expectedall-or-none pattern. The complications Pr9ducel by suih elaborationsare discussedin Anderson (1976).
Mrxsp HmnancruEs AND Tencrno HrrnancHrEs
To this point the discussionhas largely assumedthat hierarchies cons]istof units of the same rePresentationaltyPe. However, representationscan be mixed, and there is considerable
advantage to this flexibility. If one wanted to represent lohn
chanted;Oor, two, three," it is more economicalto representthe
object of john's chanting as a string. That is, the string would
element of a proposition. Again, to represent the
"t
"t
"pp".t
at a ball game, one might want a linear orevents
t"q.t"t." of
of propositions describing the significant
a
sequence
dering of
would be mixed to representa spaimages
and
events. Strings
or a sequenceof distinct images'
syllables
nonsense
of
array
tial
One would use a mixture of imagesand propositions to encode
comments about pictures or the position of various semanti-
79
Knowledge RePresentation
TheArchitectureof Cognition
",,
i
'LJ
I
I
:
':
Sll \
Dorn /
( wotrin
't'
Loc
- > Rcslouronl
R.lolion
Curlomrt
Hungry
Hor Moncy
types.
Figure 2.12 A tangleithierarchyof multiplerepresentational
80
Appendix:ProductionSetforMentalRotation
Table 2.2 provides a production set that determines if two
produc-tionset
shepard ut d M"trler figures are,con8ruent._This
task, howMetzler
and
Shepard
the
is much more general tl"t
presimultaneously
pair
of
any
whether
ever; it will dlcide
2'13
Figure
rotation.
after
is
congruent
figures
sentedconnected -of
sysproduction
by
the
produced
control
flow
illustrates the
asset
production
This
task.
the
of
subgoals
the
tem among
memory
sumes that-the subieit uses the stimuli as an extemal
and is internally building an image in working memory. Figure
2.9, earlier, indicated where attlntion is focused in extemal
memory and what is currently being held in internal working
*u*ory at various points during the correspondence.
rHEN*fi',il:5*
p3
[;:fi
rr *T":;:ll"#,:";5T
2
of a partof obiect
* image
82
83
P16
Pt7
P19
P20
84
T'heArchitecture of Cogttition
.F
iD(a
q)
g
g
;3
(a
t
s
(a
X
"lJ
sq!
(/)
qt
o
oo
=t
E!
t:
(-
(r:
qr
(J
I
E
c
:(r!
t
.:p
ql
UI
,vq
q)
's
U
".i
(\
o
o
C'
I
q)
.5
qt
F
F
q)
(l,
C'
Es
r(
8e
3
eo P5
e
(e) 5
o
gE
Ea
!B
(a
(a
()
s
\l
P
F.
q)
s
s
ES
;tr
(a
!t
OO
O.F
o
oo
c,
OO
s
o
\r
P
O
CD
C'-
EE
PirCl
-e
(J
o
q
E:
g5
3
_e t )
sql?
*d
Fbo
(O
F{
N
q,
L
bo
ll
Spreadof Actiuation
,l
\,
(1)
j
. 1,
',
\.-
-!t
t,
i
,l
'
''-'\,
.."[J
Introduction
\
i
87
89
Spreadof Actiaation
88
Mechanisms of Activation
Activation flows from a sourceand setsup levelsof activation
throughout the associative network. In ACf, various nodes
can becomesources.In this sectionI will specify how nodes become and stay sources, how activation spreadsthrough the network, and how levels of activation in the network and rate of
production firing are related. Becausethis section uses quite a
lot of mathematicalnotation, Table 3.1 provides a glossary.
Souncus or ActrvATroN
of notation
Table3.1 Glossary
6f
Lt
n{t)
N(f)
p*
p
tenance
Total activation input to node i at time f
Vector giving input to all nodes at time f
Rate of decay of activation
Amount of activation maintained in spread, defined as p :
a
A1
a{t)
A
A(f)
A{t)
B
c*
ci
ci'(r)
c*
c*(r)
c
C1
Spread of Actiaation
TheArchitectureof Cognition
90
CocNluvp
CoNxrcuoNs
TEMPORARY
UNITS
ENCODING
PROBE
LONG.TERMMEMORY
,,/PRIEST
B/p*
fg
R
sr(T)
u N t T S-
UNIT I
^/::::,
LAWYER
T9-
/
FRIDAY
UNIT2
The elements entered into working memory (letter features,
words, and so on) all start out as sources of activation. However, unless focused, these elements stay active only Af period
of time. Thus a great deal of information can be active at any
one time, but only a relatively small amount will be maintained
active. This is like the conception of short-term memory
sketched out in Shiffrin (7975). Sources of activation are not
lost, however, if they are encodings of elements currently being
perceived. Even if the node encoding the perceptual obiect did
inactive, it would be immediately reactivated by reenb".ott
"
coding. An element that is made a source by direct PercePtion
can become inactive only after it is no longer directly Perceived.
Also, the current goal element serves as a Pennanent source of
activation.
"TSA|LoR
U N I TI O
t(
-- RoB
I
Figure 3.1 A hypotheticalnetworkstructure.In temporaryactiaememory is encodedThe doctor hates the lawyer who is in the
bank. In long-termffiemoryare illustraterlthe /ncfs The doctor hatesthe priest, The lawyer hatesthe iudge, The lawyer
was in the court on Friday, The sailor is in the bank, The
bank was robbed on Friday, and The sailor was robbed on
Friday.
gnition
nodes' with no distinction
nodes.2
Spreadof Activation
Spnnnp or AcrrvATroN
The network over which activation spreads
can be represented as an n x n matrix R with entries
iu. tt
value ru ='0 if
there is no connection between i andl. otirerwise,
"
its varue reflects the relative strength of connection from
i to j.A constraint
in the ACT theory is that lsrg - l. This guarantees
that the ac_
tivation from node i is divided among ti"
nodes according to their strengths of connection. The"tiu.hed
level of activation
of a node will increase or decrease as the activation
coming into
it from various sources increases or decreases.
The activation of a node varies continuously
with increases or
decreases in the input. A bound is placed
or, ihu totar activation
of a node by u dgcay process. This con."pr-ion
.an be captured
by the model of a ieiky capacitor (sejnowski,
19g1), and the
change in activation of noae i is desciib"a
u/ the differential
equation:
ry:
- p*a{t)
Bn{t)
(3.1)
(#-"f.
- eo")
-Df)
(3'3)
n t $ ) : c i ( r )+ l r x a l t
,I
J,f,E.-/tnf.d*k1
where ci$) -- 0 if i is not a sourcenode at time f, and ci() = ty
if it is. The term in the sum indicates that the activation from a
nodei is divided among the attachednodesaccordingto the relative strengthsrx. The value 6f appearsin Eq. (3.2)to reflect the
delay in transmission between node i and node i. It a neural
model it might reflect the time for information to travel down
an axon.
Equations (3.1) to (3.3)may be rePresentedmore generally if
we consider an z-elementvectorA(t) to representthe activation
of the n nodes in the network, another n-elementvectorfiI(f)to
representthe inputs to them, and a third n-elementvector C.(f)
to reflect the activation from the sourcenodes. Then Eqs. (3.1)
and (3.3)become:
dA(t) :
BN(t) - P.A(t)
dt
(3.4)
and
N(t;:C*(t)+RA(t-Ef)
(3.2)
(3.s)
where R is the matrix of connectionstrengths.The generalsolution for Eq. (3.a) is given as:
A(t; :
e-p'tA(g\*
f -re-p'.-')BN(x)dx
(3.6)
94
It
W:6tad/
l: :)& r^f,P[
A+ ' pRA
t-1/A--
l*ft
Q:)
A(t\:
o,ttl
(3.8)
Let q(Q denote the sum of activation over all the nodes suPported by source i T seconds after f became a source node. sr(7)
,"pt"tut is the activation of f due to its source contribution, plus
thl activation of its associated nodes due to the source contribution, plus the activation of their associated nodes, and so on'
That is, i,(D is the sum of the entries in ,q1(t).To calculate sr(7)
we need to represent T in terms of the number, n, of El time
units that have passed: T = ntt. The activation will have had a
chance to spread n units deep in the network by the time T.
Then,
=,,,Al
ry(,,6r)
[,
B%
(3.e)
''
, {u*l1tt
Spreadof Actiaation
wherep(i,n):ffil6f.Inthelimitthishasthevalue:
sr(r)il
q/$ - p)
{,h I lr"ntt2ol
''
X*t
be p* : 1, B :
'$
Reasonablevalues for the parametersmight
(and hencep : .8),6f : 1'msec.with such values,the quantity
in Eq. (3.9j achieves73 percentof its asymptoticvalue in L0
*r"., 91 percentin 20 msec,and 99 percentin 40 msec.Thus, to
a good approximation one can reason about the impact of the
aslmptotic activation values without worrying about h9*
asymptotewas achieved.Thus Eq. (3.7) is the important one for
determining the activation pattem. The appendix to this chapter contait Jso*e usesof Eq. (3.7)for deciding about the impact
of various network structures on distribution of activation' It
shows that level of activation tends to decay exPonentiallywith
distancefrom source.
The rapidity of spreadof activation is in keeping with a g99d
number tf ot-t eiperimental results that will be reviewed in
"r with the adaptive function of spreadingactivathis chapterand
tion. AJmentioned earlier, activation is supposedto be a relevancy heuristic for determining the importance of various
pieces of information. It guides subsequent-processingand
makesit more efficient. It would not make much adaptive sense
to compute a spreading activation Processif that computation
took a iot g time to identify the relevant structures of memory'
Sunaunlc Up
This section has shown in detail how a spreading activation
mechanism,consistent with our knowledge of neural functioning, producesa variation in activationlevels.Equation(3.7) der.iiU"r asymptotic pattems of activation, which, it was shown,
are approacneaso iapidly that one can safely ignore spreading
time in reaction timeialculations. Thus a coherentrationale has
been establishedfor using Eq. (3.7). In some enterprises(see
chapter5), facedwith complexnetworks, one might want to resort to solving the simultaneousequations in (3.7).However,
for the purposesof this chapter, its qualitative implications are
- :r
sufficientr
deter\
The level of activation of a piece of network structure
mineshowrapidlyitisprocessedbythepatternmatcher.
Chapter4 will [o into detail about how the pattem matcher Performs its tests and how it is controlled by level of activation. For
current PurPosesthe following approximation is useful: if the
.il,
96
LI
lr)
Priming Studies
Priming involves presenting subjectswith some information
and then testing the effect of the presentationon accessto associated information. In the lexical decision task @ecker, 1980;
Fischler, 1977;McKoon and Ratclif,f,1979; Meyer and Schvaneveldt, 1971;Neely, 1977),the one most frequently used, it is
found that less.timeis required to decide that an item is a word
if it is precededby an associatedword. For instance butter is
more rapidly iudged to be a word when preceded by breail.
These paradigms strongly demonstratethe importance of associative spread of activation. They are particularly convincing
becausethe priming information is not directly part of the measured task, and subiectsare often unaware of the associativerelations between the prime and the target. The fact that priming
is obtained without awarenessis a direct reflection of the automatic and ubiquitous characterof associativespread of activation. Beneficialeffectsof associativelyrelated primes have also
been observedin word naming (Warren, 1977),free association
(Perlmutter and Anderson, unpublished), item recognition
(Ratcliff and McKoon, 1.981),sentencecompletion and sensibility iudgments (Reder, in press), and word completion (Schus"*n1iitl.r,r.",ron
of the associatedword is the dominant reJf ,
is
sometimesfound for words unrelated to the lI.
sult, inhibition
and Bloom,1979;Becker,1980).ForJ
prime (Neely, 1977;Fischler
instance, judgment of butter may be slower when precededby
the unrelated,gloae than when precededby xxxxx if the subiect
consciously expectsthe following word to be related-a word
such ashand,rather than butter-after gloae.If the subiect is not
aware of the predictive relation between the associative prime
and the target, or if the prime-to-target interval is too short to
permit an expectation to develop, only positive facilitation is
observed. Neely has related this to Posner and Snydey's(1975)
distinction between automatic and consciousactivation.
97
Spreadof Actiaation
t'$rr
|^;?
R,J,.ui
Spread of Actiaation
98
r{o
ANTICIPATIONS
AN T IC IP A TE
WORD
IDENTIFY
SIMILAR
WORD
P4
SAY
YES
NO, BUT
P6
SIMILAR
(OPTIONAL)
SAY
NO
SAY
Y ES
of theflow of controlamongtheproductions
Figure3.2 A representation
in Table3.2.
such production Per word. Not only will they label actual
words, but they will label near-words like FEATHFR with its
closestmatching word. The mechanismsof such word identification and partial matching will be described in the next chapter. It is reasonableto suPPosethat such productions exist,
given that throughout our lives we have to recogniz9 briefly
words. The difficulty of
fresented and iriperfectly produced
is
proofreading text ior speliing errors also evidence that such
productions exist.
partial-matching
^
Sir,." product-ionslike P1 will label partial matches,the subject cannot respondsimply upon the firing of P1; the spelling of
ihe similar word must be checkedagainst the stimulus' It has
been suggestedthat the resultstypically observed in lexical decision tists depend on the fact that the distractorsare similar to
words oames, 1975;Neely,1977). Certainly the subiect need,s
some basis for reiecting distractors, and that basis may well
have an impact upon the processfor recognizing targets'.According to the model presentedhere, the subiectreiectsa stimulus as a nonword if ifmismatches a hypothesized word and accepts the stimulus if a mismatch cannot be found'
irroductions P2 and P3 model performancein those situations
where the subject is not consciously anticipating a particular
99
Nxrwonx
Figure 3.3 provides a schematic illustration of how the pattern matching for productions P2-P5 is implemented. The top
half of the figure illustrates the pattem network, and the bottom
half a hypothetical state of the declarative network that drives
the pattern matcher. Critical to the operation of this system is
the subnode A, which detects conflicts between word spelling
and the information in the stimulus. When the goal is to judge a
similar word, P2 and P3 will be applicable. Note that P2 receives positive input from both A, which detects a spelling mismatch, and from the clause noting the most similar word. P2
thus performs a test to see if there is a misspelling of the similar
word. P3 receives a positive input from the similar word node
but a negatiae input from the A node. P3 thus checks that there
TheArchitectureof Cognitian
101
Spreadof Actiuation
,
la,
.1, ,
PRODUCTIOtI
[,MORY
worcl ,s
similor
word hos
worclis
onticipoled
' 'l'-,
* GoAL-
ANTICIPATO
STIMULUS
woRK|t{C
MMORY
LONG TERT
IIEMORY
Figure 3.3 The pattern network representingthe conditionsfor productionsP2, P3,P4, and PSin the lexical decisiontask. AIsorepresentedare the temporary structurescreatedto encodethe
probe, the sourcesof actiaation (from which rays emanate),
to long-termmemory. Forsimplicity, the
and the connections
goalelementsin P2*PS are not represented.
Match
anticipation
Not match
anticipation
No
anticipation
S*, A*
E., A*
s+
Es+
E-
E*, 6+
A+
E-, A*
EEEE-
I
Control
Control
Spreadof Actiuation
102
\J
)\
J
\;
rl.:
lr
t--
words.
Neely (Lg77) came close to including all the possibilities of
Table 3.3 in one experiment. He presented subjects _with a
target stimulus. He was
prime followed at varying delays by
"
conscious primingby
from
to separate automatii priming
"bl.
a building when !h"y
of
name
the
telling subiects to anticipate
would be conscious
there
way
In
this
, su* f,ody part as a prime.
i anticipJtiot of a building but automatic priming of a body part.
Neely suiprised subjects by followingbody part
; ,' Or, sorne trials
with a body-part word such asleg. _w1"" tlgre was a 700 msec
.i .
j
dehy between prime and target,- t-he benefit of priming for a
i
lr ;
u"au part was less than the cost of the failed anticipation, yield'
less
.ir,g a^net inhibition. However, on such trials there was
a
surprise
for
than
word
body-part
surprise
a
for
intibition
bird word. This is the result predicted if one assumes that the
benefits of automatic priming combine with the cost of conscious inhibition. It cortespot ds to the contrast between the
not-match-anticipation, match-prime situation and the notmatch-anticipation, not-match-prime control in Table 3.3.
d Bloom (1979')and Neely (7977)have reported that
Fischler
"tpriming has a beneficial effect on nonword iudgconscious
ments, contradicting the predictions in Table 3.3. There aPPear
to be small decreases in latency for nonwords similar to the
primed word. However, there is also a slight increase- in the
false-alarm rate. According to Table 3.2, conscious priming has
no beneficial effects for nonwords because execution of P5,
which reiects the word as failing expectations, must be followed
l
103
wITH Srruulus
Quermv
104
Spreadof Actiaation
-b,
4-gt,
..,1
rri
.'t
li
lt
i,l
105
of a word.It
rhe $trogpteqk,inYglvpp"..naming"-the.sglor
takes longer to name the color if the word is precededby an associativetyrelatedword. In this task the priming effect is negative. To explain this, it is necessaryto assume,as is basicto all
analysesof the Stroop task (for a review see Dyer, t973), that
there are comPeting tendencies to Processthe color of the word
and to name the word. These competing tendenciescan be represented in ACT by the following pair of productions:
P7 " :X'oT;[il:r',"1'ffi
:n:;flt'"Jiifi",
is the articulatory code for LVconcept
and LVcode
THEN generateLVcode.
t
[t L
106
PB
Consider what happens when dogis presentedin red. The following information will be active in working memory:
1.
2.
3.
4.
5.
Spreadof Actiuation
TheArchitectureof Cognition
t.
l
.t
' : j
I '
PnooucrroN
IIaPTEMENTATIoN
-l
108
$t
,so
so qt
(l.S
tSJB
Se
s.g
sr:
=c
-o
'=
tr,
o
z
ct
5
ah
a
E
I
:B
c)
U'
/
I
ol
r/l\l
\j..
)J(,1.
[E
JJ
Figure 3.4 illustrates schematicallythe structure of production memory for P1-P4 and their connections to declarative
memory. Each production is representedas consisting of two
clauses,swhich are representedas seParateelementsat the bottom of the figure. The elements,called terminal nodes, perform
e
o
,/
8.E
()
c=-
gets in which, for example, the subject and verb were from the
target sentence; and (4) hpo-element foils in which the subiect
and verb ciune from different sentences.Responseto these four
types of probes is handled by productions P1.-P4,respectively,
in Table3.4.
Production Pl recognizes that the current probe matches a
proposition previously stored in memory. The elements in
quotes refer to the probe, and the elementsin parenthesesrefer
to the proposition in memory. Pl determines whether the
probe and proposition match by checking whether the variables (LVlocation, LVperson, and LVaction) bind to the same
elements in the probe and the proposition. P2 will fire if the
variables do not match in a three-elementprobe. Productions
P3 and P4 are like Pl and P2 except that they respond to twoelement probes.
CT
Et-
.D
pr
I
EsE
\')',
-8
EHE
:l l l
\H
lrj
o=
.E
o.
os
Eb
$"r
P-Y
*( 9
)r
.F
*
E C )
ss
T:
ES
SE
IT
\;:r l
Rto
F-5
.F.*g
F EE
E$$
-!rJ
.S
:-tr4-
.S.8 g"
EI.E
o
c
o,
3g
sH
o-
ld
6
z
o:
=e
*3
*R
9>
>(r
*o
x=
< tr,
fi=
h
$is
{* s
Rsi
sE-*
E
F-x
EHT
E-iE
s I
rsst
:
g
a
bo
ii
110
Spreadof Actiaation
TheArchitectureof Cognition
111
go to any propositional trace or to the probe encoding it. Pattern matching for targets is a function of the activation of the
propositional trace and the probe.
2. It should take longer to recognize larger probes because
more tests must be performed. In this experiment, three-element probes took longer than two-element probes. For ample
additional evidencein support of this prediction,seeAnderson
(1976,chap. 8).
3. Foils that are more similar to studied sentencesshould be
harder to reiect. In this experiment "overlap" foils were used
that had two of three elementsin common with a studied sentence. Subjects found these harder to reject than nonoverlap
foils. Again, for many confirmations of this prediction, see Anderson (1976,chap. 8). ACT* predictsthat a partial match of the
positive production pattem (for example, Pl) will inhibit
growth of evidence at the negative production pattem (for example, P2). More generally,ACT* predicts difficulty in rejecting
partial matches.
RrpcrroN oF Fons
An interesting question is, how does a persondecide that he
does not know something?In these experimentsthis question
is addressedby ACT's model for foil rejection.eThe most obvious model, which is obviously incorrect, is that subiects exhaustively search their memories about a concept. However,
foils are rejectedmuch too quickly for this to be true; typically,
the times to reject foils are only slightly longer than the times to
accepttargets. Anderson (1976)and King and Anderson (1976)
proposed what was called the waiting model,in which subjects
waited some amount of time for the probe to be recognized. If it
was not recognized in that time, they would reiect it. The assumption was that subiectswould adiust their waiting time to
reflect factors, like fan, that determine the time taken to recognize targets.
lmplementation
of the waiting model.The current ACT theory
provides a more mechanistic instantiation of the waiting
model. As indicated in Figure3.4, a foil is rejectedby a production whose condition pattern detectsthe absenceof information
in memory. If a production is looking for the presenceof subpattern 51 and the absenceof pattern 52, two pattern nodes are
created.One correspondsto the positive coniunction 5L&52,
and the other to 5L&-52. In Figure 3.4 the S1&S2conjunctions
correspond to productions P1 and P3, and the 51& -52 coniunctions toP2 and P4.An inhibitory relation is establishedbe-
ll2
T'heArchitectureof Cognition
'
tween the positive S1&S2and the negative51& -52. Both positive and negative patterns receive activation from 51, but only
the positivJpattern receivesactivation from S2.t0In the figure,
the common subpattern51 refers to the encoding of the Probe,
and 52 refers to the memory proposition. The 51& - 52 pattern
builds up activation either until total activation reaches a
threshold or until it is repressedby accruing evidence for the
positive S1&S2pattern.
A long-standing question in the ACT theory is how subiects
adjust their waiting time to reflect the fan of elementsin a foil.
Such an adiustment makes sense,becauseif they did not wait
long enough for high-fan targets, they would be in danger of
spuiiouslyrejecting them. However, for a long time there was
nb plausible mechanism to account for adiusting the waiting
titt i. The obvious idea of counting links out of a node and setting waiting time according to the counted fan is implausible.
Buithe current pattern-matchingsystemprovides a mechanism
for adiusting waiting time with no added assumptions.Note in
Figure 3.4 that the fan of elementswill affectnot only the activation of the memory elementsbut also that of the probe encoding. This activation will determine the amount of activation
thit arrives at the S1&-S2 coniunctionsin P2 and P4, and thus
fan will causeactivation to build more slowly to threshold for
the foil-detecting productions. It will also have the desired effect of giving high-fan targets more time to complete pattern
matching.
One should not conclude from this discussion that the only
way of responding no is by this waiting process.As in the
Glucksberg and McCloskey (1981)experiment, subiects can retrieve information that allows them to explicitly decide they
don't know. They may also retrieve information that implies
the probe is false(for example, RonalilReaganis a famousliberal
senatorfrom Alabama).However, absencedetection by waiting
is an important basic mechanism for concluding that one does
not know anything about a particular fact.
Testsof the waitingmodel.It is predicted that a foil is harder to
reiect the more featuresit shareswith a target (leading to activation of 52 and S1&S2in the above analysis).As reported earlier,
Anderson (1976)found that overlap foils including a number of
words from the studied sentencetook longer to reiect than nonoverlap foils. The experiment by King and Anderson illustrates
another kind of similarity that may slow down foil reiection.
We had subiectsstudy sentencessuch as
Spreadof Actiuation
113
tl4
TheArchitectureof Cognition
Spreadof Actiaation
1'/
tions between subject and predilate. Qr L 4o1l- \-\J3< |
,o,tV
u E t^"dt
u - ; r / . .1- t ,r0(!1
rllr1ltf;r-
"f^*-i'^
r{8r(
115
recognition. fio
rt'w
:i:"s?iltrT:5fr'"'il111,i;'1fi
:Ti:'*?':?::iil'T;f
the other recall condition was (2/f)A. The 2/f ref.ers to the
summed activation from subject and verb; the 1 in the recognition equation refers to the extra activation from the one-fan object. Reaction time is predicted by an inverse function of activation, and a single intercept was estimated for the recall and
recognition conditions. As can be seen, this model does a good
job of accounting for the differences in the size of fan effects. In
116
Spreadof Actiaation
T'heArchitecture of Cognition
Recalls
Obs: 1.35sec
Pred: 1.33sec
Obs: 1.58sec
Pred: 1.55sec
Obs: 1..70sec
Pred: 1.58sec
1. Obs = observed;Pred : predicted; RT : reaction time. Correlation: r = .991; standard error of observed times : .07 sec.
2. Predictingequation:RT = .88 + l.Y/(z/f + 1).
3. Predictingequation:RT: .88 + 1.Y/(2/f').
Anderson (19S1) a similar but more elaborate model has been
applied to predicting the differences in interference (fan) effects
obtained in paired-associate recognition versus recall.
JuocunNTs oF Cor.lNrcrEDNEss
As discussed in Chapter 2, subiects can iudge whether elements are connected in memory more rapidly than they can
iudge how they are connected (Glucksberg and McCloskey,
1981). Detecting connectivity is a more primitive patternmatching operation than identifying the type of connection. I
speculated in Chapter 2 that detection of connectivity might be
a property unique to propositional structures.
Another unpublished study confirms again the salience of
connectivity information within propositional structures and
also checks for an interaction with fan. After studying true and
false sentences of the form lt is true that the doctor hated the lawyer and lt is f alse that the sailor stabbed the baker, subiects were
presented with simple subject-verb-obiect sentences (The sailor
stabbeil the baker). They were asked to make one of three i.tdgments about the sentence-whether it was true, whether it was
false, and whether it had been studied (as either a true or false
sentence). The last judgment could be made solely on the basis
of connectivity, but the other two required a more complex pattern match that would retrieve the studied tmth value. Crossed
with the type of question was the type of sentence-true, false,
which the question
or a re-pairing of studied elements-about
was asked. The terms could be either one- or two-fan. Thus the
design of the experiment was 3 (types of question) x 3 (types
of sentence) x 2(fan).
tt7
Table 3.5 presents the resultsof the experiment classifiedaccording to the described factors. Table 3.7 presentsthe average
times and fan effectscollapsedover type of question or type of
material. Note that subiectswere fastestto make studied iudgments in which they only had to iudge the connectivity of the
elements.They were slower on true and false judgments. Significantly, the fan effect is497 msecfor true and falseiudgments
but only 298 msec for studied iudgments. Thus fan has less effect on connectivity iudgments than on judgments of exactrelationship. This is further evidencefor the expectedinteraction
between complexity of the pattern test and level of activation.
Subjectsare also faster to iudge re-paired material than other
material, but they do not show smallerfan effects.Thus, the effect of question type on fan is not just a matter of longer times
showing larger effects.
An interesting series of experiments (Glass and Holyoak,
1979;Meyer, 1970;Rips, 7975)hasbeen done on the relative difficulty of universal statements(4ll colliesare dogs)versus particular statements (Somecolliesare dogs, Somepets are cats).
When subjectshave to judge only particulars or universals,they
judge particulars more quickly than universals. On the other
hand, when particulars and universals are mixed, subjectsare
faster to iudge universals. This can be explained by assuming
that subiectsadopt a connectivity strategyin the particular-only
blocks. That is, it is possible to discriminate most true particucorrect
Table3.5 Meanreactiontimes(sec)andpercentages
(in parenthesesl
in the truth experiment
Question
No-r.lu ettEsrroN
True
False
Re-paired
Fnu eunsrrox
True
False
Re-paired
True?
False?
Studied?
1.859
(.e13)
2.612
(.832)
1,.703
(.e82)
2.886
(.841)
2.74't
(.78s)
2.174
(.e31)
1.658
(.e03)
1..804
(.e03)
1,.&0
(.e70)
2.457
3.262
(.822)
3.429
(.774)
2.786
(.881)
1.896
(.862)
2.171
(.82e)
r.728
(.u2)
2.863
(.801)
2.165
(.e62)
(.e2e)
Spread of Actiaatiott
TheArchitectureof Cognition
118
Type of material
Avrnlcs RT
2.276
True?
2.880
False?
1.783
Studied?
True
False
Re-paired
2.335
2.603
1.999
True
False
Re-paired
.402
.435
.4il
tt9
the goal element. The size of the memory sPan can be seen to i
partly reflectthe number of elementsthatian be maintained ac- |
tive by the goal element. Rehearsalstrategiescan be viewed as I
an additional mechanism for pumping activation into the net- |
I
work. By rehearsing an item, or," -"les that item a source of
activation for a short time.
Broadbent (1975)has argued that memory sPan consists of a
reliable three or four elementsthat can always be retrieved and
a secondset of variablesize that has a certainprobability of retrieval. That is, subjectscan recall three or four elements perfectly but can recall larger sPans,in the range of seven to nine
elements, only with a probability of around .5. The ACT analysis offered here correspondsto Broadbent'sanalysis of memory span. The certain three or four elements are those whose.activation is maintained from the goal. The other elements
correspond to those being maintained probabilistically by rehearsal. According to this view rehearsalis not essential for a
minimal short-term memory. Probably the correct reading of
the relation between rehearsaland short-term memory is that
while rehearsal is often involved and is supportive, minimal
setscan be maintained without rehearsal(Reitman, \97'l', 1974;
Shiffrin, 1973).It is alsorelevant to note here the evidence linking memory span to the rate of rehearsal(Baddeley,Thomson,
and Buchanan, 1975).
Peneptcu
THn STSRNBERG
The various paradigms for studying memory sPan have been
one methodology for getting at the traditional concept of shortterm memory. Another important methodology has been the
Sternberg paradigm (Sternberg,1969).In the Sternberg experiment, time to recognize that an element is a member of a studied set is found to be an aPProximatelylinear function (with
slope about 35 msec per item) of the size of the set. This is true
for sets that can be maintained in working memory. The traditional interpretation of this result is that the subject performs a
serial high-speed scan of the contents of short-term memory
looking for a match to the probe. I. A. Anderson (1973)has
pointed out that it is implausible that a serial comparison Process of that speed could be implemented neurally. Rather he
arguesthat the comparisonsmust be performed in parallel. It is
well known (Townsend, 1974)that there exist parallel models
which can predict the effectsattributed to serial models.
In the ACT framework, these iudgments could be implemented by productions of the form:
t20
Spreadof Actiaation
T'heArchitectureof Cognition
P5
P6
tzl
122
Aclivolion
Level
(o)
(b)
oooo-3^-r--,Co:,
Cr-r
t23
Spreadof Actiuation
oooo
at
of activation
effects
Figure3.S Twosimplelinearchains
for calculating
ilistance.
Level 2
\/\
o.-o
\--\lr,/
/\^/
Level I
./o
uNANALYZED
\l/
Level O
SrnucruRn
ANALYZED
t24
Spreailof Actiaation
- r, i + r, and possibly
i.ra Let s1
lions only to nodes in level i
be the relative strength of ail connections to level
i-- l, s, the
relative strength of all connections to level r, and
ss the streirgth
of all connections to level i + 1. sr + sz * ss =
l. It will be as_
sumed that the same values for these parameters
apply at all
levels except 0.
For all levels except 0 and l, the following equation describes
the pattern of activation.
at = PSrat-r * pszot * pssAsl
(3.11)
krr
(3.12)
2Pst
(3.13)
where
,:
and
k-
psv
r(1 -pssr-psz-pzsss)
(3.14)
and
ao:
(3.ls)
-'"'""
125
Control of Cogtrition
lControl of Cognition
127
r,tg-
t". Ei$E-ggtl-t
i-nffiry1*
qgged"u'
t h ed i i t i r r cit6 n i- i l E l s o
-.ffgf"gJAt"'"""',
Interestingly,
processing.
pefreptual
or
moders
in
many
Tound
of augdesign
in
the
as
architecture,
in
advances
continued
mented transition networks (woods, 19701for parsing or the
HARPY system for speech perception (Lowelre, 7976\, have
blurred this distinction. In such systemsthe processingoccurs
in responseto goals and data iointly'
(that
whether ot"-rt,rdies tasks that are basically-perc-eptual
or
basically
system)
is,- they start at the bottom of the cognitive
___v:-
t28
Controlof Cognition
TheArchitectureof Cognition
ing psychological
resultsconcemingthis mixing. DSlibqAtely
focssl4go{r9g.1ttgn!tg4en.e pgqceptual
!qgk-gan facititailits
e, 1973
gggellgrf, (LaBerg
; PosnerarirdSnj'dii: i'tg7il Cit"rge
M is the midPoint
PROVE:
of EF
ffi*q
l"iinstani,
lhat subiscts are fastei to recognize letters
they expect . Context uaI factoriffi veTlse GIftrs on_-DE
rcEDtual
ived.
--'rne=rGts;rimnrynrevel
known and are now well documented in the laboratory. A recent surprise has been how much low-level processesaffect
whlt is supposedly high-level processing. some of the early
evidenceof this camefrom studiesof chessand the game of go,
in which it was found that what separatedexpertsfrom novices.
was the ability to perceive relevant pattems on the gameboard
(chase and simon,]r9T3;Reitman, t9T6;simon and-Gilmartin,
1973).For an expert, the lines of development in the game are
suggesteddirectly by the patterns on the board, just aJour perception of a dog is usually determined by the data, without any
intent or plan to seea dog. The key to expertisein many problem--solving domains like physics (Larkin, 1991)or geometry
(Anderson, 1982)is development of these data-driven rules.
such rules respond to configurationsof data elementswith recommendationsfor problem development,independent of any
higher-level goals. In the problem shown in Figure 4.L, from
our work on geometr!, what distinguishes experienced from
novice students is that experts very quickly perceive that
LACM = aBDM even though they do not know how this fact
will figure in the final proof.
r
FDB
Figure 4.1 A geometryproblemthatseruesto tlistinguishnoaicestudents
immediatelyperceiue
frim
'that expeiinced. Experiencedstudents
how the f act wiII fit
knowing
=
without
ABDM
LLCM
into the final Proof.
that are useful for identifyit g what is significant about the ACT
architecture.
Certain aspects of the HEARSAY architecture have much in
common wittr production systems. For instance, there is a
blackboard which, like working memory, contains a wide range
of relevant data, organized generally according to level. In
speech perception, HEARSAY's blackboard contained hypotheses about sounds, syltable structures, word structures, word
sequences, syntax, semantics, and pragmatics. Numerous
knowleitge sources,which are like productions, resPond to data
at one levet and introduce data at another level'
At any point in time the system must choose which source to
apply, fto- a set of potentially relevant knowledge sources.
fttis'is the conflict-resolution problem, and it is in the solution
to this problem that the HEARSAY architecture differs most
fundamentally from ACT. In HEARSAY, conflict-resolution decisions are made dynamically and intelligently by considering
any relevant information. Various knowledge sources are responsible for evaluating the state of knowledge and deciding
what should be done next. This contrasts with the simple, compiled conflict-resolution schemes of production systems' The
irnenseY scheme has the potential for cognitive flexibility but
{(
130
131
Controlof Cognition
EXECUTIVE
P'ioriri.'+l
ProblrmDrfinirion
Dir,.torf<
!
'n**
Adminittr.tion
ro.u,
---
lu'*
l-
lnomgencnt
lschrduh
*;xr;
3*o''
lrr'b
4\,s
PLANAESTRACTION
f{
I
Architcct
Int.nlionr
Outcomu
<-
st..,,..
Srrrqgrir
lA
a
r..--.--'-'
|
3
I
rraran
I
l-+
KNOWLEDGE
BASE
et t.git'J
schcmcr
L.yost
tnvdttol
a--1
I
I
Procrduro
nrirrruorr j
f
lrcrtcr
Doignr
Ptttcrn
Rxqnint
f,out6
a-J
tuoximitv
Dctrtor
Oporrtionr
132
"l'heArchitectureot' Cognition
The Principles of Conflict Resolution
The human production system probably consists of somewhere between tens of thousands of productions and tens of
millions. For many reasons, we do not want to apply all the productions that could apply at any time. For instance, two productions that request the hand to move in conflicting directions
should not apply simultaneously. conflict resolution involves
deciding which productions to apply. In the Acr theory there
are five principles of conflict resolution: degree of matchr production strength, data refractoriness, specificity, and goal-dbminance. Each of these principles is achieved in AcT* through
the operation of its pattern matcher for production conditions.
In this, ACT* is unique; other production systems apply conflict resolution after pattern matching to choose a subset of the
matched productions. In ACT*, conflict resolution is thus an integral part of information processing rather than something
tacked on at the end to get adequate performance.
Dncnnn or Marcn
In most production systems it is a necessary but not sufficient
condition for production application that the condition of a production be fully matched by the contents of working memory.
on computational grounds it might seem desirable to insist on
full matching to avoid making errors. However, this constraint
must be relaxed to have a psychologically valid model. A.y
number of phenomena point to the fact that productions can be
evoked by partial matches.
Partial matching is essential because the world is full of partial data structures that must be matched. Faces change but still
can be recognized. we can recognize partial lexical patterns, as
the example in Figure 4.3 shows. It is also hard to explain why
generally well-working human procedures occasionally make
errors, unless one invokes the concept of partial match. Norman
(1981) describes many such errors. For instance, there is the
case of a person who threw a dirty shirt "by mistake" into the
toilet rather than into the hamper. Apparently a "hampermatching" pattern partially matched the toilet.
Looking at novices learning to program in LISp, we find
Control of Cognition
133
STnnNGTH
134
Controlof Cognition
Given: 4e s eE
m: eD
Aare= Acre
A6=m
A a o e= A c e e
(b)
Given:
(c)
EE=ID
6Tsm
p r o v eA: n e c : A a o c
haoedifficulty
stu.dents
4.4 ln theseproblems,beginninggeometry.
Figure
servestwo
simultaneously
ilata
element
the
same
that
in seeing
roles.
135
rJ6
Control of Cognition
137
Controlof Cognition
n
q!
"s
qt
g
L
q)
p
\t
E!
st
s
o
's
N
o
.t
o
q)
(a
o
F.
F
o
q)
(,t
\t
L
o
3
qJ
qr
{s
e.
qJ
F
o
(r)
oo
s
N
t
oo
o
Q
P
L
,
.v
L,
o
3
CJ
s
3
o
s
qt
qt
\t
ra
Itl
E
a
00
139
tal bar with right and left articulationpoints and a vertical bar
with bottom, middle, and top articulation points. This is a
small subset of the four-letter words used by McClelland and
Rumelhart, but it is sufficient for illustration'
Unlike McClelland and Rumelhart but like Forgy, pattems in
ACT are put together by a binary piecing of subpatterns.{orking fromit t"p to the bottom, EACH is composedof the letter
"
EA and CH; EA is formed from the latter
cluster pattems
pattems E and A; A is formed from the letter subpatiu*, F and I ; l- is formed by ioining the left end of a
horizontal bar to the middle of a vertical bar'
The exactcombinations are somewhatarbitrary, and a goal of
not
the project was that the pattern-matching principles would
undepena on specifics of the network for their success.It is
we
Uklly that experienceis careful in how it "wires" the net, so
Forgy
the
In
do not want brittle pattem-matching principles.
data-flow scheme,simple data elements(in this case,vertical
flow
and horizontal bars) .otn" in at the terminal nodes and
nodes'
pattern
through the various higher-order two-input
ThesJpattem nodes perform testsfor compatiblecombinations
tests
of the subpatterns. For instance,the A node performs two
places'
correct
the
at
to determine if its two subpattems ioin
and
The EACH pattem performi a test to determine if the EA
be
can
scheme
data-flow
Forgy
The
CH.patterns are adiacent.
another
than
organization
network
one
under
,r,or" efficient
but will function successfullyunder all'
A difference between my scheme in Figure 4.5 and the
McClelland and Rumelhart network is the existenceof intermediate units of letter clusters and feature clusters' McClelland
and Runrelhart have only letter and word levels' The current
proposal is not that there are specificnew levels for word recoSis
,,itiorr, but that it is artificial t-osuPPosethe pattern network
neatly layered into a small number of levels'
Dara-Frow MercHrNG
An examplenetwork. A prerequisite to explaining the ACT
pattern matcher is understanding some basics about data-flow
to
matching. Data-flow matchersare not especiallydesigned
only
the
is
mine
I
know,
as
far
as
and,
patterns
letter
recogniz-e
more
exariple of such a use. They are designed to_match the
in production conditions' If
t"t "*t classof patterns that "pp"ut
f word pattem i, ,"pt.tunted li a production condition, a dataflow mitcher will handle letter recognition as a special case'
However, before turning to letter patterns, let us consir{e1hqW
oo
s
st
-N
s3;
3-:|o
Pl
EEd
iis
ql
!p
,,ErE
Og
72 eZt
OLOs
E'EF
g E?'8
g:
ET
7t^
|ll
(N9U'A
.o
-c
a
a
Oo
o
-AJ
CC
st
oo
c,g)
sJ J t
:< lt
-GJ
o(U
EE.-
EE'
E 5E
To t rE| /:'
.JJ
-j
rr
EFGF.)
I6I6';
PEEE!
s g g,
rq)
L
0
(a
t\
I
a.
(.)
3 ig
?6s
J.el
nfuEaE;e
L
sq)
.s
aridF
o
(n
s
a
.9 et
(\l
c
q,
: r!
\t
o
2.?t
6 66
q)
qJ
(n
q)
L
6 6@
ET
E
o osr
;eErEF:g*
_q-86
d.
t\
(,t
qt
i =
3 :,l-
;t
(r
;3
-i
oGt
f;a
-6
E5
: Er* a t o
6
-- t)F)d=
(a
qJ
L
P
qt
E Eu E r
"f *V
EE,
EEO
ca
'lsct
j,'
oo
(a
EE
. 3 -i l .- 9
O
;RO
z6g
a.s7
o9
;e;?:g
;a;
a
cri
rt
\t
sr
>|
\t
o
,s
s
q)
o
"r<
L
P";
-in
E.s
q)o
It,
o
urc
o,
Eg
rqclO
<
7A
r Y).
Cl'J
o
c\o
-o0c,
oF
ttt
xE,
(9tiD
ss
33
t5
l l r
st (t
tsF
\lq)
<f
\o
t!
I
bD
t42
TheArchitectureof Cognition
Control of Cognition
t43
the left
two instantiations of node H (these are instantiations of
side)'
right
the
of
side incompatible with any instantiations
level
next
the
to
flow
then
These first-level combinations would
'!.,2,
instanan
be
would
8
4,
and
to be combined. Combination
the
tiation of node I, which rePresents a postulate relevant to
1,2,
Combination
teacher.
the
current line recommended by
po-stuand 5 would be an instantiation of node ], representing a
There
tried'
been
not
has
that
line
late relevant to the current
do-es
goal
current
is,
the
K-that
node
of
are no instantiations
methods'
untried
no
with
be
to
a
statement
Proven
not involve
aPPly'
The instantiation of node I means production P2 can
apply'
and the instantiation of node J means production Pl can
will
one
Both productions change the goal, so application of
applies
prurr"r,t application of tf,e othei. Whichever production
ef?irst, as determined by the conflict resolution principles, will
angle-tag
will
it
fectively block the other. Should Pl apply first,
goal of
side-angle as tried. Now, if the system retums to the
relepostulates
statementa, there will be no untried
proving
'vant
aPPIY:
P3
can
to"triangle congruence, and production
- .
this
SignificantieatureJof the exampti. lnumber of features of
data"-flow pattern matcher give it a comPutational advantage
over other proposals for paltern matching. First, it capitalizes
and
on the fact that multiple pioductions share the same clauses
only
clauses
these
to
relevant
tests
the
the matcher performs
once. This featdre of data-flow networks is described as spatial
redundancy. second, if part of a pattern is-in workin8 mem-ory,
the relevant nodes, and the
it can flow through thenetworkto
the patnetwork can be elfectively primed to receive theres{
"r
tern. This feature is referred to as temporaLrcd!!!.9ry;
In contrast to discrimination networks (Simon arfd Feigenbaum, 1964), a data-flow network does not expect features to
apPear in a certain order but will match the pattern no matter
what order the features come in. This is particularly important
in achieving a viable partial matcher. In a discrimination network, if th; first tests fail, the pattern will never be matched.
The data-flow scheme does not need to weigh any particular
feature so heavily. Also, such a pattern matcher is at a considerable advantage over a discrimination network because
it can take advantage of parallelism and perform many tests
simultaneously.
Untike the pattern matchers associated with early versions of
ACT (Andersbn, 19761,this one has a clear sense of time built
into it. The time for a pattem to be matched will be the time for
the tests to be performed at the two-input node that feeds into
t44
'[he
Control of Cognition
Architectureof Cognition
r45
This production is testing for spatial code information as defined in Chapter 2. Specifically, tests are being performed for
the location of various bars, the connections among them, and
the iuxtaposition of various feature clusters. The various nodes
in the network perform tests of how well the spatial constraints
specified in the condition are met. For instance, the F pattem
tests whether a vertical and a horizontal bar come close to meeting the specification that the middle of the vertical bar is connected to the left of the horizontal.
The production above is quite complex relative to the other I
ones in this book. This is partly because the production is being l
represented so explicitly, but also because the condition con- /
tains more information than most productions. Such large con'/
dition productions can only be built with much practice (seei
Neves and Anderson, 198'l', and Chapter 6).
Unlike Rumelhart and McClelland's scheme, the theory does
not require separate pattem recognizers for separate positions.
Rather, a single node rePresents the feature E in any position.
As mentioned earlier, different instantiations are created of the
E pattern for each instance.
Usn or Acrrv.nuoN
rN Perrsnr.I MercnrNc
As Forgy (1979\ documents, time to match in his implementation is determined by factors such as size of working memory,
number of patterns, and pattern complexity. The exact relationship of these factors to time is determined by constraints of the
computer hardware, so no one ProPoses that the temporal ProPerties of Forgy's system be taken in detail as a serious model of
psychological processing time. In ACT* the pattem nodes in the
data-flow network perform their tests at a rate determined by
the level of activation. The level of activation of a node is determined by parallel excitatory and inhibitory interactions with
other nodes in the network. Another important difference is
that in McClelland and Rumelhart, activation determines
whether the pattem matches, but in ACT, level of activation determines the rate of pattern matching. In both systems activation plays an important role in determining if a pattern will
match, but in ACT there is also an important role for tests at the
pattern nodes.
The level of activation at a particular pattern node is determined by four factors:
1.
The individual data elements that are matched to the terminal nodes have a level of activation. This activation is
L45
'iJ
'' ,Qw
"
-l
[-,^
(' . ."t )
t
Control of Cognition
i ;
ih"-;"J;; in"ttu'trorthe 7 {
^r,, oLcan} f. ,rr +l-rainnrrt ec in (b)- The nodes that test for the
i - ;#;;?;;;.h"T;;il;;l;i
.
\J
. r!
.
.,,,-rl- ^r
!|
^L^..11
L^.'^
*..^L
l^.^tap( \hL'(-n
combination of this feature, that is, F, should have much lower
uPStt'.b activation.
, uQ,V'*3
activation.
A
oF AcTIvATIoN Lnvns
,"f [,,t casethe ievelof activati."Y* the.node.:ltgd ?" "dil-t1",1-t:,:a critical feature may be tiT-
{-Cfr'" T
t47
(o)
tHT
(o)
(b)
tnr
( b)
FNUL
FNtF
148
'I'he
Arclritecture of Cognition
149
^r Contrgl of Cognition
i ir
wC
adn
t
b1
>n
i
\tr
7
it) f
Bottom-upexcitritiol. Excitation startsat the bottdm (terminal)
nodes in the network. Activation only flows upward. The level
of activation of the bottom nodes is a function of the level of
activation of the data that instantiatethem. The activation flowing from a node to its superpatternsis divided among the su--,
perpattems according to their strength, producing a "farr ef-\---i
fect." Indeed, the process for upward spread of activation is
very similar to the spread of activation in the declarativenetI
work.
The rules for calculating bottom-up spread of activation depend on whether the node is a positive two-input node or a
negative one-input node. The activation of a positive two-input
node is just the sum of the activation from the two inputs. If
there are no data from one of the two subpatterns (for instance
the horizontal bar is missing for the
F subpattern in the
input
is
the
activation
is iust that reletter A when the
fl ),
l. 4
'
tt
'l
*'
Es"t
r'+1
l, 2,
2 ...
K-l
K+l
...
K!i
IJ
of the situationthnt leadsto lateralinhibition.
Figure 4.9 A representation
NodeK receiaesinput from I and l. lt receiaesinhibitionfrom
all the noilesthat receiaeinput from I and J.
,{
'
150
3.
FoR A PnopucrloN
To Exscurr
2.
151
Control of Cognition
3.
4.
5.
The exact interaction of C, S, A, G, and I depends on the specifics of the data-flow network structure. It is almost impossible
to decide on such specifics, and they probably vary from subject
to subiect. In fact the ACT theory does not prescribe a way to
assign a numerical value to each factor. However, it is useful to
have some general, if approximate, description of the relationship among these factors:
' : E RCI
(4.1)
Conrucr
REsoLurIoN Pnrncrpms
t52
'l'he
Architectureof Cognition
Control of Cognition
(o)
Fnt I
(b)
FNUL
(c)F-n
(d)fnF F
(E)UNEH
(f) | l==l
(s)HE L
Figure 4.10 Vaious test stimulipresentedto the networkin Figure4.5.
and LOAF. Our simulation also produces this result and for a
similar reason. For instance, we contrasted recognition of the
second letter in parts (d) and (e) of Figure 4.10. In (d) the second
letter was recognized as O and the whole stimulus as LOAF. In
(e) the second letter was recognized as A because of frequency,
and the whole stimulus was not identified.
A third result reported by Rumelhart and McClelland is that
context is more helpful if it precedes than if it follows the word.
That is, subiects do better at perceiving O if they see F-OL context first and then O than vice versa. ACT predicts this because
the system can be locked into the incorrect perception of a letter
if it is presented too early. Once a pattern is decided, that interpretation will hold even in a contradictory context.
Subpart Dias. One of the interesting phenomena of human
perception is that it is easier to perceive a pattern when complete subparts of it are given than when the pattern is totally
fragmented. For instance, consider the contrast between parts
(/) and (g) of Figure 4.10. Both consist of nine letter strokes, but
it is much easier to see (g), which has a complete subpart, as
forming the word HEEL (or HELL). ACT reproduces this phenomenon of perception. The subpart forms a focus of high activation around which the whole pattem can be built and the
Control of Cognition
154
155
z
I
F.
C)
lr
o
u't
L
z
t
z
I
tr
(J
15
to
I N H I B I T O RCYY C L E S
ll-
o
o
ts
z
OrHrn
o5lo15
INHIBITOR
CY
YCLES
z
I
F
F
t!
o
an
tr
2.
z
l
3.
Figule 4.11
4.
156
5.
6.
7.
'l'he
Control of Cognition
The section on data refractoriness in this chapter discussed the evidence for this principle of conceptual and
perceptual pattern matching.
The model easily produces an gffect of pattern frequency
on rate of matching.
LaBerge (1974) has described some interactions between
attention and perception that can be accounted for in this
framework (to be discussed in the next section). More
generally, this model produces the effects of goals on pattem matching.
N p (p. qt-f
IF thegoalis to perceive
/!ry fr'li
patteml
"'s-s^\fu$
t57
Architecture of Cognition
HnnencrucAl
Goer Srnucrunrwc
158
P1
P3
P4
P5
P6
P7
Controlof Cognition
159
Control of Cognition
st
.$"
p
o
s
o
&x
C,E
.sR
(lt
.',F
.L
(\l
TR
tri
xE
tl \$ t
B
-s
trE
$s
s6
.F*
sr-E
ss
si'S
l,.;
Flo
rF
E5
3r
tg"
<t
o
lt':,
Sq,
s*
L(a
F-S
dl
a.r o
,s sq )s
C\'
5\t
.$x
"tia
sl
FO
H"E
q)ql
L
TX
Fr
CEI
cl-
Rs.
+Q
<-t
tsr
9q
oo
- . 9 "ra
E\i
<'
t
<-'
CL
Ir)
cL
Eb
bE
E- E
ql
qJl\
FR
161
"t62
it would be possible to calculatethe next move from the configuration. Theseproductions are similar to sophisticatedPercePtual strategy described in Simon (1975).
The effect of losing track of a hierarchical goal structure can
be to make behavior seemless structured and more data-driven.
One might describe as opportunistic the behavior of a subiect
who forgetshis goals,analyzesthe current state, and constructs
some partial solution to it in resPonseto its peculiar properties.
Whilt this might be an adequatedescription of the behavior,
it need not imply that the architecturesuPporting this planning
was nonhierarchical.The production system in Table 4.1 shows
how such opportunistic behavior can be obtained from a hierarchical system subiect to failures of working memory'
Behavior from the ACT architecturemight aPPearto be opportrrnistic in characterin that it responds to a complex, non'
irierarchicalcontrol structure. It is quite possible in ACT for a
data-driven production to produce a radical shift in the current
goal-for instance:
163
Controlof Cognition
to)
./\
^-(,,
./
\,
rr
\r
oF GoAt Tnsrs
In a coniunctive goal tree like that in Figure 4.13, all the subgoals of a goal are necessary to satisfy the goal. If one sub8oal
Iails, it is necessary to generate another coniunction of goals to
achieve it. A search problem arises when one must consider a
number of alternative decompositions (coniunctions) as ways to
achieve a goal. As a simple example, consider theapplicatio-n of
the following three productions to the problem in Figure 4.14(a):
IF the goal is to Prove LXYZ - AUVW
THEN try the method of angle-side-angle(ASA)
and set as subgoalsto Prove
1 . LZXY : LWUV
2 . X V= U V
3 . LXYZ - LUVW.
= ADEF. (b)
4.14 (al A simplegeometryproblem:proae aABC
Figure
e
productions.
the
by
The goatstrulture thit wouldbeSenerated
different
attemptsby
different
reflects
in the text. The or-leael
proiluctionsto soluethe pro,blem.The and'leaelreflectsthe
requireilfor eachdttempt'
subgoals
IF the goal is to Prove LXYZ = AUVW
THEN try the method of side-side-side(SSS)
and set as subgoalsto Prove
1. yy =W
2. E,-ffi
3. fr. = W=U.
tu
Control of Cognition
'
Y2.
The repair for this situation would be proposed by the following general production:
P5
166
Control of Cognition
TheArchitectureof Cognition
P A I N TL A D D E R
PAINT CEILING
APPLY
PAINT TO
CEILING
PAINT LADDER
PAINT CEILING
APPLY
PAINT TO
LADDER
t67
168
'Ihe
Architecture of Cognition
Control of Cognition
1'59
lt*:Bn,(t)-p*a"(t)
(4.2)
where nr(t) is the input to node x. It is the sum of excitatorybottom-up input, e,(t), and of inhibitory input, i"(f). Thus
n"(t): er(t) + i'(f)
(4.3)
The valu e of er(t) is positive, but the value of i"(f) can be positive or negative.
The excitatory input depends on whether the node is a positive two-input node or a negative one-input node. If it is a positive two-input node, the excitatory factor is defined as the
weighted sum of activation of its two subpatterns, y and z:
er(t\ :
(4.4)
fnt:
s3
(4.5)
> ,,
170
quency and recencyof exPosure.In the caseof a negative oneinput node r, its excitatory input is a function of the activation
of the one-input node y that feedsinto it:
er(t):ffi'rur'ao(t)
(4.6)
I t.
-r^.(t'"
V t
,r.0*
i,(r)
i"o$)
(4.8)
, . o q ' : ^ _ f'ut'l
Ft.r,{'(;\l
r t''''
,i-t
(4.7)
'
i..
(4.10)
Introduction
UMAN MEMORY has been a maior domain for testi.g the ACT theory. Although the theory is more
general than any particular memory paradigm, it is
important to show that its assumptions, when combined,
p.edi.t the results of specific paradigms. Every experimental
iesult that can be derived from existing assumptions is further evidence for the accuracy and generality of those assumptions. The first Part of this chapter will discuss the
ACT assumptions about encoding, retention, and retrieval of
facts. The later sectionswill consider some of the maior Phenomena documented in the experimental literature on human
memorln
ErscopINc
Encoding in ACT refers to the Processby which cognitive
units (in the senseof Chapter 2) becomePerrnanentlong-term
memory traces.When a cognitive unit is created,to record either some externalevent or the result of some intemal computation, it is placedin the active state.However, this active unit is
transient. There is a fixed probability that it will be made a Permanent trace.The term trace is reservedfor permanent units.
This encoding assumption is spectacularlysimple. The probability is constant over many manipulations. It does not vary
with intention or motivation to learn, which is consistentwith
the ample researchindicating that intention and motivation are
irrelevant if processingis kept constant (Nelson, 1976;Postman, 1964).This is probably an adaptive feature, becauseit is
unlikely that our iudgments about what we should remember
are very predictive of what will in fact be useful to remember.
171
,,,r" "
L73
when a subject reads and then rereads a story that involves too
many cognitive units to be all held simultaneously in working
memory. In this situation the subject must shift his working
memory over the material. The second effect of duration of
study is that the subject may use the extra study time to engage
in elaborative processing, which produces redundant traces.
That is, while added duration of residence in working memory
will not increase that trace's probability of being stored, the
time may be used to create traces that are redundant with the
target trace. For example, an image of a horse kicking a boy can
beiegarded as redundant with the paired associate horse-boy.
This possibility will be discussed at great length later in the
chapter.
this view is somewhat similar to the original depth-of-processing model (Craik and Lockhart, 1972; Craik and Watkins,
1973),which held that duration of rehearsal was not relevant to
memory if the rehearsal simply maintained the trace in working
memory. Rather, duration would have a positive impact only if
the rehearsal involved elaborative processing. However, the
current proposal differs from that in the claim that spaced repetition of an item will increase memory even if neither presentation leads to elaborative processing. The current proposal only
claims that duration of uninterrupted residence in working
memory is ineffective, which is more in keeping with Nelson's
conclusions (1977) on the toPic.
Additional presentations after a trace has been established
increase the strength of the trace. All traces have an associated
strength. The first successful trial establishes the trace with a
strength of one unit, and each subsequent trial increases the
strength by one unit. The strength of a trace determines its
probability and speed of retrieval. Thus strength is the mechanism by which ACT predicts that overlearning increases the
probability of retention and speed of retrieval.
172
Rrrrrgrror.t
According to the ACT theory, a trace once formed is not lost,
but its strength may decay. Studies of long-term retention show
gradual but continuous forgetting. Based on data summarized
6y Wickelgren (1976) and data of our own, trace strength, S, is a
power function of time with the form
S:
t-b
(5.1)
where the time f is measured from the point at which the trace
L74
Memoryfor Facts
TheArchitectureof Cognition
Rrrnrrvnr
The probability of retrieving a trace and the time taken to retrieve are functions of the trace'slevel of activation. Retrieval of
a tracein the ACT framework requires that the tracebe matched
by u production or productions that generate the memory report. As developed in the last two chapters,the time to perform
this matching will be an inversefunction of the level of activation. Thus the time, T(A), to generatea memory report of a trace
with activation A is:
r@):l+B/A
(5.2)
this assumption will be given later, when the effectsof extelsive practice are discussed.
Not
elements. rt\rl
interconnect sltlrrErrfD.
that llrtEr(-urlltEut
units fllaf
cognitive unus
are coSnltlve
arg
Traces
lraces
only do the unit nodes have strength but so do the element
nodes they connect. Every time a unit node acquires an increment in strength, there is also an increment in the strength of the
elements. An element node can have more strength than a unit
it participates in because it can participate in other units and
gain strength when these are Presented too. Thus all nodes in
lhe declarative network have associated strengths, and all accumulate strength with Practice.
The strength of a node affects its level of activation in two
ways. First, the amount of activation spread to a node from associated nodes is a function of the relative strength, r11,of the
link from node i to node i, as developed in Chapter 3. Let Sy be
the strength ofj. Then rg : SlflxSlwhere the summation is over
all nodes k that aneconnected to element i. Second, every time a
trace is strengthened there will be an increment in the activation capacity of its elements. This capacity determines how
much activation the element can spread if it is presented as Part
of the probe (that is, the source activation capacity, in the terms
of Chapter 3). This idea will be developed in the section on
practice.
1,75
,t
v )q I
tA ^i,.2,
(5.3)
t - e-KAtB
(s.4)
This is most naturally read as the probability of an exPonentially variable matching Processwith mean B/A completing before a fixed cutoff K. However, it is possible to read it as the
probability of a fixed matching Process completing before a
variable cutoff.
The probabilistic retrieval Process implies that if repeated
memory tests are administered, the results should be a mixture
176
Memoryfor Facts
T'heArchitecture of Cognition
Tnruerrc
fuocltENTs
The experimentsof Reder and Anderson (1980)and Smith,
Adams, ind Schom(1978)are examplesof situations in which
subiectscan use this processof connectednessiudgment to verify statements. These researchershad subiects study a set of
ficts about a Personthat all related to a single theme, such as
running. So a subiectmight studY:
Marty preferred to run on the inside lane.
Marty did sprints to improve his speed.
Marty bought a new Pair of Adidas.
The subjects then had to recognize these facts about the individual and reiect facts from different themes (for example,
Marty cheered the trapeze artist). The number of such facts studied a-bout a person was varied. Based on the research on the fan
effect (see Chapter 3) one might expect recognition time to increase with the number of facts studied about the individual.
However, this material is thematically integrated, unlike the
material in the typical fan experiment. In these experiments,
recognition time did not depend on how many facts a subiect
studied about the individual.
Reder and Anderson postulated, on the basis of data from
Reder (1979), that subiects were actually iudging whether a
probe fact came from the theme and not carefully inspecting
memory to see if they had studied that fact about the individual.
To test ihis ide", we examined what happened when the foils involved predicates consistent with the theme of the facts that
had been studied about the probed individual. So if the subiect
had stud ied Marty prefeneit to run on the inside lane, a foil might
be Marty ran fiae miles eaery ilay (the subiect would have studiedran fiae miles eaery ilay about someone other than Marty). In
this situation subiects took much longer to make their verifications, and the fan effect reemerged.
We proposed that subjects set uP a rePresentation like that in
Figure 5.1-, in which the thematically related predicates are alteidy associated with a theme node like running. A subnode
has been created to connect the traces of that subset of the facts
that were studied about the person. This subnode is associated
with the theme and with the individual theme predicates. The
subnode is basically a token of the theme node, which is the
type. The figure shows two such theme nodes and two subnodes. Reder and Anderson proposed that in the presence of
178
TheArchitectureof Cognition
Memoryfor Facts
C h e c kS c h e d u l e
Arrive ol Slolion
Buy Tickel
S u b n o d eI
W o r m e d u Pb Y J o g q i n g
P r e f e r r e d I n s i d eL o n e
R o nF i v e M i l e s
Sirbnode2
Did Sprinls
Troce 6
Wonfed lo MokeTeom
ughf NewAdidos
179
Memoryfor Facts
T'heArchitectureof Cognition
180
Number of
facts about
other theme
1 fact
about
tested
theme
3 facts
about
tested
theme
0
1
3
7.78
7.26
7.24
6.83
6.57
6.56
0
1
3
7.M
5.59
5.50
7.49
6.29
6.22
rN PREsENcE
Sunhloor rN PREsENcE
OF TARGET
SusNopn rN PREsENCE
OF RELATED
Sunnoor
FOIL
0
1
3
s.30
4.07
4.00
5.94
4.88
4.81
0
1
3
1.40
.70
.66
r.42
rN PRESENcE
OF UNRELATED FOIL
.75
.71,
181
oN SunrqoPrs
183
TheArchitectureof Cognition
Memoryfor Facts
sentences of the form The lawyer hated the doctor, they had to
discriminate these sentences from foil sentences consisting of
the same words as the target sentence but in new combinations.
There were twenty-five days of tests and hence practice. Each
day subiects in one group were tested on each sentence twelve
times, and in the other group, twenty-four times. There was no
difference in the results for these two groups, which is consistent with earlier remarks about the ineffectiveness of massing of
practice, so the two groups will be treated as one in the analysis.
There were two types of sentences-no-fan sentences, consisting of words that had appeared in only one sentence, and
fan sentences, with words that had appeared in two sentences.
Figure 5.2 shows the change in reaction time with practice. The
fuictions that are fit to the data in the figule are of the form T :
I + BP-", where I is an intercept not affected by strengthening,
I + B is the time after one day's practice, P is the amount of
practice (measured in days), and the exponent c is the rate of
improvement. It tums out that these data can be fit, assuming
different values of B for the fan and no-fan sentences and keeping I and c constant. The equations are:
182
Practice
People get better at remembering facts by practicing them,
and it should come as no surprise that ACT predicts this. However, the serious issue is whether the ACT theory can predict
the exact shape of the improvement function and how this
varies with factors such as fan.
AccurrruLATIoN oF STRENGTHwITH PRAcrrcE
ACT makes interesting predictions about the cumulative effects of extensive practice at wide intervals such as twenty-four
hours. The reason for looking at such wide spacings is to avoid
complications due to the diminished effects of presentations
when they are massed together. ln a number of studies I have
done on this topic, we have given the subiect multiple repetitions of an item on each day and repeated this for many days. In
the analyses to follow, the cumulative impact of the multiple
repetitions in a day will be scaled as one unit of strength.
Assume the material has been studied for a number of sessions, each one day apart. The total strength of a trace after the
pth day and just before d^y p + 1 will be (by Eq. 5.2):
p
s:> sf-D
l=l
for no fan
(s.7)
/1
/1
/f,"v
(5.s)
It can be shown (Anderson, 1982)that this sum,is closely.ag9 = vr"rtl- 5 \,f^Jh .'
proximated as:
cLbL-l,eo eLc+l
5:D(P)+"-a
(s'6)
Errscrs
oF ExrENsIvE PRACTICE
lb
t5
20
25
DAYS OF PRACTICE
as a function of
timesfor fan and no-fansentences
Figure 5.2 Recognition
practice.The solidlinesrepresentthe predictiotrsof the model
desuibedin the text.
184
(s.8)
ttr
Y
/
I(P):\+IzP-"
RT(P):Ir*(6+nB'\P-c
4nr y"-l
sirength
of a conceptlthen;rrysqe.ngrh
K, i'*t;ri'pr#ii."z
naarra--iha
A-
rL^
r-^^^
Rr(P): I(P).
-^-:tl
l- ^
4/
?,
nn+r\
r(p)+
#*.r:
..r
I *t"l
t'2,4r1ts."rzl
&
q , \ ij
.r{
aD\PJ,
nB'P-"
fr-
_fl
;
,i
(s.e)
ffd;yll"?
p= s,/0-a\
q=
l ^ ,n
blo htz
C*- fc{
(5.10),
S =n",y'Q
>1,'^fr?*
5l rnf rr"e,,w;b
n .: S.,.,
b =' ,)., n.1 rrngi..-{l
),^
,-b
-.-..|.
L)
by settin'lr:
INrrnaCrION
where B' : B/3D and I(P) is the intercept after p days of practice. To the extent that s', the prior strength of the .or,cept, is
small relative to the impact of the massive experimental practice, this function becomes nB'P-", and total reaction time is
predicted to be of the form
S'=R.^o
(s.12)
.t
$'/DP" + 7)
(s.11)
odthat
f t h a ttrace
t r a crelati
erelatctivationemitted\.hZ,<J,r^1
",^uRY)
h
rr
nnnnonl
il
by
^a concept
ifa
function of its strengt]i If R is the prior
;t0
185
T'heArchitectureof Cognition
G',vr')1- jfr-vufti-D+
#,]l
t87
TheArchitectureof Cognition
Memoryfor Facts
186
AN, UNFAMILIAR
t.40
Oruen
(-t
lrJ
I
u,J
F A N ,F A M I L I A R
1.20
z.
o
t(J
N O- F A N
UNFAMILIAR
LrJ
E
NO- FAN
FAMILIAR
t23456789
DAYS OF PRACTICE
nboutfamiliarandunFigure5.3 Recogttition
of f nn andno-fansentences
as a functionof practice.
familiar concepts
Rnsurrs
The hypothesis that more frequent nodes-have greater a_ctivation capa.ity it consistent with a number of results in the literature. For instance, Keenan and Baillet (1980)found that subiects
are better able to remember material about more familiar people
and locations. They used an incidental-leaming paradigm in
which subjects *rui" asked an encoding question about these
concepts (is your bestfriend kind? versus ls your teacher kind?
subLater,
where the second is about a less familiar person).
had studied pairs like teacheriects were asked whether they
kinit. Subiects were better able to remember suctr pairs yh"l
the people were more familiar. Keenan and Baillet also found
that subiects answerecl the encnding questions more rapidly
about familiar items, which is also what would be expected if
familiarity affected level of activation.
A similar concept is needed to understand the marvelous performance of the subject SF, studied by Chase and Ericsson
(1981), who was able to increase his memory sPan to_over
.ignty digits after hundreds of hours of practice. He was able to
cJmmit these numbers to long-terrn memory with a rate of
188
T'heArchitectureot'Cognition
presentation typical of memory-span experiments and to reliably retrieve these items from long-term memory. Some of this
performance depended on developing mnemonic encoding
techniques, such as encoding three- and four-digit numbers
into running times. He might encode 3492 as 3 minutes and
49.2 seconds, a near-record mile time. This allowed him to get
his memory span up to more than twenty digits (for example, T
running times x 4 digits : 28 digits). Presumably, SF's ability
to remember these times reflects the fact that running times
were more frequently used concepts for him than strings of four
digits. He did not in fact have a prior concept of 3: 49.2. However, the various elements that went into this concept were sufficiently familiar that they could be reliably retrieved.
Note that this mnemonic technique by itself left SF far short
of the eighty-digit level. To achieve this, he developed an abstract retrieval structure for organizing the running times in a
hierarchy. He stored these running times in long-tenn memory
and then was able to reliably retrieve them back into working
memory. Figure 5.4 illustrates the hierarchical structure proposed by Chase and Ericsson for SF's performance on an
eighty-digit list. The numbers at the bottom refer to the size of
the chunks he encoded mnemonically. If we ignore the last five
numbers, which are in a rehearsal buffer, SF has a node-link
structure that organizes twenty-one chunks in long-term memory. He had to practice with a particular hierarchy for a considerable time (months) before he became proficient with it. This
practice was necessary to increase the capacity of the individual
emory
accurate and
nodes in this hierarchy so that they could support
the caincrease
can
practice
that
rapid retrieval. This indicates
in this
ones
the
as
such
nodes,
abstract
pJ.t,y of completely
such
nodes,
concrete
more
of
capacity
th"
as well ur
,t*.[,,r",
as people
.
e
mnemonic
fnfotu generally, an important comPonent of T"ty
unfamiliar,
techniqties is clnverting the task of associating
strong eleweak elements to the taik of associating familiar,
method
key-word
ments.s For instance, the critical step in the
(AtkinsonandRaugh,lgTs)forforeignlanguageacquisitionis
familiar word'
to convert the ,rr,fltt itiar foreign wbrd into a
practice,.the
through
familiar
when the foreign word becomei
recall of
accurate
for
required
longer
conversion stef is no
meaning.
Recognition versus Recall
is straightforThe difference between recognition and recall
paradigm,
recognition
a
In
Parts
analysis.
ACT
ward under the
whether they
asked
is
subiect
th"
uttd
presented
are
of a trace
is also asked to
were studied. In a recall paradigm, the subiect
activation conACT,
retrieve other componenti of the trace. In
If there is
components.
verges on the trace from all presented
recognia
In
available'
sufficient activation, the trace becomes
In a
trace'
is
a
there
that
tion test the subiect simply announces
More
trace.
the
parts-of
generate
also
recall test the sublect *ntf
paradigm'- For
of the trace is usually presented in a recognition
and restimulus
both
recognition
instance, in paired-isiociate
recallonly
paired-associate
in
but
presented,
,po,,," are typically
to test pairedthe stimulus is presented. Howevet, it is possible
a stimulus' One
associate ,".ogr,ition by simply Pfes-e1ling
between successat
would expect a high conditional probability
the response'
recognizing the st'imulus and success at recalling
and fnde"J thut" is (Martin, 1957)'
Parnnn-AssoclATg Rscocr'ItrloN
recall, wolIn one interesting analysis of recognition versus
to the
associate
paired
a
ford (1971) tried to ielate recognitionbf
rethe
of
recall
to
lead
would
p.oUuUitity that the stimului
,por,ruur,dtotheprobabilitythattheresPonsewouldleadto
fol guessing,
recall of the stimulus. He showed that, correcting
by the propredicted
be
could
associate
paired
iecognition of a
His model was basibabilities of forward and backward recall.
if the subiect
cally that a paired associate could be recognized
the stimulus
or
could retrieve if," turponse from the stimulus
190
Memoryfor Fa.cts
Pt + ('l' - P)P,
(s.13)
under the ACT* theory the subject is viewed not as performing two independent retrievals in recognition but rather as converging activation from the two sources. This is an important
*r"y it, which AcT* differs from the earlier ACT',s (Anderson,
1gi6) and their predecessor HAM (Anderson and Bower, 19731.
Nonetheless, ACT* can predict the relationship documented by
Wolford. Let As denote the amount of activation that comes
from the stimulus and Aa the amount from the response. The
probability of retrieving the memory trace will be a function of
ihe activufiot from the stimulus in the case of forward recall, of
the activation from the response in the case of backward recall,
and of the sum of these two activations in the case of recognition. The following equations specify probability of forward recall, backward recall, and recognition:
Pt :'1. - e-KAs
(5.14)
- e -x A ^
(s.15)
Po :1
Pr:
L - e-KGs+Anl
- | -
:f-(1
(s-xes)@-KAa)
-Pr(l -Po)
: Pr * 0' - Pt)Pb
,WordI
Word 2
o
o
o
o
(5.16a)
(s.16b)
(5.16c)
(s.16d)
191
CONTEXT
Wordi
o
o
o
o
o
Wordn
for a
of the word-contextassociations
Figure 5.5 Network representation
singlelist.
192
'l"he
Architectureof Cognition
lvlvlllury
volves retrieving the context from the word and verifying that it
is indeed a list context. Direct recall involves retrieving list
words from the list context. However, because of the high fan
from the context, the subiect will have limited success at this.
Thus the subject will use an auxiliary process, using various
strategies to generate words that can then be tested for recognition. This has been called the generate-testmodel of recall. For a
review of relevant positive evidence see Anderson and Bower
(1972)or Kintsch (1970).The maior challenge to this analysis has
come from various experiments showing contextual effects.
These results will be reviewed in the next sections.
The basic assumptions of the generate-test model are consistent with the current framework. The current framework makes
it clear that recognition is better than recall because the fan from
the word nodes is smaller than that from the context nodes. If
the same word appears in multiple contexts the fan from the
word node would be large, and this would hurt recognition.
Anderson and Bower (1972, 1974) present evidence that recognition performance degrades if a word appears in multiple contexts.
Errscrs
TRAIN
Luv.o
oTHERS2l?:0l
in the en'
networkstructure
of therelevant
Figure5.6 A representation
with
associated
numbers
The
,oaing_rpirilirityexperiments.
acti'.
spreailing
the
with
associateil
strengths
the
aie
ttte niais
assumeil
aation a-nilysis.Thisis thememoryrepresentation
train-black'
whenthe sibiecthasstudieil
or ENcoDTNGCorgrrxr
A large number of studies have displayed an effect of encoding context on both recognition and recall (for example, Flexser
and Tulving, 1978; Tulving and Thomson, l97l; Watkins and
Tulving, 1975). These experiments are said to illustrate threencoding specificity principle, that memory for an item is specific to
the context in which it was studied. The experiment by Tulving
and Thomson (1971)is a useful one to consider. Subiects studied items (for example,black) either in isolation, in the presence
of a strongly associated encoding cue (white, say), or in the
presence of a weakly associated encoding cue (train). The strong
and weak cues were selected from association norrns. Orthogonal to this variable, subiects were tested for recognition of the
word in one of these three contexts. Recognition was best when
the study context matched the test context.
We have explained this result in terms of selection of word
senses (Reder, Anderson, and Biork, 1974; Anderson and
Bower, 1974') or in terms of elaborative encodings (Anderson,
7976). These explanatiops still hold in the current ACT* framework. Figure 5.6 illustrates the network structure that is assumed in this explanation for the case of study with a weak encoding cue-that is, the subject has studied the paft train-black.
Black, the to-be-recalled word, has multiple senses. In this case
lv,
,.*r
'+
i..
l-.,
,'*l
',F:
t,
-t
,r'1 ,
t
I{
;
to which is attached
it is illustrated ashaving two senses,blackl.,
which is attachedthe
the weak associatefialn, andblackz,to
s t r o n g a s s o c i a t e w h i t e . T h e o v a l n o d e s i n F i g u r eto5othersl
.6arethe
leading
tracesencoding theseassociations;the nodes
associations'Similarly'
andothers2representother unidenti fied
and white represent
it ,,oa"s at it bottom attached to train
"
only the multisimplicity,
" unidentified associations.For
other
ple senses of blackare rePresented'
there is only one
At first, p"opi" often have the intuition that
a number of disare
there
However,
sensefor a word llkeblack
one is likely to
white,
of
the
Presence
tinct if similar,",',",. In
color or a race of
think of blackas referring to a prototypical.
to associateit with
people. In the fi"r"r,." 2itraiy one is liicely
toy train'
!ooa'o, the glistening color of a polished
of the word chosense
the
determines
conilxt
The encoding
that senseand, perhaps,
sen, and a traJe is formed involving
is tested, context will
the encoding context. When the su6iect
will spread
activation
and
again determinl the sense chosen,
fromthechosur,r"rrr".Theprobabitityofrecognition.willbc
greaterwhenthesamesenseischosen,becauseactivationwill
to the trace'
t" ,pr"uding from a node directly-attached
I t s h o u l d b e n o t e d t h a t a s e n s e f o r t h e w o r d c a n a l of,black
sobechoThat is, the sense
sen by means oirpruuaing activation.
is the one that receivesthe
chosen in the context of {rain-black
greatestactivationfromtrainand,b|ack.InFigure5.6thiswillbe
of.train and black. Thus,
btackl,,which lies at the intersection
Memoryfor Facts
TheArchitectureof Cognition
194
\,a
LOG
TRAFFIC
RRY
STRAWBE
'<l
RASPBERRY
TRACE
JAM
Figure 5.7 A representation
of the releaant memorystructuresfor the
Light and Corter-Sobell
experiments.The subjecthas studied
raspberry-iam.The nodes51 and 32 identify the two senses.
195
TheArchitectureof Cognition
Elaborative Processing
t96
(5.17)
re'/
Memoryfor Facts
TheArchitectureof Cognition
198
additional concepts not in the probe but which were part of the
elaboration at study. This involves elaborating on the probe at
test to try to generate additional concepts from which to spread
activation. The third method involves inferring or reconstructing what the target trace must have been by using elaborations
retrieved at test.
RrurnncuNc
AcTIvATIoN
PRIOR
ASSOCIATES
of the elaboratiaestructuregenerated
Figure 5.8 Networkrepresentation
to connectthe Pair dog-chair.
199
Souncss oF AcTIvATIoN
200
'[he
possible to retrieve an elaboration, focus on elements in the retrieved elaboration, and make them additional sources for activating the target. So if the subject recalls his elaboration The dog
loued his master, he can use master as another point from which
to spread activation to the target structure. That is, the subject
need not confine himself to spreading activation from the presented word.
Configural cuing. The research on configural cuing of sentence
memory can be understood in terms of this elaborative analysis. Anderson and Bower (1972) had subiects study sentences
such as The minister hit the landlord, and then cued subjects for
memory of the obiects of the sentence with prompts such as
the The minister hit the The The minister hit the -
S-cue
V-cue
SV-cue
(s.18)
201
Architecture of Cognition
(5.le)
(5'20)
One can get the inequality in Eq. (5.18) to the extent that a trace
encoding the sentence is not formed. That is, let P*(C) be the
probabiliiqrr of reviving the trace from cue C conditional on the
iru.u being formed, and let P(C) be the unconditional probability of trace retrieval. Let a be the probability of trace formation.
Then,
P(Ct) : aP*(Cr)
(5.21)
P(Cr) :
(s.22)
aP*(Cz)
(5.23a)
(5.23c)
TRAC E
\
( h i r)
)
O (londlord
(cross)
traceand the
of a subiect-aerb-obiect
Figure 5.9 Network representation
one
concepts.
the
among
oaerlappingnetworkof associations
trace.
in
the
included
been
has
concepts
inteisecting
of the
203
202
' '.;.]
':':,
.'1
._f+'
i
,l
. ,it,'
t'r:,it
and cued subjects for recall with a term like actress.They point
out that nurse has two interpretations-one as a female Person
and the other as a profession. Similarly , beautiful has two interpretations-one appropriate to women and one appropriate to
iandscapes. In the case of A, actressis appropriate to the selected senses of nurse and beautiful This is not the case in B or
c. They were interested in the relationship among the probability of actressevoking recall of sentence A, the probability of it
evoking recall of sentence B, and the probability of it evoking
recall of sentence C. Referring to these three probabilities as f , s,
and p, respectively, they observed the following: I ) s *
(l - s)p. This was interpreted as contrary to associative theory.
The argument was that s reflected the probability that actress
could .ettieve a sentence through nurse, and p was the probability that actresscould retrieve a sentence through beautiful.It was
aigued that according to associative theory, the probability of
reirievinB A, a sentence that involved both nurse andbeautifr'rl,
should be no greater than the probabilistic sum of s and p.
Figure 5.10 shows the network schematics for situations A, B,
and C. In each case, concepts at the intersection of subiect and
predicate are chosen for inclusion in the trace elaboration. As
an example, in case A glamor is at the intersection of nurse and
beautifut.This is closely associated to actress,so (tctressis closely
associated to the trace. In contrast, the elaborations at the intersection of subject and predicate in B and C are not closely asso-
Memoryfor Facts
T'heArchitectureof Cognition
2M
(A)
BEAUT]FUL
NURSE
(\6Af\E
TRACE
\'^i"'/
zu5
GL A M O R
maP
life. The elaboritive productions in the appendix would
folthe
generating
sentence,
to-be-studied
the
this schema onto
lowing elaboration:
ACTRESS
(B)
G L A M O-R N U R S E
It\
L IC E N S E
TR A C E
ACTRESS
This elaboration is generated from the original memory by simple substitution of cat tot rat.
Suppose the subiect was not able to recall the target sentence
but was able to recall two elaborations:
\
\
\
(c)
DIPLOMA
B EA U T I F U L
LANDSCAPES
'/
RA.E
)X"o*
_L_-
A.TRESS
t/
,o,lutNc
207
Memoryfor Facts
206
208
Memoryfor Facts
'l'heArchitectureof Cognition
Conclusions
d*..,J
\o' irr"'tt'
tf;
9t
!'
;:
.a
l:
209
phenomena can be understood in terms of this spreading activation mechanism is evidence for that mechanism.
A large part of the current research in memory concerns the
encoding, retention, and retrieval of stories and discourse. A
frequeni question is whether these phenomena can be explained Uy tfre same principles that aPPly to more traditional
memory paradigms. The answer of the AcT theory should be
the way results from the two domains are mixed
apparent]to*
in tfris chapter. Processing of discourse differs only in that the
material is richer and more comPlex. For instance, there might
be more elaborative Processing, and it might be more predictable. However, there is no reason to believe that the embellishment of a story is different in kind from the,.embellishment\of
the lowly paired associate
Appendix:
Table 5.3 provides a production set for generating elaborations. Besides providing a concrete model for elaboration generation, this prlduction set has relevance to the issue discussed
in Chapter 1, whethgr the baqic architecture consists of produc-,d;'
ttqn qytlg$S. qf.get-temasystems. This production set will illust*t" t o* ihe effect of a schema system can be achieved in a
production system architecture.
The knowledge attributed to schemata is stored in knowledge
structures in deilarative memory. The productions in Table 5'3
serve to interpref these declarative structures and to create elaborations by analogically mapping the schema onto the to-bestudied material. ttre dectarative knowledge structures will be
sets of one or more propositions describing specific or general
events. Each ptoposition will involve a relation or set of arguments. The fact that these schemata describe events and involve
propositions makes them rather like the scripts of Schank and
eU"rcon (1977\. A wider set of schemata (to include things like
descriptions of spatial obiects) could have been handled by
augmenting the set of productions in Table 5.3. However, what
is provided is enough to get the general idea across.
iignt" 5.11. illustrates the flow of control among the productionJ in Table 5.3. Suppose a subiect studies:
1.
2.
3.
4.
TheArchitectureof Cognition
2lo
Table5.3Someofttteproductionsthatulottldbeneededtoembellish
studied materialwith stored schemata
material
IF the goal is to embellish the studied
P1
stru-cture
knowledge
andiVschema is a connected
P4
P9
P10
p1l
PB
berweenLVargu_
rHEN*
rguLVa
nictbetween
con
l^T"lr'115";ir::ffi1lirln"
P72
P13
andthereisnofactfromthestudiedmaterialthatinvolves LVrelation
THENinferLVfactforthestudiedmaterialinvolvingLVrelation
andsutasasubgoaltomakeacolTesPondencebetween
LVschema-unit and LVfact'
between LVschemaIF the goal is to make a coresPondence
P6
unit and LVfact
in LVrole
and LVschema-unit involves LVargumentl'
which has not been maPPed
in LVrole
iVfutt involves LVargument2
between
correspondence
"r,a
the
check
to
THEN ,et ul'"-r"ug""l
LVargumentt and LVargument2'
between LVschemaIF the goal is to make a coresPondence
P7
unit and LVfact
in LVrole
and LVschema-unit involves LVargumentl
maPPed
been
not
which has
LVrole
and LVfact has no argument in
for LVargucorrespondence
the
to inier
THEN ;;r;;;6;d
THENPOP.
rF the soar,:j:.1;:i_,*::;.,o""dence
P74
and POP.
IF the goal is to infer the correspondenceof LVargumentl
and no corresPondingargument can be found
THEN POP with failure.
P15
P16
Memoryfor Facts
TheArchitectureof Cognition
212
EMBELLISH
STUDIED
MATERIAL
No unused
schemo unifs
Find schemc
Retrieve
scherno unil
EMBELLISH
WITH
SCHEMA-UNIT
R e t r i e v ef o c l
from study
moleriol
ArgumenlsolreodYin
No unmopped
orguments
Hove to infer
o new focl
MAKE CORRESPONDENCE
BETWEENARGUMENTS
OF SCHEMAUNIT
AND FACT
Correspondence
slored
Pt2
CHECK CORRESPONOENCE
BETWEENARGUMENTS
No ossignmenl
for schemo
orgumenl
INFER CORRESPONDENCE
FOR SCHEMA ARGUMENT
lncompotible
ossignmenls
RESOLVE
CHECK COilNPATABILITY
@NTRAOICTION
OF ARGUMENTS
213
214
'f!:.
.'lt'r.ltiltclnt't rtfi'rrtrriffrrru
6 | Procedural Learning
Introduction
At this point, control returns to the top-level goal of embellishing the story. Another schema might be retrieved-for
instance, one about breaking a bottle against a metal object or
about taking a trip on a ship. The eventual product can be represented much more richly than the original material. For purposes of understanding memory, two features should be noted.
First, the final memory structure is more densely interconnected. second, the resulting knowledge structure will overlap
with the schemata that generated it. This means that if only
fragments of the structure remain at recall, the schemata can be
reinvoked to replace the forgotten fragments.
This shows that production systems are capable of the kind of
inferential embellishment associated with a schema system.
The fact that one does not need a spegial architecture to achieve
this embellishment is an argument of a sort against schema systems as a primitive architecture. ACT brings to this problem
one element that is not well handled by schema systems. This is
the spreading activation mechanism, which handles the initial
selection of a schema and the problem of identifying possible
correspondences among the arguments (producton P13).
tO thc ACT
F THERE IS ONE TERM thAt iS MOSICCNTTAI
theory, it is "production." Productionsprovide the connection between declarative knowledge and behavior.
buried
They solve the dilemma of Tolman's rat that was lost,
producThe
1a3).
p.
i' ti..orght and inaction (Guthrie, 1952,
the
tions th-emselvesare not part of the fixed arcfritecture of
that
knowledge
of
kind
system; rather, they are a second
the declarative knowledge contained in longCn,
"L-pf*ents
term memory. The productions constitute the procedural
to do
knowledge of ACT, in"t is, knowledge about how
min{
to
brings
first
things"
do
to
if,ir,gr. fhe phrase "how
of
examples
concrete
most
the
are
these
motor skills because
tyPor
bike
a
riding
like
skills
Motor
procedural knowledge.
certainly inslances of procedural knowledge, but -they
iig
been
"ru
hie not been ih" fo..tt of ttre ACT theory. Its focus has
on cognitive skills such as deglqion ryraking, mathematicalj
r
prgb!g,;n.solving, computer Programming, and language $neration.
'The
acquisition of productions is unlike the acquisilion of i
is notl
facts or cognitive uniti in the declarative comPonent. It
possible
is
it
way
the
in
possible tJ simply add a production
a cognitive unit. Rather, procedural learn- ,
io simply
"r..od"
This 1
ing occuis only in executing a skill; one leams by doing.
more
a
much
is
leaming
procedural
why
is-one of the reasons
gradual Process than declarative learning'
DECTINATIVE
VERSUS PNOCEDURAL
EUNOOTUENTS
OF SKILL
That productions cannot be directly acquired is not a Perverfeasity of t^hecognitive system. Rather, it is an adaptive design
2t5
216
ProceduralLearning
fl)
In ACT, acquisition of a skill first begins with an interprelive )
stage in which skill-independent productions use deilaiative
representations relevant to the skill to guide behavior. Then
skill-specific productions are comPiled. This sequence corre-
1\
d.'
217
r'r'l
218
YfOCeAUfql
TheArchilectureof Cognition
Writtgn
Exercisos
\b
Proof:
SIATEMENTS
l. RO=NY
2.RO=NY
3.ON=ON
4.RO+ON=ONTNY
5. mNY
6',9O+ON=RN
7.ON+NY=OY
8.RN=OY
ut t,I'll tt t6
Figure 6.2
1.
2.
3.
4.
5.
6.
7.
8.
?
'!
?
2
?
?
?
?
Given
Def. of = segmenls
Rcflcxivcprop. ofcqual
Add. prop. ofequal.
Given
Def. of bctween
Def.ofbetween
SubstitutionprinciPle
Given: AKD;AD - AB
Prove:AK + KD - AB
Proof:
KD
STATEMENTS
l.
2.
3.
4.
AKD
AK+KO-AD
AD_A8
AK+KD-A8
REASONS
1. Given
2. Defrnrtron
of between
3. Grven
4. Transilrveproperty of equality
Figure 6.1 The text instructionon two-columnproofsin a geometrytextbook.AKDmeansthe pointsA, K, andD are collinear.(From
et al., 1975.)
lurgenson
rt
220
ProceduralLearning
Doo Lisl of
Pr oblem s
221
Lot er
Problems
Find o
Reoson
Success Foilure
Try o
P o r li c u l o r
Method
S fo t e m e n t
Am ong
List
Stotement
l m p l i e db y
o Rule
Foilure
E s fo b l i s h
Ant ecedent s
of Rufe
11")
Proccdurnl Lenrning
.r/ Crrvrilioil
Table 5.1
P4
P5
P6
P8
P9
P10
P1l
P12
P13
P14
(continuedl
P15
P15
Pl7
Pl
'223
P18
rule in a set
and there is no untried rule in the set which matches the
statement
THEN POP the goal with failure.
IF the goal is to determine if antecedents correspond to
P19
established statements
and there is an unestablished antecedent clause
and the clause matches an established statement
THEN tag the clause as established.
P20
Pzl
224
Procedural Learning
225
Cognitiotr
PrctcadurnlLenruing
Usr or.Auerocy
The preceding exampreilustrated
how generarproblem-solving procedures are used to interpret
de?arative knowredge.
This is.only one type of interpretiie
use of decrarativeknowredge' Anarogy is inother way
of using knowredge interpre_
tively (see also Gentner, in press;
Rumelhart and Norman,
1981).Anderson (19s1d)discusses
in detail the role or
in many geometry probrem-sorving
"r,"ilgy
pr"."aur"s. For instance,
students wilr try to transform a
sor,rtio. worked out for one
problerninto a sorution for another
probrem. A somewhatdif_
ferent possibirity is that students
wiil tr;ns?;rm the sorutionof
a similar nongeometryprobrem into
,;ili"n
for the current
geometryproblem.
To itustrate the yariety of uses
of anarogy,this chapterwi[
consideran exampredrawn from
the language-generation
domain, concerninghow students
can use th-eirknowredgeof En_
glish plus transrition rures to
speaka foreign ranguage.Tabre
6.2 providesthe rerevantproduction
rures,#a rig,, re G. gives
a hierarchicaltrace of the process
of translation. production pl
sets the basic plan for generating
tt e rte""ii sentence:first an
EnglishsentencewiflbJptanned]then
the translationto French
will be planned, and then the French
sentence wiil be gen_
erated' Productions,p2;p-4give three
of the many productions
that would be invorved in
ienerating English sentences.Fig_
ure 6-4(a)illustratelh"y they wourd
ippty"to generatethe sen_
tenceThe smartbankeris robbingthe
piir'*iai*.
Productionsp5-p13 moder some
of the procedurarknowr_
actually involved in the sentence
translation.
ldg"
these productions are not specific pr"r,.t -rrrey Note that
to
.
attempt to
model a person'sgeneral auitity to
appry textbook rures to map
sentencesfrom.the native language
onto a foreign ranguage.
They are the interpretive projucti,ons.
The declarative knowledgethey are using is the ientence
representationand the text_
book translatior,..,1", for French.
Production p5 is a default decomposition
rure. It maps the
goal of transratinga crauseonto
the goarot' transratingthe ere_
ments of the crause.Appried to
FigJre o.+, ii iecomposesthe
sentencegeneration into generatio"n
,"U;u.t, .rurb, and object; applied a secondtimI,
"f
it decomp";;i;
subjectgenera_
tion into the subgoarsof generating
articre, adjective, and
noun.
,hT exampre,it is at the point of
the individuar word trans_
, .lr''
Iation that nond*:', rules upply.
For instar,.u, pZ applies to
the translation of the u-d ."irieves
that re is an appropriate
TRAI{5LATE
TRANSLATE
O
Plo
l|'rsERT
,.8ANOIJER..
IT{SERT
"DRoER'
OO
?9
[{sERl
,Eil
TRAIN
O. EFOR
"GRo8ER.
UU
pt2
Plz
SAY
"MALIN"
Table 6.2
P3
ProceduralLearning
228
229
'l
ProcetlurnlI tur:tirr*
iris trcrirsl.rriur exampie illu$trates some
features of analogy.
23t
Procedural Learning
232
233
the corresponding
parts of anothertriangle, the trianglesare congruent.
^A"N
oF CoMPILATIoN
I z I o n d . z -Z o r e r i g h t o n g l e s J S = K S
Given
Prove: [RSJ = ARSK
problema studentencountersthat
Figure 6.6 The first proof-generation
requiresapplicationof the SASpostulate.
[ong pause]-wait, I see how they work' flong pause] /S is congruent to KS flong pause] and with angle 1 and angle 2 are right
angles that's a little problem. [long pause]OK, what does it saycheck it one more time. "If two sides and the included angle of
one triangle are congruent to the corresPondingparts." So I have
got to find the two sides and the included angle.With the included
angle you get angle 1 and angle 2. I suppose[long pause] they are
both right angles,which means they are congruent to each other.
My first side is/s is to Ks. And the next one is RS to Rs. so these
are the two sides. Yes, I think it is the side-angle-sidepostulate.
After reaching this point the student still had to go through a
long process of writing out the proof, but this is the relevant
portion for assessing what goes into recognizing the application of SAS.
After a series of four more problems (two were solved by SAS
and two by SSS),the student came to the problem illustrated in
'
liigure tr.7.
follows:
'Ihe
"L
t ti cnue('(Ufe
Ol VOgnttOn
Prot:t'Jurul Lanrning
235
PL
G i v e n :1
f:.2.
-1,?
AE DE
BT \,, t\
Prove:
6J
A B K= A DCK
Composition creates "macroproductions" that do the operation of a pair of productions that occurred in sequence. Applied
to the sequence of P1 followed by P2, composition would create
P1&P2
Procedural Learning
of ACT's learning mechanisms, composition and proceduralization do not replace old productions with new ones. Rather
they augment the existing production set with the new productions. In this way the old productions are still around to deal
with situations not captured by the new ones. Also, if the new
productions are faulty they may be weakened (as will be explained in the section on tuning), and then the old productions
can return to full control of behavior.
This example illustrates how knowledge compilation produces the three phenomena noted in the geometry protocols
at the beginning of this section. The composition of multiple
steps into one produces the speed-rrp and leads to unitary
rather than piecemeal application. Verbal rehearsal drops out
because proceduralization eliminates the need to hold longterm memory information in working memory.
In addition to these general phenomena, a fair amount of empirical data can be explained by the mechanisms of knowledge
compilation. Some of this is reviewed in Anderson (1982b),
where it was shown that proceduralization mechanisms can
produce some of the phenomena associated with the automatization of behavior. In particular, proceduralization reduces the
set-size effect in the Sternberg task with repeated practice on a
constant set (Briggs and Blaha, 1969). Repeated practice compiles specific productions to recognize specific set members and
thus eliminates a search of a working-memory set. By a similar
process ACT reduces the effect of display size (Shiffrin and
Schneider,1977). Proceduralization does seem to have many of
the characteristics of automatization as described in Shiffrin
and Schneider (1977)and Shiffrin and Dumais (1981).Composition was also shown to be responsible for a combination of
speedup and rigidity in behavior. In particular, it was shown to
produce the Einstellung phenomenon (Luchins, 1942).That is,
specific productions are composed that embody a particular
problem solution, and these specific productions can prevent
more general productions from detecting a simpler solution.
296
= Mary'snumber
LVtelephone-number
LVdigitl = I
LVdigit2 : 3
a produc-
237
238
vene between the application of productions that belonged together. For instance, suppose the following three productions
happened to apply in sequence:
P5
P5
I'7
total.
cess subgoals in solving the problem. The first sets up the subgoal that is met by the third. The second production is not related to the other two and is merely an inference production
that interprets the sound of the teacher approaching. It just
happens to intervene between the other two. compoiition as
described by Neves and Anderson would produce the following pairs:
P5&P6
rHEN
::.,i:"fi'f:5Tiil:',fi l"H totherunning
totar
and the teacher is coming my way.
P5&P7
rHEN,.n:l'Lt':T:
sum.
Anderson (1982b)proposed that composition should apply to
productions that are logically contiguous in terms of the goal
structure rather than temporally contiguous.t This would avoid
239
oF KNowLEDGE CoupnerroN
240
the.same
5. lt is not the casethat productions always apply in
to be reis
it
likely
less
the
iequence,
the
,"qrr"r..u. The longer
.-oTlargg
learning
for
opportunities
that
r.r,""rir
p""t"d. This
4,
2,3,
positions occurlessfrequ"t*ty. supposeproductions 1',
^5,
occasions
6,7, and g can apply in that order. There are many
when a pair (say l.-ind 2) will be repeated,giving an-opportufewer
nity to form a pair composition; there are considerably
repea-t'
will
4)
8i-1ing
occasionswhen a quadruple (1, 2,3, and
less
an opportunity to cbmposetwo pair compositions; and still
to
an
opportunity
giving
ii"qi6"tly wiit alt eight be repeated,
coverproduction
a
into
compose two quadruple productions
ing ihe whole r"q,r"..". (This idea is developedat great length
bv
-' Newelt and Rosenbloom, 1'981')
the implementation of Neves and
i;;"i-pitia"rol,ization.In
occurred as rapidly as-composiproceduralization
Anderson,
clausesthat matched long-term
condition
all
pass,
tion. In one
all variablesbound in these
and
deleted
were
elements
memory
clauseswere replacedby constantsfrom the long-term memory
to
structures. However, proceduralization should be made
protocols
in
information
occur less rapidly. Verbal rehearsalof
does not disipp"ur after one application. In a1 automatization
task like that in Schneidet ut d Shiffrin (1977),it takes much
practicebeforethe effectsof retrieving memory set disappear' It
,""*, reasonableto propose that proceduralizationonly occurs
when long-term *"*ory knowledge has achievedsome threshof
old of strlngth and hai been used some criterion number
times.
THS ADEPTIVE VATUE OT KNOWLEDGECOUPTT'ETION
is
In contrast to computer compilation, human compilation
told
is
gradual and occurc ut a result o1 practice. The computer
to the
what is program, what is data, lnd how to apply-one
conof
flow
the
other. f[us, it can decide once and for all what
be
to
going
are
which
trol will be. The human does not know
the
using
before
instructions
the procedural comPonents in
knowledge in the inJtructions. Another reason for gradual compilation is to provide some protection against the errors that
conoccur in a compiled proceduie because of the omission of
interpretis
system
ACT-like
an
if
inltance,
For
ditional tests.
series of steps that includes pulling a lever, it can first rei"g
flect"on the lever-iulling step to see if it involves any unwanted
proconsequences in ihe current situation. (An error-checking
so
specific
duction that performs such tests can be made more
action')
of
that it will take precedence over the normal course
Procedural Learning
241
Thearchitecturea
42
ognition
ProcedurnlI-enrning
As another example of generalization,
pair of productions:
GsNnneLrzATroN
'I'he
ability to perform successfully in novel situations is an
important component of human intelligence. For example, productiaity has often been identified as a critical feature of natural
language, referring to the speaker's ability to generate and comprehend utterances never before encountered. Traditional
learning theories have been criticized (McNeill, 1968) because
of their inability to account for this productivity, and one of our
goals in designing the generalization mechanism for ACT was
to avoid this sort of problem.
Good examples of generalization occur in first language acquisition, as will be discussed in the next chapter. Here I will
illustrate generalization by a couple of examples from the game
of bridge. Many bridge players learn the rules of play from a
text or through expert instruction, in which case the rules are
already quite general. However, these examples assume that
the person has compiled rather more special-case production
rules from experience. Suppose the following two rules have
been created:3
P1
IF I am playing no trump
and my dummy has a long suit
THEN try to establishthat suit
and then run that suit.
PZ
IF I am playing spades
and my dummy has a long suit
THEN try to establishthat suit
and then run that suit.
P4
IF I am playing no trump
and I have a king in a suit
and I have a queen in that suit
and my opponent leads a lower card in that suit
THEN I can play the queen.
P5
IF I am playing no trump
and I have a queen in a suit
and I lrave a iack in the suit
and my opponent leads a lower card in that suit
THEN I can play the iack.
Processes.
243
IF I am playing no trump
and I have LVcardl in a suit
and I have LVcard2 in the suit
and my opponent leadsa lower card in that suit
THEN play LVcard2.
Here the variable replacement has been made explicit with the
terms prefixed with LV.
The above generalized production has lost the important
structure of the original production; thus, it makes the point
that one does not want to just replace constants with variables.
One wants to find constraints that are true of the variables in
both of the to-be-generalized productions and add these constraints to the generalized production. Relevant facts are that
the cards are honors and that they follow each other. Adding
these constraints produces the following, more satisfactory generalization:
P7
IF I am playing no trump
and I have LVcardl. in a suit
and LVcardl is an honor
and I have LVcard2in the suit
and LVcard2 is an honor
244
ProceduralLearning
touching
honors in bridge.
General-
245
P1
These two productions could be compiled from study and practice of the side-side-side (SSS) and the side-angle-side (SAS)
postulates. They are general rules without any strategic component. There are no tests for suitability of the SSSor the SAS postulate, so a student would be at a loss in trying to decide which
production to aPPlY.
These productions seem a reasonable model of the comPetence of one high-school student we studied at the point where
he was iust learning about triangle-congruenc Proofs. At-this
point sss and sAgwere the only two_rules he possessed for
proving triangte congruence. Part (a) of Figure 6.8 illustrates a
critical-problJm, and part (&) illustrates the goal tree of his
attempt to solve it (as inferred from his verbal protocol). First
he tried sss, the method that had worked on the previous
problem and that perhaps had a recency advantage. However,
[.e *ut not able to prove RK and R/ congruent. Then he turned
to SAS and focused on proving that the included angles were
congruent. Interestingly, it was only in the context of this goal
lJ u
Pr t t ct dur t l I . ennr f ug
Given:
/l
ond
ore right
ZZ
ongles
JSsKs
Prove: Ans.lgARsx
GOAL.
(b)
A R S J= A R s x
rr( x
RS=
\r'.
JS :
KS
n-x= m
Zl
=12
right ongles
Figure 5.8 A geometryproblem and the structureof itssolution. This leads
ACT to form hotlr cotrditiotrand nction discriminations.
With this information ACT can form two types of discriminations of P1 (SSS).It could formulate a production for the current
situation that recommends the correct SAS action:
P3
The first discrimination, P3, is called an action discrimination because it involves leaming a new action, while the second, P4, is
called a condition discrimination because it restricts the condition
for the old action.{ Because of specificity ordering, the action
discrimination will block misapplication of the overly general
P1. The condition discrimination, P3, is an attempt to reformulate P1 to make it more restrictive. These discriminations do not
replace the original production; rather, they coexist with it.
Principles for forming discriminations. ACT does not always
form both types of discriminations. It can form an action discrimination only when it obtains feedback about the correct action for the situation. If the feedback is only that the old action
is incorrect, it can only form a condition discrimination. However, ACT will only form a condition discrimination if the old
rule (P1 in the above example) has achieved a level of strength
to indicate that it has some history of success:The reason for
this restriction is that if a rule is formulated that is simply
wrong, it can perseverate forever by a Processof endlessly pronew condition discriminations.
posing
The feature selectedfor discrimination is determined by comparing the variable bindings in the successfuland unsuccessful
249
Procedural Learning
248
a successful aPPlication.
There are two basic ways in which ACT can identify that a
production application is in error and determine the correct aciion. One is through extemal feedback, and the other is through
internal computation. In the extemal feedback situation the
SrnrNc-rnwrHc
The generalization and discrimination mechanisms are the
inductive comPonents in the leaming system in that they try to
extract from examples of success and failure the features that
characterize when a particular prcduction rule is applicable.
These two pnocesses produce multiPle variants on the conditions controlling the same action. At any point in time the sys-
250
tem is entertaining as its hypothesis not iust a single production but a set of different productions with different conditrons
to control the action. Having multiple productions, diffenng in
their conditions, for the same action provides advantages in
expressive power. Since the features in a production condition
are treated conjunctively, but separate productions are treated
tlisirrrrctively,one can express lhe condition for an.rcliolr as a
disjunclion of conjunctlons of conditions. Many real-world categories have the need for this rather powerfut expressive.logic.
Also, because of specificity ordering, productions can enter into
even more complex logical relations.
However, because they are inductive processes, generalization and discrirnination will sometimeserr and produce incorrect productions. There are possibilities for overgeneralization
and useless discrimination. The phenomenon of overgeneralization is well documented in the language-acquisition literature occurring in learning both syntactic rules and natural language concepts. The phenomenon of pseudodiscrimination is
less well documented in language because it does not lead to
incorrect behavior, iust unnecessarily restrictive behavior.
tlowever, there are some documented cases in careful analyses
ol langr.ragetlevelopment (Maratsos and Chaikley, 1981). One
reason a strength mechanism is needed is because of these incluctive failures. lt is also the case that the system may simply
create procluctions that are incorrect-either because of misinformati<rnor bccauseof mislakr.sirr its corrrPtrt.rtiolrs.
ACT rrscs
ils strcngth ntcchanism to elirninate wrorrg pnrductions, whatever tlreir source.
Other examples of incorrect production rules can be drawn
from domains that are more clearly problem solving in character. For instance, beginners in bridge often start with a rule that
they shotrld always play the highest card in the suit led-a disastrously overgeneral rule. As anotherexample, high-school geomr'try students often make their proof rules specific to a particular orientation of a problem. In this case the rule is too
restrictive. Again, ACT's strengthening mechanism can serve
to identify the best problem-solving rules.
The strength of a production determines the amount of activation it receives in competition with other productions during
pattern matching. Thus. all other things being equal, the condition of a stronger production will be matched more rapidly and
so repress the matching of a weaker production. Chapter 4
st)cci[icrl tlre .rlgetrra by rvhir'h strength determined the rate of
p.rttcrn nralching. Ilere I will speciiy the algebra by which ex-
ProceduralLearning
251
first
perience determines the strength 9f gry{uclions' When
time it applies'
!r"ut"a, a production has a strength of 1' Each
production apitt ,trength increases by 1. However, when a
is reduced by
plies ani receives negative feedback, its strength
a
i5 p*rcent. Because a multiplicative adiustment Produces
this
gruit", change in strength than.an additive adiustment'
;punishmenf' has nruch more impact
a. reinforcenrent'
.than
'f'he
systen'l
exact strengthening values encoded into the ACT
among the
are somewhat arUitrary. The general relationships
are
values are certainly important, but the exact relationships
not.
probably
'
values
There are further principles to deal with the strength
productions are
of new general productions. Because these new
to be victims of
introduced with low strength, they would seem
strong' and
a vicious cycle: they .unt6t aPPly unless.th.e1,1e
is required
it"y ur" not st.ong utless they have applied'.What
a
to break out of this cycle is a way to strengthen Production
is achieved by
that does not rely on its actual application' This
decreases)
taking all of the strength adjustments (increases or
making these
that ire made to a production that does apply and
to all bf its generalizations.as well' Since a general
uai"t,.""o
possiwill be stren[thenetl every time any one of its
piiJt.ii."
can
Lly n.r-urous specializitions aPPlies, new generalizations
of situations in
uriu", ur,or'rgh strength to extcnd the range
progeneral
*ni.n .lCf ierforms s,rccessfully' Also, because a
production
duction uppiiut more widely, a successful Eeneral
wilt
--li gathei more strength than its specific variants'5
i" p"ttiUle for the'leaming mechanisms of Proceduralizato create
tion, composition, generalization, and discrimination
In
that are identical to ones already in the system'
o-i,rctio.,,
ihi, .uru, rather than creating a coPy, the old Production.restrength increment oil, and so do all of its generaliza."i.r"t
"
tions.
-et.upt for the decrement rule in the case of negative.feed'
the
strengthening mechanism for productions is
U""t, .iCf"
is also
the strengihening iule for declarative memory' It
;;;
part of"; the ACT theory that there is a Power-law decay in propc'wer-law decay
i".,ion strength with disuse, iust as there is a
from
i" in" ttt""gin of declarative units' lt mig,ht be argued
declarative and
these identiial strengthening principles that
are so
procedural memory are rcally the same-'.However' there
leaming princiL""y Jiff"t"nces between tliem, including the
feature does
olestiscussed in this chapter, that one common
much evidencl for a common base' The similarity
loip*uia"
252
of the strengtheningprinciples probablyexistsbecausethe systems are iriplemen-ted in ihe same kind of neural tissue, and
that neural tissue has certain basic retentivenesscharacterisiics.
Comparisonwith Thorndike'slawsof effect-nnd exercise'lt is interesting to contrast the ACT theory with Thorndike's (1932,
1935) iJeas about learning. Thomdike concluded that pure
pr".ii." produced no impiovement in performance and that
punishmlnt also had no effect. Only rewarding the behavior
ied to improvement. ACT has almost the opposite system: exercise leadi to it.."ased production strength when the system
has no reason to iudge its performancea failure. On the other
hand, negative feedblck (not necessarilypunishment) leads to
a weakening of production strength.
for the ineffectivenessof pure practice
Thorndik6't
"nid"nce
consistedof observationsof behavior such as repeated attempts
to draw a line exactly three inches long while blindfolded'
Without feedback there was no improvement in accuracy,no
matter how frequent the repetitions. ACT would produce_the
samei'esult,though it would produce a system that would draw
the line more tupiaty. That is, without feedbackthere would be
no discrimination tb change the production, but the repeated
practices would strengthen it. There are numerous problemsolving and game-playing situations where it can be shown
that rJpeated practice without external feedback leads to more
rapid ierform"nce, as ACT's strengtheli.g mechanism would
prlai.i (seeNewell and Rosenbloom's[1981]discussion of the
card game stair). whether the accuracyof the performance imptottJt depends on whether the systemcan make reliable internal iudgments about correctnessand so inform the discrimination mJchanism. lnternal or external feedback is necessaryfor
discrimination, but not for strengthening. Both discrimination
and strengthening are necessaryfor improving accurag; only
strengthet ing is necessaryfor improving speed'
Aglin, p.tttith*ent does not lead to improved perforlnance
unleis there is feedback about what the correct behavior is in
that situation. This is iust what Thomdike failed to provide-in
his experimentsthat manipulated punishment. Again, ACT',s
strengihening mechanism will not improve accurary in the ab,"r,.J of info-rmation that can be used by the discrimination
mechanisms. Only when there are competing tendencies, one
correct and the other not, can punishment rePressthe incorrect
tendenry. Punishment without the opportunity Jo identify the
correct responsewill lead to slower performancebut will not affect the accuracy.
253
ProceduralLearning
o
!,
g
o
I
o
6
ctl
1.6
Figure 6.9
2.5
4,O
59.8
65'l
luJr
processescan clo a
AC'l"s generalizationan.l ,Jiscrinrination
education,and
searchover thesefeatures(Baptist,high-school
categorymem.
a
identify
that
so on), looking for combinations
ber.ACT,slearningmechanismswereshowntobesuperiorto
featuresand
mechanismsthat ,ior" correlationsbetweensingle
oI
comblnatlons
between
categories(rather than correlations
single
a
gateform
that
mechanisms
features and categories),
describing
gory prototype (iather than multiple productions
that store
mechanisms
or
members)'
different types of category
(rather than generalizations sumilrl indiviiual instances
marizing multiPle instances)'
SummarY
theory'
Skilt development, according to the A91 learning
knowldeclarative
application
starts out as the interpretive
9f
and this proceedge; this is compiled into a proceduralform;
refinement of condural form undergoesa Processof continual
this is a stage
ditions and raw irr.tuut" in speed. In a sense
such analysesof
analysis of human leaming. Much like other
an approximabehavior, this stageanalysisof ACT is offered as
interesting
Any
interactions.
of
system
tion of a rather cJmplex
with
behavior is produced by a set of elementarycomponents'
of a
part
instance,
different componentsai different stages.For
peris
part
task can be performed interpretively while another
formed comPiled.
Theclaimisthattheconfigurationofleamingmeghanisms
from
describedis involved in ttre fult rangeof skill acquisition,
abstraction'
languageacquisition to problem solving to schema
acquisition
Another strongclaim is ihat the mechanismsof skill
basic
the
Ptoblg*-solvfunction within the mold provided by
these
successfully,
function
to
Moreover,
skills.
of
,rut"re
irrj
planning
and
structures
i"ul^ir,g mechanisms require goal
and better
structures. As skills develop they are compiled
sPace
problem
the
of
tuned; consequently, the original search
may drop out as a significant aspect'
APPendix: Production ComPosition
Wearestitlexperimentingwithvariousprinciplesfor.g*.
prinlTl"t.reflect
posing productions. However, the following
theideas*"h*"developedsofarinourGRAPEssimula.
acquisition of
tion (Anderson,Farrell, and sauets, \982) of the
LISP Programming skills'
256
ProceduralLearning
or
L"spe'ins
ili tYi:il if;"r*,?1,::'"fi
LVitem
THEN LVitem is the
"',answer.
BnorHnn PnopucrtoNs
On some occasions two productions aPPly in sequence as
steps to a common goal. An example is the telephone productions (Pl and P2) discussedearlier in the chapter. Two productions of this sort can be composed simply, as were the telephone productions.
Suncoar SrrrtNc aNo Rsrunx
A second case, illustrated by the addition productions (P5
and P7), occurs when one production sets a subgoal, and a second production returns successfrom that subgoal. Then a composite can be formed, for example, P5&P7, that omits the subgoal setting and retum. The composition directly produces the
effect of the subgoal.
An elaboration of this circumstanceoccurs when a series of
subgoals is set that fails, followed by a subgoal that succeeds.
As an example, considerthe following productions that searcha
set to seeif any member begins with a letter (for instance,find a
month that begins with A).
P8
257
P8a
.. rrt|ectut'eorLogtt
lhis is a irrgtrly specitic and complex production. It may well be
too complex to be actually formed, because of working-memory
limitations. It is presented here principally for contrast with the
earlier P8&P9.
Composition does not need to "understand" the semantics of
first and member to behave differently in composing new productions for these two types of problems. lts behavior is totally
cletermined by the semantics eif goal structures and by examining how the variables are instantiated in the case of the successful production. In the member case the instantiation did not
depend on the prior failed productions; in the first case each
failed production was relevant.
Ar.I ILLU5TRATI'N
The ideas considered so far can be combined to produce a
relatively interesting composition involving the production set
in Table 6.1.. Figure 6.10 illustrates the application of that
production set to the justification of Ro : oN. First givens
was attempted, then definitions. In attempling defilritions, the
definition of congruence was selected, and Ro = oN was found
1*olq the established statements. Note that P16 followed by
P19 followed by P20 constitute a successful call to and return
F I N OA
REASON FOR RO=ON
SEE lF RO-ON
A M O N GG I V E N S
SEE lF RO=ON
IMPLIEDBY
DEFINITION
ESTABLISH
RT=0T
2
P8&Pl1&P16&I',19&P20&P17&P1
IF the goal is to find a relation (1)
and there is a set of methods for achieving the relation (2)
and the method involvesestablishingthat a statementis
implied by a rule in a set (3)
if
and the set containsa rule of the f.ormconsequent
antecedent (4)
and the consequent matches the statement (5)
and the antecedent matches an established statement (6)
THEN the method has been successfully tried with the rule. (7)
260
7 lLanguageAcquisition
The instantiated
to indicate
262
G3
G6
c7
Table 7.1
G9
G10
(continued)
PRoDUcrIoNs: AN Exaupru
a
Table 7.1. provides a set of productions that will generate
main
the
generates
G1
Production
small fragment of English,
G2-G6
clause of"a subiect-vJrb-obiect structure, productions
of
attributes
various
on
(by
expanding
generate noun phrases
genG7-G10
productions
and
phiase),
ihe topic of tne noun
erate verb-auxiliary structures. There are many simplifjcations
in this production set, such as the fact that the verb productions
two of a
do not iest for number or that G3 and G4 capture only
indefand
definite
using
for
principles
of
complex constellation
to
the
inessential
are
simplifications
these
But
inite articles.
linreadability,
of
purposes
for
Also,
here.
made
points being
like LVagent have been used.
iuistically mnemonic labels
be deceived by these.labels
not
should
reader
the
i{o*"u.r,
languaSe-sPecificin-the
anything
is
there
that
into thinking
producway the prodluction-system interpreter Processesthese
tions.
The noun-phrase rules are ordered according to specificity'
The first rule that would aPPly if legal is G2, since this contains
G4,
the most specific information in its condition. Then G3 and
before
next
which test for a specific property, would be ordered
264
'ct
o
ttt
v,
{E
.C
F.
O
>\
IrJ
L
(l'
z
ox
a
u)
(u
,tr
F
lrJ
td
q)
lt
q)
q)
(r)
q)
(?
o
lrj
I
z
=
=
o
()
J
lrj
E,
f
cl
(t
L
q)
t
c)
OO
oo
o
F
(nn/H
s
.S
Tq)
> - .E
H ./'-(
FH
z
I
F
J
lrJ
E
\3
s
s
q)
L
\-
:3
(J
i'
L{
GGI
la(J
Ee,
o
z
oo'6
UJ
.SX
Ha,
td
..-'ii o O.
:)
(n
sc
G(d
Soo
(t)
fb
.s5
E.9
,n
x@(
3,/
-4.
\H
Fa
st.
to
+ilstr
)l-
8U
,{o
o-'E'
3-q
\.
\E
-s .9
srl.
Sor
tr
r<
bi
E
a
00
fL
Communrcote
Unknown
I
Ongoi ng
Currenl
\rr,
anriuur"
f
ibul
- Attribute
I "-Expensive
8uy
I Cofegorr
Agenl
I
cofegory
- Lowyer
Recent
a
Comptetedl
Sue
\
Alfributes
Agenl
ouiect
Relof ion
Objecf
Supposed
This pruduciitln woulcl g,eneratethe dccl"rrativcsctttettcc,Ir.lllsform it to question form, then say the transformed sentence'
Thus ACT'; planning and transformation Processes(see Chapter 4) underlie the transformational component of English'{ The
compilation process can apPly to this goal structure too, and it
couli create ihu follo*ing production, which compiles out the
planning:
IF the goal is to question whether the proposition (I-Vrelation
LVagent LVobiect)is true
and LVrelationdescribesongoing LVaction
and LVaction is currently happening
THEN set as subgoals
1. To generateis
2. To describeLVagent
3. To generatethe name for LVrelation
4. To generateing
5. To describeLVobiect.
Predicole
268
'I'he
Architecture at' Cognition
LanguageAcquisition
269
271
LanguageAcquisition
referentialwords before learning of syntax begins. There is evidence that children accomplish their initial lexicalization by
having individual words paired directly with their referents
(MacVihinney, 1980).(Certiinly this was very much the casefor
my son.)Thi; assumptionallowsus to focuson the learningof
ryt t"*; it is not essential to the working of the Program. Again,
the child-language simulation will consider what happens
when the child slarts without knowing the meanings of any
words. In that simulation most of the initial learning is devoted
to inducing word meaning; only later does the program pick up
any intet"itir,g syntax. A function of the one-word stage in
children's lanfuage acquisition may be to pennit this initial
270
lexicalization.
IongtrrvlNc rHE PHnasESrnucruRs
BeforeACT can learn from a paired sentenceand meaning, it
must identify the sentence'shierarchicalphrase structure' For a
number of reasons,inducing the syntax of language becomes
easieronce the phrase structure has been identified:
1. N{uch of syntax is concernedwith placing phrase units
within other Phraseunits.
2. Much of the ireative capacityfor generating natural-language sentencesdepends on recursion through phrasestructure units.
3. Syntacticcontingenciesthat have to be inferred are often
localizedto phraseunits, bounding the size of the induction problem by the size of the phraseunit'
4. Natuiat-lu.rg,tug. transformationsare best characterized
accordingto phraseunits, as the transformationalschool
has argued.
5. Finallp many of the syntactic contingenciesare defined
by phrase-unit arrangements.For instance, the verb is
inflected to reflect the number of the surface-structure
subiect.
The ACT theory predicts that natural language will -have a
phrase structure.-GivenACT's use of hierarchicalgoal struci.rr"r, it has no choice but to organize sentencegenerationinto
phrases.The learning problem for the child is to identify the
hierarchicaloiganization and make the hierarchicalcon"d.rlt,r
trol of his behavior match that. seen this way, languageacquisition is just a particularly clearinstanceof acquiring a societytlefined:;kill.
,o
( o)
sMtLL
BI G
RELATION
:::::'::l
i
BOY ,CATEGORY
f otr*,tute
] .ott**t
AGENT
- n,*.
OBJECT
z
I
cArEcoRY
I
LIVES
BOOK
HOUSE
identifies the location of the terms for which there are meanings
in the surface structure of the sentence.Howevet' a term like
the before big girl rcmains ambiSuous in its placement. It could
either be part of the noun phrase or directly part of the main
clause. Thus, some ambiguity about surfacestructure remains
and will have to be resolved on other bases.In LAS the remaining morphemes were inserted by a set of ad hoc heuristics that
worked in some casesand completely failed in others. One of
the goals in the current enterprise is to come up with a better set
of principles for determining phrase boundaries.
Researchon acquisition of artificial grammars Provides empirical evidence for the use of the graph-deformation condition.
Moeser and Bregman (1972, 1973)have shown that possession
of a semantic referent is critical to induction of q/ntax, as would
be predicted by the graph-deformation condition. Morgan and
Newport (1981)show that the critical feature of the semanticreferent is that it provides evidence for the chunking of elements,
as predicted by the graph-deformation condition. Anderson
(1975)demonstratedthat languageswith referents that systematically violated the graph-deformation condition are at least as
difficult to learn as languageswith no semantic referent.
275
LanguageAcquisition
The graph-deformationcondition is violated by certain sentenceslnit have undergone structure-modifying transformations that creatediscontinuous elements.Examples in English
are;The news surprisedFred that Mary wfls preSnant, and lohn
and Bilt borrowedand returned, respectiaely,the lawnmawer'
Transformationsthat create discontinuous elements are more
common in languagesin which word order is less important
than it is in fnlnsn. However, the graph-deformation condition is a correctiharacterizationof the basic tendency of all languages. The general phenomenon has been frequently com(see
iretited ,rpo^ and hai been called Behaghel's First Law
Clark and Clark,1977).
Sentencesthat violate the graph-deformationcondition cause
two problems. First, such J sentencecannot be hierarchically
org"r,i""d by using the semanticreferent,as was done in Figure
r.I. Fortr.,ut"ly, ncr has other principles for chunking. Second, it is necessaryto learn the transformations underlying
these sentences.Aswill be seen, ACT can learn such transformations but with difficultY.
Other principlesfor phiase-structureiilentification.The existence of nonreierential morphemes meansthat the graph-deformation condition in itself clnnot provide an adequatebasis for
the phrase structuring of sentences.For instance, consider the
placlment of the article a betweengaae andbookin Figure 7 '3(b)'
birr"r, that a has no meaning association,there is no basis for
deciding whether it is part of the verb phrase or the noun
phrase.to assign such morphemes to the appropriate phrases,
lhe simulation will have to use nonsemanticcues.
A number of other cuescan be used for chunking a string into
phrases. First, there may be pauses after certain morpheme-s
and not after others. Normal speechdoes not always have such
pauses in the correct places and sometimes has pauses in the
*ror,g places;however, pausing is a fairly reliable indicant of
phrase structure (Cooper and Paccia-Cooper,1980). Also, as
booput and Paccia-Cooperdiscuss, there is information in the
intonational contour as to the correct segmentation of a string.
It is argued that parent speech to children is much better segmented than aduit tpe"ch to adults (seede Villiers and de Villiers, 1g7S).Furthermore, ACT does have the facility to recover
from the occasionalmissegmentation. Children also occasionally missegment (MacWhinney, 1980)and, of course, they re-
cover eventuallY.
Another basis for segmentation relies on the use of statistics
abottt morpheme-to-rnorphemetransitions. For instance, in
274
fast.
Fon*ratroN oF lNmrer Rurss
276
TheArchitectureof Cognition
Tlrcprrrgrrttn
rlitrlsttttllu)tkn(lwi116
rrrryof tlrasyntnctlc
nrtcoof
lhc Inrrguit[ftr.
ln llrlnulluntlrrrr
tlrc pprgrntnctulalllrernot Hcnrtltlkrrutylhltrgrn't,rnllrrlllrrrr'h
r)ll Rrnn0
tletrrtrlt
urr,lerb,rreJorr
nlnrclrrrt'orttl
nnllence
In ilre rncnrrlrrg
rtfenent.'l'lrerrlr sonte
eviderrcethat childrenuse default word orders;that is, they
LanguageAcquisition
Zn
durrrly
wlthlrrllruuyrrlnrllc
unrslrnlnlr,
lfy atlfrirtlng
llrurnrrdonrgurertrllurr
wernntlctheeentenceu
moreconllrnrunreleru,
plex on the averagewith time. However, this gradual increase
LanguageAcquisition
278
in syntactic complexity is not particularly critical to performanceof the simulation. SeeAnderson (1981c)for a rather similar
record of successon a rather similar subset without gradually
increasing syntactic complexity.
Tnr Fnsr SrNrrxcs
The first target sentencepresented to the program was The
boys are tatt; its meaning structure is illustrated in Figute 7.4.
All the relevant semantic information controlling this syntactic
structure is present in this figure, including the subject's definite and plural featuresand the predicate's present and stative
features. However, it is unreasonable to suPpose that the
learner actually knows that all these features are relevant to the
communication. Supposestatiae and definite ate thought to be
relevant, but not presentor plural. Then we should represent the
hierarchical structure imposed by the to-be-communicated
structure as ((stative(*tall)) (definite (*boyy))).tt In all meaning
structures used in the simulation, the relational descriptors are
embedded around the core relation (for example, tall) of the
communication and similarly around the core nouns. This will
produce a "nouniness" ordering on words in the noun matrix
and a "verbiness" ordering on verbs in the verb matrix. As discussedearlier in the chapter, the sameeffect can be achieved by
a data-specificity principle of production selection. However,
this simulation did not have that conflict-resolution principle
built into it in a way that reflected the specificity of the meaning
structure. Therefore one might think of noun-phrase and verbphrase embedding as implementing this aspectof the specificity principle.
The program had no syntactic rules at this point, so it fell back
on the default rule of generating the terms it knew in the order
they occurred in the semantic referent (generally ordered as
279
the
predicate-obiector relation-agent-obiect).since it knew
sentence
target
The
boy.
tall
generated
iords for tati andboy, it
(are
it received as feedbick uit"t chunking was ((the (boy + s),)
to
led
meaning
the
and
sentence
(tall))). comparing the target
it
formed
phrase
noun
the
From
.r"riior, of generu'tiotrrules.
the rules:
't,. (Definiteobiect)+ "the obiect"'
2. (*BoYterm) + "boY+ s"'
From the main clause, the following rule was learned:
3.(Predicateobject).-+,,obiectpredicate,'ifpredicateascrib
+tall.
PLURAL
PRESENT
ATTRIBUTE
TTRIBUTE
ATTRIBUTE
OEFINITE
Figure 7.4
EDICATE
STATIVE
*TAI-L
6.(Relationagentobiect)+,,aSentrelationobiect,,ifthere|ation
is about *shoot.
7. (*Shoot\+ "shoot * s."
8. (lndefinite obiect)+ "A obiect"'
9. (*LawYer)+ "Iau)Yer"'
which
Rule 1, which was leamed from the first sentence and
strengthwas
so
and
g"."r"i" d, the in this context, was correct
s, was
Ined by one unit. Rule 2, which inflected the noun with
disapand
strength
incorrect and so was weakened to zero
280
LanguageAcquisition
some of the more interesting of the first twenty-five sentences are illustrated in Table 2.2. Note that sentence 9 is the
first one in which-ACT's generation matched the target
sentence. It acquired the rule for the trom the first sentence.From
the sixth sentenceit learned the agent-actionordering for
iump
and the ed inflection. From sentence7 it learned the slnfleltior,
for lautyer. At this point all of the grammaticalrules are specific
to single concepts.
ACT's performanceon the tenth sentencereveals that its performance on the ninth was luck. Here it uses the s inflection
for
lawyer when the noun is singular. It also incorrectly uses the
ed
inflection fordance(acquired?ro* the third sentence).This
leads
to a set of three discriminations. The s inflection ior
lawyer is
strong enough to justify a condition discrimination u,
*"ll u,
an action discrimination:
11. (*Lawyerterm)+ "lautyer* s,'if termis plural.
12. (*Lawyerterm)-+ "luutyer,,if term is singlhr.
It also makesan action discrimination to give the s inflection
for
dance. Contrasting sentence 3 with ,"rrt-"r,." L0, it selects
the
tense feature:
Sentence generated
by ACT
Sentence
number
BoY
relt,
snoor
10
14
Krss rHE
7
9
rIrE
BoY s LAwYER
FARMER A BoY
THE BOY S
ll7
170
IUMP ING
soMB FARMER s Hrr
ED THE
208
THE DocroR
MAY TICKLE ED
358
472
wHo
632
751
wHo
LADY
LADY
BEING BAD
wAs rHE
FUNNY
NO
790
805
HAs A sArLoR
811
wHo
s BEING
MAY BE sHoor
ED BY
815
824
835
838
BErNG
BAD
RUN ED
13.
Feedback fmm
target sentence
281
282
LanguageAcquisition
Acr will also consider elements in the semantic structure before it considers the syntactic structure (goal structure). other
than that, it chooses randomry from u*6.g the possibl" f.rtures. Therefore,the fact that all three discriminations were correct in the above example was somewhata matter of luck.
sentence14 is a good illustration of an erroneous discrimination. Note that ACT had generated the term a where the target
used some.ln responseto this difference ACT formed an acti,on
discrimination. This time, by chance,ACT noted that the successfuluse of a had not been in the context of *kfss.Therefore it
built the following erroneous rule:
30.
31.
32.
33.
34.
35.
283
(Relation
agent)+ "agentrelation"if relationinvolvesa verb.
(rVerb)-+ "verb * s" if presenttense.
(*Verb)+ "verb * ed."
(*Verb)+ "verb * s" if presentand singular.
(*Verb)+ "verb" if presentand plural.
(*Verb)+ "verb + ing" if in the contextof progressive.
In these rules verb refers to a word in the verb class, and *verb
refers to its meaning.
The development of the verb word classdoes not stop after
thefirst twenty-five sentences.As evidencebuilds up about the
syntacticpropertiesof other words, ACT will want to add additional words to the class. A maior issue is when words should
be added to the same class. It is not the case that this occurs
whenever two rules can be merged, as above. The existenceof
overlapping declensionsand overlappingconjugationsin many
languageswould result in disastrousovergeneralizations.ACT
considersthe set of rules that individual words appear in. It will
put two words into a single classwhen
These rules are not a particularly parsimonious (let alone accurate) characterization of the language structure. Because of
specificity ordering, they work uettei than might seem possible. For instance, rules 22 and 23 take pre."derr.e ovir 13,
which does not test for number, so 13 will not cause trouble.
similarly, rules 13,22, and 23 ail take precedenceover rule 21.
Thus rule 21 will not be able to generatean ed inflection when
the tense is present, even though 21 does not explicitly test for
past tense.
ACT's generalization mechanism would like to generalize
rules 20, 24, and 28 together. This would involve replacing
the
concepts *dAnce,*irmp, and *play by a variable. Hbweu""r,
discussedin the previous chapter,theremust be a constrainton
",
284
285
LanguageAcquisition
final
was the noun class, which included all the nouns. The
rules for this class were:
34.
35.
96.
g7.
(*Nounterm)+
(*Noun term) +
(*Noun term) +
(*Noun term) +
is a
To the right of each rule is its eventual strength. Rule 34
accordothers
the
by
blocked
be
residual rule that will always
pluing to specificity. Rules 35 ind 36 are the standard rules for
was
that
excePtion
an
is
37
Rule
ral"and'singulir inflection.
36'
from
formed by in action discrimination
parIn contrast to the noun word class,which had a relatively
aspect
perfect
the
of
simonious structure,the rulesfor inflection
Figure
as has,had, or haae had a relatively complex structure.
Each
rules'
these
of
7.5 is an attempt to illustrate the structure
produca
single
terminus in the discrimination tree represents
the
tion; the number beside it is its strength.The paths through
a
in
choosing
network reflect the operation of conflict resolution
ate
3795
class
and
production. The tesls involving class 4750
the
testsof whether perfect occursin ttre context of a modal. By
SING?
,rr/\t
/\
PAST?
,ol \.
t\
[rs] nao
,rt/
PL],.R?
\*o
\
PRES?
YEs/ Y'./
ACTION?
\"
\
\/
/
[ss]nave [zs]Have [re] Have
learned
Figure 7.5 A discriminationnetworkspecifyingthe aanousrules
tense'
inflecting
Perfect
for
287
LanguageAcquisition
286
Becauseof specificity, the last rule would aPPly only if the first
two did not. Given its learning history, ACT has worked its
way to a more complex characterization. In general, rules
derived from language learning are not likely to be the most
parsimonious, nonredundant characterizationof the language.
fnis is an important way in which the conception of language
based on ACT differs from the conception promoted in linguistics. Parsimony has been applied in linguistics to a too nalrow
range of phenomena, and the result is a seriously misleading
characterizationof language (see Chapter 1).
AceursrrloN oF TnausroRMATroNs
ACT needs to get its syntactic rules clear before it is ready to
deal with question transformations.Actually, the subject question does not involve learning a transformation. Given the
meaning structure ((progressive (hif)) (query X) (definite (boy
Y))), ecr would generateby its subiect-verb-obiectrule_?ruas
ttiiii"g the hoys. The question mark indicates that it failed to
have'alexical item for query. Comparing this with the feedback
who was hitting the boys?,ACT would infer that query is realized as Who. However, the other two constructions are more
difficult: Wasthe girl hitting the boys? and Who was the girl hitting?In both case; the graph-deformationcondition is violated
inLoving the participle was ftomthe verb matrix to the front of
the sentence. Early in the learning history, ACT typically refused to try to leam from a sentence that violated the graphdeformation condition because it could not formulate any
phrase-structure rule to accommodate it and could not see a
way to transform the output of existing phrase-structure rules'
However, as its phrase-structure rules becamemore adequate,
it was eventually able to propose a transformation.
Figure 7.6, showing AcT's first postulation of a question
('RUN}} (DEFINITE('FUNNY('IJAOYX}))I)
((TUTURE
(OUESTION
IITWUNE('RUN)}
X))I)
(DEFINITE(}FUNNY(TLADY
( O E F I N I T E( ' T U N r u V( T L A D YX ) } )
( THE
(FUNNY
( LADY
S)))
(WILL
(RUN))) l
'I'he
Architectureof Cognition
transformation, illustrates the decomposition
of the meaning
structure into components accordingto the rules ACT possessed at this time. Not knowing h;
to rearize the question
element, ACT simply tried to generate
the embedded proposition that was questioned. This-was .eatizea
as The ii"ii{,rau,
utill run. The target sentence came in
chunked as (wil (the
(lady s)D Gun)). Acr noticed that
it courd achieve the
\funny
target sentenceby simpry rearranging
the structure of its generation. This is one of the circumstan&s
for learning a truiJror_
mation. Therefore it formed the foilowing
pranninf rule:
__ _lt the goalis to communicate(questionassertion)
THEN plan the generationof assertion
and movethe first morphemein the reration
matrix to the
front
and thengeneratethe sentence.
As discussed at the beginning o{
this chapter, this planning
rule might eventuaily b1 reprlced
by ,
rure of the
form:
"olpiled
IF the goalis to communicate(question(relation
agent))
and the tenseis future
THEN set as subgoals
l. to generate
wiil
Z. to describeagent
3.
to describe relation.
nguageAcquisition
IF the goalis to communicate(question(fverb) agentobject))
and the tenseis past
and thereis no modal,perfect,or progressive
aspect
THEN plan the generationof ((*verb)agentobiect)
and then insertdid at the front
and replaceverb * eil by verb
and then generatethe sentence.
Compiled, this becomes
IF the goalis to communicate(question((*verb)agentobject))
and the tenseis past
and thereis no modal,perfect,or progressive
THEN set as subgoals
1. to saydid
2. then to describeagent
3. then to say verb
4. then to describeobiect.
Thus our compiled rule for did insertion is no different from any
of the other compiled rules for the question transformation. The
same leaming principles can deal with dfd insertion without
any special-caseheuristics.
SuuuanY oF ACT's PrnronuANcE
Figure 7.7 summarizes ACT's progress.The number of syntactic errors per sentenceare plotted againstthe number of sentences studied. Since the complexity of sentencestends to increase through the leaming history (becauseof a bias in the
random sentencegenerator), there are more opportunities for
errors on later sentences.Thus the rate of learning is actually
faster than the graph implies. The sentencesin Table 7.2 are a
sample biased to represent errors, becausethese are the more
interesting cases.The table lists all the errors made in the last
hundred sentences.As can be seen, the final errors almost exclusively involve irregular verbs in lessfrequent constructions.
With time ACT would learn these too, but it has not yet had
enough opportunities to learn them. Children also have a long
history of residual trouble with irregular constructions.
One type of error produced by the program seemsunlike the
errors made by children. This is illustrated in sentence ll7,
where the program generated the verb matrix has k jumping
rather than has beenjumping. It had learned eom prtrfpus gerr
tences that is plus irg communicates the progrecsive in the oon-;..
290
LanguageAcquisition
291
unbounded from the start. Every sentenceit generatesis a serious attempt to reproduce a full-length English sentenceand, as
we see in sentenie 9, it can hit the target early. In contrast, utterancesgeneratedby children are one word in length fola lglg
while, then two woris for quite a while; only after considerable
do longer utterancesstart to appear. The first mul(My
"rp"riurrce
tiword combinati[ns are clearly not English sentences.
son,SfirSt noted two-word utterancewas "more giSh"-translated "I want more cheese.")
a,
(rl
c'
q,
c
q,
,t
l|,
cl
o
o
|l,
|9
o
J
ro
l@
50
20
LOG(numberof sentences)
asa funcgenerated
Figare7.7 Themeannumberof errorsper sentence
paiings'
tion of the numberof sentence-meaning
t e x t o f p r e s e n t s i n g u l a r . T h e . p r o g r a m h a s n o t y e t l e a r n eitdis
that
of perfect' Thus'
the elementis becoln esbeenin th! context
generalization' Perhaps
a perfectly (pun intended) reasonable
and are unwilling
children remember word-to-word transitions
(such ashas-fs)exceptin the
to venture un ir,frequent
^"tf". transition
This idea will be employed in the seccontext of a stronl
tion on child language'
"example simulation should make a fairly
In summury, tti'
language-acquisition
convincing case for the power of ACT's
memory, it
computer
and
time
system.Exceptfor limitaiions of
isunclearwhetherthereareanyaspects-ofanaturallanguage
thatthisp,og,"^cannotacquire.rrrefutlimportofthisclaimis
of the learning ProS-r1T
u tittt" ur,cer{ain, howevet.'ih" success
dependsonProPerties-ofthe.semanticreferent,anditisdiffisolutions to critical probcult to know wliether I have built in
This question can
referent.
lems in the structure of the semantic
to leam very
referent
semantic
be resolved by using the same
great deal of
a
require
will
proiect
different tanguales,iut that
effort.
Child Language
previous simulation is
The perfornance generated by the
its vocabulary choice'
than
definitely unchildlite in ways other
length of utteranceis
its
Its rateof initial ProBressis rapid, and
rc Architecture
'gnition
Short-termmemorylimitationsand conceptuallimitations.There
are severelimitations on the intake and analysis of a meaningsentencepairing in the absenceof comprehension of the sentence.According to traditional assertionsand the discussionin
Chapter 3, a child can hold five or fewer chunks in memory.
Early in language these chunks may be a morpheme or less;
later they may be as large as a word; still later they will be the
length of comprehended phrases. Moreover, when a child's
short-term memory is overloaded, it will be selectivein what it
retains. Anderson (1975)proposed tl":re
telegraphic-perception
hypothesisthat children tend to retain the words they know and
the semantically important words. The child presented with
The ilog is catching the ball might record ,loggy catch ball. Small
wonder, then, that his utterances have the same telegraphic
quality.
In addition to a limitation on the strings he encodes,there is
a limitation on the meaning structures the child can pair with
these utterances.These are limitations of memory capacity but
also of conceptual development. If the child is not perceiving
possession in the world he can hardly acquire the syntactic
structures that communicate it. Slobin (1973)has argued that
much of the timing of the child's syntactic development is determined by his conceptual development.
Understandingthe natureof language.The child's appreciation
of the function of linguistic communication is probably also developing as he acquires language. In the preceding simulation,
and indeed in the forthcoming one, it is assumed that the
learner is always operating with the motivation to learn the full
communicative structure of language. However, a child may
well start out with more limited goals, and restrictions on his
utterancesmay reflect only his aspiration level.
Impact of the constrainfs.It is important to determine what
happens to the performanceof the ACT language-learningprogram when the child's situation is better approximated. The
characterof its generations should better approximate the character of the child's generations. And it should still eventually
learn a natural languageiust as a child does. This secondpoint
is not trivial. The child initially produces (and presumably
leams) a language which is not the target language.His rules
are not simply a subsetof the adult grammar, they are different
from the adult grammar. Therefore, in acquiring the correct
rules, the child must overcome his initial formulations.
Are initial rules hindrances, neutral, or stepping stonesto acquiring the correct rules? For instance, where the adult will
LanguageAcquisition
293
295
LanguageAcquisitiotr
discussion in the previous section). Consequently, the program's initial utterancesare repeats of sequencesin the adult
model; only later does it come up with novel utterances.The
samewas observedof I.l. His first usesof morewere utterances
lnkemore gish (l want morecheese)and morenana (l want more
banana),and only later did he produce novel generations that
had not been modeled. An example of a novel combination is
more high, by which he asked to be lifted high again. (Please
note tiat hig:hdoes not refer to drug use.) Interestingly, when
he had *ut ted to be lifted high before this point, he simply
would say more or high.
An interesting consequenceof these assumptionscan be illustrated with utteranceslike Russbark or Mommy bark. For a
while, the simulation was not capableof thesetwo-word transitions, nor was f .j. This simply reflected the weaknessof the rule
authorizing this two-word transition. Thus |.J. and the program
would generate only one of these two words. If the simulation
generated this plan and went through it left to right, it would
generateRussor Mommy.However, |.j.'s one-word utterancein
this situation was bark, although he knew the words Mommy
and Russ, and we had modeled the two-word transitions many
times. Therefore, the Program has a "subgoal omitter"; if it
came upon a subgoal that was going to exhaust its resources
and a more important part of the utterance remained, it simply
omitted that subgoal and went on to the more important part.
Similar ideas have been proposed by Bloom (7970).Thus, in our
program, as in f .f ., the first generationswere bark and only later
aialney become agent * bark. The program regarded the relational term as most important, then the obiect, and finally the
agent. This reproducedf.J.'s ordering, although it is rePorted
(MacWhinney, 1980)that many children prefer the agent. In the
caseof l.l.-an only child-it was usually obvious from context
who the agent was, and this perhaps accounts for his preference. MacWhinney (1980)reports that some children will move
the most important term forward out of its correctposition, but
we did not observeany clear casesof this with J.f.
Another interesting consequence of the assumptions concerns the. frequency of generations of the structure oPera'
tor * object versusobject * operator. Braine (1963)has characterized early child sPeech as being formed by these two
structures, and indeed early on the J.J. simulation did create
two word classes,one consisting of first-position oPerators and
one consisting of all second-position oPerators. However, in
the early speechof I.l. and the Program, the operator * obiect
294
quence.
'
Rot, of learninS.In line with the earlier discussionon rate of
leaming, the piogram was limited to learning one rule Per
meaninlg-stringpresentation.Becausethe utterancesit worked
with *Jt" so thott, there was often just a single rule to learn'
However, when there were multiple rules to learn, the program
chose the left-most rule with the imallest span. For instance, if
given the generation Do88y chaseskitty and it did not know the
ireaning for doggy,it would learn this lexical item in preference
to learning theitquence agent-action-object.This ordering was
produced-Vy ^ ptogt"* that scanned the sentencein a left-totignt depth-firtt -"t t er looking for the first to-be-learned
stiucture. It seemedreasonablethat the program should focus
on the pieces before learning how to put the pieces together'
One consequencewas that thl utterancesfirst produced tended
to be short and fragmentary.
to generate
Quality of training itata.when the program failed
50
utterance,
the
target
match
to
failed
any utterance or *[". it
corthe
as
to
feedback
incorrect
percent of the time it received
rect utterance. (When incorrect feedback is given, another random utterance is chosen from the model's repertoire.) Even
when its generations matched the target utterance, it was given
incorrect ieedback 20 percent of the time. The smaller percentage in this second situation reflects the idea that learner and
t*a"t are often not in corresPondencewhen the learner cannot
express himself, but are more often in correspondence when
the learner does exPresshimself correctly. One consequenceof
this is that when the Program does hit upon the correct rule,
that rule is generally reinforced.
The progiu* takes various measures to protect itself from
noisy data. When it detects too great a disparity between its
generationand the model's, it iust refusesto learn. More import-ant,it does not weaken the rules that led to generation of the
mismatched sentences.
Incorrect feedback createsthe greatest problems for acquisition of word-meaning pairings. Almost 50 percent of the program's hypothesesaslo a word's meaning are incorrect. ThereIor", a word-meaning corresPondenceis encoded only after a
consistent reinforcethree-of
certain number-currently
ments. So set, there was one occasionwhen a wrong hypothesis
296
LanguageAcquisition
297
(*upldown
(? X)))+ "upldownII."
4. (Request
5. (Describe(*see(*personX))) + "hi person"('hi Russ").
6. (Describe(*depart(*personX))) - "bye-byeperson"(bye-bye
Daddy"l.
7. (Request/describe(*action ($personX))) * "person action"
('Mommy iump").
8. (Describe (*property (*object X))) - "property object" ("hot
bagel").
9. (Negative conceptualization) -, "no conceptualization" ("no
more Sally").
10. (Describe (bodypart (*person X)) - "person part" ("Mommy
nose").
11. (Describe (associate(tobiect 1 X)(*obiect2 X))) --+"obiect 1
object 2" ("baby book").
12. (Describelrequest(*verb (*agent X)(*objectY))) - "agent
verb object" ("daddy feed Russ").
13. (Describe/request(*direction (*agent X))) -* "agent go direction" ('ll go down"l.
14. (Describe/request(*back(*agent X)(robiect Y))) - "agent put
back obiect" ("Daddy put back blocks").
15. (Describe/request(*off (*agent X)(*obiectY))) + "agent tunr
off obiect" ('Sarah turn off Sally").
(*obiect X))) - "object
16. (Describe (*openl*closel*broken
(' door open").
openlclosehroken"
17. (Question (location (object X))) -- "rohere'sobject" ('where's
balloon").
18. (Describe/request(*up (agent X)(object YXlocation Z)\l +
"agent put obiect up on location" ('Ernie put duck up on bed")
19. Delete agent in all requests.
20. Optional transformations:put back object + put object back
turn off object + turn object off .
21. Optional transformation: replace inanimate object by it.
22. (Proposition location) + proposition fn location ('Mommy
read book in playroom"l.
These constructions are given in the order they were introduced
into the training sequence. As can be seen, there is an increase
in complexity of both meaning and utterance. The first nine
were introduced by the 1.,500th pairing, which I equate with
I.I.'s development at seventeen months. The next eight were introduced by the 3,000th pairing, which I equate with nineteen
months. The last five pairings were introduced by the 4,fi)0th
pairing.
The system started out just learning single word pairings
(construction 1.). This is intended to model a true one-word
stage in ability when the child is given or rlecords only a single
word and selects a single obiect as its referent. Construction 2
298
LanguageAcquisition
was introduced to account for our child's early tendency to present objectsto us with the utterancedis. Construction 3 received
a lot of modeling at the kitchen table: "Does ].]. want more
cheese?"Sometimes we modeled iust the two-word pairing
"More apple juice?" and sometimesfull sentences,but again
the assumption was that ].J. extractedthe critical two words
from the longer utterances.The expression"uP jj" derives from
our idiosyncratic use of "up with J.j." and "down with I.1." Although we did frequently use the more normal order "Does |.j.
want down?" our son latched onto the other order.
Rules 72, 74, and 15 all describe agent-verb-obiect sequences
that we started modeling with fairly high frequenry. Until the
twenty-first month, we heard only verb-obiect utterancesfrom
f .J.There are multiple hypothesesfor the omission of the agent.
He may have been modeling the command form, which omits
the agent. Or he may have been omitting the most obvious term
becauseof capacitylimitations and to avoid blocking the critical
verb.
Initially, the simulation was trained on requestswith sentencesthat contained the agent. This corresPondsto forms we
sometimesused to I.I.-"Russ, drop the ball" or "Will f .f . give
me the ball?" However, I decided at the 3,000th pairing that it
was more approPriate to provide the simulation with the deleted-agentversion of requests (rule 19). Whatever the justification for this midstream change, it demonstrated the learning
program's ability to recover from mistraining. Rules 20 and 21
reflectoptional transformationsand so tap the Program'sability
to deal with constructionsin free variation. In this case,if the
program generatedthe equivalent expression (Dadily put back
ball f.orDadily put ball back), the program was not given feedback that it was incorrect, but was presented with the alternative form as another construction.
Note that function morphemes (a, the -ed, 's, and so on) are
not part of the input to the Program. This reflectsthe belief that
I.l. did not initially record these items (the telegraphic-perception hypothesis). When he began echoing our phrases at the
twenty-first month he often succeededin echoing constructions
and words he was not producing spontaneously. However,
usually he omitted the function words. So for instance,"Russ is
a good dog" would become"Russ good dog," and "Nicky goes
to the docto/' would become (approximately) "Nicka go
docka." The first function word he began to use productively
was if (rule 2L), at about twenty-two months. The articles (4,
the) ancls for pltrralizationand possessiveonly appearedin the
1-500
501-1,000
1,001-1,500
1,501-2,000
2,001-2,500
2,501-3,000
3,001-3,500
3,501-4,000
4,00L-4,500
4,501-5,000
5,001-5,500
5,501-6,000
Vocabulary size
12
31
39
M
54
55
97
110
t22
736
742
1.00
1.00
1.09
1. 13
1.16
1.26
1.24
7. 26
'1,.23
7. 41
1.50
1.58
MOMMY READ
HI MOMMY
WANNA GRAPES
300
BYE DADDY
PLEASE MORE
DowN u
DADDY GO DOWN
UP I'
RUSS WALK
MOMMY BARK
NO MORE
MOMMY CHIN
GOOD FOOD
MOMMY TALK
It cooK
READ BOOK
HOT FIRE
DADDY GO UP
GO WALX
301
tI GO DOWN, DADDY
ERNIE GO BY CAR
WANNA IT
NICE RUSS
LanguageAcquisition
Adequacy
This chapter has reported two simulation efforts to assessthe
adequacy of the ACT learning mechanisms for natural language. The first simulation showed that it was capableof acquiring a reasonable subset of natural language. The second
showed that when constrained by reasonableinformation-processinglimitations, it provides a good approximation to child
languageacquisition. These two demonstrationsare in lieu of
the impractical demonstration, that is, that the program can
learn any natural language in a humanlike manner.
wexler and culicover (1980)have attempted an approximation of this goal for a rather different learning system. They
showed that a set of learning mechanisms could acquire any
transformational grammar that satisfieda good number of constraints. Their formal proof of this did not say anything about
whether the course of languageacquisition would approximate
child language acquisition. Nonetheless, if natural languages
satisfy these constraints, then they at least have a sufficiency
proof for their learning scheme. coming at the problem from
this point of view, they actually are more interested in the constraints than the learning scheme.Although some of the constraints seem purely technical for purposes of achieving a successful proof, others are important claims about the universal
propertiesof natural language.
While the Wexler and Culicover endeavor makes senseif one
acceptstheir framework, it doesnot if one comesat the problem
from the ACT point of view. If ACT is the right kind of model
for languageacquisition, there is no elegant characterizationof
what constitutes a natural language.A natural languageis anything that the learning mechanismscan acquire, given meaning-sentencepairings. This is unlikely to be a member of some
pristine formal class satisfying iust a small number of elegant
formal constraints.t8Thus, while it is certainly worthwhile to
identify the formal properties of languagesthat ACT can and
cannot learn, a proof in the mold of Wexler and Culicover does
not make much sense.
Lrrgcursrrc UwrvpRsAls
Wexler and Culicover propose that their constraints on natural languagearelinguistic universals.Chomskypopularizedthe
ideathat the syntaxof all natrrrallangtrages
satisfies
certaintrni-
303
LanguageAcquisition
302
LanguageAcquisition
1.and 5 but not 2, 6, or 9. (A similar statementof the transformation for 7 and 8 would produce that phenomenon.)
There is nothing language-specificabout this analysis. Such
ambiguities in specificationabout in computer text editors,
where one has the same problem of retrieving multiple elements that meet the samedescription. In various problem-solving situations, one is likely to find that an optimization transformation selects the first object that satisfies its description.
Consider a plan to paint three obiects,with the structureof the
plan specified as:
304
1a.
1b.
2a.
2b.
3a.
3b.
If the bucket can hold enough paint for any two objects,.alikely
transformation of this plan is
La'.
Lb'.
1c'.
2a'.
2b'.
The optimization transformationhas applied to the first two objects in the sequencebut not the last two.
If this discussionis correct,this constraintis not a fundamental limitation on natural language.There is no reason why, if
the learner received appropriate training, he could not learn a
fuller description that could selectthe appropriateobject. This
would require more complex training data and make learning
somewhat more difficult. However, it would just be a matter of
degree. Thus, this analysis predicts that if English had constructions like 2, 6, and 9, it would still be learnable, if more
difficult.
There has been a considerable constellation of research
around the A-over-A constraint and its relatedmanifestations.
It may well be that the exactidea offered here will not extend to
all of this constellation.However, a general point about such
"constraints on transformations"is that given finite bounds on
pattern descriptions, it is always possible to come up with data
structuresstrfficientlycomplexthat the patterndescriptionswill
305
Notes
Nofes to Pages19-78
(1e5s).
309
7. Galambos and Rips (1974)document that a script involves temporal string structures organized hierarchically. That is, they show
for
effects urrJ.i.tud both wilh hierarchical structure (an advantage
(disstructure
information high in the hierarchy) and with temporal
tance effects for order iudgments).
3.
SPreadof Activation
.\.
I
active
to
correlate
intend
not
do
I
illustrates,
example
1. As this
I
'memory with what one is currently consciousof'
2. While the units in Figure 3.1. are all propositions, the same
analysis would apply if the units were strings or images.
its
3. If the ,o,rr." node is an encoding of an external stimulus,
a
of
result
source activation reflectssensory stimulation. If it is the
activation
of
result
a
is
production execution, its source activation
the
p.r*pua into the network from the production. If it is linked to
goal'
the
from
goal, its activation comes
4. This assumesthat there is a set of activation sources that stay
and the
constant for a long period of time relative to the decay rate
idealuseful
be
a
can
this
delay of transmiss'lon.In many situations
ization.
5. In contrast to productions like P1 that exist before the experiment, it is assumedihat productions P2-P6 in Table 3'2 are created
specifically to perform in the experiment and that they are compiled
(iee Chapter 6) during the first few trials'
in
6. This implies thit subiects go through a stage of processing
to asneed
is
no
There
word.
a
with
identified
is
a
nonword
which
able to
sume that subiects are conscious of this at the level of being
piloting
exPerisuch
in
that
report
make a verbal report, but I can
on
ments I am awate of identifying a nonword with a similar word
trials.
the
of
maiority
the
large
7. Ii should be noted that this analysispredicts particularly
case
In
this
name'
a
color
itself
is
word
Stroop interferencewhen the
the
and
task
the
by
primed
being
is
word
the
for
the response code
productions
of
instantiations
partial
new
Also,
named.
Ue
to
color
will comPetewith the correct one'
it is
8. For simplification the first goal clause is ignored since
constant acrossthe Productions'
g. I thank Robert Frederking for pointing out to me the analYsis
of foil iudgments rePorted here.
receiv\
must
*o'us1&-s2
related to
1.1. There are, of course, limits on this prediction,
of
sources
as
limits on the number of items that can be maintained
activation.
be done
12. It is important to note that theseiudgments cannot
plane, moun'
is,
That
category.
a
to
belongs
probe
if
ihe
by deciding
category that
tain, crash, clouds, and wind do not belong to any
would exclrrdetlrc foils.
Notesto Pages123-152
Notes to Pages1'61-194
for X can
13. For instance,no node in the unanalyzedstructure
C.
node
to
be connected
level than
1,4. If the node had any connections to nodes of lower
be i. The
not
would
X
node
from
i - l, its minimum link dirtur,."
*
1'
because
i
than
higher
nodes
to
node cannot have connections
node'
level-i
a
from
removed
link
they are only one
8. For instance, rather than simply executing a pop, other production languagesmight have many special-caseproduction rules
like:
310
4, Control of Cognition
in the preced1. one exampteof the distinction was developed
automatic
and
conscious
between
ing chapter in the distinction
priming
automatic
that
saw
We
task.
the lexical decision
f.i*i"g'in
netin
the
active
especially
made
was
information
when
l..rrr"l
goal
his
set
subfect
the
work, and consciouspriming occurred when
In '.e1lral,
to allow a potentially more Jfficient production to apply.
in ACT to
corresPonds
processing
conscious attention Lr controlled
processing'
of
mode
a
special
favor
setting goals to
capacity of
2. In the last chapter I argued thal the momentary
requires
planning
working memory is high. However, opportunistic
a large sustainedcaPacitY.
by build3. It appears ttrat stuaents get around these difficulties
Thus,
data.
the
of
uses
both
encompass
that
patterns
ing largei
segment,
same
the
rather than seeing two triangle pattems sharing
a single
they see one "ailacent triingles" pattern that- involves
that
information
the
is
pattem
the
with
shared segment. Associated
congruence'
of
rule
reflexive
the
with
used
be
can
the one segment
have special
4. Later in the chapter I discuss how goal structures
mechanismsto deal with data refractoriness'
to immediate
5. As will be discussed later, "goal" in ACT refers
Only one
a
sentence'
generatingas
such
behavior,
goals of current
have
maya
that
deny
to
not
is
This
Person
active.
be
can
Juch goal
better
are
goals
such
many",,high-level"goals like "making money."
current
called po[.i"r becairsethey do not have direct control over
the
and
attraction,
fear,
hunget,
analysis
behavior. Again, under this
goals'
of
setting
the
to
lead
can
that
but
states
like are not foab
the goal
6. It should be noted that this seriality is a property of
element
goal
the
involve
productions only. Productions that do not
particularly
A
parallel.
in
aPply
can
and match disioint sets of data
important class of such productions consists of those for automatic
puitu* matching. Thus multiple letters on a Page can be matched
simultaneouslY.
this
7. Many people,s introspective reports do not agree with
a
(n)
(b)
as
or
either
see
to
fail
people
some
particular sim.,l"tiott.
which
FOUL,
(b)
as
seeing
reported
hive
people
of
A
number
word.
ilis not representeain figure 4.5. Such examplesare only meant to
acwe
Until
obtained.
is
lustrate irow the wordlsuperiority effect
to a human lexicon, the
tually implement a reasonu-bl"
"pptoximation
to
correspond exactly to
urilikely
are
simulation
our
perceptions of
human percePtions.
311
Nofesto Pages795-261
Nofes to Pages266-299
3. Recently, it has been argued in linguistics (for example,Bresnan, 198L)that there are no transformations.While earlier linguistic
theories may have overemphasized their use, it seems improbable
that there are no transformations. For instance, constructions like respectivelyand aice aersaremain for me consciously transformational.
4. Later I will discuss what to do about verb phrases that do not
have auxiliaries for pre-posing.
5. As discussed later in the chapter, content words in ACT are
those that have an associatedconcept.
6. Of course, at certain points in the processing of content
words, implicit, production-based knowledge comes in. Thus it is
very reasonable to propose that our knowledge of what a dog looks
like is embedded in dog-recognition productions. However, in generating the word dog, the speakercalls on a declarative link between
the lexical item and its meaning representation.
7. Some work has been done with Latin and French; see Anderson (1981c).
8. A description of this production system will be provided upon
request.
9. lt would also be possible to form analogous comprehension
rules from this experience.Thus, the acquisition of comprehension
and generation rules need not be independent, even if the rules are.
10. See the discussion in Chapter 6 of analogy as a basis for
using declarative rules.
11. Asterisks are being used to denote the concept corresponding
to content words.
12. Each of these rules is specific to a particular concept. For instance,(1) is specific to definiteand (2) is specific to boy.
13. Unlike the weakening principles set forth in the last chapter,
which involved a multiplicative change, this simulation simply subtracts one unit of strength. The successof the current simulation illustratesthe arbitrariness of the exact strengthening principles.
14. lf MacWhinney is right in this debate and there are distinguishing semantic features, the ACT learning mechanismsdescribed
in Chapter 5 will apply without this modification to incorporate arbitrary classes.
L5. As it tums out, ACT leams the passive as a separatephrasestructure rule rather than as a transformation. Unlike the question
transformations to be discussed, the passive does not violate the
graph-deformationcondition.
16. How one should measure the length of these utterancesis an
interesting question. We are fairly certain that I need, coming back,
and in a minute are fixed phrasesfor J.J.
17. It is of some interestto considerI.I.'s use of the fourteenmorphemes discussed by Brown (1973).He used the ing inflection to describe actions, but he most definitely did not have the present
progressiveatrxiliary (is, nre, and so on). t{e used ln and on quite
312
6. Procedural Learning
1. This assertion does not imply that composition can only aPPly
to productions that involve goali. However, goal structures provide
to'
or," irnportant means for determining which. productions belong
of
products
in
the
oplimizations
certain
producing
for
gether and
composition.
2. Each time an attempt is made to recreatean existing production (for instance, through composition), the existing production
gains an increment of strengthrather than a coPy being created'
3. No claim is made foitne appropriatenessof these rules for actually playing bridge.
4. These terms-areslight misnomers. A condition discrimination
incorrect production. However, an
iust changes the condition of the
in both condition and action'
change
a
makes
discrimination
action
5. Recall, however, from Chapter 4, that an exception to a strong
general production can still take precedencebecauseof a specificity
of
fa.rantag". This requiresthat the exceptionhave a modest amount
tolnexceptions
implications.for
interesting
iottt"
has
This
strengthi
For
flectiJnal rules for words (for example, the plural of man is men)'
the
rule,
regular
the
over
precedence
an exception production to take
to
The
exceptions
frequency.
moderate
least
at
word must occur with
frequent
more
for
occur
to
aPPear
do
rules
inflectional
general
words.
6. Data refractoriness(seeChapter 4) will prevent the selection of
Januaryor SePtembera secondtime.
7. Language Acquisition
1. It is somewhat unfortunate that the term "generation" is used
throughout this chapter, since it has a different technical meaning in
lingu[tics. However, the other obvious choice, "production," is
even more contaminatedin the current context'
2. The earlier set of ideas about declarative knowledge rePresentation and spreading activation defined on that knowledge rePresentation are no doubt.alsoapplicableto languageProcessing,as indeed
the priming studies inaicite (Swinney, 1979).However, these mechanisms are more important to the semanticsof languagethan to the
syntax.
313
314
used articles'
successfully(thank you, sesamestreet\. He occasionally
auxiliaries
verb
all
ing'
for
Except
possessives'
ur,d
plural inflectionr,
missing.
were
inflections
and
is leamable in
18. However, it is not the casethat any language
certain
entertain
wilt
ACT
input,
an
an ACTlike framework. Given
that
of
structure
the
about
others,
not
and
possible hypotheses,
identify
to.
system
inductive
any
for
i*possible
is
logicutty
input. lt
It is
ali possible languai"r, iin"t finite input. ACT is_no exception'
hylanguage
about
preferences
an open question"*n"Ift"t ACT's
potheses"l*"y,correspondtohumanchoice.Thechapterhas
,ho*rr that it does in the caseof the two subsetsof English'
References
316
References
References
3t7
318
References
Btrcc, l. 1977.Recognition memory for sentencemeaning and wording. lournal of VerbalLearningand VerbalBehaaiorI0, 176-t&t.
Bscc, I., and PtIvto, A. 1969. Concretenessand imagery in sentence
memory. lournal of Verbal Learningand VerbalBehaaior8, 821-827.
I., Gttss, A. L., and SrecY, E. W. 1973.Searchingfor asBrnpenl\.t.aN,
pects in real world scenes.lournal of ExperimentalPsychology97,2227.
Bronx, R. A., and Anrn, T. W. 1970.The spacing effect: consolidation
or differential encoding?lournal of VerbalLearningand VerbalBehattior 9,567-572.
Formand Functionin Emerg'
Deuelopment:
BtooM, L. M. 1970.Language
ing Grammars.Cambridge, Mass.: MIT Press.
Bonnow, D. G., and WtNocRAD,T. 1977. An overview of KRL, a
knowledge representationlanguage. CognitiueScience7, 3-46.
BontNc, E. G. 1950.A History of ExperimentalPsychology.New York:
Appleton Century.
BowEn, G. H. 1972. Mental imagery and associative leaming. In
L. Gregg, ed., Cognitionin Learningand Memory. New York: Wiley.
1977.Contactsof cognitive psychology with social leaming theory. Address to the convention of American Association of Behavior
Therapists,Atlanta, Georgia, Dec. 9.
BowEn,G. H., Btlct<, J. 8., and TunNER,T. l.1979. Scriptsin memory
for text. CognitiaePsychology11,177-220.
Bowrn, G.H., KlnttN, M.8., and Duncx, A. L975.Comprehension
and memory for pictures.Memory anil Cognition3,276-220.
BowEn, G. H., and SpnINGsroN,F. 1970.Pausesas recoding points in
letter series.lournal of ExperimentalPsychology83, 421-430.
BnapsHew,G.L., and AxorRsoN, I. R. 1982.Elaborativeencoding as
an explanation of levels of processing.lournal of VerbalLearningand
VerbalBehaaiar21, !65-t74.
Bnlrntr, M. D. S. 1963.The ontogeny of English phrase structure: the
first phase. Language39, l-13a.
BnaNsronD,1.D., BaRcllY, J. R., and FnenKs,I. I. 1972.Sentence
memory: a constructiveversus interpretive approach.CognitiaePsy'
chology3,793-209.
The abstractionof linguistic
BnexsronD,f . D., and FnnNKs,I. I. 1971,.
ideas. CognitiuePsychology2, 33L-350.
of GrammaticalRelaThe Mental Representation
Bnrsnrx, I. W. 198'I-..
tions. Cambridge,Mass.: MIT Press.
Bnrccs, G. E. t974. On the predictor variable for choice reaction time.
Memory and Cognition2,575-580.
Bnrccs, G. E., and BraHe,,l. 1969.Memory retrieval and central comparison times in information-processing.
lournalof ExperimentalPsy'
chology79,395-402.
Bnoepnnrur,D. E. 1975.The magical number seven after fifteen years.
In R. A. Kennedyand A. Wilkes, eds., Studiesin Long-termMemory.
New York: Wiley.
llep;1i1
tltetlrl,;a generativctheBRowN,t. S., and VnruI nttru,K. 198{.|.
References
319
r-6r.
320
References
Coonrn, L. A., and PopcoRNY,P.1976.Mental transformationand visual comparison Processes:effectsof complexity and similarity ' lournal of Experimentit Psychology:Human Perceptiohanil Performance2,
503-514.
I. 1980.syntaxand speech.camCoonnR,w. E., and Paccr^c.-coornn,
bridge, Mass.: Harvard University Press'
a
cnaK, F. I. M., and LocxHARr, R. S. 1972.Levels of processing:
framework for memory research.lournal of verbal Learningand verbal
Behavior11',671-684.
in
cnarx, F. l. M., and werrrus, M. I. 1973.The role of rehearsal
12,
Behauiot
andverbal
Learning
verbal
of
short-term memory. lournal
599-607.
Cnowpnn, R. G. 1976. Principlesof Learningand Memory. Hillsdale,
N.J.: Erlbaum Associates.
CamAcquisitdon.
ps VirrrnRs, I. G., and pn Vtrurns, P. A. Language
Press'
University
Harvard
bridge, Mass.:
DosHEi.,B. A. 1976.The retrieval of sentencesfrom memoryi a speedaccuracystudy. CognitiaePsychology8, 291'310'
and its use in the study of
Dynn, F. il. 1973.Thi Stroop pt
"tto*enon
. Memory and Cognition
andresPonse
perceptual, cognitive,
Processes
1, 106-120.
Ecctns, l. C.1972. Possiblesynaptic mechanismssubserving leaming'
In A. G. Karyman and I. C. Eccles,eds., Brainand Human Behaoior,
New York: SPringer-Verlag.
Euo, R., and ANoensoN,J. R. 1981.Effectsof categorygeneralizations
and instance similarity on schemaabstraction.lournal of ExperimenHuman Learningand Memory 7,397-417'
tal Psychology:
Enrrrlr.I,L. D., and LnssEn,V. R. l979.The HEARSAY-II speechunderstanding system. In W. A. Lea, ed.,Trenilsin SpeechRecognition'Englewood Cliffs, N.l. : Prentice-HallEsi:es,W. K. 1960.Leaming theory and the new "mental chemistry"'
Psychological Reaiew67, 207-223.
Fanir"nN, S. E. 1981.Representingimplicit knowledge.In G. E. Hinton and f . A. Anderson, eds., ParallelModelsof AssociativeMemory'
Hillsdale, N.f. : Erlbaum Associates.
Frxrs, R. E., and N[ssoN, N. Lg7l. sTRIPS: A new approach to the
application of theorem proving to problem solving. Artificial lntelligence2.
FricHrBn, l. 1977.Semanticfacilitation without associationin a lexical
decision task. Meraory and Cognition5, 335-339'
FrscHrEn,I., and Btoolt, P. A. 1979.Automatic and attentional processesin the effectsof sentencecontext on word recognition' lournal
of Verbat Learningand Vetbal Behaaior8, l-20'
FrscHrrn, I., Bnve-Nr,'K.,and QUrnNS,E. n.d. Rapid priming of episodic and semantic word recognition. Unpublished manuscript'
Ftscurtrn, I., and GoopulN, G. O. 1978.Latencyof associativeactivation in memory. lournal of ExperimentalPsychology:Human Percep'
tion nnil I'trformnnce4, 455-470.
References
321
322
References
References
323
324
References
References
325
326
References
References
327
328
References
References
329
Posr, E. L. 1943.Formal reductions of the generalcombinatorial decision problem. Ameican lournal of Mathematics65, 197-268'
Posrr,,riN, L. lgf/.. Short-term memory and incidental leaming. In
A. W. Melton, ed., Categoriesof Human karning. New York: Academic Press.
Potls, G. R. lgTz.Information Processingshategiesused in the encoding of linear orderings.lournal of VerbalLearningandVerbalBehaaior
t1-,727-740.
1975.Bringing order to cognitive stnrctures.In F. Restle,R. M.
shiffrin, N. I. castillan, H. R. Lindman, and D. B. Pisoni,eds.,cognitiae Theory, vol. 1. Hillsdale, N.|.: Erlbaum Associates'
Pyr,ysnyr{, Z.W. L973.What the mind's eye tells the mind's brain: a
critique of mental imagery. PsychologicalBulletin80,l-24.
--lg79.The
rate of "mental rotation" of images:a test of a holistic
analoguehypothesis.Memory and Cognition7,19-28'
Commueurrlrei, M. if. 1969.The teachablelanguagecomPrehender.
459-476.
12,
ACM
the
of
nications
L. W. 1977.RecogRenrNowrTz,I. C., MenoLER,G., and B43SALOW,
failure.
lournal of VerbalLearnnition failure: anothercaseof retrieval
639-63.
16,
Behaaior
ing and Verbal
Rerirrrr, R. 1978.A theory of memory retrieval. PsychologicalReview
85,59-1,08.
Rlrclrrr, R., and McKooN, G. 1981. Does activation really spread?
Reaiew88, 454-457Psychological
p.-R.,
L. D., FnNNnt, R. D', and Nrrr'Y, R' B' 1973'
Enulw,
Rnpiv,
The HEARSAY-I speechunderstanding system: An example of the
recognitior, pto."tJ. Proceedingsof the International loint Conference
on Artificial lntelligence,Stanford: 185-194'
Rnprn, f . fr{. lg7g.ihe role of elaborations in memory for prose' Cognitiue PsychologY71, 221-234.
fig0. thl role of elaboration in the comprehension and reten50,5-53'
tion of prose. Reuiewof EilucationalResearch
yigZ. plausibility judgment versus fact retrieval: alternative
strategies for sentence verification. PsychologicalReaiew 89, 250280.
330
References
References
331
332
References
References
333
334
References
Index of Authors
A b e l s o nR
, . P.,2,36,37,38,79,
209
A d a m s ,N . , 7 7 7
Allen, T. W., 172
Allison, T., L91
Anderson,I. A., 15,27, 35, 36,
179
A n d e r s o nI,. R . , 6 , 1 3 , 1 5 , 7 7 , 1 8 ,
2 3 , 2 7 , 3 8 , 3 9 ,4 5 , 4 6 ,4 9 ,5 2 ,
59, 70, 77, 72, 74, 76, 78, 79,
80, 86, 87, 96, 107, 111,1L2,
716, 120, 128, 135,
773, 1"1.4,
143,145,7&, 175,777-180,
185,190,191-192,194,
1.81.,
195, 797, 798-199, 200, 203,
205, 206, 207, 224, 226, 235,
237, 238-239, 24j,, 243, 244,
249, 253, 254, 255, 268, 269,
270,272, 273, 275,278,284,
292
Anderson,R. C., 72,78
Anderson, R. E., 51
Angiolillio-Bent,I. S., 52-53
Atkinson, R. C., 14,189
Atwood, D. W., 52
A t w o o d ,M . E . , 7
Baddeley,A. D., 23-25,29,119,
120
Baillet, S. D., 187
Barclay,I. R., 199
B arsa lou.L. W. , 196
336
lndex of Authors
Bryant,K,, 107
Buchanan,
M., 25,29,ll9
Buschke,
H.,176
.1,.
:w
'iF
rS
.#
#
r(!
.,t:1,
1rS,
*ii
',i,!
.)irr
i: i:
.,f;:
Fahlman,S. E., 88
Farrell, R., 255
Feigenbaum,E. A.,143
Fennel,R. D., 128
Fikes,R. E., 2
Fischler,1., 96,.'102,
704,105,'l0Z
Index of Authors
337
338
lndex of Authors
3 6 ,3 7 , 3 9 , 7 1 , 7 5 , 7 9 ,g g , l 2 g ,
737-139,UI4, 145, 147,l4g,
152-154,226,241
Rychener,M, D., 73,230
Sacerdoti,E. D., 2, 130,164
S a c h sJ, ,,7 0
S a m u e l s5,. 1 . , 1 2 7
S a n ta l,. L .,6 A-67, 53
Sauers,R., 255
Schaffer,M. M., 38, 133,254
Schallert,D. L., 72,78
Sc h a n k R
, . C ., 2 ,5,36,38, 39,
77,75,79,2O9
Schneider,W., 34, 237, 240
Schom,D., 777
Schustack,
M. W., 96, \05,205,
269
Schvaneveldt,R. W., 87, 96, 103,
104
Sejnowski,T.I.,92
Selfridge,M.,270,272
Seymour,P. H. K., 706
Shepard,R. N., &-66,85
Sh i ffri n ,R .M ., 1 4,34,90, 118,
119,727,237,240
Shortliffe,E. H., 6
Shwartz,S. P.,69
Simon, D. P., 127,247
S i m o n ,H . A . , 2 , 6 , 2 5 , 6 2 , 7 2 7 ,
l2g, ln, 143, 157, t62, 241
S l o b i n ,D . 1 . , 2 9 2
Smi th ,E .8 ., 1 7 7
Snyder, C. R., 34, 96, 127, 128
Springston,F., 49
Stephensor,G. M., 199
Stemberg,5., 53, 103, 179
Stevens,A.,62
Stirling, N., 106
Sulin, R. A., 199,206
Swinney, D. A., 26, 104
Teitleman, W., 5
Thompson, C. P., 196
Thomson, D. M., 792,194,795
Thomson,N., 25, 29,l1rg
Thomdike, E. L., 252
Thonrdyke, P. W. , 1A7,199
lndex of Authors
Townsend,J.T., ll9, 120
Trabasso,T., 54
Tulving, 8., 192, 194, 795, 196
Tumer, A, A,, 7
T u m e r ,T . 1 . , 2 0 7
Tweney, R. D., 268
T z e n g ,O . I . L . , 1 7 2
339
Wat er m an,D. A. , 6, 16
Watkins, M. I., 173, 792, 194,195
Wexler,K., 40, 270,301,
Wheeler, D. D. , 33, 152
White, W. A., 52
Wickelgren,W. A., 49,105,173
Winograd, T., 2, 36
Winston, P. H., 248
Wiseman,5., 196
Wolford, G., 189, 191
Wollen, K,, 79'L
Woocher,F. D., 54,63
Woods, W. A., 127
Woodward, A. E., 172
W a n n e r ,H . 8 . , 7 0
Warren,R. E., 96,104,705
Yegnanarayana,
8.,48
Young, R. K., 56
Generallndex
ficity, 135-136, 151.,263-2&,
278.Seealso Pattem matching
Connectednessiudgments, 7374,176-118;judgments of associative relatedness, t7 6-182.
SeealsoFact retrieval Paradigm; Intersection detection
General Index
Associative relatedness,seeconnectednessiudgments
Attention, 156-157.Seealso Automatization; Goal structures;
Priming paradigm
Automati city, see Automatization; Priming paradigm
Automatization, 34, 127,237, 267
Bottom-up processing,seeTopdown processingversus bottom-up processing
Cognitive units, 23, 76-78, 9192, 17l-776. Seealso Knowledge representation
Compilation, seeKnowledge representation
Composition, seeKnowledge
compilation
Computer programming, 4-5.
Seealso LISP programming
Concept formation, seePrototype
formation
Conflict resolution, 21, 125-170;
data refractorinese,134-135,
151-752; degree of match,
132-133, 151;goal dominance,
736-737, 156-157; in neoclassical system,15-15; production strength, 133-134;specir.l(l
341
ory
Faculty theory, see unitary versus faculty theory
Fan effects,110-111
,112, 174,
115-115,118-120,183-187,
199; Stemberg paradigm, 118120.Seealso Fact retrieval
paradigm
Feedbackin learning, 248-249,
252,270-27t, 295-299
Foil reiection, seeFact retrieval
paradigm
Frameworks, 12-13
Generalization,seeProduction
tuning
Geometry problem solving, 128,
734- 135, 140- 143, 278-225,
232-235, 245-247, 258-250
Goal stmctures, 33-34, 126-137'
156-769; flow of control, 157151; goal as source of activation, 89-90, ll9, 156-157; goal
dominance,135-137;intentions, 168-169; hierarchical,
157-168; logical stmcture of
goal trees, 161-162; memory
f.ot, 161-162;plannin5, 164768, 226-230, 260; role in
learning, 33, 238-239, 249,
255-2ffi. See alsoLanguage
generation
Graph deformation PrinciPle,
272-274
HEARSAY, 128-131. Seealso
Opportunistic Planning
342
General lndex
General lndex
classicalsystem, "1,4-15.
see
also Memory
i,$
:c
;:
t?.'i
* '':l
:t -;..
{,,1
343
a
3M
GeneralIndex
GeneralIndex
781-182, 793-194; use in thematic judgments, 177-181
Syntax, seeGoal structures; Language; Language acquisition;
Language generation
Tangled hierarchies, 25, 78-79
Temporal strings: construction,
55-56; encoding process,4851; in language, 272; pattem
matching, 52-55; purpose,56;
similarity to lists, 53, 55-56;
storage, 52; to implement goal
structures, 159; versus image
representations, 51, 54, 60-61;
versus propositional rePresentations, 49*51. See alsoRePresentational types
Thematiciudgments, seeConnectednessiudgments
Theories, 12-13
Top-down processing versus
bottom-up processing, 726llt. See also Data'driven versus goal-directed Processing;
Goal structures
Tower of Hanoi problem, 157162
Transformations, 165-167, 256267; acquisition, 287-289.See
also Goal structures
Tri-code theory, 45-48. Seealso
345