The Perceptron: A Probabilistic Model For Information Storage and Organization in The Brain

Psychological Review
Vol. 65, No. 6, 19S8
THE PERCEPTRON: A PROBABILISTIC MODEL FOR

INFORMATION STORAGE AND ORGANIZATION
IN THE BRAIN 1
F. ROSENBLATT
Cornell Aeronautical Laboratory
If we are eventually to understand and the stored pattern. According to

the capability of higher organisms for this hypothesis, if one understood the
perceptual recognition, generalization, code or "wiring diagram" of the nerv-
recall, and thinking, we must first ous system, one should, in principle,
have answers to three fundamental be able to discover exactly what an
questions: organism remembers by reconstruct-
ing the original sensory patterns from
1. How is information about the the "memory traces" which they have
physical world sensed, or detected, by left, much as we might develop a
the biological system? photographic negative, or translate
2. In what form is information the pattern of electrical charges in the
stored, or remembered? "memory" of a digital computer.
3. How does information contained This hypothesis is appealing in its
in storage, or in memory, influence simplicity and ready intelligibility,
recognition and behavior? and a large family of theoretical brain
models has been developed around the
The first of these questions is in the
province of sensory physiology, and is idea of a coded, representational mem-
ory (2, 3, 9, 14). The alternative ap-
the only one for which appreciable
understanding has been achieved. proach, which stems from the tradi-
tion of British empiricism, hazards the
This article will be concerned pri-
guess that the images of stimuli may
marily with the second and third
never really be recorded at all, and
questions, which are still subject to a
that the central nervous system
vast amount of speculation, and where
simply acts as an intricate switching
the few relevant facts currently sup-
network, where retention takes the
plied by neurophysiology have not yet form of new connections, or pathways,
been integrated into an acceptable between centers of activity. In many
theory. of the more recent developments of
With regard to the second question, this position (Hebb's "cell assembly,"
two alternative positions have been and Hull's "cortical anticipatory goal
maintained. The first suggests that response," for example) the "re-
storage of sensory information is in sponses" which are associated to
the form of coded representations or stimuli may be entirely contained
images, with some sort of one-to-one within the CNS itself. In this case
mapping between the sensory stimulus the response represents an "idea"
1
The development of this theory has been
rather than an action. The impor-
carried out at the Cornell Aeronautical Lab- tant feature of this approach is that
oratory, Inc., under the sponsorship of the there is never any simple mapping of
Office of Naval Research, Contract Nonr- the stimulus into memory, according
2381(00). This article is primarily'an adap-
tation of material reported in Ref. IS, which to some code which would permit its
constitutes the first full report on the program. later reconstruction. Whatever in-
386
THE PERCEPTRON 387
formation is retained must somehow tions. The theory has been developed
be stored as a preference for a par- for a hypothetical nervous system, or
ticular response; i.e., the information machine, called a perceptron. The
is contained in connections or associa- perceptron is designed to illustrate
tions rather than topographic repre- some of the fundamental properties of
sentations. (The term response, for intelligent systems in general, without
the remainder of this presentation, becoming too deeply enmeshed in the
should be understood to mean any special, and frequently unknown, con-
distinguishable state of the organism, ditions which hold for particular bio-
which may or may not involve ex- logical organisms. The analogy be-
ternally detectable muscular activity. tween the perceptron and biological
The activation of some nucleus of cells systems should be readily apparent to
in the central nervous system, for the reader.
example, can constitute a response, During the last few decades, the
according to this definition.) development of symbolic logic, digital
Corresponding to these two posi- computers, and switching theory has
tions on the method of information impressed many theorists with the
retention, there exist two hypotheses functional similarity between a neuron
with regard to the third question, the and the simple on-off units of which
manner in which stored information computers are constructed, and has
exerts its influence on current activity. provided the analytical methods nec-
The "coded memory theorists" are essary for representing highly complex
forced to conclude that recognition of logical functions in terms of such
any stimulus involves the matching elements. The result has been a
or systematic comparison of the con- profusion of brain models which
tents of storage with incoming sen- amount simply to logical contrivances
sory patterns, in order to determine for performing particular algorithms
whether the current stimulus has been (representing "recall," stimulus com-
seen before, and to determine the ap- parison, transformation, and various
propriate response from the organism. kinds of analysis) in response to
The theorists in the empiricist tradi- sequences of stimuli—e.g., Rashevsky
tion, on the other hand, have essen- (14), McCulloch (10), McCulloch &
tially combined the answer to the Pitts (11), Culbertson (2), Kleene
third question with their answer to the (8), and Minsky (13). A relatively
second: since the stored information small number of theorists, like Ashby
takes the form of new connections, or (1) and von Neumann (17, 18), have
transmission channels in the nervous been concerned with the problems of
system (or the creation of conditions how an imperfect neural network,
which are functionally equivalent to containing many random connections,
new connections), it follows that the can be made to perform reliably those
new stimuli will make use of these new functions which might be represented
pathways which have been created, by idealized wiring diagrams. Un-
automatically activating the appro- fortunately, the language of symbolic
priate response without requiring any logic and Boolean algebra is less well
separate process for their recognition suited for such investigations. The
or identification. need for a suitable language for the
The theory to be presented here mathematical analysis of events in
takes the empiricist, or "connectionist"' systems where only the gross organ-
position with regard to these ques- ization can be characterized, and the
388 F. ROSENBLATT
precise structure is unknown, has led ogous machines, have generally been
the author to formulate the current less exact in their formulations and far
model in terms of probability theory from rigorous in their analysis, so that
rather than symbolic logic. it is frequently hard to assess whether
The theorists referred to above were or not the systems that they describe
chiefly concerned with the question of could actually work in a realistic nerv-
how such functions as perception and ous system, and what the necessary
recall might be achieved by a deter- and sufficient conditions might be.
ministic physical system of any sort, Here again, the lack of an analytic
rather than how this is actually done language comparable in proficiency to
by the brain. The models which have the Boolean algebra of the network
been produced all fail in some im- analysts has been one of the main
portant respects (absence of equi- obstacles. The contributions of this
potentiality, lack of neuroeconomy, group should perhaps be considered as
excessive specificity of connections suggestions of what to look for and
and synchronization requirements, investigate, rather than as finished
unrealistic specificity of stimuli suffi- theoretical systems in their own right.
cient for cell firing;, postulation of Seen from this viewpoint, the most
variables or functional features with suggestive work, from the standpoint
no known neurological correlates, etc.) of the following theory, is that of
to correspond to a biological system. Hebb and Hayek.
The proponents of this line of ap- The position, elaborated by Hebb
proach have maintained that, once it (7), Hayek (6), Uttley (16), and
has been shown how a physical Ashby (1), in particular, upon which
system of any variety might be made the theory of the perceptron is based,
to perceive and recognize stimuli, or can be summarized by the following
perform other brainlike functions, it assumptions:
would require only a refinement or
1. The physical connections of the
modification of existing principles to
nervous system which are involved in
understand the working of a more learning and recognition are not iden-
realistic nervous system, and to elim-
tical from one organism to another.
inate the shortcomings mentioned
At birth, the construction of the most
above. The writer takes the position,
important networks is largely random,
on the other hand, that these short- subject to a minimum number of
comings are such that a mere refine-
genetic constraints.
ment or improvement of the principles 2. The original system of connected
already suggested can never account cells is capable of a certain amount of
for biological intelligence; a difference
plasticity; after a period of neural
in principle is clearly indicated. The activity, the probability that a stim-
theory of statistical separability (Cf. ulus applied to one set of cells will
15), which is to be summarized here, cause a response in some other set is
appears to offer a solution in principle
likely to change, due to some rela-
to all of these difficulties. tively long-lasting changes in the
Those theorists—Hebb (7), Milner neurons themselves.
(12), Eccles (4), Hayek (6)—who 3. Through exposure to a large
have been more directly concerned sample of stimuli, those which are
with the biological nervous system most "similar" (in some sense which
and its activity in a natural environ- must be defined in terms of the
ment, rather than with formally'anal- particular physical system) will tend
THE PEECEPTRON 389
to form pathways to the same sets of

responding cells. Those which are
markedly "dissimilar" will tend to
develop connections to different sets of
responding cells.
4. The application of positive and/ FIG. 1. Organization of a perceptron.
or negative reinforcement (or stimuli
which serve this function) may facil-
The cells in the projection area each
itate or hinder whatever formation of
receive a number of connections from
connections is currently in progress.
the sensory points. The set of S-
5. Similarity, in such a system, is
points transmitting impulses to a par-
represented at some level of the nerv-
ticular A-unit will be called the origin
ous system by a tendency of similar
points of that A-unit. These origin
stimuli to activate the same sets of
points may be either excitatory or in-
cells. Similarity is not a necessary
hibitory in their effect on the A-unit.
attribute of particular formal or geo-
If the algebraic sum of excitatory and
metrical classes of stimuli, but de-
inhibitory impulse intensities is equal
pends on the physical organization of
to or greater than the threshold (6) of
the perceiving system, an organiza-
the A-unit, then the A-unit fires, again
tion which evolves through interaction
on an all-or-nothing basis (or, in some
with a given environment. The
models, which will not be considered
structure of the system, as well as the
here, with a frequency which depends
ecology of the stimulus-environment,
on the net value of the impulses
will affect, and will largely determine,
received). The origin points of the
the classes of "things" into which the
A-units in the projection area tend to
perceptual world is divided.
be clustered or focalized, about some
central point, corresponding to each
THE ORGANIZATION OF A PERCEPTRON A-unit. The number of origin points
The organization of a typical photo- falls off exponentially as the retinal
perceptron (a perceptron responding distance from the central point for
to optical patterns as stimuli) is shown the A-unit in question increases.
in Fig. 1. The rules of its organiza- (Such a distribution seems to be sup-
tion are as follows: ported by physiological evidence, and
serves an important functional pur-
1. Stimuli impinge on a retina of pose in contour detection.)
sensory units (S-points), which are 3. Between the projection area and
assumed to respond on an all-or- the association area (An), connections
nothing basis, in some models, or with are assumed to be random. That is,
a pulse amplitude or frequency pro- each A-unit in the An set receives
portional to the stimulus intensity, in some number of fibers from origin
other models. In the models con- points in the AI set, but these origin
sidered here, an all-or-nothing re- points are scattered at random
sponse will be assumed. throughout the projection area.
2. Impulses are transmitted to a set Apart from their connection distri-
of association cells (A-units) in a bution, the An units are identical
"projection area" (Ai). This pro- with the AI units, and respond under
jection area may be omitted in some similar conditions.
models, where the retina is connected 4. The "responses," Ri, R^, . . . ,
directly to the association area (An). R n are cells (or sets of cells) which
390 F. ROSENBLATT
respond in much the same fashion as sets, making mutual excitation be-
the A-units. Each response has a tween the R-units and the A-units of
typically large number of origin points the appropriate source-set highly
located at random in the An set. The probable. The alternative rule (6)
set of A-units transmitting impulses leads to a more readily analyzed sys-
to a particular response will be called tem, however, and will therefore be
the source-set for that response. assumed for most of the systems to be
(The source-set of a response is iden- evaluated here.
tical to its set of origin points in the Figure 2 shows the organization of
A-system.) The arrows in Fig. 1 a simplified perceptron, which affords
indicate the direction of transmission a convenient entry into the theory of
through the network. Note that up statistical separability. After the
to An all connections are forward, and theory has been developed for this
there is no feedback. When we come simplified model, we will be in a better
to the last set of connections, between position to discuss the advantages of
An and the R-units, connections are the system in Fig. 1. The feedback
established in both directions. The connections shown in Fig. 2 are in-
rule governing feedback connections, hibitory, and go to the complement
in most models of the perceptron, can of the source-set for the response from
be either of the following alternatives: which they originate; consequently,
this system is organized according to
(a) Each response has excitatory Rule b, above. The system shown
feedback connections to the cells in its here has only three stages, the first
own source-set, or association stage having been elim-
(&) Each response has inhibitory inated. Each A-unit has a set of
feedback connections to the comple- randomly located origin points in the
ment of its own source-set (i.e., it tends retina. Such a system will form simi-
to prohibit activity in any association larity concepts on the basis of coin-
cells which do not transmit to it). cident areas of stimuli, rather than by
The first of these rules seems more the similarity of contours or outlines.
plausible anatomically, since the R- While such a system is at a disadvan-
units might be located in the same tage in many discrimination experi-
cortical area as their respective source- ments, its capability is still quite
impressive, as will be demonstrated
presently. The system shown in Fig.
2 has only two responses, but there is
flHOKfflt LIHCt >W
IHHtllTOMV
COHKECTIOHI
clearly no limit on the number that
might be included.
The responses in a system organized
in this fashion are mutually exclusive.
FIG. 2A. Schematic representation of
connections in a simple perceptron. If RI occurs, it will tend to inhibit Rs,
and will also inhibit the source-set for
R2. Likewise, if Ra should occur, it
OWHWTMV OMNICTIOH will tend to inhibit RI. If the total
• EMITITMV CONNtcriON
impulse received from all the A-units
in one source-set is stronger or more
FIG. 2B. Venn diagram of the same per-
frequent than the impulse received
ceptronf (shading shows active sets for RI by the alternative (antagonistic) re-
response). sponse, then the first response will
THE PERCEPTRON 391
tend to gain an advantage over the more potent, or more likely to arrive
other, and will be the one which at their endbulbs than impulses from
occurs. If such a system is to be an A-unit with a lower value. The
capable of learning, then it must be value of an A-unit is considered to be
possible to modify the A-units or their a fairly stable characteristic, probably
connections in such a way that stimuli depending on the metabolic condition
of one class will tend to evoke a of the cell and the cell membrane, but
stronger impulse in the Ri source-set it is not absolutely constant. It is
than in the Ra source-set, while assumed that, in general, periods of
stimuli of another (dissimilar) class activity tend to increase a cell's value,
will tend to evoke a^stronger impulse while the value may decay (in some
in the Ra source-set than in the Ri models) with inactivity. The most
source-set. interesting models are those in which
It will be assumed that the impulses cells are assumed to compete for met-
delivered by each A-unit can be abolic materials, the more active cells
characterized by a value, V, which gaining at the expense of the less
may be an amplitude, frequency, active cells. In such a system, if
latency, or probability of completing there is no activity, all cells will tend
transmission. If an A-unit has a high to remain in a relatively constant
value, then all of its output impulses condition, and (regardless of activity)
are considered to be more effective, the net value of the system, taken in
TABLE 1
COMPARISON or LOGICAL CHARACTERISTICS OF a, /3, AND 7 SYSTEMS
a-System /3-System 7-System

(Uncompensated (Constant Feed (Parasitic Gain
Gain System) System) System)
Total value-gain of source set per rein-

forcement Nar K 0
AV for A-units active for 1 unit of time +1 K/Nar +1
AV for inactive A-units outside of domi-

nant set 0 K/NA, 0
-N*r
A V for inactive A-units of dominant set 0 0
NAr-Nar
Mean value of A-system Increases with number Increases with Constant

of reinforcements time
Difference between mean values of Proportional to differ- 0 0

source-sets ences of reinforce-
ment frequency
(»,„—»,„)
Note: In the /3 and 7 systems, the total value-change for any A-unit will be the sum of the AV's
for all source-sets of which it is a member.
N»r = Number of active units in source-set
NAT — Total number of units in source-set
«,„ = Number of stimuli associated to response TJ
K = Arbitrary constant
392 F. ROSENBLATT
its entirety, will remain constant at plement of its own source-set, and
all times. Three types of systems, thus preventing the occurrence of any
which differ in their value dynamics, alternative response. The response
have been investigated quantitatively. which happens to become dominant is
Their principal logical features are initially random, but if the A-units are
compared in Table 1. In the alpha reinforced (i.e., if the active units are
system, an active cell simply gains an allowed to gain in value), then when
increment of value for every impulse, the same stimulus is presented again
and holds this gain indefinitely. In at a later time, the same response will
the beta system, each source-set is have a stronger tendency to recur, and
allowed a certain constant rate of gain, learning can be said to have taken
the increments being apportioned place.
among the cells of the source-set in
proportion to their activity. In the ANALYSIS OF THE PREDOMINANT
gamma system, active cells gain in PHASE
value at the expense of the inactive The perceptrons considered here
cells of their source-set, so that the will always assume a fixed threshold,
total value of a source-set is always 6, for the activation of the A-units.
constant. Such a system will be called a fixed-
For purposes of analysis, it is con- threshold model, in contrast to a con-
venient to distinguish two phases in tinuous transducer model, where the
the response of the system to a stim- response of the A-unit is some con-
ulus (Fig. 3). In the predominant tinuous function of the impinging
phase, some proportion of A-units stimulus energy.
(represented by solid dots in the In order to predict the learning
figure) responds to the stimulus, but curves of a fixed-threshold perceptron,
the R-units are still inactive. This two variables have been found to be
phase is transient, and quickly gives of primary importance. They are
way to the postdominant phase, in defined as follows :
which one of the responses becomes
active, inhibiting activity in the com- Pa = the expected proportion of A-
units activated by a stimulus of a
given size,
PC = the conditional probability
that an A-unit which responds to a
given stimulus, Si, will also respond
to another given stimulus, 82-
It can be shown (Rosenblatt, IS) that
FIG. 3A. Predominant phase. Inhibitory connections as the size of the retina is increased,
are not shown. Solid black units are active.
the number of S-points (Na} quickly
ceases to be a significant parameter,
and the values of Pa and Pc approach
the value that they would have for a
retina with infinitely many points.
For a large retina, therefore, the
equations are as follows :
FIG. 3B. Postdominant phase. Dominant subset
suppresses rival' sets. Inhibitory connections shown
only for Ri.
(1)
FIG. 3. Phases of response to a stimulus,
THE PERCEPTRON 393
where and
P(e,i) = L — proportion of the S-points illumi-

nated by the first stimulus, Si,
which are not illuminated by
X
S2
and G — proportion of the residual S-set
R = proportion of S-points activated (left over from the first stim-
by the stimulus ulus) which is included in the
x — number of excitatory connec- second stimulus (82).
tions to each A-unit The quantities R, L, and G specify the
y = number of inhibitory connec- two stimuli and their retinal overlap.
tions to each A-unit le and /,• are, respectively, the numbers
0 = threshold of A-units. of excitatory and inhibitory origin
(The quantities e and i are the ex- points "lost" by the A-unit when
citatory and inhibitory components of stimulus Si is replaced by 82; ge and
the excitation received by the A-unit gi are the numbers of excitatory and
from the stimulus. If the algebraic inhibitory origin points "gained"
sum a = e + i is equal to or greater when stimulus Si is replaced by 82.
than 6, the A-unit is assumed to re- The summations in Equation 2 are
spond.) between the limits indicated, subject
to the side condition e — i — I, + l{
j x y e i
+g, - gi > 0.
•* e = " ~ 2~i 2-i 2—i 2- Some of the most important char-
acteristics of Pa are illustrated in Fig.
x—0 y~~^
4, which shows Pa as a function of the
£P(«,t,J.,/<,«.,«<) (2) retinal area illuminated (R). Note
that Pa can be reduced in magnitude
(e - - J, + /,- + ge - «,- > 0) by either increasing the threshold, 6,
or by increasing the proportion of in-
where hibitory connections (y). A compari-
son of Fig. 4b and 4c shows that if the
excitation is about equal to the inhibi-
tion, the curves for Pa as a function
of R are flattened out, so that there is
little variation in Pa for stimuli of
different sizes. This fact is of great
importance for systems which require
Pa to be close to an optimum value in
X order to perform properly.
The behavior of Pc is illustrated in
Fig. 5 and 6. The curves in Fig. 5 can
X be compared with those for Pa in Fig.
>
4. Note that as the threshold is in-
creased, there is an even sharper re-
x (/\x - e\
s^
j/ G"«(I - G) *-«-<'• duction in the value of Pc than was the
case with Pa. Pc also decreases as the
x (';')G.,(1-G).-., proportion of inhibitory connections
increases, as does Pa. Fig. 5, which is
394 F. ROSENBLATT
(•] EFFECT OF INHIBITORY- (b) VARIATION WITH 6 (c) VARIATION WITH M AHD 0
EXCITATORY MIXTURE. 6 " I FOR X • 10, V * 0 FOR MIXTURES ABOUT.60J
INHIBITORY. SOLID LINES
ARE FOR X • 5, Y » 5.
0 .1 .2 .3 .1 .5 .1 .2 .3 .it .B
PROPORTION OF S-POIHTS ILLUMINATED
(K)
FIG. 4. P0 as function of retinal area illuminated.
calculated for nonoverlapping stimuli, MATHEMATICAL ANALYSIS

illustrates the fact that Pc remains OF LEARNING IN THE
greater than zero even when the stim- PERCEPTRON
uli are completely disjunct, and illumi- The response of the perceptron in
nate no retinal points in common. In the predominant phase, where some
Fig. 6, the effect of varying amounts fraction of the A-units (scattered
of overlap between the stimuli is throughout the system) responds to
shown. In all cases, the value of Pc the stimulus, quickly gives way to the
goes to unity as the stimuli approach postdominant response, in which ac-
perfect identity. For smaller stimuli tivity is limited to a single source-set,
(broken line curves), the value of Pc the other sets being suppressed. Two
is lower than for large stimuli. Simi- possible systems have been studied for
larly, the value is less for high thresh- the determination of the "dominant"
olds than for low thresholds. The response, in the postdominant phase.
minimum value of Pc will be equal to In one (the mean-discriminating sys-
tem, or ju-system), the response whose
Pcmin = (1 - L)'(l - G)». (3) inputs have the greatest mean value
responds first, gaining a slight advan-
In Fig. 6, Pemin corresponds to the
tage over the others, so that it quickly
curve for 6 = 10. Note that under becomes dominant. In the second
these conditions the probability that case (the sum-discriminating system,
the A-unit responds to both stimuli or S-system), the response whose in-
(Pc) is practically zero, except for puts have the greatest net value gains
stimuli which are quite close to an advantage. In most cases, sys-
identity. This condition can be of tems which respond to mean values
considerable help in discrimination have an advantage over systems which
learning. respond to sums, since the means are
THE PERCEPTEON 395
respond in the desired fashion, but

merely applies positive reinforcement
when the response happens to be cor-
rect, and negative reinforcement when
the response is wrong.) In evaluating
the learning which has taken place
during this "learning series," the
perceptron is assumed to be "frozen"
in its current condition, no further
value changes being allowed, and the
same series of stimuli is presented
again in precisely the same fashion, so
that the stimuli fall on identical posi-
tions on the retina. The probability
that the perceptron will show a bias
towards the "correct" response (the
one which has been previously rein-
forced during the learning series) in
preference to any given alternative
.1 .2 .3 .4 .5
response is called Pr, the probability
R (RETINAL AREA ILLUMINATED) of correct choice of response between
two alternatives.
FIG. 5. Pc as a function of R, In the second type of experiment, a
for nonoverlapping stimuli.
learning series is presented exactly as
before, but instead of evaluating the
less influenced by random variations perceptron's performance using the
in Pa from one source-set to another. same series of stimuli which were
In the case of the -/-system (see Table shown before, a new series is pre-
1), however, the performance of sented, in which stimuli may be drawn
the ^-system and S-system become from the same classes that were previ-
identical.
We have indicated that the percep-
tron is expected to learn, or to form
associations, as a result of the changes
in value that occur as a result of the
activity of the association cells. In
evaluating this learning, one of two
types of hypothetical experiments can
be considered. In the first case, the
perceptron is exposed to some series
of stimulus patterns (which might be
presented in random positions on the
retina) and is "forced" to give the
desired response in each case. (This
forcing of responses is assumed to be
a prerogative of the experimenter. In
experiments intended to evaluate (PROPORTION OF 0¥E«L«P BETMEEK STIMULI)
trial-and-error learning, with more FIG. 6. Pc as a function of C. X - 10,

sophisticated perceptrons, the experi- Y = 0. Solid lines: R = .5; broken lines:
menter does not force the system to R = .2.
396 F. ROSENBLATT
ously experienced, but are not neces- Ra after n,r stimuli have been shown
sarily identical. This new test series for each of the two responses, during
is assumed to be composed of stimuli the learning period. N, is the number
projected onto random retinal posi- of "effective" A-units in each source-
tions, which are chosen independently set ; that is, the number of A-units in
of the positions selected for the learn- either source-set which are not con-
ing series. The stimuli of the test nected in common to both responses.
series may also differ in size or rota- Those units which are connected in
tional position from the stimuli which common contribute equally to both
were previously experienced. In this sides of the value balance, and con-
case, we are interested in the prob- sequently do not affect the net bias
ability that the perceptron will give towards one response or the other.
the correct response for the class of Nar is the number of active units in a
stimuli which is represented, regard- source-set, which respond to the test
less of whether the particular stimulus stimulus, St-P(Nar > 0) is the prob-
has been seen before or not. This ability that at least one of the Ne
probability is called Pt, the prob- effective units in the source-set of the
ability of correct generalization. As correct response (designated, by con-
with Pr, Pg is actually the probability vention, as the Ri response) will be
that a bias will be found in favor of the activated by the test stimulus, St.
proper response rather than any one In the case of Pg, the constant c2 is
alternative ; only one pair of responses always equal to zero, the other three
at a time is considered, and the fact constants being the same as for Pr.
that the response bias is correct in one The values of the four constants
pair does not mean that there may depend on the parameters of the
not be other pairs in which the bias physical nerve net (the perceptron)
favors the wrong response. The prob- and also on the organization of the
ability that the correct response will stimulus environment.
be preferred over all alternatives is The simplest cases to analyze are
designated PR or Pa. those in which the perceptron is shown
In all cases investigated, a single stimuli drawn from an "ideal environ-
general equation gives a close ap- ment," consisting of randomly placed
proximation to Pr and Pg, if the ap- points of illumination, where there is
propriate constants are substituted. no attempt to classify stimuli accord-
This equation is of the form : ing to intrinsic similarity. Thus, in a
typical learning experiment, we might
P = P(^ a r >0).<£(Z) (4) show the perceptron 1,000 stimuli
where made up of random collections of
illuminated retinal points, and we
P(Nar > 0) = 1 - (1 - P.)*. might arbitrarily reinforce Ri as the
<t>(Z) = normal curve integral "correct" response for the first 500
from — oo to Z of these, and R.2 for the remaining 500.
and This environment is "ideal" only in
the sense that we speak of an ideal gas
in physics; it is a convenient artifact
c4n, for purposes of analysis, and does not
If Ri is the "correct" response, and R 2 lead to the best performance from the
is the alternative response under con- perceptron. In the ideal environ-
sideration, Equation 4 is the prob- ment situation, the constant c\ is
ability that Rj, will be preferred over always equal to zero, so that, in the
THE PERCEPTRON 397
case of Pg (where c2 is also zero), the where u> = the fraction of responses
value of Z will be zero, and Pg can connected to each A-unit. If the
never be any better than the random source-sets are disjunct, w = I/NR,
expectation of 0.5. The evaluation where NR is the number of responses
of Pr for these conditions, however, in the system. For the ^-system,
throws some interesting light on the
differences between the alpha, beta,
and gamma systems (Table 1).
First consider the alpha system, (6)
which has the simplest dynamics of
the three. In this system, whenever
an A-unit is active for one unit of The reduction of c3 to zero gives the
time, it gains one unit of value. We ^-system a definite advantage over the
will assume an experiment, initially, S-system. Typical learning curves
in which N,r (the number of stimuli for these systems are compared in
associated to each response) is con- Fig. 7 and 8. Figure 9 shows the
stant for all responses. In this case, effect of variations in Pa upon the
for the sum system, performance of the system.
If n,r, instead of being fixed, is
treated as a random variable, so that
(5) the number of stimuli associated to
each response is drawn separately
from some distribution, then the per-
100 1000 10,000 100,000

0,n (NUMBER OF STIMULI ASSOCIATED TO EACH RESPONSE)
FIG. 7. P,( 2) as function of an,, for discrete subsets.

(a>« = 0, Pa = .005. Ideal environment assumed.)
398 F. ROSENBLATT
10,000 100,000
FIG. 8. Prdit as function of an,. (For Pa — .07, wc = 0. Ideal environment assumed.)
formance of the a-system is consider- For the /3-system, there is an even

ably poorer than the above equations greater deficit in performance, due to
indicate. Under these conditions, the the fact that the net value continues
constants for the //-system are to grow regardless of what happens
to the system. The large net values
Ci — 0 of the subsets activated by a stimulus
C2 = 1 - Pa tend to amplify small statistical differ-
r MT B -I)« ] (7)
ences, causing an unreliable perform-
L 7Vfl-2 ^XJ ance. The constants in this case
2(1 - Pa)
(again for the /u-system) are
C^ —
ci =0
where c, = (1 - Pa)Ne
(8)
c, = 2(PaNequNB*)*
q = ratio of 0*,, to n,,
NR = number of responses in the sys- c« = 2(1 -
tem
NA = number of A-units in the sys- In both the alpha and beta systems,
tem performance will be poorer for the
co,. = proportion of A-units common sum-discriminating model than for the
to RX and R2. mean-discriminating case. In the
gamma-system, however, it can be
For this equation (and any others in shown thatP,(s) = PK/O; i.e., it makes
which n,r is treated as a random no difference in performance whether
variable), it is necessary to define n,r the S-system or ^-system is used.
in'. Equation 4 as the expected value Moreover, the constants for the y-
off this variable, over the set of all system, with variable nsr, are identical
responses. to the constants for the alpha jt-sys-
THE PERCEPTRON 399
FIG. 9. P,tf) as function of P*. (For n,T — 1,000, u, = 0. Ideal environment assumed.)
tern, with nsr fixed (Equation 6). demonstrates the advantage of the
The performance of the three systems 7-system.
is compared in Fig. 10, which clearly Let us now replace the "ideal en-
10,000
FIG. 10. Comparison of a, /3, and 7 systems, for variable «»,

(NR = 100, <mr, = .5*,,, NA = 10,000, Pa = ,07, w = ,2).
400 F. ROSENBLATT
vironment" assumptions with a model stimuli drawn at random from all

for a "differentiated environment," in other classes in the environment.
which several distinguishable classes If Pcll > Pa > PC12, the limiting
of stimuli are present (such as squares, performance of the perceptron (PBoo)
circles, and triangles, or the letters of will be better than chance, and learn-
the alphabet). If we then design an ing of some response, RI, as the proper
experiment in which the stimuli asso- "generalization response" for mem-
ciated to each response are drawn from bers of Class 1 should eventually
a different class, then the learning occur. If the above inequality is not
curves of the perceptron are drasti- met, then improvement over chance
cally altered. The most important performance may not occur, and the
difference is that the constant c\ (the Class 2 response is likely to occur
coefficient of nsr in the numerator of Z) instead. It can be shown (IS) that
is no longer equal to zero, so that for most simple geometrical forms,
Equation 4 now has a nonrandom which we ordinarily regard as "simi-
asymptote. Moreover, in the form lar," the required inequality can be
for P, (the probability of correct met, if the parameters of the system
generalization), where c% = 0, the are properly chosen.
quantity Z remains greater than zero, The equation for Pr, for the sum-
and Pa actually approaches the same discriminating version of an a-percep-
asymptote as Pr. Thus the equation tron, in a differentiated environment
for the perceptron's performance after where n, is fixed for all responses, will
infinite experience with each class of have ther following expressions for the
stimuli is identical for PT and Pg :
four coefficients:
C 2 =PJV,(1-P 0 1 1 )
r=l,2
This means that in the limit it makes
no difference whether the perceptron has
seen a particular test stimulus before or
not; if the stimuli are drawn from a
differentiated environment, the perform-
ance will be equally good in either case. (10)
In order to evaluate the perform-
ance of the system in a differentiated
environment, it is necessary to define
the quantity PCap. This quantity is
interpreted as the expected value of r=»l,2
PC between pairs of stimuli drawn at

random from classes a and j3. In
particular, PcU is the expected value
of Pc between members of the same
class, and Peis is the expected value of where
PC between an Si stimulus drawn from ir) and a?(Pc\x) represent the
Class 1 and an S2 stimulus drawn from variance of Pcir and Pc\x meas-
Class 2. Pclx is the expected value of ured over the set of possible
Pc between members of Class 1 and test stimuli, St, and
THE PERCEPTSON 401
iJ and crf(Pcix) represent the \ — \Pcn~ PM)

variance of Pcir and P^ meas-
ured over the set of all A-units,
e = covariance of PclrPclx, which is

assumed to be negligible. X[cr/(Prtr
The variances which appear in these
expressions have not yielded, thus far, (ID
to a precise analysis, and can be
treated as empirical variables to be C
4= -i p XT [_Pelr Pclr
determined for the classes of stimuli r— 1,2 -* a-t'e
in question. If the sigma is set equal
to half the expected value of the vari-
able, in each case, a conservative
estimate can be obtained. When the
stimuli of a given class are all of the
same shape, and uniformly distributed Some covariance terms, which are
over the retina, the subscript s vari- considered negligible, have been omit-
ances are equal to zero. Paw will be ted here.
represented by the same set of coeffi- A set of typical learning curves for
cients, except for c2, which is equal to the differentiated environment model
zero, as usual. is shown in Fig. 11, for the mean-
For the mean-discriminating sys- discriminating system. The param-
tem, the coefficients are: eters are based on measurements for a
10,000
FIG. 11. P, and Pg as function of «,r. Parameters based on square-circle discrimination.

402 F. ROSENBLATT
square-circle discrimination problem. denote the presence or absence of each

Note that the curves for Pr and Pg stimulus class (e.g., "dog" or "not
both approach the same asymptotes, dog"), and nothing has been gained
as predicted. The values of these over a system where all responses are
asymptotes can be obtained by sub- mutually exclusive.
stituting the proper coefficients in
Equation 9. As the number of asso- BIVALENT SYSTEMS
ciation cells in the system increases,
the asymptotic learning limit rapidly In all of the systems analyzed up to
approaches unity, so that for a system this point, the increments of value
of several thousand cells, the errors in gained by an active A-unit, as a result
performance should be negligible on a of reinforcement or experience, have
problem as simple as the one illus- always been positive, in the sense that
trated here. an active unit has always gained in
As the number of responses in the its power to activate the responses
system increases, the performance be- to which it is connected. In the
comes progressively poorer, if every gamma-system, it is true that some
response is made mutually exclusive units lose value, but these are always
of all alternatives. One method of the inactive units, the active ones
avoiding this deterioration (described gaining in proportion to their rate of
in detail in Rosenblatt, 15) is through activity. In a bivalent system, two
the binary coding of responses. In types of reinforcement are possible
this case, instead of representing 100 (positive and negative), and an active
different stimulus patterns by 100 unit may either gain or lose in value,
distinct, mutually exclusive responses, depending on the momentary state of
a limited number of discriminating affairs in the system. If the positive
features is found, each of which can be and negative reinforcement can be
independently recognized as being controlled by the application of ex-
present or absent, and consequently
ternal stimuli, they become essentially
can be represented by a single pair of
mutually exclusive responses. Given equivalent to "reward" and "punish-
an ideal set of binary characteristics ment," and can be used in this sense
(such as dark, light; tall, short; by the experimenter. Under these
straight, curved; etc.), 100 stimulus conditions, a perceptron appears to be
classes could be distinguished by the capable of trial-and-error learning. A
proper configuration of only seven bivalent system need not necessarily
response pairs. In a further modifica- involve the application of reward and
tion of the system, a single response is punishment, however. If a binary-
capable of denoting by its activity or coded response system is so organized
inactivity the presence or absence of that there is a single response or
each binary characteristic. The effi- response-pair to represent each "bit,"
ciency of such coding depends on the or stimulus characteristic that is
number of independently recognizable learned, with positive feedback to its
"earmarks" that can be found to own source-set if the response is "on,"
differentiate stimuli. If the stimulus and negative feedback (in the sense
can be identified only in its entirety that active A-units will lose rather
and is not amenable to such analysis, than gain in value) if the response is
then ultimately a separate binary "off," then the system is still bivalent
response pair, or bit, is required to in its characteristics. Such a bivalent
THE PERCEPTRON 403
system is particularly efficient in re- Such a system performs similarly to

ducing some of the bias effects (prefer- the one considered above, but can be
ence for the wrong response due to shown to be less efficient.
greater size or frequency of its asso- Bivalent systems similar to those
ciated stimuli) which plague the alter- illustrated in Fig. 12 have been
native systems. simulated in detail in a series of ex-
Several forms of bivalent systems periments with the IBM 704 computer
have been considered (15, Chap. VII). at the Cornell Aeronautical Lab-
The most efficient of these has the oratory. The results have borne out
following logical characteristics. the theory in all of its main predic-
If the system is under a state of tions, and will be reported separately
positive reinforcement, then a positive at a later time.
AV is added to the values of all active
A-units in the source-sets of "on" IMPROVED PERCEPTRONS AND
responses, while a negative AV is SPONTANEOUS ORGANIZATION
added to the active units in the source- The quantitative analysis of per-
sets of "off" responses. If the system ceptron performance in the preceding
is currently under negative reinforce- sections has omitted any consideration
ment, then a negative AV is added to of time as a stimulus dimension. A
all active units in the source-set of an perceptron which has no capability
"on" response, and a positive AV is for temporal pattern recognition is
added to active units in an "off" referred to as a "momentary stimulus
source-set. If the source-sets are perceptron." It can be shown (15)
disjunct (which is essential for this that the same principles of statistical
system to work properly), the equa- separability will permit the perceptron
tion for a bivalent -y-system has the to distinguish velocities, sound se-
same coefficients as the monovalent quences, etc., provided the stimuli
a-system, for the /j-case (Equation 11). leave some temporarily persistent
The performance curves for this trace, such as an altered threshold,
system are shown in Fig. 12, where the
asymptotic generalization probability
attainable by the system is plotted for
the same stimulus parameters that
were used in Fig. 11. This is the
probability that all bits in an «-bit
response pattern will be correct.
Clearly, if a majority of correct re-
sponses is sufficient to identify a stim-
ulus correctly, the performance will be
better than these curves indicate.
In a form of bivalent system which
utilizes more plausible biological as-
sumptions, A-units may be either
excitatory or inhibitory in their effect
on connected responses. A positive
AV in this system corresponds to the 2000 woo 6000
incrementing of an excitatory unit,
while a negative AV corresponds to FIG. 12. Pga for a bivalent binary system
the incrementing of an inhibitory unit. (same parameters as Fig. 11).
404 F. ROSENBLATT
which causes the activity in the A- tory "names" to visual objects, and to
system at time t to depend to some get the perceptron to perform such
degree on the activity at time t — 1. selective responses as are designated
It has also been assumed that the by the command "Name the object
origin points of A-units are completely on the left," or "Name the color of
random. It can be shown that by a this stimulus."
suitable organization of origin points, The question may well be raised at
in which the spatial distribution is this point of where the perceptron's
constrained (as in the projection area capabilities actually stop. We have
origins shown in Fig. 1), the A-units seen that the system described is suffi-
will become particularly sensitive to cient for pattern recognition, associa-
the location of contours, and perform- tive learning, and such cognitive sets
ance will be improved. as are necessary for selective attention
In a recent development, which we and selective recall. The system ap-
hope to report in detail in the near pears to be potentially capable of
future, it has been proven that if the temporal pattern recognition, as well
values of the A-units are allowed to as spatial recognition, involving any
decay at a rate proportional to their sensory modality or combination of
magnitude, a striking new property modalities. It can be shown that
emerges: the perceptron becomes cap- with proper reinforcement it will be
able of "spontaneous" concept forma- capable of trial-and-error learning,
tion. That is to say, if the system is and can learn to emit ordered se-
exposed to a random series of stimuli quences of responses, provided its own
from two "dissimilar" classes, and all responses are fed back through sensory
of its responses are automatically rein- channels.
forced without any regard to whether Does this mean that the perceptron is
they are "right" or "wrong," the capable, without further modification
system will tend towards a stable in principle, of such higher order func-
terminal condition in which (for each tions as are involved in human speech,
binary response) the response will be communication, and thinking? Ac-
"1" for members of one stimulus class, tually, the limit of the perceptron's
and "0" for members of the other capabilities seems to lie in the area of
class; i.e., the perceptron will spon- relative judgment, and the abstraction
taneously recognize the difference of relationships. In its "symbolic be-
between the two classes. This phe- havior," the perceptron shows some
nomenon has been successfully dem- striking similarities to Goldstein's
onstrated in simulation experiments, brain-damaged patients (5). Re-
with the 704 computer. sponses to definite, concrete stimuli
A perceptron, even with a single can be learned, even when the proper
logical level of A-units and response response calls for the recognition of a
units, can be shown to have a number number of simultaneous qualifying
of interesting properties in the field of conditions (such as naming the color
selective recall and selective attention. if the stimulus is on the left, the shape
These properties generally depend on if it is on the right). As soon as the
the intersection of the source sets for response calls for the recognition of a
different responses, and are elsewhere relationship between stimuli (such as
discussed in detail (IS). By com- "Name the object left of the square."
bining audio and photo inputs, it is or "Indicate the pattern that appeared
possible to associate sounds, or audi- before the circle."), however, the
THE PESCEPTRON 405
problem generally becomes excessively which has not been seen before will be
difficult for the perceptron. Statis- correctly recognized and associated to
tical separability alone does not its appropriate class (the probability
provide a sufficient basis for higher of correct generalization) approaches
order abstraction. Some system, the same asymptote as the probability
more advanced in principle than the of a correct response to a previously
perceptron, seems to be required at reinforced stimulus. This asymptote
this point. will be better than chance if the in-
equality Pci2 < Pa < Pen is met, for
CONCLUSIONS AND EVALUATION the stimulus classes in question.
6. The performance of the system
The main conclusions of the theo- can be improved by the use of a con-
retical study of the perceptron can be tour-sensitive projection area, and by
summarized as follows: the use of a binary response system,
1. In an environment of random in which each response, or "bit,"
stimuli, a system consisting of ran- corresponds to some independent fea-
domly connected units, subject to ture or attribute of the stimulus.
the parametric constraints discussed 7. Trial-and-error learning is possi-
above, can learn to associate specific ble in bivalent reinforcement systems.
responses to specific stimuli. Even if 8. Temporal organizations of both
many stimuli are associated to each stimulus patterns and responses can
response, they can still be recognized be learned by a system which uses
with a better-than-chance probability, only an extension of the original prin-
although they may resemble one an- ciples of statistical separability, with-
other closely and may activate many out introducing any major complica-
of the same sensory inputs to the tions in the organization of the
system. system.
2. In such an "ideal environment," 9. The memory of the perceptron
the probability of a correct response is distributed, in the sense that any
diminishes towards its original ran- association may make use of a large
dom level as the number of stimuli proportion of the cells in the system,
learned increases. and the removal of a portion of the
3. In such an environment, no basis association system would not have an
for generalization exists. appreciable effect on the performance
4. In a "differentiated environ- of any one discrimination or associa-
ment," where each response is asso- tion, but would begin to show up as a
ciated to a distinct class of mutually general deficit in all learned asso-
correlated, or "similar" stimuli, the ciations.
probability that a learned association 10. Simple cognitive sets, selective
of some specific stimulus will be cor- recall, and spontaneous recognition
rectly retained typically approaches a of the classes present in a given en-
better-than-chance asymptote as the vironment are possible. The recogni-
number of stimuli learned by the tion of relationships in space and time,
system increases. This asymptote however, seems to represent a limit to
can be made arbitrarily close to unity the perceptron's ability to form cog-
by increasing the number of associa- nitive abstractions.
tion cells in the system. Psychologists, and learning theorists
5. In the differentiated environ- in particular, may now ask: "What
ment, the probability that a stimulus has the present theory accomplished,
406 F. ROSENBLATT
beyond what has already been done in 1. Parsimony. Essentially all of

the quantitative theories of Hull, the basic variables and laws used in
Bush and Mosteller, etc., or physio- this system are already present in the
logical theories such as Hebb's?" The structure of physical and biological
present theory is still too primitive, of science, so that we have found it
course, to be considered as a full- necessary to postulate only one hy-
fledged rival of existing theories of pothetical variable (or construct)
human learning. Nonetheless, as a which we have called V, the "value"
first approximation, its chief accom- of an association cell; this is a variable
plishment might be stated as follows: which must conform to certain func-
For a given mode of organization tional characteristics which can clearly
(a, /3, or 7; S or n; monovalent or be stated, and which is assumed to
bivalent) the fundamental phenomena have a potentially measurable physical
of learning, perceptual discrimination, correlate.
and generalization can be predicted en- 2. Verifiability. Previous quanti-
tirely from six basic physical param- tative learning theories, apparently
eters, namely: without exception, have had one im-
portant characteristic in common:
x: the number of excitatory connec- they have all been based on measure-
tions per A-unit, ments of behavior, in specified situa-
y: the number of inhibitory connections, using these measurements (after
tions per A-unit,
theoretical manipulation) to predict
6: the expected threshold of an A- behavior in other situations. Such
unit, a procedure, in the last analysis,
co: the proportion of R-units to
amounts to a process of curve fitting
which an A-unit is connected, and extrapolation, in the hope that
NA: the number of A-units in the the constants which describe one set
system, and
of curves will hold good for other
NR-. the number of R-units in the
curves in other situations. While
system. such extrapolation is not necessarily
Ns (the number of sensory units) be- circular, in the strict sense, it shares
comes important if it is very small. many of the logical difficulties of circu-
It is assumed that the system begins larity, particularly when used as an
with all units in a uniform state of "explanation" of behavior. Such ex-
value; otherwise the initial value dis- trapolation is difficult to justify in a
tribution would also be required. new situation, and it has been shown
Each of the above parameters is a clearly that if the basic constants and param-
defined physical variable, which is meas- eters are to be derived anew for any
urable in its own right, independently of situation in which they break down
the behavioral and perceptual phe- empirically (such as change from
nomena which we are trying to predict. white rats to humans), then the basic
As a direct consequence of its foun- "theory" is essentially irrefutable, just
dation on physical variables, the as any successful curve-fitting equa-
present system goes far beyond exist- tion is irrefutable. It has, in fact,
ing learning and behavior theories in been widely conceded by psychologists
three main points: parsimony, veri- that there is little point in trying to
fiability, and explanatory power and "disprove" any of the major learning
generality. Let us consider each of theories in use today, since by exten-
these points in turn. sion, or a change in parameters, they
THE PERCEPTRON 407
have all proved capable of adapting Thus a set of equations describing the
to any specific empirical data. This effects of reward on T-maze learning
is epitomized in the increasingly com- in a white rat reduces simply to a
mon attitude that a choice of theo- statement that rewarded behavior
retical model is mostly a matter of tends to occur with increasing prob-
personal aesthetic preference or pre- ability, when we attempt to generalize
judice, each scientist being entitled to it from any species and any situation.
a favorite model of his own. In con- The theory which has been presented
sidering this approach, one is reminded here loses none of its precision through
of a remark attributed to Kistiakow- generality.
sky, that "given seven parameters, I The theory proposed by Donald
could fit an elephant." This is clearly Hebb (7) attempts to avoid these
not the case with a system in which difficulties of behavior-based models
the independent variables, or param- by showing how psychological func-
eters, can be measured independently tioning might be derived from neuro-
of the predicted behavior. In such a physiological theory. In his attempt
system, it is not possible to "force" to achieve this, Hebb's philosophy of
a fit to empirical data, if the param- approach seems close to our own, and
eters in current use should lead to his work has been a source of inspira-
improper results. In the current tion for much of what has been pro-
theory, a failure to fit a curve in a new posed here. Hebb, however, has
situation would be a clear indication never actually achieved a model by
that either the theory or the empirical which behavior (or any psychological
measurements are wrong. Conse- data) can be predicted from the physio-
quently, if such a theory does hold up logical system. His physiology is
for repeated tests, we can be consider- more a suggestion as to the sort of
ably more confident of its validity and organic substrate which might under-
of its generality than in the case of a lie behavior, and an attempt to show
theory which must be hand-tailored the plausibility of a bridge between
to meet each situation. biophysics and psychology.
3. Explanatory power and generality. The present theory represents the
The present theory, being derived first actual completion of such a
from basic physical variables, is not bridge. Through the use of the
specific to any one organism or learn- equations in the preceding sections,
ing situation. It can be generalized it is possible to predict learning curves
in principle to cover any form of be- from neurological variables, and like-
havior in any system for which the wise, to predict neurological variables
physical parameters are known. A from learning curves. How well this
theory of learning, constructed on bridge stands up to repeated crossings
these foundations, should be consider- remains to be seen. In the meantime,
ably more powerful than any which the theory reported here clearly dem-
has previously been proposed. It onstrates the feasibility and fruitful-
would not only tell us what behavior ness of a quantitative statistical ap-
might occur in any known organism, proach to the organization of cognitive
but would permit the synthesis of systems. By the study of systems
behaving systems, to meet special such as the perceptron, it is hoped
requirements. Other learning theo- that those fundamental laws of organ-
ries tend to become increasingly ization which are common to all
qualitative as they are generalized. information handling systems, ma-
408 F. ROSENBLATT
chines and men included, may even- 11. MCCULLOCH, W. S., & PITTS, W. A
tually be understood. logical calculus of the ideas immanent
in nervous activity. Butt. math. Bio-
physics, 1943, S, 115-133.
REFERENCES 12. MILNER, P. M. The cell assembly:
1. ASHBY, W. R. Design for a brain. New Mark II. Psychol. Rev., 1957,64,242-
York: Wiley, 1952. 252.
2. CULBERTSON, J. T. Consciousness and be- 13. MINSKY, M. L. Some universal elements
havior. Dubuque, Iowa: Wm. C. for finite automata. In C. E. Shannon
Brown, 1950. & J. McCarthy (Eds.), Automata
3. CULBERTSON, J. T. Some uneconomical studies. Princeton: Princeton Univer.
robots. In C. E. Shannon & J. Mc- Press, 1956. Pp. 117-128.
Carthy (Eds.), Automata studies. 14. RASHEVSKY, N. Mathematical biophysics.
Princeton: Princeton Univer. Press, Chicago: Univer. Chicago Press, 1938.
1956. Pp. 99-116. 15. ROSENBLATT, F. The perceptron: A
4. ECCLES, J. C. The neurophysiological theory of statistical separability in
basis of mind. Oxford: Clarendon, cognitive systems. Buffalo: Cornell
1953. Aeronautical Laboratory, Inc. Rep.
5. GOLDSTEIN, K. Human nature in the No. VG-1196-G-1, 1958.
light of psychopathology. Cambridge: 16. UTTLEY, A. M. Conditional probability
Harvard Univer. Press, 1940. machines and conditioned reflexes.
6. HAYEK, F. A. The sensory order. Chi- In C. E. Shannon & J. McCarthy
cago: Univer. Chicago Press, 1952. (Eds.), Automata studies. Princeton:
7. HEBB, D. O. The organization of be- Princeton Univer. Press, 1956. Pp.
havior. New York: Wiley, 1949. 253-275.
8. KLEENE, S. C. Representation of events 17. VON NEUMANN, J. The general and
in nerve nets and finite automata. In logical theory of automata. In L. A.
C. E. Shannon & J. McCarthy (Eds.), Jeffress (Ed.), Cerebral mechanisms in
Automata studies. Princeton: Prince- behavior. New York: Wiley, 1951.
ton Univer. Press, 1956. Pp. 3-41. Pp. 1-41.
9. KOHLER, W. Relational determination 18. VON NEUMANN, J. Probabilistic logics
in perception. In L. A. Jeffress (Ed.), and the synthesis of reliable organisms
Cerebral mechanisms in behavior. New from unreliable components. In C. E.
York: Wiley, 1951. Pp. 200-243. Shannon & J. McCarthy (Eds.),
10. McCuLLOCH, W. S. Why the mind is in Automata studies. Princeton: Prince-
the head. In L. A. Jeffress (Ed.), ton Univer. Press, 1956. Pp. 43-98.
Cerebral mechanisms in behavior. New
York: Wiley, 1951. Pp. 42-111. (Received April 23, 1958)

The Perceptron: A Probabilistic Model For Information Storage and Organization in The Brain

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

The Perceptron: A Probabilistic Model For Information Storage and Organization in The Brain

Загружено:

Авторское право:

Доступные форматы

Psychological Review

Vol. 65, No. 6, 19S8

THE PERCEPTRON: A PROBABILISTIC MODEL FOR

If we are eventually to understand and the stored pattern. According to

to form pathways to the same sets of

a-System /3-System 7-System

Total value-gain of source set per rein-

AV for A-units active for 1 unit of time +1 K/Nar +1

AV for inactive A-units outside of domi-

Mean value of A-system Increases with number Increases with Constant

Difference between mean values of Proportional to differ- 0 0

P(e,i) = L — proportion of the S-points illumi-

FIG. 4. P0 as function of retinal area illuminated.

calculated for nonoverlapping stimuli, MATHEMATICAL ANALYSIS

respond in the desired fashion, but

trial-and-error learning, with more FIG. 6. Pc as a function of C. X - 10,

100 1000 10,000 100,000

FIG. 7. P,( 2) as function of an,, for discrete subsets.

FIG. 8. Prdit as function of an,. (For Pa — .07, wc = 0. Ideal environment assumed.)

formance of the a-system is consider- For the /3-system, there is an even

FIG. 10. Comparison of a, /3, and 7 systems, for variable «»,

vironment" assumptions with a model stimuli drawn at random from all

PC between pairs of stimuli drawn at

iJ and crf(Pcix) represent the \ — \Pcn~ PM)

e = covariance of PclrPclx, which is

FIG. 11. P, and Pg as function of «,r. Parameters based on square-circle discrimination.

square-circle discrimination problem. denote the presence or absence of each

system is particularly efficient in re- Such a system performs similarly to

beyond what has already been done in 1. Parsimony. Essentially all of

Вам также может понравиться