04118260

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO.
2, MARCH 2007
551
Adaptive WTA With an Analog VLSI Neuromorphic Learning Chip

Philipp Higer, Member, IEEE
AbstractIn this paper, we demonstrate how a particular spikebased learning rule (where exact temporal relations between input and output spikes of a spiking model neuron determine the changes of the synaptic weights) can be tuned to express rate-based classical Hebbian learning behavior (where the average input and output spike rates are sufcient to describe the synaptic changes). This shift in behavior is controlled by the input statistic and by a single time constant. The learning rule has been implemented in a neuromorphic very large scale integration (VLSI) chip as part of a neurally inspired spike signal image processing system. The latter is the result of the European Union research project Convolution AER Vision Architecture for Real-Time (CAVIAR). Since it is implemented as a spike-based learning rule (which is most convenient in the overall spike-based system), even if it is tuned to show rate behavior, no explicit long term average signals are computed on the chip. We show the rules rate-based Hebbian learning ability in a classication task in both simulation and chip experiment, rst with articial stimuli and then with sensor input from the CAVIAR system. Index TermsClassication, competitive Hebbian learning, neuromorphic electronics, winner-take-all (WTA).
The more prominent examples are to be found within intelligent sensors like the silicon retina [3][9] and the silicon cochlea [10][12]. B. Neuromorphic Learning Circuits Among all the learning algorithms that are used to train synaptic weights in articial neural networks, it is foremost the so-called Hebbian learning rules (based on a postulate by Hebb [13]) that have a basis in neurophysiological observations. In Hebbian learning rules, correlated pre- and postsynaptic activity leads to long lasting strengthening of synaptic efcacy and vice versa. This has also been known for some time to occur in real synapses [14]. Activity in that context was always dened as an average ring rate. Only recently has it been discovered, that exact relative timing of pre- and postsynaptic action potentials exert an inuence on the changes in the synapse. Thus, stimulation patterns that are indistinguishable when only observing the average frequency can lead to very different behaviors [15], [16]. Approximately at the same time as the neurophysiological evidence had been reported, this principle was also applied to articial learning algorithms [17] and neuromorphic electronic circuits [18]. Some more recent examples of neuromorphic circuits that have made use of the principle are, for example, [19][25]. Neuromorphic learning circuits in general are based on those neurophysiological ndings and express either classical Hebbian- or spike-based learning [also known as spike-timing-dependent plasticity (STDP)]. The learning rule implemented in this paper was originally conceived as an implementation of a spike-based learning rule. Consequently, it is a hybrid (analog/digital) asynchronous circuit and all its processing is timed entirely by its input and output pulse event signals. By adjusting biases controlling one time constant of the algorithm, the learning behavior can be changed from expressing spike-based to rate-based behavior. C. Weight Storage in Neuromorphic Learning Circuits A central problem in implementing neuromorphic synaptic models with learning capability is that of the storage of the synaptic weights. Biological synapses are believed to be processing units including local long-term storage for at least one variable, i.e., the synaptic efcacy/weight. Learning changes that parameter. Thus, neuromorphic learning chips often implement a number of synapses as simple processing units, each with a long-term storage cell. Preferably, this storage cell should be analog, accessible asynchronously, and it should retain its value indenitely. Such storage, though, is not easy to come by on a CMOS chip.
I. INTRODUCTION A. Neuromorphic Electronics
T THE beginning of the 1990s, Mead coined the term neuromorphic engineering as applying the operation principles of the nervous system to articial devices [1]. Researchers have applied this idea to subthreshold analog complementary metaloxidesemiconductor (CMOS) integrated circuits. These are well suited to the purpose, since the basic electronic building blocks have much in common with the basic building blocks of the nervous system. In fact, neurophysiologists had often used electronic components already to describe functional models of neural processes. Most prominent among those might be the action potential generation mechanism in the giant squid axon described by Hodgkin and Huxley [2]. By implementing such models with integrated circuits, it started to be more plausible that one day they could rival biology in compactness, power efcacy, and real-time performance.
Manuscript received November 9, 2005; revised July 7, 2006; accepted August 2, 2006. This work was supported by the European Union Fifth Framework Information Society Technologies (IST) Future and Emerging Technologies (FET) under Project Convolution AER Vision Architecture for Real-Time (CAVIAR). The author is with the Institute of Informatics, University of Oslo, Oslo N-0316, Norway (e-mail: haiger@i.uio.no). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TNN.2006.884676
1045-9227/$25.00 2007 IEEE
552
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007
Of course, there is a huge knowledge base to draw on for synchronous digital memory cells. These combined with digital-toanalog (DA) conversion are a viable solution to the problem, but with a number of drawbacks. If the processing of the synapse is to be analog, then it also needs to include an analog-to-digital (AD) conversion to update the weight in the digital memory cell, consuming again more space on the chip die and making the circuit more complex. If the memory cell is synchronous, updates are limited to certain time slots and a global digital clock signal adds noise to the analog components and consumes power. If the memory is digital asynchronous, then still some digital control signals will have to be generated that may add noise. There are also some easy approaches to a purely analog storage. At rst, one might think of capacitive storage. It is, however, not well suited for long-term storage: If the storage capacitance is connected to a transmission gate, for example, there will always be leakage. Some refresh mechanism might be employed to counter this leakage. This again, would increase the space consumption and complexity of the circuit and would possibly add noise by control signals. Alternatively, the capacitor might be chosen really big to limit the effect of the leakage, thus sacricing layout space for longevity. Much attention has been given recently to nonvolatile analog storage on oating gates (FG), i.e., electrically isolated nodes that still can be charged/discharged by FowlerNordheim tunneling or hot electron injection [26], [27]. This is basically the same technique used extensively these days in digital ash memory. It offers many of the sought after properties: completely analog and compact long-term storage, asynchronously accessible. Its major drawbacks are big mismatches in the tunneling or injection structures that even change with use and difcult handling in standard CMOS processes, making it hard to port the circuits to other CMOS technologies. Thus, in the neuromorphic learning circuit used in this paper yet another compromise solution has been explored, that of weak multilevel static memory [28]. It is a compromise between digital multilevel storage and analog capacitive storage. The memory cell is basically just a capacitor that can hold an analog voltage for short periods of time. Additionally, this capacitor is connected to circuits that always drive its voltage weakly to one of several attractors. Thus, the loss through leakage is not completely abolished but limited. D. AddressEvent Representation (AER): Neuromorphic Pulse Transmission The chip introduced later in this paper has been conceived to t into an overall neuromorphic spike signal processing system [29]. A major bottle neck when implementing such neural systems into analog integrated circuits is that of density of connections. The sheer numbers of connections between neurons in the brain are staggering: The human brain contains about neurons. A cortical neuron makes typically connections point-to-point with other neurons. Thus, there is a total of connections between neurons in the brain. Although cables on a computer chip can be thinner than an axon [i.e., the cable conveying a neurons (brain cell) action potential (AP) (nerve pulse)], the brain uses its three-dimensional (3-D) volume to route those cables, whereas integrated circuits and/or printed
circuit boards are limited to a discrete number of two-dimensional (2-D) layers. On the other hand, electronics is substantially faster than most processes in the nervous system. An AP in an axon is transmitted at a speed of in the order of tens of meters per second. Furthermore, an AP lasts about a millisecond and, after an AP, a neurons undergoes a refractory period of several milliseconds during which time it is unable to re again. Consequently, most neurons are incapable of ring with more than 200 Hz. In contrast, an electronic signal can be transmitted on a cable with close to the speed of light, and serial transmission rates of digital signals in todays electronic devices go up to several GHz. A widely used method in emulating point-to-point pulse event connections in neuromorphic devices these days, AER, trades in the advantage of speed of electronics for its inferior cabling density [30], [31]. Instead of an individual cable between each pair of neurons, two assemblies of neurons share one digital bus. An event (an AP from a neuron) is encoded as a digital ID/address (a number that identies the neuron producing said AP) and is transmitted on this time multiplexed digital bus. On the receiver side, this address is then again converted into pulses that are distributed to the receiving neurons that are connected to the sender. In the present implementation used for this paper, the transmission time for one AP is approximately 100 ns. Thus, within the time of one physiologically shaped AP that lasts for about 1 ms, 10 000 events can be transmitted between different senderreceiver pairs. Such a bus can, therefore, serve about 50 000 senders that re with no more than 200 Hz. E. CAVIAR Project The learning chip presented here has been developed within the European Union research project Convolution AER Vision Architecture for Real-Time (CAVIAR) [29]. Within the project, a general AER communication framework has been developed and several AER processing components, i.e., AER VLSI integrated circuits. These components can be combined within the communication framework in innumerable ways. However, as one particular objective of the project, one demonstrator system has been proposed that we will briey summarize here. The summary will by no means be exhaustive: We only intend to give the reader a context into which to place the work that is described in this paper. The interested reader may follow the references given in the text for a more complete picture of the project. In the CAVIAR demonstrator system, several integrated circuits are assembled in a bioinspired image processing chain by means of the CAVIAR AER communication framework. The chain consists of ve subsequent processing blocks as depicted in Fig. 1. The result of the image processing should be compact and robust real-time information on the position and motion of a moving ball for rapid tracking. The rst stage is the actual image sensor (silicon retina in Fig. 1), which is actually not only a passive sensor: It does already process the image. It is an array of AER pixels that emit pulses at a frequency proportional to the change of the logarithm of light intensity [32]. That means that the pixels are only active if something happens in an observed scene. This is very power efcient on the AER bus and a compact way of encoding image
HFLIGER: ADAPTIVE WTA WITH AN ANALOG VLSI LEARNING CHIP
553
Fig. 1. Five blocks of the demonstrator of the CAVIAR project.
information. Furthermore, it enables really fast detection of motion, since the imager does not suffer from frame delays (refer to [32] for more details). The second stage is an AER convolution chip which extracts a programmed feature in the observed scene (feature map in Fig. 1). By a mathematical convolution with a given 2-D target feature, the image is transformed into a topological map that indicates the presence of this feature in the original image [33]. The shape that is planned to be extracted for the demonstrator system is that of a circle: Since the imager only reacts to changes in light intensity, only the edges of the ball that should be tracked will be represented and it will thus be visible as a circle. Note here that this convolution chip is a component that operates on so-called rate encoding: It needs to integrate several events over time before it decides that it has detected a feature. Its output as well indicates a reliable detection of a feature only after it has produced several events in one location. Thus, also the rest of the processing chain should be viewed in the light of the rate encoding paradigm as opposed to temporal encoding. After the convolution chip, the addressevent (AE) timing is no longer precisely related to events in the real world (only the average frequency is). Even if the later stages can express certain behaviors in response to precise timing of sequences of input events (like the learning chip), they need to be examined in the light of random event timing at given average frequency. This will have some impact on the later discussion in this paper. The third stage [winner-take-all (WTA) in Fig. 1] has the task of deciding on the most likely target for tracking. The output from the second stage might still be ambiguous: Several circle-like shapes may be present in the scene. The task of the third stage is to clean up the various distractors and to isolate the strongest indication of a circle. This is performed by a 2-D WTA operation [34], [35]. The output of this WTA stage already contains most of the information one would need to track the target ball: A decision on where the ball is located has been taken with very little delay since all processing steps are performed in parallel for all pixels by dedicated hardware. Up to this stage, information has always been represented in topological 2-D maps, where every pulse event can be directly related to a particular location in the original scene and neighboring events will relate to neighboring locations in the real world as well. The two following steps will further reduce the amount of information in the data stream and extract a few high-level features of the scene that are in the end no longer arranged topologically. These features will not only depend on position but also on direction and speed of motion of the ball; and these high-level features will not be programmed into the system but the system itself will decide on relevant features to extract by means of
unsupervised learning. This is where the chip discussed in this paper will be used. The high-level information can then either be used directly for tracking or in combination with the position information from the third stage. It may, for instance, provide extra information in cases where the pure position information is insufcient, like if the ball is moving too fast or is behaving erratically due to obstacles in the scene. The forth stage (time-to-space expansion in Fig. 1) rst expands the data representation. It expands temporal information into a spatial dimension by producing delayed copies of the 2-D position information. It takes the AEs coming from the WTA stage and sends them down delay lines that copy every single event several times at regular intervals, assigning a new output address to every delay. Now, a trajectory of the ball, for example, will be represented as a line in a 3-D volume. Finally, in a fth stage (learning classication in Fig. 1), this resulting 3-D pattern is shown to the learning chip that will be subject of this paper. Its task is to classify the patterns that are formed by the activity of neurons in a 3-D array (the outputs of the delay line stage). In fact, this learning chip knows nothing about the arrangement of these input neurons in a 3-D cube. It just treats them as a dimensionless set of input neurons. This chip has a small number of output neurons. They are also connected in a WTA fashion, such that only one of them will be active at any one time. All of them learn to become responsive to a class of input patterns. Thus, some of those neurons might choose to become responsive for left-to-right motion, others to right-to-left motion. Some areas in the observed arena might be inaccessible for the ball, because of a wall or another obstacle. In that case, the neurons would learn not to be dependent on any activity from these silent areas and only cover motion in the active areas. To express the task dimensionless once more: The learning neurons will be exposed to a sequence of input patterns and will learn to divide those patterns among themselves, i.e., to divide those patterns into a number of classes where that number is given by the number of output neurons. In the title, we have chosen to call this behavior descriptively adaptive WTA,however, it is classically also known as learning vector quantization or learning classication. Another component of this CAVIAR system that is not explicitly shown in Fig. 1 but is usually present between any two processing stages is a so-called AE-mapper [36][38]. The input and output address spaces of the AER VLSI chips in this system are hard-wired. To be able to connect two chips, it is usually necessary to have an intermediary projecting one address space onto the other in the desired way. This can, for example, be a compression of the 128 128 address space of the retina to a 64 64 address space of the feature map. The CAVIAR AE-mapper can also perform many more projections, like expansions, compressions, rotation, shifting, etc.
554
Fig. 2. Example of the dynamics of the variables at one synapse. Incoming spikes (x ) increment the membrane voltage m by w and the causality measure c by 1 and both decay in time. m is also incremented by inputs to other synapses. As it exceeds 1, w is updated according to the learning rule (6) and both m and c are reset. If c is big enough (c > (w = ))w will be increased whereas a too low c will cause a decrement of w .
F. Goal of This Paper In the course of this paper, we will asses the basic ability of a prototype chip to perform the learning task of the fth stage of the CAVIAR demonstrator as described in Section I-E, i.e., to learn to classify a set of rate-encoded input patterns. II. LEARNING RULE The learning algorithm that has been implemented in this paper as a neuromorphic analog integrated circuit, is a spikebased learning algorithm: It operates on continuous time and the timing of the spike patterns dene its behavior. A. Denitions In the following, we will dene the dynamics of the synapse including its learning behavior. All variables are also depicted in a simulated example in Fig. 2. 1) Spike Signals: For analytical purposes, inputs and outputs from a neuron are dened as sums of time-shifted Dirac delta functions , i.e., a spike is innitely short in time, innite in magnitude, and the integral over a time period is equal to the number of spikes within that period. A neurons output is expressed as (1) where is the time of the th postsynaptic spike/action potential. The input to the th synapse of the neuron is (2) where is the th presynaptic spike at synapse . and are shown as short pulses in the rst and third trace of the example in Fig. 2.
In a few instances, it is easier to work with a function that is derived from these sums of Dirac delta functions. It is the function that is zero except for when there is a spike: Then, it is 1. We denote this function with a , for instance, as and and we dene this operator formally as (3) 2) Integrate and Fire Neuron: The dynamics of the membrane potential for the particular integrate and re (I&F) neuronal model that will be used in this paper is given by (4) The rst right-hand side summand expresses an exponential decay of the membrane voltage with time constant . and repThe second summand is vector notation for resents the synaptic inputs. Note that, since the variables are sums of Dirac delta functions, integrating over them will lead to discontinuous steps in the membrane dynamics, increasing when an input event arrives at it by the respective weight the synapse. In the same manner, the third summand resets the membrane at the event of an output spike of the neuron. See track 2 of Fig. 2 for an illustration. (Note that in the gure the neuron also receives inputs from other synapses besides the inputs to the observed synapse shown in the rst trace.) There is a formal problem with the use of the product in (4). That is because this is a product of a discontinuous funcand a sum of Dirac pulses . As is reset, this tions discontinuous step coincides with a pulse from the function . Since the value of is not dened at this moment, also the inis not dened for this moment. To tegral over the product solve this dilemma and to enable us to obtain a solution as we integrate over the differential equation (4) (and other equations
555
later in this paper), we assign a value by denition to discontinuous points in all functions in the following: At discontinuous steps, functions in this paper are dened to assume the value they have as the step is initiated. In other words, we implicitly , and later ) on the replace the discontinuous variables ( right-hand side of equations with the ones obtained by the -op, and ) as dened in (3). To keep the equations erator ( simple, we will refrain from explicitly writing the operator in these instances. The times of output events are not independent parameters. They are actually given by the neurons state and the inputs. The relation is dened as follows: (5) This equation can be expressed in words as: Whenever the membrane would exceed the threshold value 1, it res an AP. of an output event is the rst The equation states that time time after the previous event that the membrane reaches the ring threshold. (Compare traces 2 and 3 in the illustrating Fig. 2.) 3) Learning Synapse: The weight vector of a neuron evolves according to the following dynamics. (Also see trace 4 in Fig. 2.) This is the central equation here that denes the learning behavior that is later implemented on the chip (6) This learning rule has been presented in [39] and, in a rst form, in [18]. The multiplication with (the output spike signal) causes discontinuous weight update steps when there is an output spike. Here, we have changed the form of the two free parameters somewhat from [39]. The reason is that now the free parameter directly represents an important property of the learning rule: It has been proven in [39] (and we will repeat this proof in slightly different form in II-B1) that, for an I&F neuron using this learning rule, any attractor of the weight vector is of length . Thus, this learning rule implicitly normalizes the weight vector. is another free parameter, reecting the learning rate. is a vector of variables that are local to each synapse. They behave according to the following dynamics: (7) The dynamics of (example in trace 4 in Fig. 2) is dened here in a very similar manner to the membrane voltages [see (4)]. Note that they both decay with the same time constant . can be seen as storing information about recent presynaptic activity. The positive term in (6) rewards thus synapses that have received much recent activity just before an output spike, rewarding potential causal relationships. We will refer to as causality measure. B. Properties 1) Weight Vector Normalization: It will be shown in the folof an attractor of the dynamics lowing that the length
(6) [we use the to refer to the xed point of (6), i.e., its solution with respect to if the left side is set to zero] must be equal to the parameter . Thus, this learning rule implicitly normalizes the weight vector. can be seen as storing the contribution of the synapses to the membrane potential without the synapses weighting and can actually be expressed as
(8) If we multiply (6) (with the left side set to zero) from the left and use (8), we get with
(9) Remember that there is a postsynaptic spike as the membrane . voltage crosses the threshold 1. Thus, we can write cannot be eliminated without caution by dividing both sides of the equation through it. Remember that is a sum of Dirac delta functions. Basically, the multiplication by means that this equation is only valid at times of post synaptic rings. In this special case, however, we are left only with constants: the (that is constant by deniparameters and the xed point tion). Thus, what is true for these constants at times of postsynaptic pulses is also valid at any other time and we can remove from both sides of the equation
(10) 2) Direction of the Attractor: We have established that the is equal to parameter . Now, if we length of an attractor , too, it would be completely could deduce the direction of described. , we get If we set (6) to zero once more and solve for
(11) needs to be that of at We can see that the direction of the times when the neuron res a spike (i.e., ), only at these times because of the multiplication with . , but Now, this is unfortunately not a closed solution for only a requirement: The ring times are in turn dependent on . 3) Approximate Rate Behavior: In our target application within the CAVIAR project, we expect the inputs to be rate encoded. Fortunately, the learning behavior can be simplied to be dependent on average rates only instead of precise spike timing, if certain assumptions are met. If consistent temporal correlations of the input and output spike signals are not present, it becomes possible to describe the learning behavior in terms of average spike rates. We can achieve that among the inputs by encoding the input signals as independent Poisson distributed spike trains. (Others have deduced rate behavior for forms of spike-based learning rules that do not cover the one discussed
556
This learning rule behaves Hebbian as becomes small. Hebbian in the sense that the positive right-hand term ap, thus, rewarding correlation between and . proaches By tweaking the parameter can be limited for a given length of the input . ( could also be used since it determines the weight vector length and, thus, limits the magnitude of the output , for a given magnitude of the input vector .) It should not be made too small though since an I&F neuron can be prevented from spiking entirely by a too strong leakage (i.e., small ).
Fig. 3. Conceptual block diagram of one neuron circuit. There are 32 such neurons on the chip. Each has 64 learning synapses (numbered in the drawing; see Fig. 4), an excitatory synapse with a xed weight set by an external bias (marked by +; see Fig. 8) and one inhibitory synapse (marked by -, see Fig. 9) all connected to a simple dendritic compartment (psc) which conducts charge to the cell body, the soma ( soma; see Fig. 10). If enough charge has been accumulated, the soma emits an action potential (online ap). The synapses of all neurons are arranged in a 2-D array and are addressed by the AER receiver logic (not shown) by two simultaneous active low pulses ( rec x and rec y) along the two dimensions.
III. CIRCUIT IMPLEMENTATION The chip has been implemented in the Austria microsystems (AMS) 0.35 m process and contains 32 neuronal circuits. Each neuron is equipped with 64 learning synapses, one inhibitory synapse for off-chip inhibitory input and on-chip global cross inhibition, and an excitatory synapse with a constant weight (see Fig. 3 for a sketch). Off-chip spike communication is provided by AER communication blocks (not shown). Fig. 4 shows the block diagram of the learning synapse that implements the learning rule (6). Synaptic inputs are received by the active low digital signals on rec x and rec y. These signals come from the AER receiver. If they are both low simultaneously, the synapse is receiving an input event. The lu and ld blocks compute the positive and negative term in the weight update rule (6). Triggered by an output ap (and its inverse ap) of the neuron (coming from the block in Fig. 10), they emit digital pulses of a duration proportional to those terms. The ML mem cell stores the synaptic weight. The pulses from the lu and ld block activate currents that increase and decrease the voltage on the capacitor in the memory cell. The esyn block releases a charge package (psc: postsynaptic current) proportional to the weight (voltage on node w) to the soma (Fig. 10), when it registers an input event. Fig. 5 is an overview of the weak multilevel memory cell. It is described in more detail in [28]. The follower connected ampliers to the left do have the special property of turning themselves off, if the two input voltages are more than a few hundred millivolts apart (please refer to [28] for details). Connected as followers, they drive their output voltage actively towards their input voltage only if the output is already close to that target voltage. In the memory cell, the outputs of six of those followers ) are shorted with different input target voltages (level together. Thus, the voltage on that node is always driven to the closest target voltage by a small current. (Actually, that current is tunable by a bias voltage that is not shown.) p bias out is a follower bias for the read buffer of the actual memory capacitance. The state of this memory cell is changed by injecting currents onto that node, the sum of which may or may not move the memory voltage from one basin of attraction to another. In this implementation, these currents are turned on and of by digital signals on up and down and they are limited by bias voltages on up bias and down bias. These two biases up bias and down bias are used to adjust the theoretical parameters and [compare (6)], where the current through the bias transistor with gate up bias is proportional to and the current through transistor down bias is proportional to
here1 under these same assumptions [40][42].) However, with an I&F neuron, the output spike train will never be completely independent from all inputs, since there will always be one input spike driving the membrane over threshold and, thus, directly triggering the output spike. However, the weight vector normalization parameter can be chosen relatively small. Thus, it will always take many input spikes before one of them drives the neuron over threshold. This way, we can limit the temporal correlation between the inputs and the output spikes as well. [For an alternative input/output (I/O) model that would fully fulll the assumption of no temporal correlation between input and output see Appendix I-B. Unfortunately, this I/O relationship is less suited for implementation.] and the output Let us make the assumption that all inputs are independently Poisson distributed signals. spike train In that case, it can be shown that the average of is (see Appendix I-A for the complete proof) (12) Note that we do use the hat to express averages over time. is thus vector averaged over time. is the average of the a sum of Dirac delta functions, i.e., the number of action potentials divided by the averaging period. is the average of the vector multiplied with the sum of Dirac delta function , i.e., the elements of are the sum of the elements of as sampled at the times of action potentials and divided by the averaging period. The learning rule in terms of rates becomes (13)
1Neither [40], [41] nor [42] considered weight-dependent terms in the ~ in (6)]. The publications learning rules they discussed [like the term (1= )w [40] and [41] did not explore effects of limiting terms dependent on pairs c in (6), which is reset of pre- and postsynaptic spikes to nearest neighbors [~ with every postsynaptic spike]. A mechanism that was not considered in [42] were terms that were activated by only a single pre- or postsynaptic spike, i.e., independent of any intervals [ y (1= )w ~ in (6)].
557
Fig. 4. Block diagram of learning synapse. It transforms an input pulse (if both rec y and rec x are low) into a charge package on node psc and also updates its weight w according to the learning rule (6). The individual blocks are further detailed in Figs. 58.
Fig. 5. Weak multilevel memory cell (block ML mem in Fig. 4). It stores an analog voltage on a capacitor on a short time scale and settles to one of six levels (levelh0 1 1 1 5i) on a longer time scale (in the order of several milliseconds and above). Digital input pulses on down and up decrement and increment the voltage.
. For instance, with the set of parameters given in Table I (V up bias 2.32 V and down bias 0.54 V), measured approximately 0.01 and was 0.22.2 A slight change of the bias voltage up bias to 2.25 V changed those theoretical parameters to 0.013 and 0.29. Changing up bias to 2.39 V 0.007 and 0.14. resulted in The schematics of the lu (learn up) block is shown in Fig. 6. The dynamics of the theoretical variable (7) is represented by
2This has been measured by stimulating only one synapse with a number of spikes that kept it below threshold. Then, the neuron was immediately made to re by the nonlearning excitatory synapse. The resulting change in weight was measured and and were deduced from these observations.
the voltage on node corr. An input event (coincident active low pulses on rec x and rec y) increment that voltage by an amount given by the current limiting bias on lu inc. If the neuron releases an output event, the signal ap goes high. This enables a current comparison between currents through M2 (given by the bias voltage on lu threshold and set to be small) and M1 (given by gate voltage on corr): as long as the voltage on corr remains big enough, the voltage on node learn up will be low. corr is depleted through current limiting transistor M3 towards the voltage on corr baseline at the same time, and as it becomes small enough for the current through M1 to become smaller than the one through M2, the learn up signal
558
Fig. 6. Circuit computing the weight increment (block lu in Fig. 4). Triggered by a pulse on ap, it converts the voltage on node corr into a pulse duration on learn up (see Section III). corr is incremented with as the synapse receives spikes (if both rec y and rec x are low and it decays in time controlled by lu leak). TABLE I PARAMETER VOLTAGES USED IN ALL EXPERIMENTS. THIS CORRESPONDS TO 0.01 AND 0:22 IN THE THEORETICAL (6)
goes high again. If the bias lu threshold is tuned such that this only happens as corr reaches corr baseline, the duration of
the active low pulse learn up will be nicely proportional to the voltage on corr at the moment the neuron res. This circuit actually constitutes a voltage to pulse width converter. The constant leakage of is implemented as a constant depleting current through M4, regulated by the bias voltage on corr leak. This is not quite correct in respect to the theory, which would require an exponential decay of that voltage. This has an effect on the weight vector normalization which is no longer perfect, but still it is approximate: (8) is no more exact but still an approximation. The ld block is further described in Fig. 7. It computes the decrementing term in the learning rule (6). This circuit too, as lu circuit, is based on a voltage to pulse length conversion, which can be adjusted in the same manner by the biases ld threshold, ld pulse length and w baseline. Thus, the proportional relationship between the decrementing term and the weight is realized quite easily: as the neuron res an AP (active high pulse on node ap and active low on ap) the circuit emits an active low pulse on node learn down with duration proportional to the voltage on w (in the same manner as the lu circuit described in the previous paragraph). The circuit in Fig. 8 converts a voltage into a charge package, with tunable gain. Note that the sense of the membrane voltage in the circuit implementation of the soma is inverted as compared to biology and the theoretical model in (4): excitatory
559
Fig. 7. Circuit computing the weight decrement (block ld in Fig. 4). Triggered by a pulse on ap, it converts the voltage on w into a pulse duration on learn down (see Section III).
Fig. 8. Excitatory synapse (block esyn in Fig. 4). Triggered by an spike received by the synapse (if both rec y and rec x are low), this block draws a charge package proportional to the voltage on w on line psc. The tilted current mirror can be used to tweak the proportionality constant of this charge package (controlled by the balance of w baseline and amp-).
input (from this circuit) discharges the membrane voltage and inhibitory input charges the membrane voltage (compare Fig. 10). With an incoming synaptic event ( rec x and rec y low) The voltage w is sampled onto a capacitance and then drained towards the bias voltage on w baseline. The drainage current is mirrored and amplied through a tilted current mirror (transistors M1 and M2) onto node psc. This output current is the synaptic current charging the membrane capacitance of the neuron. The balance of the source voltages on the tilted mirror (w baseline and amp-) allow to adjust the current gain of this step. The circuit depicted in Fig. 9 implements an inhibitory synapse with a weight set by bias w-. It can be triggered by
either an external event or by the other neurons on the chip. It is designed such that inhibitory effects can outlast the triggering event and virtually clamp the membrane voltage of the soma to its resting potential (Vdd) with a leakage current. This current, controlled by the transistor with gate w-, is active as long as there is a voltage bigger than 0 V on shunt. This voltage is raised to Vdd by either an AER event targeting this synapse ( rec x and rec y low) or by the line xinhib being pulled low. This node xinhib is common to the inhibitory synapses in all neurons and there is one global pull up transistor keeping it at Vdd if there is no neuronal activity. Any of the neurons ring can pull xinhib low. Only in the particular neuron that causes this, is shunt prevented from being pulled
560
Fig. 9. Inhibitory synapse with on-chip global cross inhibition. It can be triggered either by other neurons ring (cross inhibition: node xinhib going low) or by an input spike directed at it from off chip (if both rec y and rec x are low). If triggered it applies a strong leakage current to the I&F neuron for a duration set by the voltage on shunt duration.
Fig. 10. Soma, the I&F neurons cell body. The voltage on node soma is pulled down by charge drawn from excitatory synapses and pulled up by current from the inhibitory synapse, i.e., the voltage in this neuron moves in the inverted directions as compared to the biological counterpart. If it falls below the switching threshold of inverter I1, it res an action potential ap and is reset.
up (by ap going high and closing the P-type eld effect transistor (PFET) it is connected to). This prevents a neuron from inhibiting itself. The bias voltage on shunt init limits the current that charges shunt in case of cross inhibition, to make sure that the initiating neuron really is not affected, because there will be a slight delay as the ap ends between the lower PFET being switched on again and the one above being turned off. The bias voltage on shunt duration controls the duration during which the inhibition remains effective.
Finally, the implementation of the I&F neuron soma is shown in Fig. 10. It is based on the implementation proposed in [5] adding a few extensions. The core I&F circuit only comprises the rst two inverters (I1, I2), the reset branch (M1, M2), the somas membrane capacitance (C1), and the feedback capacitance (C2). Note that, contrary to biology and [5], this neurons resting potential is at Vdd and excitation lowers the somas membrane voltage by drawing a current from node psc. As input current decreases the soma voltage beyond the inverters
561
switching threshold, the output of I2 (-ap) goes low. The feedback capacitance lowers the membrane voltage further, to make this state temporarily stable. The limited reset current through M1, regulated by soma pulse length, raises the membrane voltage back to the switching threshold. As this threshold is reached again, the output of I2 goes high and the feedback capacitance pulls the membrane well above the threshold again. If the feedback capacitance matches the total capacitance of the soma node well, including parasitics, (as was attempted for this implementation) it will pull the membrane voltage all the way up to the resting potential at Vdd. One difference from the original in [5] is that the membrane capacitance is split into two explicit capacitances (C2 and C3) separated by a resistor. This divides the model into two cable compartments, emulating some spatial separation of the dendrites and the soma proper. The voltage dynamics in those two compartments is coupled by the resistor but will be somewhat different between resets. This separated dynamics is not particularly relevant in the rate code application here but only in a temporal encoding context. We will thus only give a brief explanation here. Consider two neurons that are at the same membrane potential close to the ring threshold and both receive a spike to a synapse at the very same moment. Both synaptic inputs are sufcient to drive the neurons over threshold. However, one of the synapses has a higher weight than the other. In this case, the neuron that receives its input from the stronger synapse will re sooner than the other. That is because the charge drawn by a synapse is rst dumped from C3. If that charge package is big, the voltage on C3 will drop far, resulting in a big voltage difference across the resistor and, consequently, in a big initial current owing through this resistor, whereas a small charge withdrawn from C3 will result in a smaller initial current through the resistor. This may be decisive if two neurons that are connected by cross inhibition receive the same temporal spike pattern. The one with a better afnity to this particular pattern will re somewhat earlier (even if the same spike would drive both neurons over their threshold) and it would be the winner in this temporal WTA competition [34], [35], [43]. In the context of the rate experiments presented in this paper, however, this feature will not be of importance, since the synapses receive independent Poisson spike trains. Further extensions from the original are a constant leakage current (regulated by the bias soma leak), and that an AP resets both capacitors, before and after the resistor. As in the lu circuit the leakage should really be exponential to match the theoretical rule precisely. As a consequence, the weight vector normalization will not be perfectly precise. Fig. 11 shows an example of the signals when the neuron is stimulated. The stimulation happens faster than the neurons and the weak multilevel memorys time constant, thus, there is no leakage observable. The top trace shows the input spikes (in spike is high when both rec x and rec y are low) to this particular synapse. At rst it receives a burst of enough spikes to drive the membrane voltage ( soma, the second trace) below the ring threshold. At the same time it increases the causality measure (corr, the forth trace). Since the signal corr is high as the neuron res the rst time, the weight (w, the third trace) is increased. After that rst action potential
Fig. 11. Snapshot of synaptic variables on-chip under stimulation. Trace in spike corresponds to neither rec x nor rec y. These are the input spikes to the synapse. Trace soma shows the membrane potential. Note that it receives inputs not exclusively from the observed synapse (trace in spike) but it may also be decremented by input from other synapses. Trace w is the synaptic weight that gets updated when the neuron res an AP (as the trace soma is rapidly drawn down to ground) according to the learning rule. Trace corr is the causality measure of this synapse, incremented by incoming spikes and reset by APs.
(while soma is below the ring threshold) the synapse receives a few more inputs. Note that the decrements of soma are now bigger, since the synapse has a bigger weight. After that, another synapse with a xed big weight is stimulated with the same frequency and the membrane voltage continues to fall towards its threshold. However, the signal corr that is local to the rst synapse is no longer incremented and is now much smaller than before as the neuron reaches threshold a second time. Thus, the weight is decreased this time. When the neuron is driven to re a third time by only the xed weight synapse, corr is zero and, thus, the weight is reduced even more. IV. LEARNING TASKS A. Classication That Minimizes Information Loss Let us specify once again the task this chip is to solve. It is to receive a multidimensional input vector (e.g., from a sensor array) and quantize it, i.e., it projects the array of rate encoded inputs onto a discrete number of possible outputs. This is known as vector quantization or classication. Like this, the original information content of the sensor array data is much reduced and a lot of information is disregarded. By means of learning, this classication should be optimized and the information loss minimized: Given the limited number of output states, it should be attempted to maximize the information content of the output of the system. Information theory provides a measure for the information content of a variable called entropy [44]. The entropy of a variable with a discrete number of states (in case of a WTA the number of output neurons) is maximal if the probability of the variable being in any of those states is equal. We assume that the output of the chip is a purely deterministic encoding of the input and that all its entropy is information on the input. Thus, maximizing the output entropy is equal to minimizing the information loss through the encoding.
562
The entropy of a WTA network (where only one of neurons is active at a time and that has thus possible states) with ( being the unity vector in the probabilities th dimension)3 is dened as (14) and it is maximal if all probabilities are equal
(15) In the described experiments, we will use the outputs entropy as a measure of the quality of the encoding obtained through learning on the chip and in simulation.
Fig. 12. Competitive Hebbian learning in a two-neuron WTA on 2-D input vectors. The inputs are marked as dots and are made up of pairs of random numbers between 0100. The weights multiplied with 100 are marked in the same coordinate system. The two initial weight vectors are indicated by a *. They are chosen to be not perfectly normalized at the start to illustrate the normalization by the learning rule. The nal weight vectors are marked by o and have come to lie on the quarter circle with radius 100 (marked with a drawn out line) and are thus normalized. Intermediate weights are marked with dots to visualize the progress of the weight vectors during learning. Since the input values are more numerous in the upper left of the gure, the weight vectors move in the way that the WTA separates the input space not symmetrically along the vector (1, 1) but slightly shifted to the upper left too. The nal separation line is drawn as dashdotted line in the graph.
B. Competitive Hebbian Learning In the following, we will use our chip to optimize classication of a set of input patterns. To this end, it employs competitive Hebbian learning. Note that the term competitive Hebbian learning has been used in the literature with rather different meanings and in different contexts, for example, [16] and [45]. We refer to competitive Hebbian learning as the learning in articial neurons that form a WTA network due to strong cross inhibition, and are set to learn according to a Hebbian learning scheme only if they are the winner. To analyze the behavior of a WTA network of neurons that receive the same rate-encoded set of input patterns and that learn according to a Hebbian learning rule that normalizes the weight vector, we have to dene the I/O relationship of these neurons. In the Appendix, it is deduced that a linear threshold model is a possible approximation of the output of a leaky integrate and re neuron (24). The only term in this I/O relationship that is dependent on the input vector is the scalar product of the input . Remember that the weight vector and the weight vector vectors are normalized to the same length for all neurons by the learning rule used here. The winner in the WTA competition for a given input pattern will thus be the one with the smallest angle between the weight vector and the input vector, since the . Thus, the scalar product can also be written as neurons carve up the input space along subspaces at the same angular distance from the closest two weight vectors. If the weight vectors were not normalized the angular partition of the space could also be more irregular, i.e., short weight vectors would cause their neuron to respond only to input vectors at a small angular distance from themselves whereas long weight vectors would cover a bigger area. This may be problematic if weight vector growth is not held in check at all: Long weight vectors could take over the entire input space and short vectors may not be able to respond any longer. Some limited imprecision in the weight vector normalization is not problematic though and may even be used to advantage by a more clever learning rule, allowing more possibilities to divide the input space.
3In
Fig. 12 illustrates the learning behavior in a two neuron WTA with 2-D input vectors. In this simulation, the rate-based description of the learning rule (13) has been used together with the neuron rate I/O relationship (24). was set to 1, to 0.0001, and was 0.5 in this example. Twenty random input vectors are repeatedly presented in random order. It can be seen that the weight vectors are becoming normalized because of the learning rule and that the linear separation of the input space is inuenced by the slightly irregular input distribution with a bigger concentration towards the upper left. The nal separation line is indicated by a dashdotted line and it separates 12 from 8 inputs. This is not a perfectly even separation which would be optimal to maximize the output entropy. The quality of the outcome depends on the distribution and initial weights and an optimal outcome cannot be guaranteed in the general case. However, the algorithm is a good heuristic method that lends itself to implementation in neural models. C. Experimental Setup We tune the time constant to achieve rate-based Hebbian learning according to (13). Global cross inhibition was tuned to be very strong: Every output would feed into the inhibitory synapses of all other neurons, resetting them and clamping the membrane to its resting potential for several milliseconds. This network conguration is depicted in Fig. 13: Neurons are numbered 0 to 7 from left to right. They are depicted with a triangular soma (cell body) and a long dendrite (neuronal input branch) extending on top. Excitatory learning synapses are represented as small triangles on the left of the dendrite and numbered from 0 to 15, top to bottom, for every neuron. Synapses on different neurons with the same number will
simpler words, p is the probability of the ith neuron being the winner.
563
Fig. 13. Neural network conguration in the experiments with global cross inhibition. Eight (of 32) neurons, 16 (of 64) learning synapses, and one inhibitory synapse per neuron were used.
receive inputs of identical average frequency. In addition, there is one inhibitory synapse shown on the right of the dendrite. The cross inhibitory connection scheme is indicated with the lines at the bottom: Every neurons output is connected to the inhibitory synapses of all other neurons. A complete list of the chip biases used in the experiments is given in Table I. The same set of biases was used for all experiments, i.e., the chip was not tweaked to perform optimally for every individual set of input patterns. One bias in that list that is not shown in the schematics is mem bias. It is the bias for the are generated on-chip equally fusing followers. Level spaced between level and level by resistive voltage division. For comparison, we did also conduct spike-based simulations of the learning tasks in Matlab (for the three experiments with simulated input patterns). In contrast to the experiments, the simulations were more true to the theory: They operated with analog memory for the synaptic weights instead of weak multilevel memory, and the membrane potential and the synaptic local variable did decay exponentially (through a resistor), not linearly like on the chip (through a current source). We did present input patterns to the neurons that were dened as 16-dimensional vectors of average frequencies. Those frequencies were delivered to the synapses as Poisson distributed rate signals. The learning progress was monitored by computing the entropy for every learning cycle. A cycle is dened here as one presentation of the complete set of input vectors in random order. The output activity was monitored during each individual input pattern presentation. To compute the entropy, the probabilities of the WTA output neurons being active were computed as the sum of the probabilities of the input patterns to which they responded. In the case of the chip implementation, it did occur that two output neurons responded during one input pattern presentation. In that case, the contribution to their activity probability was split between the two according to their respective number of output spikes during that period. D. Nonoverlapping Clear Patches At rst, we presented eight nonoverlapping binary input vectors/patterns (synapses were either stimulated with a given
Fig. 14. Stimulus set: nonoverlapping clear patches. Eight stimuli in total that are copies of the rst stimulus that is shown here shifted by steps of two.
frequency or not stimulated) to eight neurons. Sixteen synapses were used per neuron. A pattern would consist of the same two distinct synapses being stimulated in all eight neurons with 100-Hz independently Poisson-distributed spike trains. Thus, the inputs can be represented by the set of 16-dimensional vectors depicted as a bar plot in Fig. 14. The synaptic weights for all neurons were randomly initialized by sending independent Poisson spike trains (100 Hz) simultaneously to all synapses, while forcing the neuron to re irregularly via Poisson input to a strong nonlearning excitatory synapse (50 Hz). Then, we presented the input patterns to all neurons simultaneously in random order for 1 s per pattern, and we repeat this for 100 cycles. The information content/entropy of this input set and distribution is rather poor, poor enough that it can be represented in the output with no information loss, that is if every one of the eight neurons responds to exactly one of the eight input patterns. If the learning process is simulated, this optimal result is always reached with ease. The learning parameters and are not very critical. In simulation, if was chosen to be 0.1, then, 0.0001, the ideal outcome was always for 0.005 reached.
564
Fig. 15. Experiment with nonoverlapping clear patches. (a) Probability distribution of the output entropy with random weights, and equally distributed probability of input vectors. Monte Carlo simulation. Avg: 2.25. (b) Output entropy after 100 learning cycles. Avg: 2.85. Chip measurement, 51 trials total. (c) Average over 51 trials of the evolution of the output entropy over 100 learning cycles. Chip measurement.
If was chosen smaller, two neurons would adapt to the same input pattern and would swap being the winner when this pattern was presented. Thus, another neuron would adapt to two input patterns instead of just one. The stronger learning 0.0001) makes such an undesired behavior much less rate ( likely. A bigger than about 0.005 would make the weight vector overshoot a momentary attractor. Weight vector normalization would, thus, not be achieved smoothly. Therefore, there would be a danger that a weight vector became too short and the neuron would stop responding entirely. The physical implementation, unlike the simulations with the parameters within the indicated range, does not always reach the optimal solution. Fig. 15 shows a histogram of the estimated output entropy after random initialization4 of the weights and Fig. 15(b) shows the outcome after 100 learning cycles. The top most bin includes three discrete possible outcomes, the optimal with an output entropy of 3 bits (17 cases, every output neuron responds to exactly one input pattern), 2.81 bits (23 cases, seven output neurons respond to exactly one input pattern and one input pattern elicits no response), 2.75 bits (six cases, six output neurons respond to exactly one input pattern and one neuron responds to two input patterns). Fig. 15(c) is the average of the output entropy as it changes while learning over all 51 experiments. Note that the rst value is recorded after the rst cycle of presentation, i.e., as the network has already learned for one cycle. Since the learning rate is rather high, quite an improvement as compared to the initial state had already been achieved at this point. Thus, this value is not the same as the average of the histogram in Fig. 15(a). As has been stated in the description of the circuits, the learning rate had been chosen higher than the simulations suggest are optimal. We will go more into the reason for this choice in Section IV-E. As a consequence of this choice, some
4This is not measured, since by presenting the input patterns, the chip did already learn and thus improve its performance [resulting in the rst point in the graph in Fig. 15(c), bottom]. Instead, the performance before learning has been estimated with a simple Monte Carlo simulation in the software model: The output entropy of the network is computed with 10 000 different sets of random weights and the histogram of these 10 000 outcomes is shown. The chips performance might even be worse than this estimation because of mismatch.
Fig. 16. Stimulus set: overlapping clear patches. Eight stimuli in total that are copies of the rst stimulus that is shown here shifted by steps of two.
neurons stopped responding. This is consistent with simulations with too high learning rates. In a few other cases, one output neuron consistently responded to two nonoverlapping input patterns (let us call them pattern A and pattern B) despite a relatively high learning rate. This only occurred in the chip implementation and was caused by a combination of mismatch and the multilevel memory. In the simulation, a sufciently high learning rate tips the balance and makes the neuron forget about, for example, pattern A while pattern B is presented. The synapses that receive input from pattern A have their weight decreased by the learning rule while the neuron responds to pattern B. Thus, its response to pattern A is diminished and even if it is not reduced to zero it will be incapable to relearn pattern A completely when it is next presented due to the learning speed being dependent on the output activity. The neuron would specialize more and more on pattern B and eventually forget about pattern A completely, allowing another neuron to adapt and respond to pattern A. Actually, the parameters in the chip implementation are chosen such that this is the expected behavior for the average synapse. However, the misbehavior of this one neuron can still occur due to mismatching properties: Even with the same learning rate
565
Fig. 17. Example traces for classication learning task with overlapping clear patches. (a) Simulation data. Low learning rate. (b) Chip measurement. High learning rate. Top: Example trace of activity of one neuron. Gray scale from (a) 0 events (black) to 15 and (b) 23 events (white), respectively. Bottom: Convergence of output entropy.
biases as all the other neurons, it might be incapable to depress the weights on these synapses sufciently for them to jump to the basin of attraction of the lowest level of the multilevel weight memory cell. Thus, these synapses can never be suppressed completely and the neuron can continue to respond to two patterns. This behavior may be aided by another mismatch: If the neuron in question is responding more strongly than most other neurons with the same weights it is even more difcult for other neurons to win against it in the WTA competition, even if its synapses are weakened. Fortunately, the mismatches did lead to such misbehaviors only in a few cases and the outcome of these experiments was always very close to the optimal 3 bits. E. Overlapping Clear Patches A similar experiment has been conducted with eight more difcult input patterns. This time, for every pattern, there were four synapses per neuron stimulated with 71-Hz Poisson-distributed spike trains. There would always be two synapses overlapping between neighboring patterns. The patterns are depicted in Fig. 16. 71 Hz were used to maintain the same vector length as in the previous experiment.
In this experiment, the learning parameters were much more critical to achieve an increase of the output entropy through learning. Besides reaching a maximal output entropy, also stability of the network state was a major concern. Since the input patterns are overlapping and presented as independent Poissondistributed spike trains, even an optimal state of the neuronal net (where each of the eight neurons is optimally tuned for one of the eight inputs) is not necessarily stable. Let us look at two neurons ( and ) that are optimally tuned for two neighboring patterns. Let us call them patterns and . Optimally tuned means that the weight vector has the same direction as and the input pattern vector ( where is the weight vector of neuron and is the weight vector of neuron and is the learning rule parameter [see (6) and (10)] that denes the weight vector length). Thus, the optimally tuned neuron will give the highest response when presented with the corresponding input pattern and very likely win the WTA competition.5 Due to the overlap of the input patterns, and will have common nonzero elements, and, therefore,
5Compare with the I/O relationship of the rate model of (24): The output is linearly dependent on the scalar product of the weight vector and the input vector, which is, for given vector lengths, maximal if the weight vector and the input vector are pointing in the same direction and the angle between them is zero.
566
Fig. 18. Experiment with overlapping clear patches. (a) Probability distribution of output entropy with random weights: initial state before learning. Monte Carlo simulation. Avg: 2.01. (b) Output entropy after 100 learning cycles. Avg: 2.53. Chip Measurement, 50 trials total. (c) Average over the 50 trials of the evolution of the output entropy over 100 learning cycles. Chip measurement.
Fig. 19. Stimulus set: overlapping diffuse patches. Sixteen stimuli in total that are copies of the rst stimulus that is shown here shifted by steps of one.
will also respond to input and to in the absence of cross inhibition, albeit more weakly. Thus, due to the randomness of the Poisson-distributed inputs, it is always possible that not the neuron that is optimally tuned but the other one res the rst action potential after the stimulus appears. Since there actually is strong cross inhibition, it will then suppress the optimally tuned neuron, continue to re, and become itself more tuned to this stimulus through learning. In this manner, neurons may swap stimulus preferences. There are basically two ways of addressing this problem. Render the chances of a false winner small: One way to make such an incidence less likely is to use the weight to keep the weights vector normalization parameter small such that more inputs are needed to reach the ring threshold. Thus, the randomness of the Poisson spike trains is somewhat averaged out before the threshold is reached the rst time. Render the damage in case of a false winner small: Alternatively/additionally, one can keep the learning rate very small. Thus, a few lapses in determining the correct winner will not invalidate the preferences of the neurons.
On the other hand, quick convergence and symmetry breaking demands6 would prot from a big learning rate and a long weight vector. Thus, there is a tradeoff one has to make (like in many other learning algorithms) between adaptability and stability. Still, in simulation, with the full freedom of choice for the learning parameters, excellent convergence and stability properties were achieved. Fig. 17(a) illustrates this. The top gray-scale gure shows the response of one arbitrary neuron of the eight neurons over the 100 learning cycles ( -axis) in one experiment. The -axis represents the eight input vectors. Thus, the rst gray patch on the left in row 7 indicates a weak response to input pattern 7 in the rst cycle of pattern presentation. Then, the neuron does not respond at all in the next cycle, and then in cycles 3 and 4 there is a response to pattern 1 as indicated by the gray patches in row 1, and so on. After some initial inactivity and occasional responses to patterns 1, 7, and 8, this neuron nally becomes responsive to pattern 7. Due to the moderate learning rate, the occasional responses to neighboring patterns later on do not change the neurons preference. The lower trace shows the convergence of the output entropy in this experiment: Already during the rst presentation, the neurons are adapting and a information capacity of about 2.5 bit is immediately reached. After a few more cycles, the network settles on the optimal solution (output entropy is 3.0), with a few occasional small drops, as a false winner responds. The situation for the physical chip implementation is different though and one limitation became obvious. It was actually not possible to choose a learning rate smaller than a certain limit. The six-level weak multilevel weight storage cell is mainly to blame. These six levels are slow attractors, such that on a short-time scale, the weight storage is capacitive storage and full analog. Only in the course of several milliseconds is
6We speak of symmetry breaking here when the following situation is resolved. Two neurons have a very similar weight vector and show a tendency to adapt to the same input pattern. This tendency is broken if one of the neurons makes a big step towards responding exclusively and strongly to that stimulus and, thus, clearly wins over the other neuron. This other neuron is not yet completely adapted to the fought for pattern and can, thus, still adapt to another stimulus pattern, where there is no or less competition. As previously mentioned, in simulations with a small learning rate, such situations may not get resolved at all and both neurons may gradually adapt to the same stimulus.
567
Fig. 20. Example traces for classication learning task with overlapping diffuse patches. (a) Simulation data. Low learning rate. (b) Chip measurement. High learning rate. Top: example trace of activity of one neuron. Gray scale between (a) 0 (black) events to 15 and (b) 25 events (white), respectively. Bottom: convergence of output entropy.
the weight converging to the nearest of the six levels unless it is actively updated again. (Measurements indicate that the time constant of this attractive behavior is in the order of 10 ms.) Now, with the ideal parameter setting for the simulation, neurons would re with about 10 Hz. Remember that the learning rule dictates weight updates when the neurons re. Thus, these updates lie approximately 100 ms apart. However, this makes it unfortunately impossible to use the cumulative effect of two or more weight updates to drive the weights from one basin of attraction of the multilevel memory to another: After 100 ms, the weight will already have settled again on one of the six attractors of the memory and the effect of any small weight update will not be evident any longer. Thus, the learning rate has to be big enough, such that one update can cause a big enough change (approximately 170 mV for attractors that lie 340 mV apart) to move the weight from one memory level to another. Thus, the learning rate (and the average ring frequencies too as controlled by the input frequencies and the weight vector normalization length ) had to be chosen somewhat higher than what had been ideal in simulation, resulting in the network state being less stable. This can be seen in Fig. 17(b). The top graph shows the stimulus preference of one neuron on the chip in the course of an experiment, coding activity in gray scale. The neuron nds its preferred stimulus quickly enough, but then has frequent es-
capades to one neighboring stimulus and is sometimes on the verge of switching over to that stimulus completely. The bottom graph shows the convergence of the output entropy in this trial. A good level (of transiently 2.93 in this example) is reached more quickly than in the simulation, due to the higher learning rate, but then the state is less stable and does never completely reach the optimum. In Fig. 18(b), 50 learning trials are summarized. The output entropies in the 100th learning cycle are shown in a histogram. Fig. 18(a) shows the probability distribution of the performance of the network after random initialization of the weights. Both histograms indicate that this learning task was considerably harder than the previous (compare Fig. 15): On average, an output entropy of 2.53 was reached (as opposed to 2.85 in the previous task) and also the initial state after weight randomas opposed to 2.25 ization has worse performance ( before). However, the improvement achieved by learning (i.e., the difference of the entropies) is of the same order in both cases. F. Overlapping Diffuse Patches Finally, some experiments were conducted with stimuli closer to what may appear in a real-world context. Sixteen stimuli were used on a network of eight neurons again, such that lossless in-
568
Fig. 21. Experiments with overlapping diffuse patches. (a) Probability distribution of the neural nets output entropy with random weights, i.e., initial state before learning. Monte Carlo simulation. Avg: 2.35. (b) Output entropy after 100 learning cycles. Chip Measurement, 66 trials total. Avg: 2.83 bits. (c) Average over the 66 trials of the evolution of the output entropy over 100 learning cycles. Chip measurement.
Fig. 22. Raster plot of neuron activity as all 32 neurons on the chip are shown a real-world cyclic stimulus from the CAVIAR processing chain after random initialization of the weights with learning turned off. Cycle times are indicated by the vertical dotted lines. Two neurons (9 and 20) are very dominant and mostly active for more than one cycle. This is a bad representation of the stimulus that will not allow to determine the stimulus position within a cycle.
Fig. 23. Same as Fig. 22 but with learning turned on. The dominance of the two neurons 9 and 20 gets broken. Many neurons become active and are sometimes phase locked with the stimulus for a few cycles at different phase delays. Their response pattern is, however, not stable. The neurons suffer from the same stability problem as in the experiments with articial stimuli, even to a bigger degree because of the more noisy real-world input.
G. Learning in the CAVIAR System formation transmission is no longer possible. The patterns were diffuse patches and more than one frequency was used, such that the edges of the patches were not sharp. A pattern stimulated four synapses, two central synapses with 100 Hz, and two to either side with 20 Hz (see Fig. 19). The performance of the simulation is again nearly awless and the chip too reaches much better results than with the previous set of inputs. Fig. 20(a) shows examples of simulated learning experiment and Fig. 20(b) shows the test run on the chip. Again, the performance of the simulation is almost perfect and the optimal information capacity is quickly reached and stable in a small entropy range between 2.93.0. The experiment with the chip implementation as well reaches a very good output entropy, but is again less stable as can be seen in the top graph, where the observed neuron shifts preference to neighboring stimuli along the way. Fig. 21 summarizes the output entropy as estimated after weight random initialization and for a total of 66 trials on the chip after 100 learning cycles. To indicate the kind of tasks that are intended for this chip in the near future, we would like to show a preliminary successful experiment in the context of the entire CAVIAR system (compare Fig. 1), where learning improved the representation of a cyclic stimulus. The CAVIAR processing chain was shown a circle drawn on a panel that was rotated in front of the CAVIAR silicon retina at one revolution per 4.07 s. The feature map was programmed to extract the center of this circle and the CAVIAR WTA chip removed any but the strongest response in the observed plane, simultaneously reducing the resolution to 16 16 pixels. An AE-mapper (not explicitly shown in Fig. 1) was used to project this 16 16 output to a 2 2 output passed on to the CAVIAR delay line chip. The delay line chip again expanded the temporal information into a 3-D spatial representation by producing two copies of AER stream at delays of approximately 0.3 and 0.6 s. Thus, the learning neurons would be exposed to 12 spike trains from a 2 2 3 spatiotemporal representation of the stimulus position.
569
Fig. 24. Same as Fig. 23 but with learning turned off again. The networks remains in exactly the state it was in as the learning was turned off. This is one of several unstable states the network went through during learning and all of those states, if frozen, result in a good stimulus representation. The activities of several neurons are reliably phase locked at different phase delays and allow stimulus reconstruction with a better resolution than the cycle time.
Theweightswereat rstrandomizedbystimulatingalllearning synapses of one neuron with independent Poisson spike trains of high-frequency (high as compared to the neuron time constant) together with a low-frequency spike train to a strong nonlearning synapse. Then, the response of the 32 neurons of the chip to the cyclic stimulus was recorded with the learning turned off (by setting down bias to 0 V and up bias to 3.3 V; see Fig. 22). The resulting stimulus representation is quite bad with only a few neurons dominating the network activity for long periods of time. By just looking at this activity, it is not even possible to guess at the cycle time of the stimulus. In Fig. 23, the learning was turned on. More neurons get the possibility to be active for periods shorter than a stimulus cycle and some of them become phase locked with the stimulus for a few cycles. The network response, however, is not stable, suffering from the same stability problems encountered in the previous experiments with the articial stimuli. The situation is even more difcult because of the noisy real-world input. Fig. 24 illustrates a simple solution for this for many practical applications: One can simply turn the learning off again and the network will remain in one of the intermediate states. The result is a big improvement over the initial state and now of course also stable. The neurons that are active are all phase locked with the stimulus and respond quite reliably to a certain phase delay of the stimulus, allowing to deduce the stimulus position with a good resolution. The temporal information from the delay line chip is used to achieve a resolution that is better than the 2 2 spatial representation that is its input. V. CONCLUSION In this paper, a circuit that was originally conceived to express spike-based learning (or spike timing-dependent plasticity) proves to be an efcient implementation of classical Hebbian learning as well. The implemented learning rule can express either of those behaviors dependent on input statistics and a single time constant. [That is, one constant in the theoretical rule, controlled by two voltage biases ( soma leak and lu leak) on the chip.] Thus, an implementation of Hebbian learning
has been achieved without explicit computation/representation of long-term signal averages. That is to say that although the circuit behavior can be described in terms of the average input frequencies and output frequency , these averages are not explicitly represented. This method of implementing classical Hebbian learning is very well suited for a chip that is to be used in an AER communication framework. It requires no detour in converting spike signal representation from the AER bus to average rate representation and back again. Such conversions would be sources of additional noise and delay. Furthermore, the local variables that are used at the synapse are also plausible in a biological neuron. For instance, local increase in calcium is known to occur when a synapse on a dendritic spine receives an action potential [46]. This could account for the variable in the learning rule. It is also known that if a cell res an action potential, this signal is also propagated back through the dendrites, and thus noticeable at the synapses [47]. Thus, the variable is also accessible at a real synapse. An application has been demonstrated where learning neurons are connected by global cross inhibition to WTA network and, thus, express competitive Hebbian learning. This conguration has been used to optimize the encoding of a set of input patterns, i.e., to reduce information loss (increase the output entropy) in a classication task. In three series of experiments, with three different sets of computer generated inputs the performance of the chip implementation and simulation has been compared. Eight output neurons were used which limited the maximal information they could convey to 3 bits. The chip implementation always reduced information loss after random initialization substantially. This was achieved with the same set of parameters for all three input sets. Still, the chips performance could not quite equal the ideal simulations for reasons we will briey summarize below. The simulations with analog memory reached the optimal results (3-bit entropy on the neurons outputs) in nearly every trial. The improvement by the chip implementation measured between 0.48 and 0.60 bits, i.e., between 16% to 20% of the maximal possible value of 3 bits. The relative improvement was, thus, quite comparable between the different input sets. The absolute end-results, however, was more dependent on the input sets and measured between 2.532.85 bits (84% and 95% of the optimum). One reason for the difference in performance between simulation and physical implementation was element mismatch. This caused unfair advantages for some synapses and neurons, resulting in less even distributions of the WTA output activity. The main reason, however, was a limitation to relatively high learning rates on the chip. This prevents the necessary ne tuning to reach an optimal solution. This relatively high learning rate is imposed by a too short convergence time constant of the particular multilevel memory that has been used to store synaptic weights [28]. This memory acts as a perfect analog memory on a short-time scale but discretizes its content over a longer time scale. If the time scale of discretization can be extended in future versions of the memory, ner weight updates will not be lost as quickly and can accumulate. Consequently, a lower learning rate will be possible. The high forced learning rate on the chip did not just inuence the quality but also the stability of learning states: Although a
570
good quality of the encoding was usually quickly learned, the neural network tended to continue swapping between different equally good solutions. This same stability problem was also very evident in a rst experiment with real-world inputs within the CAVIAR image processing system. The chip learned to improve the representation of a cyclic stimulus. The stability problem in this practical context was solved by simply turning the learning off again and an improved stable performance was the result.
. Thus, . Differentiating this, we get
for
(19) otherwise of the contribution of a Finally, the probability density spike having decayed to the fraction by the time of the next and as output spike can be computed from otherwise (20) 2) Expected Value of : The expected value probability variable is therefore of the
APPENDIX I SUPPLEMENTARY PROOFS FOR RATE BEHAVIOR A. Equation (12) In the following, we will derive (12). Central to this will be the probability density function of probability variable . is the fraction to which an increment of caused by a synaptic input will have decayed as the cell produces an output. For instance, if a synapse receives an input at and the neuron res the next AP at , then the contribution of 1 to the variable at that synapse will have decayed to ; so, for this particular synaptic . is the probability density function input event is for as an arbitrary input event is chosen under the assumption that both input and output events are independently Poisson-distributed with average frequencies and . Then, with this probability density, we can compute the exof . This value multiplied with the input avpected value is nothing else but the average element erage frequencies values of the causality measure vector as it is sampled at times which will turn out to be (12). of APs, namely 1) Expected Decay of Increments to : First, we will compute of an arbitrary presynaptic spike the probability density arriving in a postsynaptic interspike interval of length . This probability density will be proportional to the interval length multiplied with the probability of a postsynaptic interval being of that length (16) , A next property we will deduce for later use is the probability density of the contribution of an input spike to the correlation signal having decayed from 1 down to as the neuron res, given that it nds itself in an postsynaptic inter spike interval of length . The presynaptic spike may nd itself with equal probability anywhere in the interval . Let us denote its time of occurrence relative to the beginning of the interval is given as with . Then, otherwise Its integral (17)
(21) Now remember the denition of : It is the value to which an increment of a local variable has decayed as the cell produces an AP. Thus, an individual input to this synapse will cause to be on average equal to as it is sampled for a weight update. Thus, inputs during an averaging interval will cause to be on average equal to as it is sampled for a weight update. The time average of as it is sampled for weight updates which yields (12) and is . Thus, is nothing else but completes our proof. B. I/O Relation for Rate Model In Section II-B3, an assumption is made in order to be able to deduce the behavior of the learning rule (13) in terms of average rates and of the inputs and the output. The assumption is that the input spike trains and the output spike train are all independently Poisson distributed. It is also said that by using an I&F neuron, this assumption is never completely true. Here, we propose another theoretical I/O relationship dened for a rate model neuron that fullls this assumption by denition, and that 1) approximates the I&F neurons rate and 2) complements the rate learning rule (13) with a rate I/O relationship that also achieves exact implicit weight vector normalization (just like the I&F neuron). An I&F neuron transforms the charge on the cell membrane is always just at the into spikes. In its case that charge threshold (that we have dened to be 1) as this transformation happens. Thus, the amount of charge transformed into spikes is approximately equal to the number of spikes. The amount of that is thus transformed is equal to the charge per time output rate (22) Now, we try to derive a rate I/O relationship with (per denition) independent Poisson spike signals that shares this property with I&F neurons: Using the fact expressed in (8), that can be expressed by , using (12) and separating the slower time scale of change of , (22) can be rewritten as (23)
, the accumulated probability, is given as (18)
We now use the variable from before, the fraction to which a contribution of a presynaptic spike to has decayed as a neuron res. It can be deduced from by . Its inverse is
571
This equation is already sufcient to prove the implicit weight vector normalization for this combination of learning rule and I/O relationship: Setting the left-hand side of (13) to zero and , then using (23) and solving multiplying it from the left with gives the same result (10) as for the spike-based case. for Finally, to get an explicit solution for , we can derive a quadratic equation for from (23) with the two solutions and . The following linear threshold I/O relationship is thus a valid solution: (24) This I/O relationship is also a good approximation in general for the rate behavior of a leaky I&F neuron. REFERENCES
[1] C. A. Mead, Neuromorphic electronic systems, Proc. IEEE, vol. 78, no. 10, pp. 16291636, Oct. 1990. [2] A. L. Hodgkin and A. F. Huxley, Current carried by sodium and potassium ions through the membrane of the giant axon of loglio, J. Physiol., vol. 116, p. 449, 1952. [3] K. Fukushima, Y. Yamaguchi, M. Yasuda, and S. Nagata, An electronic model of the retina, Proc. IEEE, vol. 58, no. 12, pp. 19501951, Dec. 1970. [4] C. A. Mead and M. Mahowald, A silicon model of early visual processing, Neural Netw., vol. 1, pp. 9197, 1988. [5] C. Mead, Analog VLSI and Neural Systems. Reading, MA: AddisonWesley, 1989. [6] A. G. Andreou and K. A. Boahen, A contrast sensitive silicon retina with reciprocal synapses, in Advances in Neural Information Processing Systems (NIPS). Cambridge, MA: MIT Press, 1991, vol. 4, pp. 764772. [7] K. A. Boahen, The retinomorphic approach: Pixel-parallel adaptive amplication, ltering, and quantization, in Neuromorphic Systems Engineering, T. S. Lande, Ed. Norwell, MA: Kluwer, 1998, ch. 11. [8] E. Culurciello, R. Etienne-Cummings, and K. Boahen, A biomorphic digital image sensor, IEEE J. Solid-State Circuits, vol. 38, no. 2, pp. 281294, Feb. 2003. [9] M. Azadmehr, J. Abrahamsen, and P. Higer, A foveated AER imager chip, in Proc. IEEE Int. Symp. Circuits Syst., Kobe, Japan, May 2005, vol. 3, pp. 27512754. [10] J. Lazzaro and C. A. Mead, Circuit models of sensory transduction in the cochlea, in Analog VLSI Implementations Neural Networks, C. A. Mead and M. Ismail, Eds. Norwell, MA: Kluwer, 1989, pp. 85101. [11] R. Sarpeshkar, R. F. Lyon, and C. Mead, A Low-Power Wide-Dynamic-Range Analog VLSI Cochlea, in Neuromorphic Systems Engineering, T. S. Lande, Ed. Norwell, MA: Kluwer, 1998, ch. 3, pp. 49103. [12] A. van Schaik and S.-C. Liu, AER EAR: A matched silicon cochlea pair with address event representation interface, in Proc. IEEE Int. Symp. Circuits Syst., Kobe, Japan, 2005, vol. 5, pp. 42134216. [13] D. O. Hebb, The Organization of Behavior. New York: Wiley, 1949. [14] T. V. P. Bliss and T. Lm, Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path, J. Physiol., vol. 232, no. 2, pp. 331356, Jul. 1973. [15] H. Markram, J. Lbke, M. Frotscher, and B. Sakmann, Regulation of synaptic efcacy by coincidence of postsynaptic APs and EPSPs, Science, vol. 275, pp. 213215, 1997. [16] S. Song, K. D. Miller, and L. F. Abbott, Competitive Hebbian learning through spike-timing dependent synaptic plasticity, Nature Neurosci., vol. 3, pp. 919926, 2000. [17] W. Gerstner, R. Kempter, J. van Hemmen, and H. Wagner, A neuronal learning rule for sub-millisecond temporal coding, Nature, vol. 383, pp. 7678, 1996. [18] P. Higer, M. Mahowald, and L. Watts, A spike based learning neuron in analog VLSI, in Advances in Neural Information Processing Systems (NIPS). Cambridge, MA: MIT Press, 1996, vol. 9, pp. 692698.
[19] S. Fusi, M. Annunziato, D. Badoni, A. Salamon, and D. J. Amit, Spike-driven synaptic plasticity: Theory, simulation, VLSI implementation, Neural Comput., vol. 12, pp. 22272258, 2000. [20] A. Boll, A. F. Murray, and D. Thompson, Circuits for VLSI implementation of temporally asymmetric Hebbian learning, in Advances in Neural Information Processing Systems (NIPS). Cambridge: MIT Press, 2001, vol. 14. [21] A. Boll and A. F. Murray, Synchrony detection by analogue VLSI neurons with bimodal STDP synapses, in Advances in Neural Information Processing Systems (NIPS). Cambridge, MA: MIT Press, 2003, vol. 16. [22] G. Indiveri, Neuromorphic bistable VLSI synapses with spike-timingdependent plasticity, in Advances in Neural Information Processing Systems (NIPS). Cambridge, MA: MIT Press, 2003, vol. 16. [23] K. Cameron, V. Boonsobhak, A. Murray, and D. Renshaw, Spike timing dependent plasticity (STDP) can ameliorate process variations in neuromorphic VLSI, IEEE Trans. Neural Netw., vol. 16, no. 6, pp. 16261637, Nov. 2005. [24] G. Indiveri, E. Chicca, and R. Douglas, A VLSI array of low-power spiking neurons and bistable synapses with spike-timing dependent plasticity, IEEE Tran. Neural Netw., vol. 17, no. 1, pp. 211221, Jan. 2006. [25] Z. Yang, A. Murray, F. Wrgtter, K. Cameron, and V. Boonsobhak, A neuromorphic depth-from-motion vision model with STDP adaptation, IEEE Tran. Neural Netw., vol. 17, no. 2, pp. 482495, Mar. 2006. [26] M. Holler, S. Tam, H. Castro, and R. Benson, An electrically trainable articial neural network (ETANN) with 10240 oating gate synapses, in Proc. Int. Joint Conf. Neural Netw., Jun. 1989, no. II, pp. 191196. [27] C. Diorio, S. Mahajan, P. Hasler, B. Minch, and C. Mead, A highresolution non-volatile analog memory cell, in Proc. IEEE Intl. Symp. Circuits Syst., 1995, vol. 3, pp. 22332236. [28] P. Higer and H. K. Riis, A multi-level static memory cell, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), Bangkok, Thailand, May 2003, vol. 1, pp. 2225. [29] R. Serrano-Gotarredona, M. Oster, P. Lichtsteiner, A. Linares-Barranco, R. Paz-Vicente, F. Gmez-Rodrguez, H. K. Riis, T. Delbrck, S. C. Liu, S. Zahnd, A. M. Whatley, R. Douglas, P. Higer, G. Jimenez-Moreno, A. Civit, T. Serrano-Gotarredona, A. Acosta-Jimnez, and B. LinaresBarranco, Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2006, vol. 18. [30] M. Mahowald, An Analog VLSI System for Stereoscopic Vision. Norwell, MA: Kluwer, 1994. [31] A. Mortara and E. A. Vittoz, A communication architecture tailored for analog VLSI articial neural networks: Intrinsic performance and limitations, IEEE Trans. Neural Netw., vol. 5, no. 3, pp. 459466, May 1994. [32] P. Lichtensteiner, C. Posch, and T. Delbrck, A 128 128 120 db 30 mw asynchronous vision sensor that responds to relative intensity change, in Proc. ISSCC Dig. Tech. Papers, 2006, pp. 508509. [33] A. Serrano-Gotarredona, T. Serrano-Gotarredona, A. J. AcostaJimnez, and B. Linares-Barranco, An arbitrary kernel convolution AER-transceiver chip for real-time image ltering, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), Kos, Greece, 2006, pp. 31453148. [34] M. Oster and S. C. Liu, A winner-take-all spiking network with spiking inputs, in Proc. IEEE Int. Conf. Electron., Circuits Syst., Tel Aviv, Israel, 2004, pp. 203206. [35] S. C. Liu and M. Oster, Feature competition in a spike-based winnertake-all VLSI network, in Proc. IEEE Int. Symp. Circuits Syst., Kos, Greece, 2006, pp. 36343637. [36] P. Higer, Asynchronous event redirecting in bio-inspired communication, in Proc. 8th IEEE Int. Conf. Electron., Circuits, Syst., Malta, Sep. 2001, vol. 1, pp. 8790. [37] R. P. Vicente, F. G. Rodriguez, M. A. R. Jodar, A. L. Barranco, G. J. Moreno, and A. A. C. Balcells, Test infrastructure for address-event-representation communications, in Lecture Notes in Computer Science. Berlin, Germany: Springer-Verlag, 2005, vol. 3512, pp. 518526. [38] H. K. Riis and P. Higer, An asynchronous 4-to-4 AER mapper, in Lecture Notes in Computer Science. Berlin, Germany: SpringerVerlag, 2005, vol. 3512, pp. 494501. [39] P. Higer, A spike based learning rule and its implementation in analog hardware Ph.D. dissertation, ETH, Zrich, Switzerland, 2000 [Online]. Available: http://www.i.uio.no/haiger [40] R. Kempter, W. Gerstner, and J. L. van Hemmen, Spike-based compared to rate-based Hebbian learning, Neural Inf. Process. Syst., vol. 11, pp. 125131, 1999.
572
[41] , Hebbian learning and spiking neurons, Phys. Rev. E, vol. 59, no. 4, pp. 44984514, 1999. [42] E. M. Izhikevich and N. S. Desai, Relating STDP to BCM, Neural Comput., vol. 15, pp. 15111523, 2003. [43] J. Abrahamsen, P. Higer, and T. S. Lande, A time domain winner-take-all network of integrate-and-re neurons, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), Vancouver, BC, Canada, May 2004, vol. V, pp. 361364. [44] C. E. Shannon, A mathematical theory of communication, in Claude Elwood Shannon Collected Papers, N. J. A. Solane and A. D. Wyner, Eds. Piscataway, NJ: IEEE Press, 1993. [45] A. Guazzelli, M. Bota, and M. A. Arbib, Competitive Hebbian learning and the hippocampal place cell system: Modeling the interaction of visual and path integration cues, Hippocampus, vol. 11, pp. 216239, 2001. [46] R. Yuste and W. Denk, Dendritic spines as basic functional units of neuronal integration, Nature, vol. 375, pp. 682684, 1995. [47] G. Stuart, N. Spruston, B. Sakmann, and M. Husser, Action potential initiation and backpropagation in neurons of the central nervous system, Trends Neurosci. (TINS), vol. 3, pp. 125131, 1997.
Philipp Higer (M02) was born in Zrich, Switzerland, in 1970. He studied computer science at the Federal Institute of Technology (ETH), Zrich, Switzerland, with astronomy as a second subject, from 1990 to 1995. He received the Ph.D. degree from the Institute for Neuroinformatics at ETH, in 2000. Currently, he works as an Associate Professor in the Microelectronics Group, the Department of Informatics, University of Oslo, Oslo, Norway. He teaches a lecture in neuromorphic electronics and he is a Local Coordinator for the European 5th Framework Information Society Technologies (IST) Project Convolution AER Vision Architecture for Real-Time (CAVIAR), with partners in Spain and in Switzerland. Dr. Higer is a member of several IEEE Circuits and Systems (CAS) society technical committees.

04118260

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

04118260

Загружено:

Авторское право:

Доступные форматы

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO.

Adaptive WTA With an Analog VLSI Neuromorphic Learning Chip

I. INTRODUCTION A. Neuromorphic Electronics

1045-9227/$25.00 2007 IEEE

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

HFLIGER: ADAPTIVE WTA WITH AN ANALOG VLSI LEARNING CHIP

Fig. 1. Five blocks of the demonstrator of the CAVIAR project.

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

HFLIGER: ADAPTIVE WTA WITH AN ANALOG VLSI LEARNING CHIP

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

HFLIGER: ADAPTIVE WTA WITH AN ANALOG VLSI LEARNING CHIP

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

HFLIGER: ADAPTIVE WTA WITH AN ANALOG VLSI LEARNING CHIP

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

HFLIGER: ADAPTIVE WTA WITH AN ANALOG VLSI LEARNING CHIP

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

HFLIGER: ADAPTIVE WTA WITH AN ANALOG VLSI LEARNING CHIP

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

HFLIGER: ADAPTIVE WTA WITH AN ANALOG VLSI LEARNING CHIP

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

HFLIGER: ADAPTIVE WTA WITH AN ANALOG VLSI LEARNING CHIP

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

HFLIGER: ADAPTIVE WTA WITH AN ANALOG VLSI LEARNING CHIP

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

. Thus, . Differentiating this, we get

, the accumulated probability, is given as (18)

HFLIGER: ADAPTIVE WTA WITH AN ANALOG VLSI LEARNING CHIP

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

Вам также может понравиться