You are on page 1of 8

Fellous and Suri. The roles of Dopamine.

To appear in The Handbook of Brain Theory and Neural Networks, Second edition,
(M.A. Arbib, Ed.), Cambridge, MA: The MIT Press, 2002.http://mitpress.mit.edu
The MIT Press

The Roles of Dopamine


Jean-Marc Fellous and Roland E. Suri

The Salk Institute for Biological Studies


Computational Neurobiology Laboratory
10010 N. Torrey Pines rd,
La Jolla, CA 92037
Tel: 858-453-4100, x1618
Fax: 858-587-0417
fellous@salk.edu
suri@salk.edu

INTRODUCTION and depend of the nature and distribution of the


Dopamine (DA) is a neuromodulator (see: postsynaptic receptors. At the single cell level,
NEUROMODULATION IN INVERTEBRATE in the in vitro rat preparation, DA has been
NERVOUS SYSTEMS and SYNAPTIC found to either increase or decrease the
CURRENTS, NEUROMODULATION AND excitability of neurons, through the modulation
KINETIC MODELS) that originates from small of specific sets of sodium, potassium and
groups of neurons in the mesencephalon (the calcium currents (see (Gulledge and Jaffe,
ventral tegmental area (A10), the substantia 1998), and (Nicola et al., 2000) for reviews).
nigra (A9) and A8) and in the diencephalon While the exact nature of the modulation is still
(area A13, A14 and A15). Dopaminergic debated, it is likely to depend on the opposing
projections are in general very diffuse and reach contributions of the D1/D5 and D2/D3 family of
large portions of the brain. The time scales of dopamine receptors that are respectively
dopamine actions are diverse from few hundreds positively and negatively coupled with adenylate
of milliseconds to several hours. We will focus cyclase. Studies in monkey cortical tissue
here on the mesencephalic dopamine centers showed that the D1/D5 family of receptor was
because they are the most studied, and because 20-fold more abundant than the D2/D3 family,
they are thought to be involved in diseases such and that they were present distally in both
as Tourettes syndrome, schizophrenia, pyramidal and non-pyramidal cells (Goldman-
Parkinsons disease, Huntingtons disease, drug Rakic et al., 2000).
addiction or depression (see DISEASE:
NEURAL NETWORK MODELS and Dopamine modulates excitatory and
(Tzschentke, 2001)). These centers are also inhibitory synaptic transmission. While the
involved in normal brain functions such as nature of neuromodulation of inhibitory
working memory, reinforcement learning, and transmission is still debated, it appears that in
attention. This article briefly summarizes the both the cortex and the striatum, D1 receptor
main roles of dopamine in particular with activation selectively enhances NMDA but not
respect to recent modeling approaches. AMPA synaptic transmission. Because of their
voltage dependence, NMDA currents are smaller
BIOPHYSICAL EFFECTS OF at rest than in a depolarized state when the
DOPAMINE postsynaptic cell is firing. Experimental and
The effects of dopamine on membrane theoretical evidence suggest that the dopamine
currents and synaptic transmission are complex enhancement of NMDA currents may be used to
Fellous and Suri. The roles of Dopamine. 2

induce working memory-like (see below) increasing background noise, essentially


bistable states in large networks of pyramidal increasing the signal-to-noise ratio during the
neurons (Lisman et al., 1998). task. There is however an optimal level of
In rats in vivo, stimulation of the ventral dopamine concentration above and below which
tegmental area or local application of dopamine working memory becomes impaired. Current
decreases the spontaneous firing of the theories propose that this effect is due to the
prefrontal cortex (Thierry et al., 1994), striatum enhancement by dopamine of excitatory inputs
and nucleus accumbens (Nicola et al., 2000), on pyramidal cells and interneurons observed in
suggesting that dopamine may be able to control vitro. Because DA is more effective in
the levels of noise, and hence signal-to-noise facilitating excitatory transmission on pyramidal
ratios. cells than on interneurons, intermediate levels of
Given that dopamine modulation strongly DA improves performance, while higher levels
depends on the particular distribution of D1/D5 of DA recruits feed forward inhibition and
and D2/D3 receptors and on the particular decrease pyramidal cell outputs, therefore
pattern of incoming synaptic transmission, the resulting in impairments in the task. Low levels
biophysical effects of dopamine on the intrinsic of DA would not be sufficient in inducing
and synaptic properties is likely to differ from excitatory facilitation, yielding a poor pyramidal
one neuron to the next, raising the intriguing cell output, and hence an impairment (Fig 1 and
possibility of the existence of several subclasses (Goldman-Rakic et al., 2000)). There has been a
of neurons that differ only by their responses to few attempts at modeling the neural substrate of
this neuromodulator. working memory, but very little has yet been
done to account for the role of dopamine
DOPAMINE LEVELS INFLUENCE (Tanaka, 2001).
WORKING MEMORY
Working memory refers to the ability to hold DOPAMINE RESPONSES RESEMBLE
a few items in mind, with the explicit purpose of REWARD PREDICTION SIGNAL OF TD
working with them to yield a behavior (see MODEL
SHORT-TERM MEMORY). Typically, A large body of experimental evidence led to
working memory tasks such as spatial delayed the hypothesis that Pavlovian learning depends
match-to-sample tasks consist in the brief on the degree of the unpredictability of the
presentation of a cue-stimulus (bright dot reinforcer (Dickinson, 1980). According to this
flashing once) in one of the 4 quadrants of a hypothesis, reinforcers become progressively
screen, followed by a delay period of several less efficient for behavioral adaptation as their
seconds, and by a test where the subject has to predictability grows during the course of
respond only if the test stimulus appears the learning. The difference between the actual
same quadrant as the cue-stimulus. Single cells occurrence and the prediction of the reinforcer is
studies in monkeys revealed that some prefrontal usually referred to as the error in the
cortical cells increased their firing rate during reinforcer prediction. This concept has been
the delay period, when the stimulus is no longer used in the temporal-difference model (TD
present but when the animal has to remember its model) of Pavlovian learning (see
location in order to later perform the correct REINFORCEMENT LEARNING IN MOTOR
action. Both pyramidal cells and interneurons CONTROL). If the reinforcer is a reward, the
may present this property. The activity of these TD model uses a reward prediction error signal
cells is stimulus dependent, so that only the cells to learn a reward prediction signal. The error
that encode for the spatial location where the signal progressively decreases and shifts to the
cue-stimulus occurred remain active during the time of earlier stimuli that predict the reinforcer.
delay period. The characteristics of the reward prediction
Local iontophoretic administrations of signal are comparable to those of anticipatory
DA in the prefrontal cortex of monkeys responses such as salivation in Pavlov's
performing a working memory task increase the experiment.
cells firing rate during the delay period, without
Fellous and Suri. The roles of Dopamine. 3

The reward prediction error signal of the successfully applied to machine learning studies
TD model remained a purely hypothetical signal (see REINFORCEMENT LEARNING IN
until researchers discovered that the activity of MOTOR CONTROL). Midbrain dopamine
midbrain dopamine neurons is strikingly similar neurons project to the striatum and cortex and
to the reward prediction error of the TD model are characterized by rather uniform responses
(Fig. 2A) (Montague et al., 1996; Schultz, throughout the whole neuron population.
1998). Advances in reinforcement learning Computational modeling studies with Actor-
theories and evidence for the involvement of Critic models show that such a dopamine-like
dopamine in sensorimotor learning and in reward prediction error can serve as a powerful
cognitive functions lead to the development of teaching signal for learning with delayed reward
the Extended TD model. The reward prediction and for learning of motor sequences (Suri and
error signal of the TD model by (Suri and Schultz, 1999). These models are also consistent
Schultz, 1999) reproduces dopamine neuron with the role of dopamine in drug addiction and
activity in several situations: (1) upon electrical self-stimulation (see below).
presentation of unpredicted rewards, (2) before, Comparison of the Actor-Critic architecture to
during, and after learning that a stimulus biological structures suggests that the Critic may
precedes a reward, (3) when two stimuli precede correspond to pathways from limbic cortex via
a reward with fixed time intervals, (4) when the limbic striatum (or striosomes) to dopamine
interval between the two stimuli are varied, (5) neurons, whereas the Actor may correspond to
in the case of unexpectedly omitted reward, (6) pathways from neocortex via sensorimotor
delayed reward, (7) reward earlier than striatum (or matrisomes) to basal ganglia output
expected, (8) in the case of unexpectedly nuclei (see BASAL GANGLIA) (Fig. 2B).
omitted reward-predictive stimulus, (9) in the Whereas this standard Actor-Critic model
case of a novel, physically salient stimulus that mimics learning of sensorimotor associations or
has never been associated with reward (see habits, it does not imply that dopamine is
allocation of attention, below), (10) and for the involved in anhedonia.
blocking paradigm. To reach this close
correspondence, three constants of the TD model ALLOCATION OF ATTENTION
were tuned to characteristics of dopamine Several lines of evidence suggest that
neuron activity (learning rate, decay of dopamine is also involved in attention processes.
eligibility trace, and temporal discount factor), Although the firing rates of dopamine neurons
some weights were initialized with positive can be increased or decreased for aversive
values to achieve (9), and some ad hoc changes stimuli, dopamine concentration in striatal and
of the TD algorithm were introduced to cortical target areas are often increased (Schultz,
reproduce (7) (see below). 1998). Both findings are not necessarily
In Pavlov's experiment, the salivation inconsistent since small differences in firing
response of the dog does not influence the food rates of dopamine neurons are hard to detect
delivery. The TD model is a model of Pavlovian with single neuron recordings, and measurement
learning and therefore computes predictive methods for dopamine concentration have
signals, corresponding to the salivation response, usually less temporal resolution than those of
but does not select optimal actions. In contrast, spiking activity of dopamine neurons.
instrumental learning paradigms, such as Furthermore, dopamine concentration is not only
learning to press a lever for food delivery, influenced by dopamine neuron activity but also
demonstrate that animals are able to learn to by local regulatory processes. Slow changes in
perform actions that optimize reward. To model cortical or striatal dopamine concentration may
sensorimotor learning in such paradigms, a signal information completely unrelated to
model component called the Actor is taught by reward. Otherwise, relief following aversive
the reward prediction error signal of the TD situations may influence dopamine neuron
model. In such architectures, the TD model is activity as if it were a reward, which would be
also called the Critic. This approach is consistent consistent with opponent processing theories
with animal learning theory and was (See CONDITIONING). Allocation of
Fellous and Suri. The roles of Dopamine. 4

attentional resources seems to determine collicular neurons) of the saccadic movement


dopamine neuron activity in the situation when a required to bring the target to the fovea. If it is
reward is delivered earlier than usual. In contrast assumed that the animal must execute a saccade
to any linear model, including the standard TD to a visually presented stimulus before it can
model, dopamine neuron activity is on base line adequately assess its predictive value, the
levels at the time of the expected reward in this latency of dopamine response would be too short
situation. This suggests that delivery of the to signal reward. We argue against this view of
reward earlier than usual seems to reallocate Redgrave and colleagues. Neural activities in
attentional resources through competitive cortical and subcortical areas reflect the
mechanisms (Suri and Schultz, 1999). anticipated future visual image before a saccade
Dopamine neurons respond to novel, is elicited (Ross et al., 2001). Therefore, these
physically salient stimuli even if the stimulus representations of future visual images may
has never been associated to a reward (Schultz, influence dopamine neuron activity as if the
1998). In contrast to reward-predictive saccade had already been executed, and thus the
responses, for stimuli of equal physical salience, dopamine response may start slightly before the
the increase due to novelty responses seems to saccade. The Extended TD model computes
be smaller and is followed by a pronounced such predictive signals and uses them to select
decrease of neural activity below base line goal-directed actions in a cognitive task (Suri et
levels. (Brief and less pronounced decreases of al., 2001). According to this complex Actor-
dopamine neuron activity sometimes also occur Critic model, the interactions between dopamine
after a response to a reward.) In contrast to neuron activities (computed by Critic) and
responses to conditioned stimuli, novelty activities that reflect the preparation for intended
responses extinguish for repeated stimulus actions (in Actor) select the actions that
presentations. The characteristics of this novelty maximize reward predictions. The model
response is consistent with the TD model if evaluates the expected values of future actions,
certain associative weights are initialized with without necessarily executing them, in order to
positive values instead of using initial values of select the action with the optimal predicted
zero (Suri and Schultz, 1999). Such initialization outcome. The model selects the optimal action
of initial weights with positive values was from such action ideas or imagined actions.
proposed in machine learning studies to This optimal action is selected by assuming that
stimulate exploration of novel actions. dopamine neuron activity increases the signal-
Simulation studies demonstrated that such a to-noise-ratio in target neurons. According to
novelty bonus hardly influences slow this advanced Actor-Critic model, dopamine
movements of more than 100 msec duration improves focusing of attention to intended
because the effects of the two phases in the actions and selects actions. Since some neural
firing of dopamine neurons cancel out and the activities anticipate the retinal images that result
movement starts after the biphasic response. of saccades before these saccades are executed
However, dopamine novelty responses may (Ross et al., 2001), animals may indeed use such
stimulate exploration for very brief actions, predictive mechanisms for the selection of
which may include saccades or allocation of intentional saccades. Furthermore, similar
attentional resources (Suri and Schultz, 1999). internal mechanisms may bias intentional
Redgrave and collaborators (Redgrave switching capabilities of the basal ganglia to
et al., 1999) argued that the latency of dopamine facilitate the allocation of behavioral and
responses is too short to be consistent with the cognitive processing capacity towards
hypothesis that dopamine is a reward prediction unexpected events (see BASAL GANGLIA and
signal. Onsets of dopamine novelty responses as (Redgrave et al., 1999)). If we assume similar
well as reward responses seem to occur just functions of dopamine for short-term memory,
before the start of the saccade or during the this model suggests that dopamine may select
saccade. The dopamine response will likely the items that should be kept in short-term
occur after the superior colliculus has detected a memory and may also help to sustain their
visual target but prior to the triggering (by representation over time.
Fellous and Suri. The roles of Dopamine. 5

time of reward (omitted reward). B: Interactions


CONCLUSIONS between cortex, basal ganglia, and midbrain
In vitro studies of the biophysical effects of dopamine neurons mimicked by Actor-Critic
dopamine demonstrate a wide range of models. The limbic areas are proposed to
dopamine effects on the intrinsic and synaptic correspond to the Critic and the sensorimotor
properties of individual cells. In vivo studies areas to the Actor. The striatum is divided into
suggest however that the main overall effect of matrisomes (semsorimotor) and striosomes
dopamine may be to control noise levels and to (limbic). Limbic cortical areas project to
selectively enhance the signal-to-noise-ratio of striosomes, whereas neocortical areas chiefly
neural processing. This action may behaviorally project to matrisomes. Midbrain dopamine
lead to an improvement of working memory and neurons are contacted by medium spiny neurons
to better selection of goal-directed actions. The in striosomes and project to both striatal
TD model reproduces dopamine neuron activity compartments. They are proposed to influence
in many behavioral situations and suggests that sensorimotor learning in the matrisomes
dopamine neuron activity code for an error in (instrumental learning) and learning of reward
reward prediction. A complex TD model was predictions in the striosomes (Pavlovian
described that solves cognitive tasks including learning). Striatal matrisomes inhibit the basal
goal-directed actions (also called planning or ganglia output nuclei Gpi/SNr and can elicit
intentional) and attempts to reproduce the actions due to their projections via thalamic
function of dopamine in attention and nuclei to motor cortical areas. Several additional
preparation processes. functions of this architecture were proposed in
(Suri et al., 2001).
FIGURE CAPTIONS

Fig. 1. Biphasic effects of Dopamine during REFERENCES


a working memory task. The task consisted in
the brief presentation of a cue (C), a delay of 3 Dickinson A. (1980) Contemporary animal
seconds (D) and a response (R). Moderate levels learning theory. Cambridge, UK: Cambridge
of local application of SCH39166 (25 nA), a D1 University press.
receptor agonist, dramatically enhanced the Goldman-Rakic P.S., Muly E.C., 3rd, Williams
activity of this cell, without significantly G.V., 2000, D(1) receptors in prefrontal cells
increasing its background activity (before cue). and circuits, Brain Res Rev, 31:295-301.
Higher levels of SCH39166 (75 nA) decreased Gulledge A.T., Jaffe D.B., 1998, Dopamine
the activity of this cell throughout the task. decreases the excitability of layer V pyramidal
Histogram units are spikes/s. Figure adapted cells in the rat prefrontal cortex, J Neurosci,
from (Goldman-Rakic et al., 2000). 18:9139-9151.
Lisman J.E., Fellous J.-M., Wang X.-J., 1998, A
Fig. 2. A: Prediction error signal of the TD role for NMDA-receptor channels in working
model (left) similar to dopamine neuron activity memory, Nature Neuroscience, 1:273-275.
(right) (figure adapted from (Suri and Schultz, Montague P.R., Dayan P., Sejnowski T.J., 1996,
1998)). If a neutral stimulus A is paired with A framework for mesencephalic dopamine
reward, prediction error signal and dopamine systems based on predictive Hebbian learning,
activity respond to the reward (before learning). J Neurosci, 16:1936-1947.
After repeated pairings, the prediction error Nicola S.M., Surmeier J., Malenka R.C., 2000,
signal and dopamine activity are already Dopaminergic modulation of neuronal
increased by stimulus A and on baseline levels excitability in the striatum and nucleus
at the time of the reward (after learning). If the accumbens, Annu Rev Neurosci, 23:185-215.
stimulus A is conditioned to a reward but is Redgrave P., Prescott T.J., Gurney K., 1999, Is
occasionally presented without reward, the the short-latency dopamine response too short
prediction error signal and dopamine activity are to signal reward error?, Trends Neurosci,
decreased below baseline levels at the predicted 22:146-151.
Fellous and Suri. The roles of Dopamine. 6

Ross J., Morrone M.C., Goldberg M.E., Burr prefrontal cortical circuit for working
D.C., 2001, Changes in visual perception at memory, Prog Neuropsychopharmacol Biol
the time of saccades, Trends Neurosci, Psychiatry, 25:259-281.
24:113-121. Thierry A.M., Jay T.M., Pirot S., Mantz J.,
Schultz W., 1998, Predictive reward signal of Godbout R., Glowinski J., 1994, Influence of
dopamine neurons, J Neurophysiol, 80:1-27. afferent systems on the activity of the rat
Suri R.E., Schultz W., 1998, Learning of prefrontal cortex: Electrophysiological and
sequential movements by neural network pharmacological characterization. In: Motor
model with dopamine-like reinforcement and Cognitive Functions of the Prefrontal
signal, Exp Brain Res, 121:350-354. Cortex (Thierry A.M., Glowinski J., Goldman-
Suri R.E., Schultz W., 1999, A neural network Rakic P.S., Christen Y., eds), pp 35-50. New
model with dopamine-like reinforcement York: Springer-Verlag.
signal that learns a spatial delayed response Tzschentke T.M., 2001, Pharmacology and
task, Neuroscience, 91:871-890. behavioral pharmacology of the mesocortical
Suri R.E., Bargas J., Arbib M.A., 2001, dopamine system, Prog Neurobiol, 63:241-
Modeling functions of striatal dopamine 320.
modulation in learning and planning,
Neuroscience, 103:65-85.
Tanaka S., 2001, Computational approaches to
the architecture and operations of the
up The Roles of Dopamine Fig 1

Control

60

0
0 Time (s) 3

Medium D1

60

High D1

60

0
up The Roles of Dopamine
Fig 2

A
Reward Prediction Dopamine Neuron
Error Activity

before
learning

1 sec

after
learning

stimulus A reward stimulus B stimulus A reward

omitted
reward

stimulus A stimulus A

B
Neocortex Limbic Cortex Neocortex

Cortex

matrisomes striosomes Actor


striatum acts

dopamine Critic
reward
neurons

GPi/SNr thalamus

stimuli