Learning to Pay Attention to What Matters: The Conditioning Circuit ...

6 downloads 0 Views 557KB Size Report
Feb 18, 1996 - where N is the Now Print signal of (10), and ~xi1 is a memory trace of the activity of the xi1 node. This learning scheme is similar to the one ...
Learning to Pay Attention to What Matters: The Conditioning Circuit Revisited Marcos M. Campos February 18, 1996

Abstract

This study proposes an alternative to the conditioning circuit as discussed in Grossberg and Levine (1987). This new version builds on recent work on selective attention (LaBerge, 1995), fear conditioning (Weinberger, 1995) and adaptive timing (Grossberg & Merrill, 1992), and addresses some of the short-comings in Grossberg and Levine (1987), such as: self-priming, weight transport and the habituation mechanism of short-term memory. The performance of the present implementation is analyzed on two classical conditioning paradigms: simple acquisition and blocking. Comparisons between the present model and Grossberg and Levine (1987) are also drawn.

1 Introduction Any information processing system with limited processing capacity has to address some very dicult problems. These systems need to be able to select from a large number of events those that are most relevant while ignoring others (the problem of selective information processing). At the same time, in order to be able to explain classical conditioning data, the model needs to be able to develop associations between a conditioned stimulus (CS) and a unconditioned stimulus (US) in spite of the variability of the time lag between the CS and the US (the synchronization problem). Finally these associations should not degrade quickly when many cues are processed in parallel and some of the cues are already conditioned to motivationally incompatible responses (the persistence problem).1 The conditioning circuit is part of a larger research program attempting to account for these di erent capabilities. This program has been developed over the years in a number of studies (Grossberg & Levine, 1987; Grossberg & Schmajuk, 1987; Grossberg & Merrill, 1992; Bullock, Fiala, & Grossberg, 1994; Grossberg & Merrill, 1994). The conditioning circuit provides a framework for addressing, among others, the following two tasks: the creation of meaningful associations, that is, the assignment of meaning to cues, and selective attention, where cues are selected on the basis of their previously assigned meaning and the system's internal states or drives. The extension proposed here assume that these two tasks are the main role ful lled by the conditioning circuit. From this perspective the circuit modulates the ow of information from sensory representation based on their previously assigned meaning and the organism's internal states. 1

See (Grossberg & Levine, 1987) for a detailed discussion of these problems.

1

2 The Basic Model The conditioning circuit has three basic mechanisms. First, sensorial cues compete for short-term memory (STM). Second, the US is capable of activating a group of drive nodes as long as a drive input (internal signal) is also present. Then, through associative learning, the CSs eventually become capable of activating the same group of drive nodes, much in the same way as the US is capable of doing. Third, each drive node has associated with it an \expected" sensorial representation. This \expectation" is the driving force behind selectivity. That is, the system will pay more attention to those environmental signals that are relevant to its present internal state. The model uses three sets of units to implement the above mechanisms (see Figure 1). The rst set of cells (xi1) implements a STM and is used to store a sensory representation of the conditioned stimuli (CSs) and unconditioned stimuli (USs). The cells in this group have self-excitatory connections and inhibitory connections among themselves. This implements the competition for STM resources between sensorial cues. X32

X31

X31

US

Motor Response

US

− −

X22

X21

CS2

− −

Motor Response

X21

CS2





X12

X11

X11

CS1

CS1

y

y

Drive Input

Drive Input

Figure 1: Left: original conditioning circuit. Right: the modi ed version proposed here. The double arrows represent the existence of pathways in both directions. Unless indicated, all connections are excitatory. Modi able synapses are represented by semi-disks. Nonmodi able ones are represented by arrows. The second set of cells, or drive nodes, (represented by the single node y in Figure 1) is used to encode the association between sensorial cues and internal drives. These cells receive inputs from the sensorial representation in the rst set of cells and also a drive input. In order to re, these cells need to be excited by the drive input and the inputs from the sensory representation (xi1 ) simultaneously. The weights zi1 modulating the incoming sensorial signals to these cells encode the association between the conditioned stimuli (CSs) and the drive input. These cells implement the second basic mechanism behind the model. The US is capable of activating the y node as long as the drive input is also present. Then, through associative learning, the CSs eventually become capable of activating the y node on their own. The third set of cells (xi2), polyvalent sensory nodes, also encodes a sensory representation of the conditioned stimuli (CSi ) and unconditioned stimuli (USs). However in order for these cells to re, they need also to receive input from the drive nodes. In this way this set of cells can be seen as encoding the expected sensorial representation associated 2

with a given drive. If the drive is active then these cells are primed to re. Those sensorial cues present in the rst set of cells (xi1 ) that match the \expected" sensorial representation primed in xi2 will then be reinforced and will gain a competitive advantage over those that are not expected. This feedback mechanism implements the selectivity mechanism mentioned above. Cues that are relevant to a given internal state (drive) are boosted and those that are not are suppressed through competition. The connection between drive nodes and the xi2 cells is modi able. It encodes the expected sensorial representation for the di erent drive nodes. Cells in this third group are also the initiators of motor responses. The implementation of the conditioning circuit in Grossberg and Levine (1987) had a number of short-comings. First, conditioned sensory cues were capable of self-priming. In other words, once a CS had \learned" to control the drive node it would be capable of increasing the strength of the association even if no US was presented. This is not in agreement with the data on classical conditioning. Second, there was no explicit mechanism implementing the learning of drive expectations encoded in the connections between drive nodes and polyvalent sensory nodes (xi2 ). Third, if none of the cues presented to the circuit were capable of activate a drive node, the circuit would have no output. So, in e ect, there would be no motor activity generated from the circuit. Finally, Grossberg and Levine (1987) provided a crude mechanism for the habituation of the STM without any discussion on the process behind it.

3 An Alternative Model The version proposed here is illustrated on the right hand side of Figure 1. It has two set of cells. The rst set implements a STM and is used to store a sensory representation of the conditioned stimuli (CSs) and unconditioned stimuli (USs). Like the original model, the nodes encoding the sensory cue representation (S1) compete for a limited capacity or limited amount of activation. This is modeled here by a recurrent competitive eld with transmitter gated activity (see Figure 2). The use of transmitter gated activity in the recurrent competitive eld in S1 allows for the implementation of a STM that, for an appropriate choice of parameters, slowly decays, due to transmitter habituation, in the absence of an external input (Ii ) to the node. A key di erence between the current version and the original model is the interpretation given to the role played by selective attention in the two models. Grossberg and Levine (1987) implements a particularly odd selective attention mechanism. That model selectively attends to cues only if at least one of the cues has been previously conditioning. In this case, even non-conditioned cues can receive attention if they are salient enough. However, if no cue has been previously conditioned, the model ignores all cues, even very salient ones. Here, on the other hand, the circuit works as a lter only if some cue has been previously conditioned (acquired meaning). In this case, the circuit produces a bias in the competition in favor of the conditioned cues. However, if none of the stimuli has been associated previously with some meaning, the circuit lets the incoming signals go through unmodi ed. In the latter case, the competition mechanism favors those cues that are \naturally" more salient. This alternative formulation was motivated by recent work on selective attention (LaBerge, 1995) and fear conditioning (Weinberger, 1995) (see section 5 below). 3

The formulation proposed here also avoids the need for polyvalent cells at S2 and thus the problem with learning the weights between drive nodes and sensory cue representation found in Grossberg and Levine (1987). In the present version, the need for polyvalent cells is shifted to the motor response part where the weights can be easily learned using an outstar learning law. Because of this, it is possible, for simplicity sake, to lump together the S1 and S2 elds of the original model in a single eld (the S1 eld in the version proposed here) without a ecting the overall behavior of the system. x11

x21

w1

w2

f(x10)

f(x20)

+ +

− x10

+

− x20

I1

+

I2

Figure 2: Recurrent competitive units with transmitter gated activity.

3.1 Sensory cue representation

The local potentials xi0 of the S1 nodes obey the equation

dxi0 = ? x + (1 ? x )[I +  (1 + J )x ] x0 i0 x0 i0 i x0 i i1 dt X ? x0 (xi0 + x0 ) (1 + Jk )xk1;

(1)

k=i 6

where Ii are the external inputs or cues, xi1 is the output of the node, and Ji is a facilitation term (see below). The output xi1 is related to the node's activity or local potential xi0 by

xi1 = f (xi0 ; x0 )wi;

(2)

where the activation signal (\ ring rate") f (xi0 ; x0 ) interacts with a habituative neurotransmitter wi. In this way xi1 can be interpreted as the chemical transduction of the cell's ring rate. In this paper the signal function f (x; ) was chosen to be

f (x; ) = [x ? ]+  max(x ? ; 0):

(3)

4

3.2 Habituative transmitter

The transmitter level wi is controlled by

dwi = ( ? w ) ? x : w w i w i1 dt

(4)

According to (4) the amount of neurotransmitter wi accumulates to a constant target level w via the term w ( w ? wi), and is inactivated, or habituates, via the term ? w xi1 .

3.3 Facilitation

The facilitation term Ji in (1) is given by

Ji = J z2i f (y; y );

(5)

where z2i is the strength of the synapse connecting the drive node y to the ith node in S1, and f (y; y ) is the output of the drive node. This facilitation term embodies the idea that the drive node can only facilitate the S1 nodes. In other words, the drive node is not capable of activating a S1 node by itself. It can only amplify the e ect of the incoming signal Ii . As a CS becomes associated with a US, it becomes capable of activating the drive node. This has a dual facilitatory e ect. First it facilitates the CS representation at S1

through the positive feedback term in (1). Second it increases the inhibitory e ect of the CS node on the other cells in S1 through the inhibitory term in (1).

3.4 Drive node

The activity (local potential) y of the drive node is controlled by: dy = ? y + (1 ? y)d X z x ; y y i1 i1 dt i

(6)

where d is the drive level and zi1 is the strength of the synapse connecting the ith node in S1 to the drive node D. According to 6, in order to re, the drive node needs to be excited by the drive input and the inputs from the sensory representation (xi1 ) simultaneously.

3.5 Learning

Learning in the S1 ! D pathway obeys a modi ed outstar rule ( dzi1 = z1 f (~xi1; z1 )(N ? z1 zi1 ); for a CS, 0; for a US,

dt

(7)

where N is the Now Print signal of (10), and x~i1 is a memory trace of the activity of the xi1 node. This learning scheme is similar to the one used in Grossberg and Merrill (1992). The term z1 f (~xi1; z1 ) gives the sampling signal. In order to have learning, x~i1 has to be above the threshold z1 . Learning causes zi1 to approach N= z1 during the sampling interval at a rate proportional to the sampling signal. The amount of change in zi1 re ects the degree which x~i1 and N have simultaneously large values over time. If x~i1 is large when 5

N is large then z_i1 > 0. If x~i1 is large when N is small then z_i1 < 0. As indicated by 7, the

strength of the synapse associated with a US doesn't change over time. Learning in the D ! S1 pathway obeys a modi ed instar rule ( dzi1 = z2 f (~xi1; z2 )(N ? z2 zi2 ); for a CS, 0; for a US. dt

(8)

The learning implied by this equation has the same properties as those discussed above for (7).

3.6 Activity trace

The activity trace x~i1 obeys the equation dx~i1 = ? x~ + x ;

dt

x1 i1

(9)

x1 i1

The trace x~i1 keeps a time average of the activity xi1 and, in this way, it always lags behind xi1 (compare the top graphic in Figure 4 with the bottom one). This mechanism is partially responsible for avoiding self-priming and for the lack of conditioning for ISI=0. As a result of using x~i1 to control the sampling in (7) and (8), it is possible to have learning even after the activity xi1 has been terminated.

3.7 Now print signal

The Now Print signal N modulates the learning process of (7) and (8). It is modeled using the same mechanism as in Grossberg and Merrill (1992). 2 The signal N can be activated by a US or by a CS that has already become a conditioned reinforcer. It is turned on by suciently large and rapid increments in the activity of the drive node D, and it is derived from the activity y according to the equation

N = f (f (y; y ) ? E; N );

(10)

where E is a time average of f (y; y ) which obeys

dE = (f (y;  ) ? E ): (11) E y dt Figure 4 illustrates how N responds to increases in y . The top graphic shows the activity

y for the simple acquisition simulation and the bottom one shows the associated values for N . A key property of N is that it increases in amplitude in response to larger f (y; y ) without signi cantly changing its duration. In this way it works as a temporal marker for rapid and large changes in the activity of the drive node. This speci cation produces a bias in the learning towards fast increases in y . Because of this, for large but slow changes in f (y; y ), N will be small and very little learning will take place. 2 See Grossberg and Merrill (1992) for a discussion on how the Now Print signal can be implemented with interneurons.

6

4 Simulations In order to evaluate the performance of the model proposed here, two classical conditioning paradigms were simulated: simple acquisition and blocking. The impact of varying the ISI on the associative strength conditioned in the simple acquisition paradigm was also investigated. In all simulations the network described on the right hand side of Figure 1 was modeled using equations (1){(11). The equations were solved using a forth-order Runge-Kutta method with a step size of 0.01 seconds. All simulations used rectangular pulses of amplitude 1.0 lasting 2.0 seconds to represent the CSs and rectangular pulses of amplitude 2.0 lasting 1.0 second to represent the US. This is illustrated in the top parts of Figure 3 (simple acquisition) and Figure 7 (blocking). The intertrial interval (ITI) was xed at 7.0 seconds. The synaptic strengths (z31 and z312) for the US were set to 1. The following values were used for the remaining parameters: x0 = 1, x0 = 2, x0 = 1, x0 = 10, x0 = 0:5, x0 = 0:1, J = 10, x1 = 0:35, x1 = 0:2, w = 1:5, w = 1, w = 6, y = 1:5, y = 2, y = 0:05, D = 1, E = 10, N = 0:02, z1 = 1:2, z1 = 0:01, z1 = 0:015, z2 = 1:2, z2 = 0:01, and z2 = 0:015. For the simple acquisition simulation 5 trials, pairing the CS1 and the US, were performed using an interstimulus interval (ISI) of 1.5 seconds. These were followed by one trial were CS1 was presented by itself. The results for this simulation are illustrated in Figures 3 through 5. Figure 3 (bottom) shows that the maximum CS1 activity (x11) increased gradually with time. This was the result of feedback facilitation due to the increasing control of the drive node by the CS1 . Along side with this, as CS1 became a conditioned reinforcer, it started to activate the drive node prior to the occurrence of the US. This anticipation of the CR is illustrated in Figure 4 (top). As expected, the trace x~i1 lagged the activity xi1 (see Figure 4). In the initial trials, the Now Print signal N was only activated by the onset of the US. In later trials, as the CS1 became a conditioned reinforcer, N was also activated by the onset of the CS1 . Furthermore, in later trials, the maximum value of N due to the onset of the US decreased. This happened because the change in the drive node activity y brought about by the onset of the US became progressively smaller as the CS1 became progressively more capable of activating the drive node. In other words, the onset of the US was less of a \surprising event" as the CS1 became a conditioned reinforcer. The average value of the LTM variables z11 and z12 (see Figure 5 (top and center)) seemed to asymptotically converge to a xed value as the number of trials increased. At rst, for each trial, the LTM variables showed only one peak due to the onset of the US. In later trials, it showed two peaks. The rst (the smaller one) was due to self-priming by the CS1 . It was immediately followed by a \trough", resulting in a net decay or \unlearning". The second peak, following the trough, was due to the onset of the US. It became progressively smaller due to the loss of the \surprising e ect" discussed above. The decay in the LTM in the last trial (CS1 without the US) illustrates that the present version avoids the problem of self-priming. The ISI curve was computed by gradually varying the ISI until no conditioning was obtained. For each ISI value, 30 trials, pairing the CS1 and the US, were performed using an interstimulus interval (ITI) of 7.0 seconds, and the maximum z11 was then recorded. 7

The results are illustrated in Figure 6. The ISI curve displays the traditional inverted U-shape found in the animal learning literature. The shape of the curve can be understood by taking a closer look at the model. Small ISIs did not give enough time for the CS1 to establish itself in STM before being overtaken by the US. As the ISI increased there was more time, the CS1 was allowed to progressively reach its \peak value" and learning could take place. However as the ISI continued to increase there was a point where learning began to diminish because of the decay in the STM trace (see bottom part of Figure 3). In other words, even though the CS1 could establish itself in STM without interference, by the time the US came along, the STM trace of the CS1 was already too weak for learning to be e ective. In the blocking simulation 11 trials divided in three phases (see the top part of Figure 7) were performed. Phase I consisted of 5 trials pairing the CS1 and the US using an ISI of 1.5 seconds. In phase II, 5 trials were performed presenting CS1 and CS2 simultaneously followed by a US with a ISI of 1.5 seconds. In phase III, a single trial was performed presenting the CS2 by itself. The results for this simulation are illustrated in Figures 7 through 9. In the rst ve trials (phase I), simple acquisition and the anticipation of the CR were obtained (see discussion above). In the next ve trials (phase II), there was further learning of the association CS1{US. As a result, the maximum values of z11 and z12 continued to rise. During phase II, x21 was kept small due to competition with x11. Some association between the CS2 and the US did take place during phase II, as illustrated by the increase in the value of z12 and z22 (see Figure 9). However, it was not enough to elicit a response when the CS2 was presented by itself in phase III.

5 Final Comments The version of the conditioning circuit proposed here overcomes many of the problems encountered in Grossberg and Levine (1987), such as: self-priming, ISI curve, weight transport and the lack of an explicit habituation mechanism for short-term memory. The di erent parts and mechanisms of the proposed model can also be interpreted in terms of some biological structures, if, following Grossberg (1975, 1978), it is considered that S1 is located in the Thalamus and S2 (here lumped together into the S1 eld) is located in the cortex, and D involves the Thalamus and Amygdala. LaBerge (1995) has attributed the role of selective attention to the thalamic-cortical circuitry. As pointed out in that work, inputs owing from the Thalamus to the Cortex are capable of ring the Cortical cells. However, feedback from the Cortex to the Thalamus, inside the same \thalamic-cortical" column, is only facilitatory. The thalamic activity is modulated by the inhibitory interneurons of the thalamic reticular nucleus which receives inputs from the Thalamus and the Cortical area to which the thalamic signal projects to. This structure for the thalamic-cortical circuitry agrees with the STM mechanism proposed here for S1 with self-excitatory connections and inhibitory connections among the nodes. The reticular nucleus, because of its long-range connections, provides a good way of mediating the competition of resources in STM across modalities. Intramodal competition is probably mediated at the cortical level through inhibitory interneurons. The need for a di erent treatment for intermodal and intramodal competition might require the use of a 8

unlumped version of the present model with separate elds for Thalamus (S1) and Cortex (S2). Weinberger (1995) found that the thalamic nucleus where the association between the CS and the US took place (represented by the D node in the present model) connected to the apical dendrities of the pyramidal cells in its cortical projection area, thus having only a facilitatory e ect on the cortical cells. This is also in agreement with the general structure of the model proposed here. In the lumped version of the circuit presented above, the feedback from Cortex to Thalamus and that from the associative nucleus in the Thalamus to Cortex, both facilitatory, have been combined into a single facilitatory mechanism from D to S1. Although it is possible to interpret some of the structures and mechanisms in terms of biological structures, it should be kept in mind that these are oversimpli ed analogies. More detailed models are required so that these analogies may eventually be tested against data.

References Bullock, D., Fiala, J. C., & Grossberg, S. (1994). A neural model of timed response learning in the cerebellum. Neural Networks, 7, 1101{1114. Grossberg, S., & Merrill, J. (1994). The hippocampus and cerebellum in adaptively timed learning, recognition, and movement. Tech. rep. CAS/CNS-TR-93-065, Boston University, Boston, MA. Grossberg, S., & Merrill, J. W. L. (1992). A neural network model of adaptively timed reinforcement learning and hippocampal dynamics. Cognitive Brain Research, 1, 3{ 38. Grossberg, S., & Schmajuk, N. A. (1987). Neural dynamics of attentionally modulated Pavlovian conditioning: Conditioned reinforcement, inhibition, and opponent processing. Psychobiology, 15, 359{362. Grossberg, S. (1975). A neural model of attention, reinforcement learning, and discrimination learning. Internal Review of Neurobiology, 18, 263{327. Grossberg, S. (1978). A theory of human memory: self-organization and performance of sensory-motor codes, maps, and plans. In Rosen, R., & Snell, F. (Eds.), Progress in Theoretical Biology, Vol.5, pp. 233{374. Academic Press, New York. Grossberg, S., & Levine, D. S. (1987). Neural dynamics of attentionally modulated pavlovian conditioning: Blocking, interstimulus interval, and secondary reinforcement. Applied Optics, 26 (23), 5015{5030. LaBerge, D. (1995). Attentional Processing: The Brain's Art of Mindfulness. Harvard University Press, Cambridge, MA. Weinberger, N. (1995). Retuning the brain by fear conditioning. In Gazzaniga, M. (Ed.), The cognitive sciences. MIT Press, Cambridge, MA. 9

CS1

US

0

10

20

30

40 t (seconds)

50

60

70

80

0.6

0.5

x31

0.4

0.3

0.2

0.1

0 0

10

20

30

40 t (seconds)

50

60

70

80

50

60

70

80

0.45 0.4 0.35 0.3

x11

0.25 0.2 0.15 0.1 0.05 0 0

10

20

30

40 t (seconds)

Figure 3: Simple acquisition. Top: inputs. Center: US representation at S1. Bottom: CS1 representation at S1 . 10

x11 Y 0.5

0.4

0.3

0.2

0.1

0 0

10

20

30

40 t (seconds)

50

60

70

80

1

0.9

0.8

0.7

0.6

0.5

w11 w31

0.4

0.3 0

10

20

30

40 t (seconds)

50

60

70

80

50

60

70

80

0.1 N x11 trace

0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0

10

20

30

40 t (seconds)

Figure 4: Simple acquisition. Top: drive node output compared with the activity of the CS1 in S1 . Center: transmitter levels. Bottom: Now Print signal and activity trace for CS1 . 11

0.6

0.5

z11

0.4

0.3

0.2

0.1

0 0

10

20

30

40 t (seconds)

50

60

70

80

10

20

30

40 t (seconds)

50

60

70

80

0.6

0.5

z12

0.4

0.3

0.2

0.1

0 0

Figure 5: Time evolution of the LTM traces for the simple acquisition simulation. 0.8

0.7

0.6

z11

0.5

0.4

0.3

0.2

0.1

0 0

1

2

3

4 5 ISI (seconds)

6

7

8

9

Figure 6: Plot of maximal z11, computed over thirty paired trials, as a function of ISI. 12

CS1

CS2

US 0

20

40

60 80 t (seconds)

100

120

140

0.45 0.4 0.35 0.3

x11

0.25 0.2 0.15 0.1 0.05 0 0

20

40

60 80 t (seconds)

100

120

140

20

40

60 80 t (seconds)

100

120

140

0.45 0.4 0.35 0.3

x21

0.25 0.2 0.15 0.1 0.05 0 0

Figure 7: Blocking. Top: inputs. Center: CS1 representation at S1 . Bottom: CS2 representation at S1. 13

0.35

0.3

0.25

Y

0.2

0.15

0.1

0.05

0 0

20

40

60 80 t (seconds)

100

120

140

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

w11 w21

0.2 0

20

40

60 80 t (seconds)

100

120

140

100

120

140

0.09 N x2 trace

0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0

20

40

60 80 t (seconds)

Figure 8: Blocking. Top: drive node output. Center: transmitter levels. Bottom: Now Print signal and activity trace for CS2. 14

0.7

0.6

0.5 z11 z21 0.4

0.3

0.2

0.1

0 0

20

40

60 80 t (seconds)

100

120

140

0.7

0.6

0.5 z12 z22 0.4

0.3

0.2

0.1

0 0

20

40

60 80 t (seconds)

100

120

140

Figure 9: Time evolution of the LTM traces for the blocking simulation.

15