Theoretical and Implementation Aspects of Pulse

0 downloads 0 Views 336KB Size Report
are quasi-periodic binary waveforms which convey analog information on waveform timing. The theoretical ... both analog and digital implementations, such as: high noise immunity and .... be multiplexed more easily than analog signals, thanks to their digital .... one or more analog EEPROM devices [23, 24, 28]. Optical.
Theoretical and Implementation Aspects of Pulse Streams: an Overview L.M. Reyneri Dipartimento di Elettronica - Politecnico di Torino - [email protected] Abstract This tutorial paper presents an annotated overview of existing hardware implementations of Artificial Neural Systems based on “Pulse Stream” modulations. Pulse Streams are quasi-periodic binary waveforms which convey analog information on waveform timing. The theoretical bases of Pulse Stream computation are shown for the major techniques, and basic circuits are briefly described for most Neural and Fuzzy functions. Pulse Stream modulations and multiplexing are then analyzed in terms of accuracy, response time, and both power and energy requirements. The performances of the various techniques are compared both with each other, and with those of other analog computing systems.

1 INTRODUCTION Because of the advantages they provide, “Pulse Streams” (PSs) [1, 2] are gaining support in the field of ANS hardware implementations [1]: : :[37]. PSs are a class of modulation techniques widely used in other fields of electronics as well (e.g. telecommunications). They are based on “quasi-periodic” binary waveforms, where information is contained in the timing instead of the amplitude. Therefore PSs are primarily used to encode analog values using binary signals. In practice PSs can be of any physical medium (e.g. currents, voltages, light beams, etc.), depending on the implementation. Applying PSs to Artificial Neural Networks (ANSs) and Fuzzy Systems (FSs) provides clear advantages over both analog and digital implementations, such as: high noise immunity and insensitivity to signal attenuation [38] (e.g. in inter-chip communications); ease of multiplexing (see sect. 2.3); low energy and power requirements (see sect. 4.3); straightforward interface with analog and digital systems [16]. PSs are particularly suited to control power actuators and to be handled by optocouplers; additional stochastic properties. On the other hand, PS implementations suffer from a few drawbacks, when compared with purely analog systems, such as: higher electromagnetic interferences (due to switching of binary waveforms); incompatibility with subthreshold operation of MOS in very low power systems;

0 This

work has been partially supported by the ASI-ARS contract 96.138 “Optimized Structures for Intelligent Control of Flexible Arms”.

Figure 1. Timing diagram of the major Pulse Stream modulations.

worse coupling and signal cross-talk within VLSI chips; sometimes larger size. The drawbacks of PS implementations with respect to digital systems are mostly the same of analog systems, namely a limited accuracy and a higher noise sensitivity.

2 PULSE STREAM MODULATIONS In Neural and Fuzzy systems, PSs are primarily used to encode input and output signals i and synaptic weights wji , either unilateral ( i ; wji 2 [0; 1]) or bilateral ( i ; wji 2 [?1; 1]). Several PS techniques have been considered so far and most of them have been used to build existing ANS chips [1]: : :[37].

2.1 Plain Pulse Stream Modulations PS techniques are briefly described below, with reference to Figs. 1 and 2 which show the timing diagram of the most interesting ones. a. Pulse Rate Modulation (PRM) (also called Pulse Frequency Modulation): pulses usually have a constant width T on , while their average repetition rate is

d. Duty Cycle Modulation (DCM) and Coherent DCM (CDCM) are variations of PWM and CPWM, respectively, where activity values are associated with the waveform duty-cycle:

Figure 2. Timing diagram of a) CPWM.

i ; wji = T T+P1T P1 P0

or

T P0 i ; wji = TT P1 ? P1 + T P0

for unilateral and bilateral values, respectively. proportional to the activity

 = T f1  1 on max where i 2 [0 : : : 1]. Typical values of f max range between 500kHz and 5MHz [3, 5, 7, 8]. Frequencies below a given minimum f min are usually considered fi = f max i

with

as zero.

Sometimes, pulses are used individually (especially in bioinspired systems [14]) to carry individual quanta of information. In this cases pulse rate as such may be immaterial, although the overall behavior of such systems is affected by the average repetition rate. Although historically this was the first PS technique to be used [3], it has very poor performance in deterministic systems (see sect. 4). It is much more used in bioinspired systems [7, 12, 13, 32] for its superior multiplexability (see sect. 2.3). b. (Incoherent) Pulse Width Modulation (PWM): pulses have a constant frequency fO = T1O , while their width is proportional either to the desired value:

Ti = T max i

Tji = T maxwji ; respectively, where both i ; wji 2 [0; 1]. Clearly, T max  TO and, in most cases, T max  TO . Typical values of fO range between 100kHz and 500kHz or

[1, 9, 10, 15]. This PS technique has the worst performance of all (see sect. 4).

c. Coherent Pulse Width Modulation (CPWM) is a variation of a PWM, where all incoming streams have a known phase relationship with each other and with an additional reference clock (CCK) common to the whole system, as shown in fig. 2.a. As described in Sects. 2.3 and 4.1, CPWM outperforms the other PS techniques (except for bioinspired systems), since it presents the lowest computation energy and response time, and can be multiplexed more easily. A complete CPWM chip set for ANS is presented in [16, 4]. In spite of what is commonly believed [17], the phase relationship of CPWM streams does not imply synchronism among leading and trailing edges of the waveforms, which would cause high current spikes on power supply.

e. Pulse Burst Modulation (PBM): the activity value is associated with the number of pulses contained in a relatively short burst:

ni = KN  i ; where KN is the maximum number of pulses in each burst. Within bursts, fB is the peak bit rate, and  = 1 Ton fB  1.

Quite often, KN is a low number, therefore PBM heavily discretizes activity values. The effects of this discretization can sometimes be reduced by using a nonlinear (e.g. logarithmic) PBM [8]. This modulation (especially in its differential form, see sect. 2.2) has recently been used in several bioinspired systems, such as retinas [8], in conjunction with Event Driven Multiplexing (see sect. 2.3). f. Pulse Code Modulation (PCM): the waveform is a sequence of bits representing the binary encoding of the desired value, with bit-rate fB :

ni = KN  i

or

nji = KN  wji KN is a power of

respectively, where usually two. This technique has been used in digital neurons as a reduced-size variation of PWM [5, 18], but it can also be associated with PBM to improve the performance of Event Driven Multiplexing. g. Stochastic Pulse Modulation (SPM) is a pseudorandom sequence of bits [19, 20, 21] with an average bit rate fQ = T1Q . The probability P (1) of having a “1” in the sequence is proportional to the desired value:

Pi (1) = i ;

or

Pji (1) = wji

So far, SPM is the only PS technique used in a commercial device [19], in which fQ  25MHz. h. Pulse Amplitude Modulation (PAM): although not a binary modulation, it has been included here since it is often used for neural computation in conjunction with the other PS modulations, as described in sect. 3.2. Pulse amplitude (either current, or voltage, or resistance) can be made proportional to the synaptic weight:

Aji = KAwji ; where KA is a suitable factor which depends on the technology and the circuit used.

i. Pulse Phase Modulation (PPM) and Pulse Delay Modulation (PDM): the activity is associated with the time Td between two pulses on either a pair of lines (PPM, not shown), or on the same line (PDM) [34, 35]. Note that all intrinsically unilateral modulations (i.e. PRM, PWM, PBM, SPM and PDM) can be made bilateral by substituting i2+1 instead of i . Because of their nature, PRM, PBM, PPM and PDM are modulations used in the so-called spiking neurons which are often used in several perceptive applications and bioinspired systems [7, 12, 13, 32], because of some implementation issues and stochasticity characteristics they have.

2.2 Differential Pulse Stream Modulations There is also a class of Differential PS Modulations, which are used to transmit activity values only (i.e. no weight) and which offer interesting performance in particular applications (mostly bioinspired systems). Differential modulations are variations of plain PS modulations where signals are modulated according to the time variation  i  of the activity:  i (t) = ( i (t) ? i (t ? 1)). Most relevant and useful cases are: a. Differential PRM cannot exist as such, because of the intrinsic time-continuity of PRM; D-PBM (see further) is used instead. b...d Differential Pulse Width Modulations (D-PWM, DCPWM, D-DCM, D-CDCM) have no practical applications, since they have the same performance of the corresponding plain modulation. e,f Differential PBM (D-PBM) and Differential PCM (D-PCM) are among the most useful differential PS modulation. They are used in spiking neurons and find applications in processing signals with high temporal correlation, such as slowly varying waveforms and images from silicon retinas [7, 35]. In these cases, the total amount of pulses is very low and this justifies an efficient use of Event Driven Multiplexing (see sect. 2.3). g Differential SPM (D-SPM) is similar to lation used in some A/D converters.

- modu-

2.3 Pulse Stream Multiplexing One interesting advantage of PS signals is that they can be multiplexed more easily than analog signals, thanks to their digital nature. For the scope of this work, multiplexing is defined as a method to transfer a number of independent signals over just one physical channel (e.g. an electrical wire, an optical link, etc.). Multiplexing significantly reduces the number of physical channels (mostly in retinas [7, 35] and cochleas), at the expense of some performance reduction, as described in sect. 4, and shown in table 3.

Multiplexing is also useful to “duplicate” networks in complex ANSs, as for instance in inverse control applications, where the same ANS is used in consecutive time slots with different input and output vectors. Several techniques have been considered so far [1, 2, 7, 8, 17]. Most techniques (except Event Driven) are based on a digital multiplexer which cyclically scans MR inputs at a multiplexing rate f mux (i.e. the average rate at which the multiplexer switches from one input to the next). a. Asynchronous High Frequency Multiplexing: the multiplexer is controlled by a high-frequency clock, which does not significantly affect pulse shape. This technique can be used with PRM and PBM (if f mux  MR 1 T on and T on  2f max ), all PWMs and DCMs (if f mux  MR fO ) and SPM (if f mux  MR fQ ). The drawback is obviously the high bandwidth required. b. Synchronous PRM Multiplexing: a variation of the previous technique, where the PRM stream is synchronized with multiplexer clock [5]. This reduces bandwidth requirements but adds an unwanted phase jitter or rate quantization. c. Coherent Multiplexing for CPWM, CDCM and SPM: the multiplexer switches to the next input when the previous pulse (or bit) has been completely transmitted. This technique is straightforward as the clock coincides with the reference CCK (or the SPM clock). d. Sequential Multiplexing is a variation of Coherent Multiplexing used mainly with incoherent PWM and DCM. The major difference is that now the multiplexer first has to wait until the pulse on the present input has completely finished and then until another one on the next input begins. This additional delay is completely random and on average is about 50% of pulse period. This method has also been used with PRM, by converting each PRM cycle into a PWM pulse [17]. e. Event Driven Multiplexing [7, 8] is used in conjunction with spiking neurons, namely with PRM and PBM, although the best performance are obtained primarily with D-PBM and D-PCM. The major difference with respect to the other multiplexing techniques is that this one does not use a cyclically scanning multiplexer. Instead, all neurons share a common transmission line (or a bus). For each pulse to transmit, a neuron sends asynchronously (on a collision-detection basis) a packet of data on the bus, containing the binary address of either the source or the destination neuron. When used with PRM or (D-)PBM, each packet has the same effect of a single pulse, therefore the average packet rate (or packet count, for PBM) is proportional to the transmitted activity. In case of (D-)PBM, a sign bit is also added after the address. Instead, when used with (D-)PCM, the PCM coding of () i is appended after the packet address.

This techniques offers more channel bandwidth to those neurons which are more active and less to the others, therefore provides a faster response to the most active signals. This resembles what happens in biological neural systems, therefore this technique is often used in bioinspired systems [7, 12]. There are two forms of Event Driven Multiplexing, namely with and without retransmission. In the former case, when two pulses from different neurons collide, they are retransmitted, while, in the latter case, they are not, therefore some pulses may be lost, causing an unwanted signal attenuation [7]. As shown in sect. 4.2, Event Driven Multiplexing with retransmission has better performance, at the expense of increased circuit complexity. Asynchronous, Synchronous and Coherent Multiplexing require exchanging a synchronization signal between transmitter and receiver, while Event Driven Multiplexing explicitly contains the address of the receiver.

3 TAXONOMY OF PS TECHNIQUES At least five functions are required by classical ANSs and FSs, Multi Layer Perceptrons in particular: 1) weight storage, 2) synaptic multiplication, 3) summation of synaptic contributions, 4) non-linear activation function, and 5) transmission (and routing) of input and output activities among neurons. Other functions (such as Hamming [4] or Euclidean distance, Winner-takes-all, etc.), which may find applications in Radial Basis Functions networks and in FSs, have not been analyzed here, since at present they are seldom implemented using PS techniques. All functions can be computed by combining together two or more PS modulations. Analog, digital and optical PS techniques can be mixed together in a large number of combinations. The taxonomy shown in fig. 3 can be sketched out. Weight storage and synaptic multiplication are both synaptic functions, while summations and activation functions are performed by neuron bodies. As regards terminology, a PS computing system on the whole takes the name from the modulation technique used to transmit input/output activities, while neurons are said to be either analog, digital or optical according to the nature of summation.

3.1 Weight Storage Synapses may store a weight with either digital, analog or optical techniques. The former generally require a Pbit digital storage cell, connected either as a conventional memory [19] or as a shift register [5, 18, 16]. Analog storage usually uses either one or more capacitors [1, 22, 17], or one or more analog EEPROM devices [23, 24, 28]. Optical storage uses computer-generated holograms [25].

Figure 3. Taxonomy of Pulse Stream neural functions.

Analog storage using capacitors is simple and straightforward, although it suffers from weight decay. Two techniques have been proposed to overcome this problem: periodical refresh from an external memory [16] and multi-level self-refresh [27]. EEPROM and optical storage is permanent but requires ad-hoc technologies [24, 28].

3.2 Synaptic multiplication Synaptic multiplication is always performed by combining two PS techniques together. Multiplication is based on the property of average pulse power PX which is the triple product of pulse amplitude by pulse width by pulse frequency. To perform PS multiplication, two of these parameters are associated with input activities xi and synaptic weights wji , respectively, while the third is held constant (KX ). Average pulse power is thus proportional to the product:

PX = KX (wji xi ); as desired. The factor KX can often be used to tune the steepness of the non-linear transfer function F (z ). More

details can be found in [2, 29] and in most bibliographical references. There is a wide choice of combinations of PS modulations, as shown in fig. 3. The following paragraphs briefly describe some of the most commonly used and interesting techniques. Each combination is given the name (InputModulation+WeightModulation):



(PRM+PAM): this has been the first PS technique used in analog neurons [1, 17]. At present is used mainly in many bioinspired systems [14, 33]. The basic principle is shown in fig. 4: one or more current generators (I1 ) are switched on by the incoming PRM stream, at a rate f max xi . The current IS = KW wji is proportional to the synaptic weight, therefore the average value of the pulsed current is, as desired, I ji = KX (wji xi ), where KX = (KW f max T on).

Synapses are usually associated with analog neurons, sometimes with digital [3, 10] and optical ones. When used with digital neurons, current generators are substituted by Boolean AND gates, as for (PRM+PWM) (chopping clocks) [9]. Multiplication can span either 1 or 2 quadrants.

  Figure 4. Working principle of multipliers (PRM+PAM), (PWM+ PAM), (CPWM+PAM), (DCM+PAM) and (SPM+PAM). Current generators can be replaced by resistors (R1 ). Additional generators (I3 ) can add offset terms.

Current generators generally consist of one to four MOS transistors controlled by the voltage stored on a capacitor (analog weight storage). Multiplication can span either 1 or 2 quadrants.









(PBM+PAM) and (D-PBM+PAM): both techniques have been used in bioinspired systems [7, 8] mainly as a method to exploit the advantages of both (D-)PBM and Event Driven Multiplexing. The basic circuit can be reconduced to that shown in fig. 4, with the addition of an address detection circuit. When the synapse is addressed, a quantum of charge (either fixed or proportional to wji ) is injected into the neuron. Multiplication can span either 1, 2 or 4 quadrants. (PWM+PAM) and (DCM+PAM): both techniques use the same circuit of (PRM+PAM), but with a PWM input stream [1, 15]. The average synaptic current is now I ji = KX (wji xi ), where KX = (KW TTmax O ). Multiplication can span either 1, 2 or 4 quadrants. (CPWM+PAM) and (CDCM+PAM) derive from (PWM+PAM) but the use of CPWM improves performance by more than one order of magnitude (up to two) with respect to other techniques [16, 29, 6, 36]. In addition, this combination perfectly matches Coherent Multiplexing. Multiplication can span either 1, 2 or 4 quadrants. (PRM+PWM) and (PRM+DCM): both techniques [1, 10] derive from (PRM+PAM), with the difference that current generators now draw a fixed current ID , irrespective of weights. Synapses also contain a pulse stretcher which stretches the width of each input pulse to Tji = T max wji . Average synaptic current is then I ji = KX (wji xi ), where KX = (ID T maxf max ). 1 , therefore f max is limited to Clearly, f max  T max about 500kHz ? 1MHz and T max to 1 ? 2s. Weight storage and pulse stretcher are often analog.



 

(PRM+PCM) is a variation of (PRM+PWM) with digital weight memory and pulse stretcher [5, 9, 18] (SPM+SPM) is a stochastic technique [19] where the incoming unilateral SPM stream has a probability Pi (1) = xi , while the weighting SPM stream has a probability Pw (1) = wji . Provided that the two streams are uncorrelated, a simple Boolean AND generates a SPM sequence with probability Pz (1) = (wji xi ). This combination is usually associated with digital neurons, sometimes with analog ones, while weight memory is in most cases digital. Multiplication can span either 1 quadrant (with an AND) or 4 quadrants (with an EXOR). So far, this is the only PS modulation to be used in a commercial chip [19], although weights are limited to negative powers of two. Other non commercial circuits [21] do not have such limitation. Another interesting applications of (SPM+SPM) techniques is in the so-called pRAMs [30], where a vector of NX input SPM streams (x1 ; x2 ; : : : ; xNX 2 [0; 1]) is used as the address of a digital RAM. The output word from the RAM is a vector of NY SPM streams (y1 ; y2 ; : : : ; yNY 2 [0; 1]). (SPM+PAM): this technique has been used first by [20]. The incoming SPM stream has a probability P (1) = xi , while the weight modulates an analog current generator switched by the input signal, with a circuit similar to that shown in fig. 4. (PWM+SPM) uses a Boolean AND to multiply a PWM input stream by a SPM weight stream [21]. (D-PCM+PAM): similar to (D-PBM+PAM), but the number of pulses is encoded using PCM. Multiplication can span either 1, 2 or 4 quadrants.

3.3 Summation of synaptic contributions Summation of synaptic contributions is a straightforward operation in PS ANSs. In most cases PS synapses generate pulsed analog currents (in analog neurons), or light beams (in optical neurons), while in other less frequent cases they generate digital signals (in digital neurons; e.g. (SPM+SPM), and some types of (PRM+PWM) and (PRM+PCM)). As already mentioned, it is the nature of the summation itself (i.e. analog, digital or optical) which gives the name to neurons. Therefore, although several optical PS networks such as [25] have optical synapses, they convert light beams

into currents before summation, therefore their neurons are clearly analog, with optical interconnections. In truly optical neurons all light beams are focused on the same detector (optical summation of light intensities). Within analog and optical neurons, currents (or light beams) are summed up together on a common node (or a common light detector). Currents (or detector currents) must then be low-pass filtered or integrated to evaluate average pulse power. Either RC filters or simple integrators are commonly used for this purpose. On the other hand, pulses in digital neurons are OR-ed together [3, 19] and counted up by a digital counter (i.e. digital integration). It is the time constant  of either the RC filter, or the integrator, or the word length of the digital counter, which primarily affects ANS performances, namely response time and accuracy (see sect. 4).

3.4 Non-linear activation functions

 

 

Table 1. Synaptic errors for different combinations of PS modulations ((y) e0 PAM does not apply to digital neurons).

e0

SYNAPSE (PRM+PAM)

e0 PRM

(PRM+PCM)

e0 PRM e0 PWM e0 CPWM

(PWM/DCM+PAM) (CPWM/CDCM+PAM)

e0 PRM

(PRM+PWM/DCM) (SPM+SPM)

e0 SPM e0 PBM e0 PCM

(SPM+PAM) (PBM/D-PBM+PAM) (PCM+PAM)

Analog activation functions are usually based on nonlinear amplifiers (e.g. CMOS inverters with resistive feedback, or transconductance amplifiers) [6]. Pulsed activation functions have been proposed in [3] and subsequently used in [19, 21]. They are used in digital neurons, where pulses from different synapses are OR-ed together. The fixed shape of F (z ) derives from binomial distribution [3]. Digital non-linearities are only used with digital neurons. They are usually based on a digital counter plus a RAM used as a look-up table. Waveform-driven non-linearities [16, 20, 29, 36] are only used in analog neurons. They directly convert analog values to PSs (often CPWM or CDCM): the output pulse stream is mostly the binary result of a comparison of the analog internal activity with an adhoc periodic waveform. A triangular waveform can also be used for linear analog-to-PS conversion [16]. Using a shaping waveform in the current domain can also compensate for the mismatches and nonlinearities of the integration capacitors [4]. Waveform-driven non-linearities can also be generated by comparing an analog (or a digital) value with an appropriate noise signal (SPM). A non-uniform distribution causes a non-linear transfer function. This method can also be used for analog-to-PS conversion [20].

3.5 Analog to/from PS Conversion Since PS ANSs have to interface to an external world which is usually analog, the problem of converting between analog and PS has been considered by several authors. Examples can be found in literature: A/PRM [7], A/CPWM and CPWM/A [16], A-PBM, A-SPM [19, 20]; see also sect. 3.4.

e0w

q

e0 PAM (y)

?  (e0 PCM )2 + e0 PAM (y) 2 e0 PAM

q?

e0 PAM   TC 2 ? (y) 2 TO + e0 PAM

e0T = e0 SPM e0 PAM (y) e0 PAM (y) e0 PAM (y)

4 PERFORMANCE ANALYSIS All PS networks suffer from a “computation inaccuracy” caused mostly by the intrinsic discreteness of PSs. Errors are mainly due to the process of synaptic multiplication (i.e. synaptic errors), since errors in the summation and the nonlinearity may often be made small enough. The relative synaptic error eT is defined as:



(wji i )  j i j + jwji j = e0 + e0 = e0 eT = max fwji i g max wmax w T where (wji i ) is the absolute synaptic error, while the upper bounds e0 and e0w are relative evaluation errors 

due to the use of PSs for input activities and for synaptic weights, respectively. Depending on what PS modulation is associated with i and wji , the two error components assume different values (see tables 1 and 2). The actual computation errors can be higher than e0T , for a number of different reasons, depending on the chosen circuit and on the design effort. Therefore formulae given in table 2 represent only a lower bound for the real computation errors. Further details and proofs can be found in [2].

4.1 Evaluation Errors and Response Time Pulsed synaptic contributions are summed up together and either integrated or low-pass filtered by the neuron body. Due to the binary shape of pulses, the value of internal activity is subject to the fluctuations shown in fig. 5, which add up noise and random errors to the neuron output. Figure 6 shows the sources of uncertainties:

 

PRM: a 1 uncertainty out of (fi TM ) counts;

PWM: a T mismatch between PWM periods in source and destination neurons;

Table 2. Comparison among theoretical performances of various Pulse Stream techniques. MODUL.

PRM

e0 TC 6= 0 r 2  2 1 + TC p 2 3f max TM T on EVALUATION ERROR for

r PWM (DCM)

TO 8p3TM

PBM

1

2p3K

2 2  + TC T on N

2  2 1 + TC 2p3KN T on r  2 TQ + TC 16TM TQ

SPM

1 q  ? p C 2 2 3f max (e0 )2 ? TTon T q O ?  p C 2 8 3 (e0 )2 ? TTO TO

r PCM

TM TC 6= 0

TC TO

CPWM (CDCM)

r

2  2 + TC TO

INTEGRATION PERIOD for

1 q  ? p C 2 0 2 3fB (e )2 ? TTon  q   ? C 2 ? log2 2p3 (e0 )2 ? TTon fB TQ   2  16 (e0 )2 ? TTC Q

e0 TC = 0 1 2p3f max TM ERROR FOR

OPTIMAL ERROR

e0 opt

1

1

FOR

M

p3(eC0 )2

T

T on;opt =

p2T C i e0 

TC 4p3TM

TC 4p3(e0 )2

T O;opt =

p2T C i e0

TC TM

TO = TC0 e

0

2p3fB TM

T M;opt

p3TC

q q

not applic.

OPTIMAL INTEGR. PERIOD

q

T

T

T

p3TC

p3(eC0 )2

M

1 2p3  2(fB TM ) q q TQ TC 3 16TM 2(7=2) TM

p2T

C

q ?  ln p61e0

ln(2)e0

TC 2(7=2) (e0 )3

TO = TC0 e

i

1 K N;opt = p 0 i 6e 

1 K N;opt = p 0 6e T Q;opt =

1 2

p2T C i e0

1.6 low TM high TM CPWM

1.4

Integrator output

1.2 1 0.8 0.6 0.4 0.2 0 0

0.5

1

1.5 2 t (generic units)

2.5

3

Figure 5. Random activity fluctuations.

  

CPWM: a TC switching error (switching time);

PAM: a A error in amplitude (due to parametric mismatches in devices; see also sect. 5); SPM: fluctuations due to randomness of signals.

Note that integration must always take place independently of the nature of synapses and neurons, either analog, digital or optical. Usually, the longer the integration period (or the filter time constant) TM , the smaller the error. The only exceptions are CPWM and CDCM: thanks to their intrinsic coherence, fluctuations due to pulses are negligible at the end of each active phase, as shown in fig. 5, therefore activations can be sampled anywhere during the idle phase. Table 2 shows that evaluation errors e0 (either e0 or e0w , as from table 1) are mostly functions of TM , except for e0PAM which is analyzed in sect. 5. Proofs and further de-0 tails can be found in [2]. The inverse relationship TM (e ) gives the minimum integration period which guarantees a given synaptic error e0 . This period can be as short as one pulse period (e.g. for CPWM) or as long as a few fractions of ms (e.g. for PRM and PWM). Observe that TM (e0 ) is also function of the PS parameters (either Ton , or TO , or KN , etc.). There is an optimal

Figure 6. Intrinsic uncertainties in PSs. value for such parameters [2] which minimizes TM and provides the optimal relationships (TM;opt (e0 ) and e0opt (TM )) shown in table 2 and in fig. 7. The computation speed of an ANS is given by

SC = NT S  TNS S M where SC and NS are the amount of connections per second (CPS) and the total number of synapses in the system, respectively, while the response time TS  TM is the delay required to accurately compute the output. By looking at the results, it can be noted that CPWM is the best when a high accuracy is required (namely, e0 < 0:1), while PWM and SPM are better when a lower accuracy is sufficient (namely, e0 > 0:1). Instead, PRM and PBM show the worst performance, although they are often used in bioinspired systems because of the advantages they provide with multiplexing (see later).

PRM, PBM PWM CPWM PCM SPM

100

10

1

a b c d f g

10000

Bandwidth factor Beta

Integration period Tm (us)

1000

0.1

1000 100 10 1

0.01 0.001

0.01 Normalised error e

0.1 0.001

Figure 7. Optimal integration period T M;opt versus normalized error e0 for various PS techniques, for  = 2 and TC = 10ns.

4.2 Multiplexing Errors and Latency Times This section discusses how multiplexing affects the performance of PS modulations. In particular, the transmission  (transerror e and the corresponding integration period TM mission delay) are considered. For each signal to be multiplexed, an equivalent  Nyquist [38] bandwidth B = 2T1 is defined. Then, the M following bandwidth factor is introduced as a performance parameter for multiplexing:

= MfMB

(1)

R

where fM is the maximum number of transitions per unit of time (either 0!1 or 1!0) which can be transferred over the transmission channel. In practice, is proportional to the ratio between the bandwidth of the transmitted waveform ( fM ) and the total bandwidth of all the MR signals to be multiplexed (MR B ). The lower is the more efficient is multiplexing. Therefore, from (1):

M TM = 2fM

Figure 8. Bandwidth factor versus normalized error e , with  = 2 and MR = 256: a) CPWM (Async); and Event (D-PBM, without retx), j j  0:01; b) PWM (Async); c) SPM (Async, Coherent) d) Event (PRM, without retx),  0:2; and Event (PRM, with retx), = 1; f) Event (PCM, without retx); g) Event (PCM, with retx).

4.3 Power Dissipation A generic PS ANS is a combination of digital, analog and optical circuits. For the sake of this analysis, optical devices are treated as analog:



R

where fP and V dd are the PS frequency and the supply voltage, respectively.



M

Table 3 and fig. 8 show TM and for the different techniques, both for TC 6= 0 and TC = 0. Note that both  are decreasing for an increasing e0 and are often and TM independent of MR . Details and proofs in [2]. By looking at the results, it can be noted that Event Driven with retransmission and PCM is the best when a high accuracy is required (namely, e0 < 0:05), while Event Driven without retransmission with D-PBM and Asynchronous with either CPWM or SPM or PWM are better when a lower accuracy is sufficient (namely, e0 > 0:05). Asynchronous multiplexing with CPWM is either the best or the second best choice in all cases, and it is also much simpler than any other technique. In several perceptive applications [7] it is required to transmit a 1 bit information of the type “I am active”. Such piece of information can be reliably transmitted with an er1 . ror of 0:5, namely e =  2p 3

Dynamic Power in the Digital Part is mainly caused by charging and discharging the parasitic capacitances of a synapse:

PD  (Ctot )V dd 2 fP ;

R TM = M 2f



0.01 0.1 Transmission error e*

Idle Power in the Analog Part is due to the supply current drawn by current generators (or light emitters) when the controlling pulse is “0”:

PI = V ddI o ;



where I o is the supply current value in the OFF state. For optical implementations I o is the current drawn by the light emitter(s) either during the OFF state (if the light source itself is switched) or during the ON state (if the light beam is modulated separately).



Active Power in the Analog Part. This is due to the supply current drawn by current generators (or light emitters) when the controlling pulse is “1”:

PA = V ddI on ; where I on is the supply current value in the ON state. Depending on the PS technique used, I on can be either constant or a function of synaptic weight wji . Active power PA is the only one really useful for proper operation (except for digital neurons).

Table 3. Latency time and bandwidth factors for multiplexing. See table II for TM . For PWMs, TC MULTIPLEXING

MODULATION

TECHNIQUE PRM, PBM

PWM

Asynchronous high frequency

CPWM

PCM

TM

Synchronous

PRM

TM

Coherent

CPWM

MR TM

SPM

MR TM

PWM

1:5MR TM

PRM

Event Driven, without retransmission

PRM, PBM

PCM

Event Driven, with retransmission

PRM, PBM

PCM



BANDWIDTH FACTOR

 TL  T M TM 2 MR 2 12f mux TO (e )2 MR 2p3f muxe TM

SPM

Sequential

TC 6= 0 p2 p3e 1 pp 2 3 6(e )3=2 1 p3e

LATENCY

TIME

for

-

1:5MR f min 2:93MR log2 (2MR )  fB e   MR log2 eln(2)fBpT3 M;opt fB e MR log2 (2MR ) p3f e B  ? MR log2 pM3R e fB

Computation Energy EC is the product of response time TS  TM by the total average power dissipation of each synapse:

EC  (PD + (1 ? 1 )PI + 1 PA )  TM where 1 = T1T+1T0 is the pulse duty cycle, from table 2. The physical meaning of EC (in J/connection or W/CPS) is the energy required to accurately compute one neural connection and is a function of the desired error e0T (from table 2).

From the point of view of Computation Energy, the same considerations drawn for TM also apply here.

5 DETAILED COMPARISON OF (XXX+PAM) AND ANALOG SYSTEMS This section describes a few possible sources of errors in both analog and (XXX+PAM) systems (that is, the combination of any Pulse Stream modulation with PAM), namely e0PAM , also comparing the performance of the two classes of neural implementations. Errors are given by the sum of two major components, namely analog errors due to those transistors (called analog) operated in an analog fashion (i.e. linear or saturated

1 8(p e )2 2 p3e 1 e 1 8(e )2 1:5 q  ?  2 (e ) ? jp3ffo j 2 o 3 p2(e )2 -

-

-

-

BANDWIDTH FACTOR for

TC = 0  p3e

 2f1M .



not applicable

not applicable



1 2p3e 1 8(e )2 1 p3e

2 log2



not applicable

1 8(e )2 not applicable

not applicable

5:86 log2 (2MR ) e   e  ln(2) fBpT  M;opt 2 log2 3 e 2 log2 (2MR ) p3e   2 log2 pMR 3e

region) and digital errors due to those transistors (called digital) which are operated as ON/OFF switches. Analog transistors can either be connected as a differential pair, or as a transconductance amplifier, or used as voltage-controlled resistors. Non-ideal devices and parametric spreads ( and VT ) cause the weight-voltage to current characteristic to differ from the desired one, therefore they introduce errors. Digital errors are given by timing errors (e.g. limited rise and fall times of digital signals), by charge injection and redistribution effects during MOS switching. In particular, comparative simulations have been done for a (CPWM+PAM) synapse and an analog Gilbert synaptic multiplier operated in strong inversion, with the two values of supply current I0 = 1 A and IO = 100 A (respectively, 5 W and 500 W power dissipation). Errors are function of the frequency of input signals, for analog synapses, or the equivalent Nyquist [38] frequency fN = 2T1M , for (XXX+PAM) synapses. Errors may have three components [29]:



Non-linearity L: although a multiplier is by definition a non-linear system, it should behave partially linearly, in the sense that the relationship between output voltage and one of the inputs, for a given value of the other input, should be linear.

1.5

1

10

0.8

Analog simulated CPWM theoretical CPWM simulated

0.6

0.5

0

-0.5

0.4

1

0.2 0

Normalised offset

Normalised output voltage

Normalised output voltage

1

-0.2 -0.4 -0.6

-1

-0.8

A.

-1.5 -1.5

-1 -1

-0.5 0 0.5 Normalised input voltage of V1

1

1.5

B.

-1

-0.8

-0.6

-0.4 -0.2 0 0.2 0.4 Normalised input timing V1

0.6

0.8

1

0.1

0.01

0.001

Figure 9. A) Static characteristic of analog and (CPWM+PAM) multipliers versus input voltage for different values of weight voltages, with  = VT = 0. B) Static characteristic of CPWM multipliers versus pulse input for different values of weight voltages, with  = VT = 0.

0.0001 0.001

0.01

0.1 1 Frequency (MHz)

10

100

Figure 11. Offset M vs. frequency relationship of analog, CPWM multipliers, for I0 = 1A,  = VT = 0.

1

0

Analog simulated CPWM simulated

Normalised gain (dB)

-10

Non-linearity

0.1

0.01

-20 -30 -40 -50 -60

0.001 0.01

0.1

1 Frequency (MHz)

10

Figure 10. Non-linearity L vs. frequency relationship of analog, CPWM multipliers, for I0 = 1A,  = VT = 0.

Instead, the input/output relationship of analog multipliers often shows the saturating behavior shown in fig. 9.A, while the relationship between pulse input and output voltage in (XXX+PAM) is in principle linear, as it relies on the integration of a current on a capacitor (see fig. 9.B). Also the relationship between weight input and output in (XXX+PAM) systems is like that shown in fig. 9.A, but this is usually compensated by learning. Figure 10 shows how non-linearity depends on the (Nyquist) frequency. (CPWM+PAM) is more linear, except at very high frequencies.



-70 0.001

100

Offset errors: mismatches among transistors, caused by  and VT , and charge injections, in (XXX+PAM) synapses, mostly result in an offset M of synaptic characteristics. Offset has two components, which are due to analog (A ) and digital (D ) transistors, respectively [29]:

M  A + D = A ( ; VT ) + K fN Figure 11 and table 4 show how offset varies with (Nyquist) frequency and with mismatches, for different values of supply current.

Analog simulated CPWM theoretical CPWM simulated 0.01

0.1 1 Frequency (MHz)

10

100

Figure 12. Gain g=go vs. frequency relationship of analog, CPWM multipliers, for I0 = 1A,  = VT = 0.



Gain errors are mainly due to analog transistors. Gain is function of both the (Nyquist) frequency and the transistor mismatches:

VT ) g  go ( ;  f 1 + ( fP )2 where fP and go are, respectively, the multiplier cut-off frequency and the low-frequency gain, which is function of  and VT (see fig. 12 and table 4).



PAM error e0PAM is then given by the sum of the three components

e0PAM  L + M + 1 g? g o

Further details can be found in [29]. It is clear from plots that analog systems have a slightly larger bandwidth for the same power dissipation, while non-linearity is in favor of (XXX+PAM) systems (except at very high frequencies, where the gain starts to reduce. Instead, offset is in favor of analog systems.

Table 4. Typical parameters for analog and (CPWM+PAM) multipliers. Data are extracted from simulations. Mismatches  = 5% and VT = 20mV were applied separately and/or together. IO is supply current, for Vdd = 5 V. I0 A

Mismatch

1 1 1 1 100 100 100 100

 VT  + VT  VT  + VT

-

go

fp

(CPWM+PAM)

MHz 2.1 2.1 2.1 2.3 15.0 13.4 14.4 11.4

1.00 0.76 0.84 0.55 1.00 0.71 0.85 0.51

f3dB

MHz 1.3 1.4 1.3 1.5 9.6 8.6 9.3 7.3

A

0.006 0.158 0.105 0.278 0.028 0.152 0.128 0.269

1

1

Figure 13. Simulated power dissipation for analog and digital data transmitters, for CL = 50pF and Vmax = 1V.

0.059 0.042 0.049 0.031 0.038 0.034 0.035 0.032

1.00 0.79 1.08 1.41 1.00 1.24 1.15 0.81

Equivalent Noise Impedance Zeq (Ohm MHz^0.5)

Average Power Dissipation (mW)

analog CPWM

0.1 Nyquist or cut-off frequency (MHz)

go

MHz?1

10

0.1 0.01

K

Analog

f3dB

(MHz) 3.5 3.5 3.7 3.6 105 150 106 64

A 0.002 0.126 0.068 0.263 0.001 0.300 0.063 0.451

10000 analog, simulated CPWM, Tb*fN = 0.10 CPWM, Tb*fN = 0.02

1000

100

10 0.4

0.6 0.8 1 1.2 Static power dissipation (mW)

1.4

Figure 14. Equivalent noise impedance, for Vmax = 1V, CL = 50pF. 10 Theoretical gain error Simulated gain error Theoretical offset error Simulated offset error

In many systems, input and output activations have to be transmitted among chips and/or the input/output devices. As well as multiplication, data transmission can be done using either analog or PS techniques. The two methods are compared here in terms of performance. A case study is considered here, where a driver must transmit a time-varying information over a capacitively loaded line CL (e.g. over a long distance, or a bus with several taps) to a receiver. Analog and un-multiplexed CPWM systems are compared. Results can be extended to other multiplexed PS techniques, simply by multiplying the frequency fN by a XXX , as from table 3. factor CPWM The following performance parameters are considered:





Offset Error (Theta), Gain (g)

6 PS DATA TRANSMISSION

0.1

0.01

0.001

0.01

0.1

1

Tau*f_N

Figure 15. Gain and offset error of CPWM data transmission versus normalized frequency.

that noise variance at the receiving end is:

p

n = VNO Zeq max

Power dissipation is a function of the data frequency (Nyquist, for CPWM) and, for analog transmission, the signal amplitude Vmax . Figure 13 compares the (simulated) power dissipation of a CPWM+PAM and an analog transmitter. Equivalent Noise Impedance indicates sensitivity of the transmission system to injected noise. For white noise with spectral density NO (in pAHz ), we find [29]

1

p

n = NO Zeq0

respectively, for analog and CPWM systems, where Zeq and Zeq0 are the equivalent noise impedances plotted in fig. 14 versus the average power dissipation. For 0 depends also on the time constant   f CPWM, Zeq B N of the digital driver.



Gain and Offset also depend on frequency (and on B , for CPWM), as shown in fig. 15.

Further details can be found in [29]. It is clear from plots that analog and CPWM systems have a comparable power dissipation, while noise impedance is in favor of CPWM systems. Most other PS systems are poorer that CPWM, as their bandwidth factor is mostly higher.

Acknowledgments The author wishes to thank Dr. H.C.A.M. Withagen and Dr. M. Chiaberge, for their help in simulations.

References [1] A. F. Murray, D. Del Corso, and L. Tarassenko, “Pulse-Stream VLSI Neural Networks Mixing Analog and Digital Techniques”, IEEE Trans. on Neural Networks, Vol. 2, No. 2, March 1991, pp. 193-204. [2] L.M. Reyneri, “A Performance Analysis of Pulse Stream Neural Networks”, in IEEE Trans. on Circuits and Systems - II, Vol. 42, no. 11, October 1995, pp. 642-660. [3] A.F. Murray and A.V.W. Smith, “Asynchronous Arithmetic for VLSI Neural Systems”, Electronics Letters, Vol. 23, June 1987, pp. 642643. [4] M. Chiaberge, E. Miranda Sologuren, L.M. Reyneri, “A Pulse Stream System for Low Power Neuro-Fuzzy Computation”, in IEEE Trans. on Circuits and Systems - I, Vol. 42, no. 11, November 1995, pp. 946-954. [5] D. Del Corso, F. Gregoretti, C. Pellegrini, L. M. Reyneri, “An Artificial Neural Network based on Multiplexed Pulse Stream”, Proc. of st Int’l Workshop on Microelectronics for Neural Networks, Dortmund, June 1990, pp. 28-39.

1

[6] E.I. El-Masry, H.K. Yang, M.A.Yakout, “Implementations of Artificial Neural Networks Using Current-Mode Pulse Width Modulation Techniques”, in IEEE Trans. on Neural Networks, Vol. 8, No. 3, May 1997, pp. 532-548. [7] A. Mortara, E. Vittoz, “A Communication Architecture Tailored for Analog VLSI Artificial Neural Networks: Intrinsic Performance and Limitations”, in IEEE Trans. on Neural Networks, Vol. 5, no. 3, May 1994, pp. 459-466. [8] Lazzaro, et al., “Silicon Auditory Processors as Computer Peripherals”, in IEEE Trans. on Neural Networks”, Vol. 4, no. 3, May 1993, pp. 523-528. [9] A.F. Murray, A. V. W. Smith, “Asynchronous VLSI Neural Networks using Pulse Stream Arithmetic”, IEEE Jou. of Solid State Circuits, Vol. 23, June 1988, pp. 688-697. [10] L.W. Massengill and D.B. Mundie, “An Analog Neural Hardware Implementation Using Charge-Injection Multipliers and NeuronSpecific Gain Control”, IEEE Trans. on Neural Networks, Vol. 3, May 1992, pp. 354-362. [11] A. Murray, L. Tarassenko, “Analogue Neural VLSI: A Pulse Stream Approach”, Chapmann and Hall, London (UK), 1994. [12] W. Maass, “Networks of Spiking Neurons: The Third Generation of Neural Networks Models”, in Neural Networks, Vl. 10, 1997, pp. 1659-1671. [13] B. Ruf, M. Schmitt, “Self-Organization of Spiking Neurons Using Action Potential Timing”, in IEEE Trans on Neural Networks, Vol. 9, No. 3, May 1998, pp. 575-578. [14] S. Wolpert, E. Micheli-Tzanakou, “A Neuromime in VLSI”, in IEEE Trans. on Neural Networks, Vol. 7, No. 2, March 1996, pp. 300-306. [15] B.A. De Cock, D. Maurissens, J. Cornelis, “A CMOS Pulse-Width Modulator/Pulse-Amplitude Modulator for Four-Quadrant Analog Multipliers” in IEEE Jou. Solid-State Circuits, Vol. 27, No. 9, September 1992, pp. 1289-1293.

[16] L.M. Reyneri, M. Chiaberge, D. Del Corso, F. Gregoretti, “Using Coherent Pulse Width and Edge Modulations in Artificial Neural Systems”, Int’l Jou. Neural Systems, Vol. 4, no. 4, December 1993, pp. 407-418. [17] A. Hamilton, A.F. Murray, D.J. Baxter, S. Churcher, H.M. Reekie, and L. Tarassenko, “Integrated Pulse Stream Neural Networks: Results, Issues and Pointers”, IEEE Trans. on Neural Networks, Vol. 3, May 1992, pp. 404-413. [18] L. M. Reyneri, “Procedimento e Dispositivo per la Moltiplicazione di Segnali, particolarmente per Sinapsi di Reti Neuronali”, patent no 67315-A/90 (I), 27 April 1990. [19] Neural Semiconductors, “NS3232 data sheet”. [20] M. Verleysen, and P. Jespers, “Analog VLSI Synapse Matrix with Enhanced Stochastic Computations”, in Lecture Notes on Computer Science, Springer Verlag, 1991, pp. 315-321. [21] G.E. Salam, R.M. Goodman, “A Digital Neural Network Architecture Using Random Pulse Trains”, in Silicon Implementation of Pulse Coded Neural Networks, M. Zaghloul, J. Meador, and R. Newcomb, Eds., Kluwer Academic, 1994. [22] J.E. Tomberg, and K. Kaski, “Pulse-density Modulation Technique in VLSI Implementations of Neural Network Algorithms”, IEEE J. Solid State Circuits, Vol. 25, October 1990, pp. 1277-1286. [23] M. Holler, S. Tam, H. Castro, and R. Benson, “An Electrically Trainable Artificial Neural Network (ETANN) with 10240 Floating Gate Synapses”, Proc. of IEEE rd Int. Conf. on Neural Networks, Washington, 1989, pp. 191-196. [24] H.J. Oguey, “Analog EEPROM Principle and Application to Neural Network”, Proc. of ISSCC 90, 1990. [25] A.V. Krisnamoorthy, G. Yajla, and S.C. Esener, “A Scalable Optoelectronic Neural System Using Free-Space Optical Interconnect”, IEEE Trans. on Neural Networks, Vol. 3, May 1992, pp. 404-413. [26] -, IEEE MICRO, Special Issue “Silicon Neural Networks”, December 1989. [27] -, IEEE MICRO, Special Issue “Analog VLSI Neural Networks”, June 1994. [28] D.A. Durfee and F.S. Shoucair, “Comparison of Floating Gate Neural Network Memory Cells in Standard VLSI CMOS Technology”, IEEE Trans. on Neural Networks, Vol. 3, May 1992, pp. 347-353. [29] L.M. Reyneri, H.C.A.M. Withagen, J.A. Hegt, M. Chiaberge, “A Comparison between Analog and Pulse Stream VLSI Hardware for Neural Networks and Fuzzy Systems”, in Proc. of MICRONEURO 94, Torino (I), IEEE Computer Society Press, September 1994, pp. 77-86. [30] J. Austin, “A Review of RAM Based Neural Networks”, in Proc. of MICRONEURO 94, Torino (I), IEEE Computer Society Press, September 1994, pp. 58-66. [31] J. Meador, A. Wu, C. Cole, N. Nintunze, and P. Chintrakulchai, “Programmable Impulse Neural Circuits”, IEEE Trans. Neural Networks, Vol. 2, pp. 101-109, Jan. 1991. [32] R. Goebel, “Biology Inspired Neuron Models and Networks: The Functional Role of Temporal Coding”, in Proc. of MICRONEURO 97, Dresden (D), pp. 65-74. [33] T. Zahn, P. Pasche, K. Trott, R. Izak, “A Mixed-Signal Neural Network for Auditory Attention”, in Proc. of MICRONEURO 97, Dresden (D), pp. 234-239. [34] R. Ros, F.J. Pelayo, B. Pino, A. Prieto, “ Firing Rate and Phase Coding Circuits for Neural Computation”, in Proc. of MICRONEURO 97, Dresden (D), pp. 305-311. [35] P.F. Ruedi, “Motion Detection Silicon Retina Based on Event Correlations”, in Proc. of MICRONEURO 96, Lausanne (CH), pp. 23-29. [36] D.J. Mayes, A.F. Murray, H.M. Reekie, “Pulsed VLSI for RBF Neural Networks”, in Proc. of MICRONEURO 96, Lausanne (CH), pp. 177-184. [37] T. Lehmann, “Teaching Pulsed Integrated Neural Systems: a Psychobiological Approach”, in Proc. of MICRONEURO 96, Lausanne (CH), pp. 185-190. [38] L.W. Cough II, “Digital and Analog Communication Systems”, New York, Maxwell MacMillan International Editions, 1989.

3