Cochlear Implant Signal Processing ICs - IEEE Xplore

3 downloads 0 Views 379KB Size Report
Abstract – The Nucleus Freedom cochlear implant system enables a profoundly ... These pulses stimulate the auditory nerve, giving the perception of sound. Fig.
IEEE 2007 Custom Intergrated Circuits Conference (CICC)

Cochlear Implant Signal Processing ICs Brett Swanson1, Erika Van Baelen2, Mark Janssens3, Michael Goorevich1, Tony Nygard1 Koen Van Herck2 1

2

Cochlear Ltd, Lane Cove, NSW, Australia Cochlear Technology Centre, Mechelen, Belgium 3 NXP, Leuven, Belgium

Abstract – The Nucleus Freedom cochlear implant system enables a profoundly deaf person to hear. The system consists of a surgically implanted stimulator and a battery-powered external sound processor. The processor is based on a 0.18 µm CMOS ASIC containing four DSP cores. The signal processing includes a two-microphone adaptive beamformer, a 22-channel quadrature FFT filterbank, multi-band automatic gain control, a psycho-acoustic masking model and non-linear compression. The key design challenge was power consumption.

I. INTRODUCTION A cochlear implant system provides hearing to an adult or child who has severe to profound hearing loss [1]. More than 80,000 Nucleus® cochlear implant systems, manufactured by Cochlear Ltd, are presently in use. This paper begins with an overview of the latest generation Nucleus Freedom™ cochlear implant system. It gives a brief history of how earlier ASIC implementations of the signal processing path have met the challenge of low power consumption. It then describes the ASIC architecture and the signal processing algorithms used in the Freedom processor. II. COCHLEAR IMPLANT SYSTEM OVERVIEW The Nucleus Freedom cochlear implant system [2], consists of two components: a surgically implanted stimulator, and an external processor, worn behind the ear (Fig. 1). The batterypowered processor contains two microphones, a digital signal processing (DSP) ASIC, user controls, and a radio frequency (RF) coil for sending power and commands to the stimulator. The stimulator extracts power and data from the received RF signal and applies pulses of current to an array of 22 electrodes inserted into the cochlea (inner ear). These pulses stimulate the auditory nerve, giving the perception of sound.

The overall signal processing is shown in Fig. 2. The front end amplifies and combines the microphone signals and incorporates automatic gain control (AGC). The filterbank splits the sound into multiple frequency bands, emulating the behavior of the cochlea in a normal ear, where different locations along the length of the cochlea are sensitive to different frequencies [3]. The envelope of each filter output controls the amplitude of the stimulation pulses delivered to a corresponding electrode. Electrodes positioned at the basal end of the cochlea (closer to the middle ear) are driven by the high frequency bands, and electrodes at the apical end are driven by low frequencies [1]. The sampling and selection block samples the filterbank envelopes and determines the timing and pattern of the stimulation on each electrode. Stimulation rates on each electrode range from 250 to 3500 pulses per second, typically using pulse widths of 10 to 25 µs duration. The amplitude mapping block compresses the filterbank envelopes to determine the current level of each pulse. The required currents are typically in the range 100 to 1000 µA, and vary both amongst implant recipients and across the electrode array. The final block is the RF transceiver, which transmits power and stimulus commands to the implant by on-off keying of a 5 MHz carrier. It also receives status and telemetry data from the implant.

Microphone signals Front end

Filterbank

Sampling & Selection

Amplitude Mapping

RF Transceiver RF signal to implant

Fig. 2. Cochlear implant sound processing Fig. 1. Nucleus Freedom cochlear implant (left) and processor (right)

1-4244-1623-X/07/$25.00 ©2007 IEEE

16-1-1

437

The batteries in the external processor power the entire system. The implant power is dependent on the stimulation rates and current being delivered, and has a large impact due to the relatively low efficiency of power transfer across the RF link. III. PROCESSOR HISTORY Over the last 15 years, Cochlear has introduced four generations of processor. Table I shows some characteristics of the ASICs used in these processors. As the filterbank generally consumed the most power of the signal processing functions, decisions on filterbank implementation determined the overall processor architecture. Two different technologies have been used to implement the filterbank: Switched Capacitor Filters (SCF) and Digital Signal Processing (DSP). TABLE I. PROCESSOR ASIC CHARACTERISTICS Processor name Spectra SPrint ESPrit Freedom

Processor type Body worn Body worn BTE BTE

Filterbank type SCF DSP SCF DSP

IC design 1992 1993 1997 2003

Voltage (V) 5.0 3.3 2.2 0.7

Feature size (µm) 1.5 0.8 1.5 0.18

dynamic range of a DSP is independent of the supply voltage, being determined by the number of bits used to represent the signal. For a fixed feature size, lowering supply voltage has the detrimental effect of increasing circuit delay. However, as feature size has been scaled down, the corresponding reduction in node capacitance has compensated for any potential speed penalty. Thus a Freedom DSP, despite using a 0.7 V supply, runs at the same 5 MHz instruction clock as the 3.3-volt SPrint. To achieve an increase in computational capability, the Freedom ASIC contains four independent DSP cores, each of which does more than twice as much work per clock cycle as a SPrint DSP. IV. FREEDOM ASIC DESIGN The Freedom ASIC is implemented in a CMOS 0.18 µm technology. The area is less than 25 mm2. The architecture is driven by the ideas of parallel processing and supply voltage reduction [6, 7]. There are three domains: analog, control, and signal processing (Fig. 3). Analog Domain

Control Domain

Microcontroller

The Spectra used two custom ASICs [4]. The first ASIC contained 20 fourth-order band-pass SCFs, each comprised of two biquad sections. A digital ASIC controlled stimulation timing, performed amplitude mapping, and encoded the RF signal. The SPrint was a body-worn processor powered by two AA batteries. As commercial DSP IC manufacturers were focussing on increasing DSP execution speed rather than decreasing power consumption, a custom DSP ASIC was designed. It had a Harvard architecture [5], with a 1024-word, 20-bit program memory and two 512-word 16-bit data memories. Instructions executed at 5 MHz, and it could perform a 16×16-bit multiply-accumulate and two addressing operations in a single cycle. A separate analog front-end IC contained an AGC and a 12-bit ADC with a sample rate of 16 kHz. The ESPrit, introduced in 1997, was the first behind-the-ear (BTE) processor for a multi-channel cochlear implant system. The key challenge was reducing power consumption. It was achieved by returning to a SCF architecture similar to the Spectra, but with the entire processor implemented on one mixed-signal ASIC with on-chip EEPROM. To reduce power consumption, the biquad filters utilized biased inverter amplifiers. The processor was powered by two zinc-air hearing-aid batteries, with a 2.2 V supply rail. No further reduction of power consumption was possible without compromising the dynamic range of the signal processing. By 2003, the relentless advances in CMOS technology clearly favoured DSP solutions over the earlier SCF implementations. In CMOS technology, dynamic power consumption is proportional to the square of the supply voltage. Hence, lowering the supply voltage can dramatically reduce power consumption [6]. In contrast to an SCF, the

Lowpower oscillator

ADC converters

DC/DC converter

HSB

DSP

DSP

DSP

DSP

RF Protocol Generator Signal Processing Domain

Fig. 3. Architecture top level

A

Analog domain The analog blocks include a low power oscillator, three power efficient, 16-bit sigma-delta Analog-to-Digital Converters (ADCs), a class D audio output, and a DC/DC converter. The DC/DC converter enables voltage scaling. It converts the battery voltage to a number of supply voltages, allowing the optimal supply voltage for each domain to be selected separately [6]. These voltages are programmable by the microcontroller. B

Control domain An 8051 microcontroller acts as a supervisor. It communicates with a PC over a serial port when the processor is being configured for a particular recipient. During normal operation, it manages the user interface, responding to button pushes and updating the LCD. It transfers code and data from an external Flash memory into the DSP memories at power-up, or when the recipient requests a change of program. It monitors the battery voltage and alerts the recipient when it is running low. The software is interrupt-driven, and after

16-1-2

438

handling an event, the microcontroller is put into a powersaving state.

ADCs

The signal processing domain The signal processing domain contains four identical DSP cores, the down-sample filters for the over-sampled ADCs, the RF transceiver, and the inter-processor communication hardware.

DSP1

DSP2

DTU

DTU

C

D

DSP core architecture This section describes the architecture of an individual DSP core. The design exploits instruction level parallelism using a Very Long Instruction Word (VLIW) approach [5]. The core has many execution units operating in parallel: two multipliers, two arithmetic-logic units (ALUs), a register move unit, and five load-store units. It uses a Harvard architecture with a 1024-word program memory, and three 1024-word, 16-bit data memories denoted X, Y and Z. The X and Y memories can be accessed twice in each processor cycle. There is a separate address arithmetic unit for each memory. Each instruction is 128 bits wide with separate fields for each execution unit. In one cycle, the core can execute: • 2 multiply-accumulates, • 2 ALU operations, • 5 load-store operations with address register updates. The instruction set contains operations not commonly found in DSPs, such as exp, log, and normalization functions. These operations reduce the number of cycles required for the cochlear implant processing algorithms, and hence the power consumption. For example, the channel selection routines (section V) use special instructions for efficiently finding the maximum value of a vector of amplitudes. A single Freedom DSP core takes fewer clock cycles than the SPrint DSP or popular commercially available DSP cores [8] to execute common DSP operations, as shown in Table II.

High Speed Bus

RF Transceiver

DTU

DTU

DSP3

DSP4

Fig. 4. Inter-processor communication

From the software point of view, the Z data memory of each DSP contains a number of special buffers, called winsters. The microcontroller configures a communication channel between two winsters in different DSPs. Several such channels can be active concurrently, as shown in Fig. 5. Channels can also be set up between the DSPs and the peripherals (ADCs and RF transceiver).

DSP1

DSP3 winster

W

W

W

W

W

W

Table II. DSP execution benchmarks (in execution cycles)

DSP Benchmark 128 point real-valued FFT 128 tap FIR 16-bit square root IIR bi-quad, 5 coefficients

SPrint 3500 270 85 23

TI C54x Freedom (1 core) 2516 489 136 74 42 21 16 9

Communication channel

W

W

DSP2

W

W

W

W

DSP4

Fig. 5. An example set of communication channels between winsters

E

Inter-processor communication Distributing the signal processing over multiple DSP cores requires good data communication and synchronization. A centralized shared memory scheme was rejected because of memory bandwidth and arbitration problems. It would also consume more power because of the greater distance the data must travel, which increases capacitance. Instead, a synchronous 16-bit High Speed Bus (HSB) is used for communication. A Data Transfer Unit (DTU) is attached to each DSP core, as shown in Fig. 4. The DTUs act as Direct Memory Access (DMA) controllers.

A communication channel is one-way: a source DSP transfers data to a destination DSP. The source DSP writes a block of data to a winster in its Z memory, and tells its DTU to transfer the data. The DTU then transfers the data onto the HSB, one word at a time, while the DSP continues to execute its algorithm. Meanwhile, the destination DTU transfers the data from the HSB into a winster in the Z memory of the destination DSP. The reception of a complete buffer of data triggers an event in the destination DSP, which it can use to start processing the received data. This allows the four DSPs to be synchronized.

16-1-3

439

A central bus controller multiplexes the channels onto the HSB. It grants access to the HSB by issuing a channel number on a separate address bus. The DTUs that are involved in the issued channel recognize their assigned channel number, and react by moving a data word from source to destination. This exchange only happens when the source has something to send, and the destination is ready to receive. The capacity of each channel can be set independently. The bus controller maintains a schedule of channel numbers, and issues a new channel number every clock cycle. The number of times a channel number occurs in the schedule determines its data rate.

four DSPs. The following sections explain each block in more detail. DSP1

DSP2 Gain Control

MIC1

ADC

+

TELE

MIC2

ADC

Signal processing pipeline The first step in implementation is to partition the signal processing path into four stages. Each stage must be able to fit into the program and data memory of a single DSP core, and must also be able to execute within given cycle count limits. The data transfer between DSPs must also be considered.

AGC

Beamformer

F

TELE / AUX

ADC

AUDIO OUTPUT

DSP3

DSP4 Channel Equalisation & Channel Gains

Fig. 6. Signal processing pipeline

The scheduling of the signal processing is shown in Fig. 6. Execution timing is driven from the ADCs. ADC samples are transferred via the HSB into memory on DSP1. When a predetermined number of samples have been collected, the signal processing on DSP1 is triggered. The number of samples is programmable, and determines the analysis rate, i.e. the rate at which a new pass of the signal processing code is executed. DSP1 must complete processing of one input buffer before the next input buffer is ready. The output data from DSP1 is then passed to DSP2 over the HSB, which triggers DSP2 to execute its code. While DSP2 is processing the next stage of the signal path, DSP1 begins processing a new buffer of input samples as soon as they are ready. DSP2 must complete processing for the second stage of processing before new data is ready from DSP1 again. This pattern continues for DSP3 and DSP4. Data for a set of stimulus pulses is then transferred from DSP4 across the HSB to the RF transceiver, which sends the appropriate commands to the implant. To save power, the instruction clock of each DSP can be turned off when it has finished processing each buffer. Although this cascade arrangement is most common, the hardware supports arbitrary communication channels between processors, and any processor can access the ADCs and the RF transceiver. V. FREEDOM SIGNAL PROCESSING Fig. 7 is a block diagram of the signal path of the Freedom processor, showing how the processing is divided across the

Window & FFT

Channel Comb

64 bins

Channel Selection

Mapping

RF Data Encoder

RF Signal to Implant

22 bands

Fig. 7. Freedom sound processing block diagram

A

DSP1: Beamformer Noise reduction strategies can improve speech intelligibility for cochlear implant recipients in noisy environments. Freedom incorporates an adaptive noise reduction system called Beam™ [9]. The Freedom processor has a directional microphone at the front, and an omni-directional microphone at the rear (earlier processors had a single microphone). The two microphone signals are sampled simultaneously by two of the ADCs and transferred to DSP1. Beam is a two-stage algorithm as shown in Fig. 8. The first stage is a fixed delay-and-sum spatial beamformer, which produces a directional response with maximum sensitivity to the front, and suppression of sounds from the side and rear. Its outputs are a speech reference and a noise reference. The second stage is an unconstrained adaptive noise canceller that attenuates the residual noise. The second stage is only allowed

16-1-4

440

to adapt during non-speech periods, as determined by a Voice Activity Detector (VAD). The result is that a null in the spatial response pattern is steered towards the loudest noise that is not in front of the recipient.

Directional microphone

Delay

Omnidirectional microphone

Fixed Filter

+

Σ

signal reference

Delay

+

+

Σ

noise reference

+

Σ

-

new FFT after every 16 ADC samples). The envelope output samples are marked with filled squares in Fig. 9. In contrast, an IIR filterbank would have to be evaluated at the ADC sample rate of 16 kHz. The FFT approach reduces the computational load, and hence the power consumption.

output (speech) signal

0.5 0.4

Adaptive Filter error

0.3 If speech not detected, adapt

0.2

Second Stage Processing

Amplitude

First Stage Processing

Fig. 8. Beam™: Two stage adaptive noise-canceller

B

DSP2: Gain Control DSP2 implements AGC functions, including a syllabic compressor [10] and a slow-acting autosensitivity function [2]. It allows an auxiliary audio signal (from a built-in telecoil, or an assistive listening device such as an FM receiver, or a consumer audio device) to be mixed into the signal path. It also provides an audio output signal that can be listened to by the parents of children with a cochlear implant, to monitor the operation of the processor. DSP3: Filterbank DSP3 implements an FFT filterbank with quadrature envelope detection. It operates on overlapping buffers of 128 samples, applying a von Hann window, then a 128-point real FFT. The real part of each FFT bin represents one new sample of a 128-point FIR band-pass filter, with center frequency equal to the bin frequency. The imaginary part of each bin represents one new sample from the corresponding quadrature filter, i.e. a filter where the frequency response has the same magnitude, but has a phase that lags by 90 degrees. Equivalently, each bin can be considered to be one complex output sample of a single FIR filter with complex coefficients [11]. The 64 bins form a filterbank with center frequencies spaced linearly at multiples of 125 Hz. Typically, the filters with center frequencies from 250 Hz to 1000 Hz are allocated to the seven lowest frequency (most apical) electrodes. In the channel combination block, wider filters for the subsequent electrodes are formed by summing adjacent FFT bins. The quadrature envelope of each filter is calculated by taking the magnitude of the complex output value, i.e. where x is the real part, and y is the imaginary part. The DSP can evaluate square roots efficiently (Table II). An example of the output of the 625 Hz FFT bin is shown in Fig. 9. If an FFT was performed every time a new ADC sample arrived, then the real part of this FFT bin would trace out the thin solid line, and the imaginary part would trace out the dashed line. The quadrature envelope is shown as a thick solid line. In practice, the FFT only has to be calculated at the channel stimulation rate (1000 Hz in this example, i.e. one

0 -0.1 -0.2 -0.3 -0.4 -0.5 540

C

e = x2 + y2

0.1

545

550 Time (ms)

555

560

Fig. 9. Quadrature filterbank outputs

The channel gains after the filterbank can be used to adjust the overall frequency response according to recipient preference. Alternatively, Adaptive Dynamic Range Optimization (ADRO) can be applied [12]. This is a form of slow acting gain control, which operates on each band independently. It estimates the long-term amplitude statistics of each filter envelope, and adjusts the gain so that each frequency band is presented at a comfortable level. D

DSP4: Channel selection and amplitude mapping To avoid undesirable interactions between electrodes, pulses are delivered sequentially, one electrode at a time, in rapid succession. The pulses have two phases of equal duration but opposite polarity. This avoids direct current flow, which can have a detrimental effect on auditory nerve function [1]. The loudness contribution of an individual pulse depends on the amount of charge delivered in each phase (current × phase width). It is not practical to use phase widths much less than 10 µs because the peak currents needed are too high. This limits the overall pulse rate to around 30,000 pulses per second (pps). Different stimulation strategies have been developed to distribute the pulses across the available electrodes. The simplest strategy is Continuous Interleaved Sampling (CIS), where the channels are stimulated in a fixed round-robin order [13]. In the SPEAK and ACE strategies [14], each time a new set of filter envelope samples are calculated, those with the largest amplitude are selected, and the corresponding electrodes are stimulated. Thus the electrodes

16-1-5

441

Electrode

stimulated in each scan vary as the sound spectrum changes. SPEAK typically stimulates 6 electrodes every 4 ms (i.e. 250 pps per electrode). ACE stimulates 8 to 12 electrodes at up to 3500 pps per electrode. In normal hearing, the presence of loud spectral components makes it more difficult to hear softer components at nearby frequencies. This masking effect is the basis of audio bit-rate reduction algorithms colloquially known as MP3. It has also inspired a new cochlear implant strategy, presently under investigation [15]. On each scan, the channels that are most significant perceptually are stimulated. Channel selection is iterative, starting with the largest amplitude channel. After a channel is selected, a psycho-acoustic model calculates the masking thresholds at all remaining frequencies. The channel that protrudes highest above the cumulative masking threshold is selected on each pass. The final block in the signal processing chain is amplitude mapping. Each electrode is characterized by its threshold current (the lowest current that is audible, often called T-level), and its maximum comfortable current (C-level). These levels are measured by an audiologist following the implant surgery. The selected envelope amplitudes are compressed with a logarithmic function and mapped to a current between the Tlevel and C-level of the corresponding electrode. The resulting pattern of electrical stimulation is illustrated in Fig. 10. In this electrodogram (analogous to a spectrogram), each pulse is represented by a short vertical line, with height proportional to the current. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 0

50

100

150

200

250 300 Time (ms)

350

400

450

provide the hearing relied upon by Nucleus cochlear implant recipients around the world. VII. REFERENCES [1] G. Clark, Cochlear Implants: Fundamentals and Applications. New York: Springer-Verlag, 2003. [2] J. F. Patrick, P. A. Busby, and P. J. Gibson, "The Development of the Nucleus Freedom Cochlear Implant System," Trends in Amplification, vol. 10, pp. 175-200, 2006. [3] B. C. J. Moore, An introduction to the psychology of hearing, 4th ed. London: Academic Press, 1997. [4] P. M. Seligman and H. J. McDermott, "Architecture of the Spectra 22 speech processor," Ann Otol Rhinol Laryngol, vol. S166, pp. 139-141, 1995. [5] J. L. Hennessy and D. A. Patterson, Computer architecture a quantitative approach, Second ed: Morgan Kaufmann Publishers, 1996. [6] A. P. Chandrakasan and R. W. Brodersen, "Minimizing power consumption in digital CMOS circuits," Proceedings of the IEEE, vol. 83, pp. 498-523, 1995. [7] J. M. Rabaey and M. Pedram, Low power design methodologies: Kluwer Academic Press, 1996. [8] TMS320C54x DSPLIB User’s Guide (SPRA480): Texas Instruments, 1998. [9] A. Spriet, L. Van Deun, K. Eftaxiadis, J. Laneau, M. Moonen, B. van Dijk, A. van Wieringen, and J. Wouters, "Speech Understanding in Background Noise with the Two-Microphone Adaptive Beamformer BEAM in the Nucleus Freedom Cochlear Implant System," Ear & Hearing, vol. 28, pp. 62-72, 2007. [10] H. J. McDermott, K. R. Henshall, and C. M. McKay, "Benefits of syllabic input compression for users of cochlear implants.," Journal of the American Academy of Audiology, vol. 13, pp. 14-24, 2002. [11] F. J. Harris, "The Discrete Fourier Transform applied to time domain signal processing," IEEE Communications Magazine, vol. 20, pp. 13-22, 1982. [12] C. J. James, P. J. Blamey, L. Martin, B. A. Swanson, Y. Just, and D. Macfarlane, "Adaptive dynamic range optimization for cochlear implants: a preliminary study," Ear & Hearing, vol. 23, pp. 49S-58S, 2002. [13] B. Wilson, C. Finley, D. Lawson, R. Wolford, D. Eddington, and W. Rabinowitz, "Better speech understanding with cochlear implants," Nature, vol. 352, pp. 236–238, 1991. [14] M. W. Skinner, L. K. Holden, L. A. Whitford, K. L. Plant, C. Psarros, and T. A. Holden, "Speech Recognition with the Nucleus 24 SPEAK, ACE, and CIS Speech Coding Strategies in Newly Implanted Adults," Ear & Hearing, vol. 23, pp. 207-223, 2002. [15] W. Nogueira, A. Büchner, T. Lenarz, and B. Edler, "A psychoacoustic “N of M”-type speech coding strategy for cochlear implants," EURASIP Journal on Applied Signal Processing, vol. 2005, pp. 3044–3059, 2005.

500

Fig. 10. Electrodogram of the word "choice" with ACE processing

VI. CONCLUSION The Nucleus Freedom cochlear implant system incorporates over 25 years of experience in cochlear implant research and development. The Freedom processor ASIC implements a complete cochlear implant processing path, from microphones to RF signals, in a few milliwatts. The increased computational capability allows more sophisticated signal processing algorithms, such as a two-microphone adaptive beamformer, and a psycho-acoustic masking model. The low power consumption enables a behind-the-ear processor to

16-1-6

442