Papers - Massachusetts Institute of Technology

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 12, DECEMBER 2003

2121

Equalization and Clock Recovery for a 2.5–10-Gb/s 2-PAM/4-PAM Backplane Transceiver Cell Jared L. Zerbe, Member, IEEE, Carl W. Werner, Member, IEEE, Vladimir Stojanovic, Member, IEEE, Fred Chen, Member, IEEE, Jason Wei, Member, IEEE, Grace Tsang, Member, IEEE, Dennis Kim, Member, IEEE, William F. Stonecypher, Andrew Ho, Member, IEEE, Timothy P. Thrush, Ravi T. Kollipara, Member, IEEE, Mark A. Horowitz, Fellow, IEEE, and Kevin S. Donnelly, Member, IEEE

Abstract—A folded multitap transmitter equalizer and multitap receiver equalizer counteract the losses and reflections present in the backplane environment. A flexible 2-PAM/4-PAM clock data recovery circuit uses select transitions for receive clock recovery. Bit-error rate less than 10 15 and power equal to 40 mW/Gb/s has been measured when operating over a 20-in backplane with two connectors at 10 Gb/s. Index Terms—Adaptive equalizers, decision feedback equalizers, multilevel systems, pulse amplitude modulation, SerDes, serial links, transceivers.

I. INTRODUCTION A. Backplane Environment

T

HE backplane is a complex environment consisting of many components and represents a serious challenge to signaling rates above 5 Gb/s. As shown in Fig. 1, the signal path includes over 11 different components, each of which has its own impedance variations. In addition, there are up to ten vias in the signal path, each having both a through and stub component, each thus presenting an additional potential impedance discontinuity and resonant pole. As a result, the transfer functions (S21s) of channels in this environment vary significantly, as can be seen in Fig. 2. At Nyquist frequencies below 2 GHz, there are some channel differences but the presence of vias and impedance discontinuities does not have a significant impact. Above 2 GHz, channels vary significantly depending on the signaling layer (and thus the thru/stub ratio of the via), the trace length (and thus the skin and dielectric loss), and the dielectric material. Achieving high data rates across this variance of channel behaviors presents a significant challenge for high-speed serial links. Often architectures which can achieve 10-Gb/s data rates with newer materials and connectors have also demonstrated operation in older legacy backplane environments at rates up to 6 Gb/s, thus demonstrating the similarity of the two problems. A significant group of 10-Gb/s transceivers [1], however, were not designed for this harsh electrical environment and thus are often improperly suited for the variety of difficulties it presents.

Manuscript received April 12, 2003; revised June 25, 2003. J. L. Zerbe, C. W. Werner, V. Stojanovic, F. Chen, J. Wei, G. Tsang, D. Kim, W. F. Stonecypher, A. Ho, T. P. Thrush, R. T. Kollipara, and K. S. Donnelly are with Rambus Inc., Los Altos, CA 94022 USA (e-mail: [email protected]). M. A. Horowitz is with Stanford University, Stanford, CA 94305 USA. Digital Object Identifier 10.1109/JSSC.2003.818572

B. Worst Case Sequence As can be seen in the raw single-bit response of Fig. 3, a single 200-ps pulse undergoes both serious loss and dispersion when sent down a backplane channel. In addition, it initiates reflections that can be a significant percentage of an equalized eye. Fig. 3 (inset) shows a zoom of the reflections plotted on a scale roughly equivalent to a single 4-PAM eye after transmit equalization. Because a transmit equalizer functions by attenuating the lower frequency components while operating in a peak-power constrained environment, the single-bit response is smaller after transmit equalization even though the intersymbol interference (ISI) has been reduced. The total usable amplitude , shown in Fig. 3 after equalization is slightly smaller than which is the distance between the peak sample and the next sample of the raw pulse response. While the magnitude of any of the individual channel reflections may not appear significant when compared with the equalized eye height, the complete set of reflections can quickly become significant when combined in a worst case sequence. In such a sequence, the polarity of each of the sequence of bits is set so that all of the reflections sum in the same direction onto a single victim bit. As there can be encroachment on an eye from either side of the voltage extremes, there are two such sequences for a 2-PAM eye, and six such sequences for a 4-PAM eye. The magnitude and importance of the worst case sequences can be seen in Fig. 4. In Fig. 4(b), 2000 symbols of a simple 2-PAM pseudorandom bit sequence (PRBS) is run across a channel at 6.4 Gb/s. This sequence is then followed by the two worst case sequences (plotted in bold) for this particular channel, and the total result is then folded into an eye. The worst case sequences appear as encroachments into the eye sample point, and cause a readily discernable degradation in voltage margin at the sample point. Fig. 4(a) shows the probability distribution function (PDF) plotted on log scale of the distributions of the waveform voltages at the sample point. The PDF was calculated using the technique of [2]. This PDF can then be viewed as the probability that a given voltage margin at the sample point will occur. There is good alignment, as shown, between the upper encroaching worst case sequence and also the mean sample voltage and the PDF voltage at 10 of the eye and the peak of the PDF. It is interesting to note that the PDF distributions show smooth and continuous nonzero tails, which, while bounded, indicate that it will be extremely difficult to rely on coding to minimize the impact of these worst case sequences. If coding were to be used to attempt to

0018-9200/03$17.00 © 2003 IEEE

2122


Fig. 1. Backplane signaling environment.

efficient code to eliminate, thus the link must actively cancel reflections if it is to minimize their impact on system margins. In summary, the backplane environment is quite complex due to the number of different elements in the signal path, and the fundamental difficulties in achieving higher performance are loss and reflections. There are often significant reflections in backplane traces which can cause serious degradation in worst case margins; these have shown themselves to be difficult to code around. Such constraints set the design environment for performance backplane links. II. DESIGN Fig. 2. Backplane transfer functions showing variations between channels, each with 20-in backplane traces. Top and bottom layers, 300-mil-thick FR-4 and Nelco-6000 backplanes. Nelco-6000 top layer via was counter-bored to 100 mils.

Fig. 3. 200-ps pulse response for a 20-in FR4 backplane trace using GbX connectors, 3-in linecards, and 100 mil vias showing dispersion and attenuation of the main pulse with associated reflections. The inset shows the size of the reflections on a scale equivalent to an equalized data eye. Each dot is a symbol sample point.

eliminate or minimize the worst case sequence, there would be another sequence with nearly the same voltage margins right behind it. As the PDF shows, there are simply too many adjacent sequences with such nearly identical properties for an

A. 2-PAM/4-PAM Modes The use of multilevel signaling to achieve higher bandwidth in high-loss systems is well understood [3]–[5]. Any system which has 10 dB of loss difference between the 2-PAM and 4-PAM Nyquist fundamental frequencies would be likely to benefit from 4-PAM signaling. This can be understood from a simple first-order understanding of the relative eye sizes. The transfer functions of two example backplane channels and their resultant 2-PAM and 4-PAM eyes running at 6.4 Gb/s are shown in Fig. 5. It is interesting to note that both channels are from the same backplane with equal trace length and total via length. The only difference in these channels is the signaling layer and the ratio of through via to stub via. In Fig. 5(a), the transfer function is not very steep between the 4-PAM Nyquist frequency of 1.6 GHz and the 2-PAM frequency of 3.2 GHz. As expected, the 2-PAM eye has superior voltage margin in this case. In Fig. 5(b), the channel characteristics show a difference in the transfer function at 1.6 and 3.2 GHz of almost 30 dB, and, as expected, the 4-PAM eye shows superior voltage margin in this case. As these two channels are almost identical physically but so different electrically, this clearly demonstrates how there is no definitive answer to the question of which is better: 2-PAM or 4-PAM. The only conclusion must be that each channel’s individual characteristics will determine the answer to the question for that particular channel. This design supports both 2-PAM and 4-PAM operation via the Gray coded levels shown in Fig. 6. The differential output driver can be operated in 4-PAM mode as in Fig. 6(a) with a 2-bit input T[1:0]. Alternately, by simply setting the LSB to

ZERBE et al.: EQUALIZATION AND CLOCK RECOVERY FOR A 2.5–10 Gb/s 2-PAM/4-PAM BACKPLANE TRANSCEIVER CELL

2123

Fig. 4. (a) 2-PAM PDF showing the probability of voltages at the sample point. (b) Eye diagram formed by 2000 symbols of a PRBS sequence and overlayed with the worst case patterns. 6.4 Gb/s over 20-in backplane.

Fig. 5. (a) Transfer functions of low loss and high-loss backplane channels (b) measured 2-PAM and 4-PAM signaling at 6.4 Gb/s over the channels. All four plotted on the same vertical and horizontal scales.

Fig. 6. Compatible (a) 4-PAM and (b) 2-PAM modes via the use of Gray coded levels and LSB = 0 when in 2-PAM mode.

zero, Gray coding allows the driver to operate in 2-PAM mode, as shown in Fig. 6(b). It is important to note that the 2-PAM mode is a subset of 4-PAM operation, and thus the 4-PAM transmitter and receiver can be used throughout. When switching between 2-PAM and 4-PAM operation, the phase-locked loop (PLL) multiplier needs to be halved in order to maintain a consistent data rate. When considering whether to use 2-PAM or 4-PAM signaling, the effect of reflections must also be carefully considered, as the size of the minimum eye relative to the . This can be undermaximum transition has decreased by stood by referring to Fig. 3, where the minimum eye size for and in 2-PAM. The worst case reflection 4-PAM is magnitude, however, does not decrease as the maximum swing remains constant to that of a 2-PAM swing. Thus, the

2124


Fig. 7. (a) Block diagram showing how transmit and receive equalizers are combined to make a range-restricted DFE. (b) Equalizer ranges overlayed with an unequalized single-bit response. Each dot is a symbol sample point.

impact of reflections on the 4-PAM receive eyes can be very destructive. In complex backplanes, some channels may have low high-frequency loss and can tolerate 2-PAM signaling. Other channels may have higher loss and lower reflections and thus will be better suited for 4-PAM operation. B. Equalization Architecture Approaches to solve ISI are well known, the most common of which is equalization [6]. In the backplane link environment, the question becomes how to perform effective equalization at very high performance with very low cost in area and power. While the use of multiple signaling levels and transmit equalization can be effective in minimizing the effects of dispersion [3], [4], transmit-only equalization is an expensive way to combat the effect of reflections which can potentially be more destructive to multilevel signaling. Decision-feedback-based receive equalization (DFE) can be effective when dealing with configuration-dependent reflections. This work uses both transmit and receive equalizers and clock recovery circuits for operation in a backplane environment with these issues. The transmit and receiver equalizers are combined to make a range-restricted DFE with effective ranges, as shown in Fig. 7. Since dispersion varies as a function of many properties in backplanes, flexibility in the transmit equalizer, both in number of taps and in tap settings, is highly desirable. One completely flexible extreme would involve the use of a digital filter and a digital-to-analog converter (DAC) [7], while the simplest extreme is two-tap pre-emphasis [8]. Any technique must be evaluated for additional insertion loss as well as power and complexity. C. Transmit Equalization A simple thermometer-coded 2-PAM/4-PAM transmitter structure is shown in Fig. 8. Pre-decoded data is sent to three

Fig. 8. Five-tap 2-PAM/4-PAM equalizing transmitter without equalization (original).

different output differential-pair drivers which can be selected to achieve any of the 4-PAM levels. In order to extend this to a five-tap equalizing transmitter, one simple method is to replicate the original driver five times over and feed each driver with individual symbol-delayed inputs of the original data, as shown in Fig. 8 (inset). In order for each tap to have the same range and resolution of the original tap, each replicated driver must be just as large and have a DAC just as fine as the original transmitter. Consequently, this simple approach would result in a 5 increase of the diffusion capacitance on the output pad and a similar increase in power and area. The five-tap merged differential transmitter/equalizer, shown in Fig. 9(b), leverages the fact that the transmitter is peak-power constrained due to output differential pair saturation margin. Thus, only 1/5 of the equalizing transmitter (or total gate equal to the original single-tap transmitter) will be active at any


2125

Fig. 10. Five-tap adjustable receive equalizer including variable delay for removing output clock-to- delay.

Q

Fig. 9. (a) Original five-tap 2-PAM/4-PAM equalizing transmitter. (b) New shared equalizing transmitter.

given time. Rather than keep this device overhead, a single transmitter is divided into segments that can be shared by any of the taps. However, the use of this approach alone limits the resolution of the output driver to be the inverse of the number of segments into which the transmitter is split. For example, for 16 parts, the transmitter would only have a resolution of 4 bits. This would result in having a five-tap 4-bit digital finite-impulse response (FIR) filter requiring five 4-bit adders running at symbol rate; this would consume an unacceptable amount of power. Instead, the equalizer is partitioned into two sections: a shared section and a dedicated section. The shared section consists of seven large subdrivers, each driving 16 current, where each shared subdriver can select from any of the five equalization tap streams A–E. The dedicated portion consists of five binary weighted drivers, one for each equalization tap, and each capable of driving up to 15 current. This combination of shared and dedicated drivers allows each equalization tap to have the same current range (127 ) and resolution (1 ) of a nonequalizing 7-bit transmitter with only 50% additional parasitic overhead. D. Receive Equalization For receive equalization, the linearity and high bandwidth of the transmission line environment were leveraged by adding and subtracting currents directly at the input pads, as shown in Fig. 10. The receive equalizer reuses the transmit filter design th scaled transmit equalizer. and is simply equivalent to a High-latency reflections are effectively cancelled by the receive equalizer in this configuration; it is preferred over a transmit equalizer for reflections as the past data is readily available in the receive pipeline. As reflections vary in both location and intensity between channels, the receive equalizer was designed to be very flexible, allowing for selection of any five taps within a window of 5–17 symbols after the received bit. It does not require the taps to be sequential. The selection of position is based on the magnitude of the reflections at each sample point. Thus,

the tap select multiplexer and tap weights are separately configured and optimized for each backplane channel. One difficulty with this type of receive equalizer is the timing alignment of the equalizer outputs to the incoming receive data, as the equalizer output has a clock-to- delay which varies over process, voltage, and temperature, and must be compensated for. This is accomplished by a simple limited-range variable delay element in the equalizer clock path. This delay element is adjusted by a training sequence where the receive equalizer sends a 0101 pattern which is received by the data path. During training, the clock data recovery (CDR) outputs are used to adjust the variable delay element while the normal receive phase value is kept fixed. An adaptive approach is used to set coefficient values for both transmit and receive equalizers whose goal is to optimize the signal-to-noise ratio (SNR) at the sample point. E. Input Receiver The input receiver design, shown in Fig. 11, consists of six slicers, two for MSB and four for the LSB levels. Odd and even slicers are used to receive data on both the rising and falling edge of the bit clock and perform an immediate 2:1 deserialization. The LSB slicers have an input offset voltage applied via a DAC, with a simple polarity reversal between the slicers for the upper eye and the lower eye. Characterization of the receiver showed it to have sensitivity to common-mode differences between the input signal and the DAC output via the nMOS clocking device not acting as a current source. In later versions of the design, a preamplifier stage was added in order to rectify this problem. F. CDR A flexible 2-PAM/4-PAM CDR was designed that uses the optimal transitions available for clock recovery in either 2-PAM or 4-PAM mode. The complete set of 4-PAM transitions, shown in Fig. 12, consists of three minor transitions (smallest change in voltage level possible), one major transition (largest change possible), and two intermediate transitions for a total of six different transition types. If a conventional zero-crossing CDR is used to recover the clock on uncoded 4-PAM data, the problem arises that the

2126


Fig. 12. Optimal 4-PAM and 2-PAM CDR transitions. The complete transition space (a) is made up of (b) minor transitions, (c) the simultaneous LSB/MSB transition, and (d) the major transition. Group (c) has undesirable timing distributions at the LSB slicer thresholds and its timing is ignored in 4-PAM mode.

(a)

Fig. 13. Bimodal 2-PAM/4-PAM CDR with edge exclusion to eliminate the use of transitions with poor timing information. (b) Fig. 11. Receiver design showing (a) six input data slicers with LSB DAC and LSB slicers sharing inverted offset polarities. (b) Schematic of single slicer.

edge distribution at the MSB sampler threshold [as shown in Fig. 12(a)] is not uniform. Instead, there are three distinct crossing regions. Similarly, the offset LSB sampler thresholds also contain three distinct crossing regions. Such distributions can cause jitter, or worse, phase offsets, if the data pattern exhibits a predominance of one transition type over another. In this design, the optimal transitions [Fig. 12(b) and (d)] are used for clock recovery depending on the link mode. In 2-PAM mode, the MSB major transition [Fig. 12(d)] is used. In 4-PAM mode, the minor transitions of either the MSB or LSB [Fig. 12(b)] are also included, while the transitions with skewed crossings [Fig. 12(c)] are ignored. Both clock jitter and phase offset are thus minimized. The use of only minor transitions also guarantees immunity to any pathological offset-inducing patterns that 4-PAM data could present to a simple 2-PAM CDR. The CDR logic that was developed in order to do this edge exclusion is shown in Fig. 13. Both MSB and LSB edge and data samplers are used. Adequate density of optimal transitions is assured through means of scrambling, PRBS XOR, or coding.

Fig. 14.

Complete link block diagram.

III. RESULTS A complete block diagram of the link, shown in Fig. 14, consists of a transmitter and receiver which share a common PLL along with a CDR and digitally controlled phase mixers in a clocking architecture similar to [9]. Separate phase mixers are used for receiver edge, data, and receiver equalizer clocks to allow for maximum flexibility. The system transmits and receives data on both edges of a CMOS bit clock. The transmit clock also uses a phase mixer with a fixed setting in order to


2127

(a) (a)

(b)

Fig. 16. Without receive equalization. (a) 4-PAM PDF showing broad distributions. (b) Eye including worst case transitions. 10 Gb/s over 20-in backplane.

(b)

(a)

(b)

Fig. 17. With receive equalization. (a) 4-PAM PDF showing narrowed distributions. (b) Eye including worst case transitions showing improvement of both distributions and eyes. 10 Gb/s over 20-in backplane.

(c) Fig. 15. (a) 2-PAM eye with no equalization at 6.4 Gb/s over 20-in backplane. (b) 2-PAM eye with transmit equalization at 6.4 Gb/s over 20-in backplane. (c) 4-PAM eye with transmit equalization at 10 Gb/s over 20-in backplane.

allow closing of the PLL loop around a common element of the clock path and minimize low-frequency jitter. A. Equalization Results Results for the equalization architecture are shown in Figs. 15–18. Fig. 15(a) shows a measurement of the transmitter running at 6.4 Gb/s over a 20-in backplane with two connectors without any equalization. The eye is completely closed due to ISI. Fig. 15(b) shows the same environment with the five-tap transmit equalizer enabled and shows significant margins and clear improvements in SNR. Fig. 15(c) shows the transmitter running over the same backplane at 10 Gb/s in 4-PAM mode.

Fig. 16 shows simulations of the effectiveness of the receive equalizer when operating at 10 Gb/s over a 20-in backplane. In this figure, no receive equalizer is enabled, and, while the PRBS eye appears open, the worst case sequence shows an eye that is virtually closed. The PDF curves of Fig. 16(a) also show inadequate margin between the distributions below 10 , indicating that there will be high bit-error rates (BERs) in the patterns with nearly the same probability as the worst case. In the simulation of Fig. 17, the receive equalizer is enabled and the worst case sequence, along with the PRBS distribution, is compressed to create maximum SNR at the sample point. The PDF curves of Fig. 17(a) also show a significant improvement in the spacing between the worst case sequences as well as improved slope to the distributions and thus improved BER versus voltage offset. Fig. 18 shows the measured effectiveness of the receive equalizer in the system voltage and timing margin. The system was margined by adjusting the offset of the receiver in both time and voltage and testing a PRBS sequence for several microseconds for each data point. When operating at 6.25 Gb/s on a relatively

2128


(a)

(b)

Fig. 18. Effect of receive equalization. Measured system margin shmoos showing final receiver voltage and timing margin (a) without and (b) with the receive equalizer enabled. Both axes plotted on a 1-UI scale at 6.25 Gb/s over 10-in FR-4 backplane.

Fig. 20. Measured (a) 2-PAM and (b) 4-PAM performance by configuration [0:7] for 5 different connectors.

(a)

sients. When 4-PAM data is used with the CDR in 2-PAM mode, as in Fig. 19(a), peak-to-peak jitter of 60 ps was measured. However, when transition limiting is enabled so the CDR only uses the minor transitions, as in Fig. 19(b), peak-to-peak jitter of 35 ps was measured. Measurements were taken at the symbol rate with a real-time oscilloscope. C. Complete System Results

(b) Fig. 19. Measured receive clock phase versus cycle using (a) 2-PAM mode and (b) 4-PAM mode on 4-PAM data. Initial PLL and CDR locking can also be observed. Edge diagrams indicate the transition types used by CDR (circles) in each mode.

short 10-in backplane, a significant improvement in the overall system margin can be observed. B. CDR Results Measured results for the 2-PAM/4-PAM CDR are shown in phase is plotted versus cycle, Fig. 19, where the measured and PLL and CDR locking can be observed in the initial tran-

Complete system results are shown in Fig. 20, where systems were margined (via the receiver voltage and timing offset technique) to a point equivalent to BER 10 . A correlation to measured BER was done prior to these experiments in order toestablish proper voltage and timing margin requirements for correlation. The data in Fig. 20 was taken over two different materials (standard FR-4 and Nelco-6000), two different trace lengths (10 and 20 inches), two different layers (top and bottom stripline layers of a 0.3-in-thick backplane), as well as five different connector types from multiple vendors (each bar representing a different connector type). The Nelco backplanes, in addition, had a counterboring process done whereby the top layer via was reduced from 0.3-in to 0.1-in. All systems were configured as in Fig. 1 with two connectors, two linecards, and a 0.3-in-thick backplane. The results indicate that 10 Gb/s is achievable using 4-PAM over most Nelco-6000 configurations and some FR-4 configurations. In 2-PAM mode, all configurations were able to achieve performance between 5–6.4 Gb/s. A summary of the link characteristics is shown in Fig. 21.


2129

[8] A. Fiedler et al., “A 1.0625 Gbps transceiver with 2x-oversampling and transmit signal pre-emphasis,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, 1997, pp. 238–239. [9] K. Chang et al., “A 0.4–4 Gb/s CMOS quad transceiver cell using on-chip regulated dual-loop PLLs,” in Symp. VLSI Circuits Dig. Tech. Papers, 2002, pp. 88–91. [10] S. Sidiropoulos and M. Horowitz, “A semidigital dual delay-locked loop,” IEEE J. Solid-State Circuits, vol. 32, pp. 1683–1692, Nov. 1997. [11] B. Song and D. C. Soo, “NRZ timing recovery technique for band-limited channels,” IEEE J. Solid-State Circuits, vol. 32, pp. 514–520, Apr. 1997. [12] J. G. Maneatis, “Low-jitter process-independent DLL and PLL based on self-biased techniques,” IEEE J. Solid-State Circuits, vol. 31, pp. 1723–1732, Nov. 1996.

(a)

(b) Fig. 21.

Jared L. Zerbe (M’90) was born in New York, NY, in 1965. He received the B.S. degree in electrical engineering from Stanford University, Stanford, CA, in 1987. In 1987, he joined VLSI Technology, Inc., where he worked on custom and semicustom ASIC design. In 1989, he joined MIPS Computer Systems, where he designed high-performance CPU floating-point blocks. In 1992, he joined Rambus, Inc., Los Altos, CA, where he has since specialized in the design of high-speed I/O, PLL/DLL clock-recovery, and data-synchronization circuits. He has authored many papers and patents in the area of high-speed clocking and data transmission. He currently leads a design group focused on high-speed backplane serial links.

(a) Cell micrograph. (b) Summary table.

IV. CONCLUSION The backplane environment can be very complex due to the number of different components involved and the variability of each of these elements. In addition to loss, there are significant reflections which degrade overall signal quality. The exact symbol location of these reflections also varies, making this an even more challenging environment. Increasing performance in this environment requires flexibility in the implementation in order to be able to adjust to each of the varying problems. The use of both 2-PAM and 4-PAM modes as well as flexible transmit and receive equalization architectures has enabled high performance over a broad configuration space of materials, trace length, via configuration, and connector type. REFERENCES [1] M. M. Green et al., “OC-192 transmitter in standard 0.18 m CMOS,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, 2002, pp. 248–249. [2] V. Stojanovic and M. Horowitz, “Modeling and analysis of high speed links,” in IEEE Custom Integrated Circuits Conf. Dig. Tech. Papers, 2003, pp. 589–594. [3] J. Stonick et al., “An adaptive PAM-4 5-Gb/s backplane transceiver in 0.25-m CMOS,” IEEE J. Solid-State Circuits, vol. 38, pp. 436–443, Mar. 2003. [4] J. Zerbe et al., “A 2 Gb/s/pin 4-PAM parallel bus interface with crosstalk cancellation, equalization, and integrating receivers,” in IEEE Int. SolidState Circuits Conf. Dig. Tech. Papers, 2001, pp. 66–67. [5] R. Farjad-Rad, C.-K. Yang, M. Horowitz, and T. Lee, “A 0.3-m CMOS 8-Gb/s 4-PAM serial link transceiver,” IEEE J. Solid-State Circuits, vol. 35, pp. 757–764, May 2000. [6] R. W. Lucky, “Techniques for adaptive equalization of digital communication systems,” Bell Syst. Tech. J., vol. 45, pp. 255–286, 1966. [7] C.-K. Yang et al., “A serial-link transceiver based on 8-GSamples/s A/D and D/A converters in 0.25-m CMOS,” IEEE J. Solid-State Circuits, vol. 36, pp. 1684–1692, Nov. 2001.

Carl W. Werner (M’97) was born in Chicago, IL, in 1962. He received the B.S. degree in electrical engineering from the University of Illinois at UrbanaChampaign in 1984. He was with Siliconix Inc. from 1984 to 1988, and with National Semiconductor from 1988 to 1997 where he worked on CMOS, bipolar and BiCMOS, analog and mixed-signal integrated circuit design. He holds several U.S. and foreign patents. In 1999, he joined the technical staff of Rambus Inc., Los Altos, CA, where he currently manages a team focused on high-speed circuit design and test.

Vladimir Stojanovic (M’00) was born in Kragujevac, Serbia, Yugoslavia. He received the Dipl.Ing. degree from the University of Belgrade, Yugoslavia, in 1998 and the M.S. degree in electrical engineering from Stanford University, Stanford, CA, in 2000. He is currently working toward the Ph.D. degree at Stanford University, where he is a member of the VLSI Research Group. He has also been with Rambus, Inc., Los Altos, CA, since 2001. He was a Visiting Scholar with the Advanced Computer Systems Engineering Laboratory, Department of Electrical and Computer Engineering, University of California, Davis, during 1997–1998. His current research interests include design and modeling of CMOS-based electrical and optical interfaces, application of digital communication techniques to high-speed links (equalization, noise cancellation), and high-speed mixed-signal IC design.

Fred Chen (M’00) was born in Wichita, KS, in 1975. He received the B.S. degree in electrical engineering from the University of Illinois at Urbana-Champaign in 1997 and the M.S. degree in electrical engineering from the University of California at Berkeley in 2000. In 1997, he joined Motorola, Libertyville, IL, where he worked on discrete RF design for CDMA cell phones. In 2000, he joined Rambus, Inc., Los Altos, CA, where he has worked on the design of high-speed I/O and equalization circuits.

2130


Jason Wei (M’00) was born in Taipei, Taiwan, R.O.C. He received the B.S. degree from National Cheng-Kung University, Tainan, Taiwan, in 1985 and the M.S. degree in electrical engineering from San Jose State University, San Jose, CA, in 1989. From 1989 to 1994, he was with Raytheon and OKI Semiconductor working on emitter-coupled logic and analog/digital PLLs. In 1994, he joined the technical staff at Rambus Inc., Los Altos, CA, where he has designed high-speed CMOS PLL circuits for clock recovery and data synchronization and

Timothy P. Thrush was born in Kansas City, MO. He attended Foothill College, Los Altos Hills, CA. He has been an IC layout designer since 1972, working for Fairchild, Signetics, Intel, Hewlett-Packard Labs, and Digital Equipment Corporation. He has been a Member of the Technical Staff with Rambus Inc., Los Altos, CA, since 1991.

high-speed I/O circuits.

Grace Tsang (M’01) received the B.S.E.E. degree from the Massachusetts Institute of Technology, Cambridge, in 1984, and the M.S.E.E. degree from the California Institute of Technology, Pasadena, in 1991. She designed high-speed bipolar circuits at Tektronix from 1984 to 1989. From 1991 to 1995, she was with Western Digital working on hard disk drive read channel ICs. Since 1995, she has been with Rambus Inc., Los Altos, CA, working on chip-to-chip interfaces and clock recovery.

Dennis Kim (M’01) was born was born in Pusan, South Korea, in 1975. He received the B.S. degree in electrical and biomedical engineering from Duke University, Durham, NC, in 1997 and the M.S.E. degree in electrical engineering from Stanford University, Stanford, CA, in 2000. In 2000, he joined the technical staff of Rambus, Inc., Los Altos, CA, where he has been engaged in high-speed I/O circuit design and test.

William F. Stonecypher was born in Huntsville, AL, in 1964. He received the B.S. and M.S. degrees in electrical engineering from the Georgia Institute of Technology, Atlanta, in 1986 and 1987, respectively. He joined Rambus, Inc., Los Altos, CA, in 1992, where he is currently a Principal Engineer working on the development of high-speed serial links.

Andrew Ho (M’99) was born in Taipei, Taiwan, R.O.C., in 1979. He received the B.S. and M.S. degrees in electrical engineering from Stanford University, Stanford, CA, in 2001 and 2002, respectively. In 2002, he joined Rambus Inc., Los Altos, CA, working in the areas of high-speed signaling and I/O design and test.

Ravi T. Kollipara (M’88) is a Senior Principal Engineer with Rambus Inc., Los Altos, CA, responsible for the signal integrity of the high-speed serial link channels. His responsibilities include design and development of models for packages, line cards, backplanes, connectors, traces and vias, and performing simulations for system level voltage and timing budgets and jitter characterization.

Mark A. Horowitz (S’77–M’78–SM’95–F’00) received the B.S. and M.S. degrees in electrical engineering from the Massachusetts Institute of Technology, Cambridge, in 1978, and the Ph.D. degree from Stanford University, Stanford, CA, in 1984. He is the Yahoo Founder’s Professor of Electrical Engineering and Computer Science at Stanford University. His research area is in digital system design, and he has led a number of processor designs including MIPS-X, one of the first processors to include an on-chip instruction cache, TORCH, a statically scheduled, superscalar processor that supported speculative execution, and FLASH, a flexible DSM machine. He has also worked in a number of other chip design areas, including high-speed and low-power memory design, high-bandwidth interfaces, and fast floating point. In 1990, he took leave from Stanford to help start Rambus Inc., Los Altos, CA, a company designing high-bandwidth memory interface technology. His current research includes multiprocessor design, low-power circuits, memory design, and high-speed links. Dr. Horowitz received the Presidential Young Investigator Award and an IBM Faculty Development Award in 1985. In 1993, he received the Best Paper Award at the IEEE International Solid-State Circuits Conference.

Kevin S. Donnelly (M’91) was born in Los Angeles, CA, in 1961. He received the B.S. degree in electrical engineering and computer science from the University of California at Berkeley in 1985 and the M.S. degree in electrical engineering from San Jose State University, San Jose, CA, in 1992. Since 1984, he has worked at Memorex, Sipex, and National Semiconductor, specializing in Bipolar and BiCMOS analog circuits for disk drive read/write and servo channels. In 1992, he joined Rambus, Inc., Los Altos, CA , where he has designed high-speed CMOS PLL circuits for clock recovery and data synchronization and high-speed I/O circuits. He is currently the Vice President of a division at Rambus responsible for developing high-speed serial links. He has authored several papers and received several patents in the areas of high-speed clocking and I/O circuits.