Two High-Bandwidth Memory Bus Structures

5 downloads 0 Views 282KB Size Report
SYNCHRONOUS-LINK DRAM (SL-. DRAM)1-5 and Direct Rambus DRAM (Direct. RDRAM)6-8 are examples of next-generation. DRAM architectures that ...
.

BUS STRUCTURES

Two High-Bandwidth Memory Bus Structures BRUCE MILLAR PETER GILLINGHAM MOSAID Technologies

The authors evaluate two next-generation memory bus architectures approximating SLDRAM and Direct Rambus. Quantifying sources of errors that degrade signal integrity, and considering power dissipation, they show that a fully loaded SLDRAM configuration has a greater timing margin than Direct Rambus.

42

SYNCHRONOUS-LINK DRAM (SLDRAM)1-5 and Direct Rambus DRAM (Direct RDRAM)6-8 are examples of next-generation DRAM architectures that address the speed requirements of tomorrow’s processors. Both employ packet command protocols, which combine the separate command and address pins of previous memory interfaces into command bursts. This reduces the number of pins required for addressing and control and facilitates the pipelining of requests to memory. SLDRAM and Direct RDRAM transfer commands and data on both edges of the clock. SLDRAM, Inc., defines its first-generation SLDRAM interface as a 16- or 18-bit-wide bus supporting up to eight loads and operating at 400 Mbps/pin with a 200-MHz clock.2 Using buffered modules, it can support up to 64 loads. The company has defined an evolutionary path to 600 and 800 Mbps but has not yet disclosed the details of the bus structures supporting these higher data rates. Direct RDRAM also has a 16- or 18-bit-wide bus, but it supports up to 32 loads and operates at 800 Mbps/pin using a 400-MHz clock,7 twice the speed of the first-generation SLDRAM. In this article, we evaluate and compare the performance of bus structures approximating SLDRAM and Direct RDRAM bus structures. Our study focuses on modeling both structures accurately using SPICE simulations. We do not attempt to quantify the

0740-7475/99/$10.00 © 1999 IEEE

effectiveness of the different protocols, core power requirements, chip complexity, or cost of the two alternatives.

Background The goal of a computer’s memory hierarchy is to achieve a balance between economy and performance. Although this statement applies to systems ranging from supercomputers to embedded controllers, we focus here on the memory requirements of the personal computer, since the PC market drives memory technology. In today’s PCs, a small amount (about 16 Kbytes each for instructions and data) of extremely high bandwidth and very low latency L1 (level 1) cache memory is located on the processor chip. These caches run at processor speed (say, 400 MHz). There is a high probability (around 90%) that the L1 cache will provide the required data, and the processor will not have to wait. Typically, a 64-bit bus operating around 100 MHz connects a second-level cache, L2, to the processor. The size of L2 is typically 256 to 512 Kbytes. Once again, there is a high probability that the L2 cache can supply the data following an L1 cache miss. The main memory is a DRAM that connects to the processor through a 64-bit interface on the north-bridge chip set, which also supports other high-bandwidth interfaces to connect peripherals and graphics to the processor.

IEEE DESIGN & TEST OF COMPUTERS

.

SLDRAM modules DRAM interfaces have 1 2 3 4 5 6 7 8 evolved in the last few years SLDRAM from asynchronous EDO (extended data-out) DRAM, which operated at 33 MHz, to Rstub = 20 Ω 66-MHz SDRAM (synchroMemory Rterm = 28 Ω nous DRAM), to today’s 100controller MHz PC100 SDRAM. DDR (double data rate) SDRAM is a new standard, in which 50 mm 15 mm data is triggered by both 175 mm edges of the clock, achieving 200 Mbps/pin bandwidth with a 100-MHz clock.9 DDR Figure 1. Fully loaded SLDRAM bus configuration (drawing not to scale). SDRAM uses terminated reduced-swing signaling known as SSTL_2 (stub series terminated logic for 2.5-V supplies).1 terns known to maximize cross-talk and intersymbol interToday, due to the statistical leverage of higher cache lev- ference (ISI) effects. We obtained both graphical and nuels in the memory hierarchy, the main memory bus can op- merical results from the simulations so that we could erate at a lower frequency than the processor, without any compare signal waveform quality along with bandwidth and appreciable effect on system performance. Nevertheless, as skew figures. For the simulated performance of the two high-speed busprocessor speed increases further, bus rates throughout the system must correspondingly increase. Next-generation es to be realistic, the use of accurate transmission line modmemory bus architectures will push printed-circuit-board els was an absolute necessity. We modeled entire data paths, from pad to pad, as transmission lines for more realistic reand packaging technologies to their bandwidth limits. sults. In other words, we modeled not only the printed circuit microstrip lines, but also, where warranted, connectors Overview The SLDRAM approach explores the economic and elec- and package leads as transmission lines. We avoided the trical limits of current motherboard, module, and packaging traditional use of lumped LC (inductance and capacitance) technologies, using familiar signaling and bus techniques. to model IC package leads on high-speed lines because the Thus, it is a straightforward, evolutionary step beyond SDRAM lumped LC network’s low-pass characteristic reduces simuand DDR. In contrast, Direct RDRAM, which exploits recent lation bandwidth, compared to a distributed lead model. It advances in IC packaging technologies while maintaining also does not properly account for loading and propagation the established Rambus signaling system, is a more revolu- delay. For even more realism, we used the HSPICE coupled transmission line models throughout the simulations, so that tionary departure from the SDRAM development trend. We carried out simulations in a Unix environment using we could represent cross-talk effects such as switching noise, HSPICE from Meta-Software.10 We created simplified five-wire edge jitter, and bit-pattern dependencies (ISI). (For more inreplicas of the two bus structures and simulated them using formation on our HSPICE line models, see our Web site at the nominal signaling, bus dimensions, and loading speci- www.mosaid.com.) fied in the data sheets for each structure. Since Direct RDRAM is a licensed technology, not in the public domain, detailed SLDRAM bus structure information on the bus structure was not readily available. The SLDRAM bus can be characterized as short and stubHowever, we found sufficient data in publicly available data by. Figure 1 illustrates the basic physical features of a 400sheets to develop an equivalent electrical model of a Direct Mbps SLDRAM bus implementation in a typical PC. The RDRAM-style bus for use in PCs. Starting with approximate structure consists of a short (175-mm), shielded bus on the bus and module dimensions obtained from the most up-to- PC main board with the memory controller at one end and date published sources, we designed the Direct RDRAM termination resistors (Rterm) at the other. Eight module conmotherboard and module’s microstrip sections to obtain the nectors are soldered to the bus at 15-mm intervals along the best performance. The configuration recommended by bus, starting 60 mm from the controller chip and ending 10 Rambus Inc. may differ from that described here. mm from the terminating resistors. The leadoff section of We simulated memory read/write operations at the nom- the bus is interrupted by 20-Ω series stub resistors (Rstub) 10 inal operating frequency for each bus. We used data-bit pat- mm from the first connector.

JANUARY–MARCH 1999

43

.

BUS STRUCTURES

ends on the main board with termination resistors and a Termination clock generator. The three and clock module connectors are solgenerator dered to the bus on the main 9 mm board at intervals of approximately 12 mm, starting 75 mm from the controller chip and 25 mm ending 25 mm from the terminating resistors and clock 133 mm 80 mm generator. An attractive feature of this bus is that it uses no series stub resistors. However, one must plug dummy modules into unused Memory controller connectors to maintain bus continuity in a partially populated memory system. 75 mm 12 mm 35 mm Although Direct RDRAM 124 mm modules can contain from one to 16 chips, the bus is re(a) (b) stricted to a total of 32 Direct Figure 2. Fully loaded Direct RDRAM bus configuration (not to scale): motherboard top view (a); RDRAM chips. For simulamodule top view (b). tion, we assumed the first module had 16 Direct RDRAM chips installed and Modules containing from one to eight SLDRAM chips plug the second and third modules, only eight chips each. Direct into each connector on the bus. Modules with more than two RDRAM chips are packaged in CSPs (chip-scale packages) SLDRAM chips are buffered so that no module presents more featuring extremely short lead lengths on critical signals (esthan two loads to the bus. The module connector, module timated at 2 mm to 4 mm including bond wire). The CSPs are tracking, and SLDRAM package leads comprise the stub. The soldered at regular intervals directly to the bus on the module. overall stub length is roughly 20 mm. Approximately 5 mm This portion of the bus was designed to maintain the bus’s is within the connector itself, 11 mm is in module tracks to the nominal characteristic impedance with capacitive loading. SLDRAM or buffer chip, and 4 mm is the end-to-end length of SLDRAM package leads and bond wires. High-speed tracking Signaling characteristics on the module is interrupted by 20-Ω series stub resistors Figure 3 compares SLDRAM and Direct RDRAM signaling placed between the module connector and the SLDRAM levels, impedances, and power dissipation. SLDRAM uses a pins. Most of the stub length is over ground plane. Because series-stub, center-tap-terminated signaling scheme, referred of the relatively short stub length, shielding between stub sig- to as SLIO.2 This scheme is similar to the SSTL_2 signaling nal wires does not significantly improve signal quality and is standards1 defined for DDR SDRAM memory systems, extherefore not required. Not shielding stubs greatly simplifies cept that the voltage swing is more precisely defined by inSLDRAM module layout and reduces costs. system calibration. The source impedance of SLIO drivers is calibrated to produce a 0.7-V swing on the bus through Direct RDRAM bus structure 20-Ω series stub resistors. The bus terminates at 28 Ω and The Direct RDRAM bus achieves high bandwidth in a long, voltage Vterm (defined as 50% of VDD). Using source and sink nearly stubless system. Figure 2 illustrates the basic physical currents of ±12.5 mA, the high-low signal swing is symmetfeatures of a Direct RDRAM bus implementation in a typical rical above and below Vterm. As shown in the figure, total PC. On the basis of available information,7 we estimate the bus SLIO signaling power is 15.6 mW per line. For an 18-line into be approximately 575 mm long. It begins on the PC main terface, this adds up to 0.28 W of signaling power. However, board with the memory controller at one end and winds its only 0.15 W of that dissipates in the SLDRAM package. way through three modules plugged into connectors on the In the Direct RDRAM signaling scheme, 28-Ω bus termimain board. It transfers from module to module and finally nators pull up signals to the system-supplied Vterm voltage

44

IEEE DESIGN & TEST OF COMPUTERS

.

(+1.8 V) in the absence of drive, for a logic 0. Any de- VDDQ = 2.5 V Vterm = 1.8 V Vterm = 1.25 V vice on the bus can assert a logic 1 by sinking 28.6 mA of current, using an open-drain Rterm = 28 Ω Rterm = 28 Ω I = ± 12.5 mA NMOS structure. Each device I = 0/28.6 mA on the bus adjusts output current sink automatically to 1.8 V 1.85 V 1.6 V Rs = 20 Ω (1.0 V) (0.65 V) (0.9 V) keep the signal swing nominally at 0.8 V. Consequently, when asserting a low level, ‘1’ 1.6 V ‘0’ Vterm 1.8 V Voh the bus dissipates 51.5 mW of signaling power per line, VSSQ = 0 V Vref = Vterm Vref 1.25 V 1.4 V VSSQ = 0 V adding up to 0.93 W of worst‘0’ ‘1’ 0.9 V 1.0 V Vol Vol case power for an 18-line interface. The worst-case output drive power dissipatPower dissipation Power dissipation ed on chip is 0.51 W. No Rterm 4.4 mW Data '1' Data '0' power is dissipated when the Rs 3.1 mW Rterm 22.9 mW 0 mW entire bus pulls up to Vterm. Driver 8.1 mW Driver 28.6 mW 0 mW For typical data patterns with Total 15.6 mW Total 51.5 mW 0 mW (b) a 50-50 mixture of 1’s and 0’s, (a) the average interface power across the bus and over time Figure 3. SLDRAM (a) and Direct RDRAM (b) signaling. is 0.46 W, of which 0.26 W dissipates on chip. Under typical conditions, the Direct RDRAM interface dis- other 200 ps for noise and cross-talk jitter effects, we have a sipates 73% more on-chip power than the SLDRAM interface. total data transfer uncertainty of 400 + 400 + 200 = 1,000 ps. Although Direct RDRAM operates at twice the frequency of If we bring data-pattern-dependent skew into the picture, SLDRAM, the larger Direct RDRAM I/O power has nothing we can estimate the maximum read data transfer rate posto do with the frequency of operation. Power dissipated in sible for a particular memory system. An interesting aspect of pattern-dependent skew is that it a terminated bus configuration would be the same at 1 bps or 1 Gbps. The difference is due largely to the open drain is virtually nonexistent up to some critical bit rate. From versus the push-pull drivers. The bandwidth of the two sys- there, it worsens gradually at a nearly constant rate as the tems is limited by bus topology, not the I/O interface, as we bit rate increases. This increase in skew depends largely on will show in the following sections. Minimizing on-chip pow- the bus’s settling properties and data transitions occurring er dissipation to meet thermal constraints is a key challenge both before and after the bus has settled. For example, if the bus settles in 2 ns, pattern-dependent skew starts at 500 in high-bandwidth DRAM architectures. Mbps, or 1⁄(2 ns). At some high bit rate, near the available bandwidth figures obtained from the simulations presentBandwidth In memory systems using high-speed buses, data uncer- ed here, skew increases dramatically as signals start coltainty is the primary mechanism limiting data transfer band- lapsing due to slew rate and ac bandwidth limitations. When width, not the ac bandwidth of the components comprising total data dispersion, or skew, exceeds the data bit period, the data transfer system. For example, for data reads, sup- the latency of transferred data is uncertain, and the system pose that the dispersion of data applied to the bus is 200 ps pipeline breaks down. In other words, data bus skew exand the accuracy of the timing system clocking data onto ceeding the data bit period closes the data eye completely, the bus is 200 ps. Then the data uncertainty for memory making reliable data recovery impossible. Static skew is chips on the bus is 400 ps. Likewise, assume that the dis- fixed for a given system and depends on printed circuit persion of data setup and hold times to the memory con- board layout, lead frame design, and loading imbalances. troller is 200 ps and the data sampling clock accuracy is 200 Dynamic skew includes all skew parameters that vary with ps. Then the uncertainty for read data clocked into the con- time, such as edge jitter resulting from noise, cross-talk eftroller from various memory chips is 400 ps. Allowing an- fects, and data-pattern-dependent effects.

JANUARY–MARCH 1999

45

.

BUS STRUCTURES

Table 1. Bus simulation conditions. Parameter

SLDRAM

Total bus length Loads Stub length Module connector spacing Main board intersignal shielding Module intersignal shielding Pad capacitance, Cp Bus vias and solder lands, Cv Lead frame model Driver source type Driver source resistance, Rq Series stub resistance, Rs Terminating resistance, Rt Terminating voltage, Vterm Reference voltage, Vref Nominal high level, Vih Nominal low level, Vil Nominal driver source current, Ioh Nominal driver sink current, Iol HSPICE stimulus ramp time

175 mm 8 20 mm 15 mm Yes No 2.0 pF 0.5 pF U model Voltage 52 Ω 20 Ω 28 Ω 1.25 V 1.25 V 1.6 V 0.9 V 12.5 mA 12.5 mA 500 ps

Table 2. Microstrip physical parameters (in mils).

Parameter Module bus (loaded) Track width, WD Track spacing, SP Track thickness, TH Track height above plane, HT Module bus (unloaded) Track width, WD Track spacing, SP Track thickness, TH Track height above plane, HT Main board Track width, WD Track spacing, SP Track thickness, TH Track height above plane, HT

SLDRAM

Direct RDRAM

NA NA NA NA

5 11 2.7 8

5 15 1.4 5

26 53 2.7 8

5 15 1.4 5

16.5 22.5 2.7 5

Bus simulation models We extracted the net lists used in the HSPICE bus simulations from circuit schematics representing the SLDRAM and Direct RDRAM bus configurations in reduced and sim-

46

plified form. Each includes only five bus wires to model Direct RDRAM bus behavior. We used no active components in the simulations. We modeled 575 mm 32 the SLDRAM CMOS drivers