oscilatorless clock multiplication - Semantic Scholar

OSCILATORLESS CLOCK MULTIPLICATION Rui L. Aguiar, Dinis M. Santos Dpt. Electrónica e Telecomunicações, Universidade de Aveiro / Instituto de Telecomunicações Campus de Santiago, 3810-193 Aveiro, Portugal, Tel: +351.34.370200, Fax: +351.34.381128. Email: [email protected], [email protected]

ABSTRACT This paper presents a technique for clock multiplication without local oscillators. This technique uses a DLL, thus presenting lower jitter than traditional PLL-based oscillator systems. Furthermore, it provides directly 50% duty-cycle clocks. This method is implemented both in a programmable custom circuit able to perform clock multiplication with integer factors from 2 to 8, and in a simpler hybrid system. Both simulations in the full-custom design and experimental results in the hybrid system support our proposal.

1. INTRODUCTION Clock multiplication is frequent in digital systems. This is due to interconnection problems, which impair widespread distribution of the high clock frequencies available inside current chips. Traditional strategies use high frequency local clocks synchronized with lower frequency global clocks [1]. This strategy resorts to Phase Locked Loops (PLL) locally placed. These PLLs derive the high frequency local clocks through multiplication of the global clock; this multiplication is achieved by the introduction of a divider in the PLL feedback loop. Flexibility and power considerations often require variable ratios between the local and global clocks [2]. This implementation technique is further complicated by the requests usually laid on duty-cycles. A 50% clock duty-cycle is usually desirable due to clocking strategies requirements [3]. However, both typical integrated local oscillators and long global interconnections cause distortion on clock waves. Both effects may lead to changes in the duty-cycle of the signal derived by the PLL. To solve this problem, a signal with the double of the required frequency is internally created in the local oscillator and the local clock signal is derived from the output of a divider. This guarantees the desired 50% duty-cycle [3, 4]. This process presents several problems. First, the PLL (with its local oscillator) introduces significant jitter in the clock. Second, the 50% duty-cycle requirement demands the synchronization electronics to operate controllably at a frequency double than the required for the rest of the circuit, just for duty-cycle purposes. Both these problems become more and more important as clock frequencies increase in recent CMOS technologies.

multiplication factor, this method has an implementation complexity similar to traditional methods with local oscillators. However, it presents lower jitter than these oscillator-based methods. Furthermore, this method provides inherently a 50% duty-cycle, and thus avoids the controlled higher frequencies required in PLL-based circuits. The paper is organized as follows. In Section 2 we discuss the proposed clock multiplication architecture concept. In Section 3 we describe a full-custom ASIC circuit for implementing this concept with programmable multiplication factors. In Section 4 we discuss the results achieved so-far, both with simulations in this custom system, and with an experimental board using specialized circuits and off-the-shelf components. Section 5 presents the major conclusions of this work.

2. CONCEPT For simplicity, we will first discuss the proposed technique in terms of a basic model for a 4x multiplication factor. Fig. 1 schematically shows an instance of the proposed multiplication technique. Conceptually, for a multiplication factor of four, the circuit uses a RS flip-flop, some auxiliary logic (transition detectors) and a Delay Locked Loop (DLL) with eight delay elements. A DLL is a voltage-controlled delay line inserted in a feedback loop which forces the synchronization of the output (clock) signal with the input clock signal; this delay line is controlled by a phase detector (PD). Note that these elements (flipflop, phase detector, controlled delay elements) are the same elements required in traditional methods using a PLL for frequency multiplication. The DLL synchronizes in such a way that presents a total delay of one clock cycle. Thus the eight delay elements can be considered as a time division line, with 1/8 of period between them (see Fig. 2 for the waveforms at the reference points in Fig. 1, assuming no propagation delay inside the logic). These delay elements are then connected through transition detectors (see next section) to two B1

Q FF

B2

B3

A2

A3

Out

R B4

A1

Clock

DE

DE

DE

DE

DE

DE

DE

DE

A8

Controlled Delay Line

PD

Clk_Ctrl

Clk_Ref

Delay Locked Loops (DLL) are well known in clock distribution applications [5, 6]. They have lower jitter than PLLs designed with similar delay elements [7], but are not able to “generate” any signal without input signals. Thus DLLs are traditionally considered unsuitable for clock multiplication applications [8]. This paper presents a method for clock multiplication using DLLs. For a fixed

S

transition detectors

DLL

Figure 1. Conceptual circuit for a 4x clock multiplier.

A1 A2 A3

A8

lower than this limit due to critical path constraints). The circuit elements for our multiplication technique have simple implementations, and do not inherently pose timing constraints lower than these. The main frequency issue in this circuit would be the very fast outputs of the transition detectors: however these are embedded in the RS-flip-flop control (see next section), and are essentially local pulses, without any width control, easy to implement.

3. PROPOSED SYSTEM

B1

B3

B8 R S Out

Figure 2. Conceptual waveforms in the 4x clock multiplier. different OR gates, in an alternate fashion (all odd elements are connected to one of the gates, and all the even elements are connected to the other gate). These create two impulse sequences with four times the clock frequency, with the sequences separated by 1/8 of the input clock period. These clock pulses drive the RS flip-flop, forcing state transitions. (In the general case, this clock multiplication mechanism can be conceptually implemented for a multiplication factor F by applying a clock signal to a DLL with 2xF delay elements connected to 2xF “transition detectors”). Thus the flip-flop will switch states at twice the multiplication factor, creating the desired output clock (with a frequency four times larger than the original) with the 50% duty-cycle. Furthermore, this clock has a 50% duty-cycle, as all clock pulses depend on the same clock transition (ascending, in the case of the figure), and on the synchronization characteristics of the DLL. The input clock dutycycle is not a factor on this circuit, as long as reasonable duty-cycle values are kept (that is, the high and low levels of the input clock have to take longer than the individual propagation time of each element in this circuit). As DLLs present smaller phase noise than PLLs [4], this method provides a 50% duty-cycle output clock with smaller jitter than previous approaches. Furthermore, this method can be easily extended to provide multiple clock phases (with 50% duty-cycle), using adequately chosen signals from the DLLs. This system can be conceptually implemented with a T flip-flop also. Another alternative would remove the transition detectors, and combine the outputs of these elements to create the desired output clock. However, the RS flip-flop implementation both allows for the withdrawal of the pulse detectors as such, and provides an output clock with very steep transitions, in a very compact implementation. Note that this circuit does not present higher frequency requirements than those presented by the typical logical circuits requiring these clock frequencies. For a given CMOS technology, clock frequency is fundamentally limited in digital circuits by the setup and hold times of registers (although in practice it is much

The full-custom system developed is able to multiply clocks with any integer factor between 2 and 8. This circuit (illustrated in Fig. 3) was implemented with a standard double metal, double poly, 0.8µm CMOS technology. It uses a DLL with 16 delay elements, already developed for rail-to-rail clock distribution applications [5] (maximum frequency: 100MHz). These delay elements are simple inverters with controllable switching current. The PD is based in a Müller C-element, and has already been tested in several clock related applications [5, 9]. The RS flip-flop is implemented as a simple cross-connected stable circuit, driven by suitably sized logic gates. This register requires careful design, in order for the delays from both inputs to the output Q to be equal. The logic gates discussed previously were actually implemented as transistors connected to the RS flip-flop, according to the concept briefly summarized in Fig. 4. Several parallel control blocks are connected to the RS flip-flop (in the general case F circuits as those indicated in the highlighted block in Fig. 4 are required, all connected in parallel to points A, B and C). All the (sequentially delayed) outputs of the DLL are connected to these blocks, in such a way that each control signal to a flip-flop input blocks the previous signal and will be blocked by the succeeding DLL output. These connections alternate between the R PD

Clk_Ref

B2

DLL

Clk_Ctrl

Clock

Switching delay compensation

Controlled Delay Line Clock

DE

DE

DE

DE

DE

DE

DE

DE

DE

DE

DE

DE

DE

DE

DE

Switching and Transition Detection Matrix

DE

Out_Clk 8

32 8

Out

Control Logic

S

Q FF

R

Selection Bits 3

Figure 3. Programmable Clock Multiplication System. A edge i-1 Q

B edge i edge i+1

C

parallel blocks

)

Figure 4. Implementation for edge-active RS flip-flop. Only conceptual elements are shown, without any timing or level considerations.

and S inputs. As a consequence, the working frequency depends fundamentally on these transistors connected to the RS register. The maximum multiplication factor depends on the performance of the DLL, both in terms of internal pulse width distortion and in terms of PD characteristics (specially its dead-band). The final circuit is more complex than this in order to support programmability. The outputs of the delay elements are connected to two selection circuits. These circuits (the “switching and transition detection matrix”) perform two functions. The first function is the selection of the “final” delay element of the DLL; this creates a variable size DLL, in order to support the variable multiplication factor. The second function is the transition detection for each output; this detection has to be made in a controllable (enabled) way, as the outputs of these detectors are always connected to the transistors driving the RS flip-flop inputs. Enabling threshold detection is simply performed by resorting to adequate elements for timing and proper control signaling, i.e. the DLL outputs are connected both to inverters and logical switches (these are not represented in Fig. 4) before driving this edge-active RS flip-flop. This control logic selects the proper size of the DLL in function of the selected multiplication factor, thus using the delay elements as described in the previous section (e.g. for the multiplication factor of 4, the output of the eighth delay element will be connected to the PD), and disables signal propagation for the delay elements outside the DLL loop.

4.2 Hybrid Implementation As mentioned, the DLLs have already been implemented in a test circuit for clock distribution applications [5]. Their availability allowed the design of a test board with the DLLs and some extra off-the-shelf logic, implementing the concept depicted in Fig. 3 with some restrictions. This test board was only able to multiply the clock frequency by two, three or four, with manual programming. Nonetheless this board presented strong performance problems, due to conflicting frequency limitations. For one side, the commercial CMOS logic used presented clear frequency limitations, making impossible its general usage for input frequencies larger than 5MHz. On the other side, the DLLs had low frequency limits: they were originally developed for input frequencies between 25MHz and 100MHz, and thus presented noisy and unreliable operation below 5MHz. Even so, the board was able to provide satisfactory results at input clock frequencies around 5MHz. Figures 7 to 9 present the results of the circuit with several different signals. All figures present the output clock on top, and the input 5MHz clock signal on bottom. Figure 7 shows the system operating with an 80% duty-cycle clock, and with a 10MHz output signal. Figure 8 shows the input clock with 50% duty-cycle, and a 15MHz output signal. Finally, Fig. 9 shows a 20% duty-cycle input clock, and a 20MHz output clock. Note that the output clock does not present the expected 50% duty-cycle. This is due to frequency

Figure 3 also shows the time compensation circuit required in the PD due to the delays inside the switching matrix and the control logic for these circuits. Dummy elements are also used in several points of the circuit for maximizing element matching, which is critical in this circuit. Note that after each gain switching, it is essential to wait the time required for the DLL to achieve synchronization: this process is the critical part in our technique. Unfortunately the PD used has stable false synchronization points, and thus some extra control logic is required.

4. RESULTS 4.1 Full Custom Circuit Several simulations were run on the proposed system. All simulations are shown after the DLL has entered synchronism, as the system is not able to dynamically change the clock multiplication factor. Figure 5 shows a simulation run with a 100MHz input clock, and a multiplication factor of 5. The input clock had a 65% duty-cycle. As expected the output clock has a 500MHz frequency with the expected 50% duty-cycle. In Fig. 6 we present simulation waveforms for another 100MHz input clock, but with the multiplication factor of 8. The output clock shown is an 800MHz, nearly 50% duty-cycle, clock. The small imprecision in the dutycycle is due to propagation delays in the 5V digital elements used in this circuit. With proper high-frequency circuit optimization this circuit could output 1GHz clocks for input frequencies around the 100MHz.

Figure 5. Simulation results for a multiplication factor of 5 with a 65% input clock. From top to bottom: input clock; clock signal inside the DLL; output clock.

Figure 6. Simulation results for a multiplication factor of 8 with a 50% input clock. From top: clock inside the DLL; output clock; typical pulsed signal at the output of a gate; DLL control voltage.

limitations of the discrete logic, as can be easily induced by the duty-cycle variation with the increase in the output frequency (for the 10MHz output clock, the duty-cycle is nearly 50%, even with an 80% duty-cycle input clock). Nevertheless, this test board is able to provide a multiplied clock even with wide duty-cycle clocks at its input.

Figure 7. Measured response of the hybrid experimental set-up, with an 80% duty-cycle input clock, and a multiplication factor of two.

5. SUMMARY A technique for clock multiplication has been described. This technique allows clock multiplication to be performed with arbitrary integer factors. As this method uses DLLs, it presents potentially lower jitter than traditional oscillator-based methods. Furthermore, as the method is based on the propagation time of a single clock edge across the delay line, the resulting clock inherently presents a 50% duty-cycle, as long as the delay elements are equal. The technique has been demonstrated by two independent methods. For once, a full-custom circuit has been designed in a 0.8µm CMOS technology, with programmable multiplication factors between 2 and 8. Simulations run on this circuit showed the technique to be generally applicable, and identified performance problems in terms of the propagation delays inside the logic elements. Nevertheless near 1GHz output frequencies seem to be achievable with some design optimization. However, in this circuit, input clock frequencies have to be much lower than these, as the DLL used was originally designed for 100MHz signals. The method here proposed has also been demonstrated with a simpler hybrid system, with off-the-shelf logic and a full-custom DLL. Tests on this circuit verified the main characteristics of this clock multiplication technique, although performance problems were apparent due to conflicting frequency constraints.

6. REFERENCES

Figure 8. Measured response of the hybrid experimental set-up, with a 50% duty-cycle input clock, and a multiplication factor of three.

Figure 9. Measured response of the hybrid experimental set-up, with a 20% duty-cycle input clock, and a multiplication factor of four.

[1] Anceau, F., "A Synchronous Approach for Clocking VLSI Systems", IEEE Journal of Solid-State Circuits, vol. 17, nº1, Feb 1982. [2] Alvarez, J., Sanchez, H., Gerosa, G. and Countryman, R., "A Wide-Bandwidth Low-Voltage PLL for PowerPC Microprocessors", IEEE Journal of Solid-State Circuits, vol. 30, n. 4, Apr 1995. [3] Rabaey, J., Digital Integrated Circuits. Prentice-Hall, 1996. [4] Rusu, S. and Tam, S., “Clock Generation and Distribution for the First IA-64 Microprocessor”, IEEE Intl. Solid-State Conf. ISSCC’00, Feb. 2000. [5] Aguiar, R.L. and Santos, D.M., “Clock Distribution Strategy for IP-Based Development” in VLSI: Systems in a Chip, Silveira, L.M, Devadas, S. and Reis, R. (eds.). Chapman & Hill, 1999. [6] Friedman, E., "Introduction - Clock Distribution Networks in VLSI Circuits and Systems", Editor paper in the IEEE reprint Clock Distribution Networks in VLSI Circuits and Systems, IEEE Press, 1995. [7] Kim, B., Weingandt T., and Gray, P., “PLL/DLL System Noise Analysis for Low Jitter Clock Synthesizer Design”, ISCAS’94, IEEE International Symposium on Circuits and Systems, Jun. 94. [8] Yuan, M.-S. and Wang, C.-K., “PLL Circuits” in The VLSI Handbook, Chen, W.-K. (ed.), IEEE Press, 2000. [9] Vasconcelos, E., Aguiar, R.L., and Santos, D., “A 0.8 µm CMOS, 622 Mb/s SDH/SONET Communication System,” MWSCAS’99, 42nd Midwest Symposium on Circuits and Systems, Las Cruces, New Mexico, Aug 1999, pp.843-846.