A Design Technique for Energy Reduction in NORA CMOS Logic

2 downloads 0 Views 786KB Size Report
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. ... Index Terms—Charge recycling, low-power design, NO RAce. (NORA) ..... [2] K. Roy and S. Prasad, Low-Power CMOS VLSI Circuit Design. Sin-.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 12, DECEMBER 2006

2647

A Design Technique for Energy Reduction in NORA CMOS Logic Konstantinos Limniotis, Yiorgos Tsiatouhas, Member, IEEE, Themistoklis Haniotakis, Member, IEEE, and Angela Arapoyanni, Member, IEEE

Abstract—In this work, a design technique to reduce the energy consumption in NO RAce (NORA) circuits is presented. The technique is based on a unidirectional switch topology combined with a new clocking scheme permitting both charge recycling between circuit nodes and elimination of the short circuit current. Calculations proved that energy savings higher than 20% can be achieved. Simulation results from NORA designs in a 0.18- m CMOS technology are presented to demonstrate the effectiveness of the proposed technique to achieve both energy and energy-delay product reduction. Index Terms—Charge recycling, low-power design, NO RAce (NORA) CMOS circuits.

I. INTRODUCTION HE growing demand for portable, battery operated systems makes low-power design one of the most important areas of microelectronics. There are three major sources of power consumption in digital CMOS circuits: the short-circuit current due to the direct current path between the power supplies, the dynamic current due to the switching activity of the circuit, and the leakage current due to the devices’ leakage current [1], [2]. Methods proposed in the open literature for low-power digital-circuit design deal with the elimination of the short-circuit current by waveform shaping [3], the dynamic current reduction by voltage scaling [1], power-down strategies, architecture driven [1] and scheduling [4] techniques, charge recycling [5]–[12], voltage swing reduction [13], [14], adiabatic logic design styles [15]–[18], gate resizing [19], and also the reduction of the leakage current by multi and variable threshold voltage design techniques [20]. Few low-power design techniques have been proposed in the open literature for dynamic design styles. In [17], an adiabatic dynamic logic family is presented that combines adiabatic theory with conventional CMOS dynamic logic. Low-power oriented differential logic families based on charge recycling between the differential output nodes of the circuits have been

T

Manuscript received September 21, 2005; revised June 19, 2006. This work is supported in part by the Greek Ministry of Education and the European Social Fund (ESF) within the framework of project “PYTHAGORAS II.” This paper was recommended by Associate Editor M. Stan. K. Limniotis and A. Arapoyanni are with the Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 15784 Athens, Greece (e-mail: [email protected]; [email protected],gr). Y. Tsiatouhas is with Department of Computer Science, University of Ioannina, 45110 Ioannina, Greece (e-mail: [email protected]). T. Haniotakis is with Department of Electrical & Computer Engineering, Southern Illinois University, 62901 Carbondale USA (e-mail: haniotak@siu. edu). Digital Object Identifier 10.1109/TCSI.2006.885690

proposed in [21] and [22]. Moreover, a low-power differential current switch logic family has been introduced in [23]. Finally, low-power Domino logic designs, based on low-voltage swing techniques, have been presented in [24] and [25]. In this paper, we propose a low-power design technique for NO RAce (NORA) circuits, which is based on the charge recycling concept to reduce dynamic energy dissipation. According to this technique, charges stored at circuit internal nodes during a clock cycle are reused in the following cycle by means of a new recycle switching topology, which enables unidirectional charge transfer between circuit nodes. In addition, as a side effect, the short-circuit current is eliminated. The paper is organized as follows. In Section II, the principles of the charge recycling technique are presented while in Section III, the new recycle switch is introduced and the operation of the proposed circuit is analyzed. Finally, in Section IV, simulation results on two case studies are presented and in Section V the conclusions are drawn. II. CHARGE RECYCLING CONCEPT IN NORA LOGIC The NORA or np-CMOS design style has been proposed as a race-free dynamic CMOS technique for pipelined circuits [26]. NORA logic is constructed of cascaded nMOS and pMOS dylatches, as it is shown namic logic networks that end on in Fig. 1. A clock signal CLK and its complement CLKB are utilized for the circuit operation which is divided in two phases, the precharge and the evaluation. In the precharge phase the latch is in the hold mode of operation while in the evaluation phase it is in the transparent mode of operation. For the circuit in Fig. 1 the precharge phase starts when the CLK and are precharged to signal turns to “low” and nodes is pre-discharged to “low.” Node is at “high” while node a high- state keeping its previous value. The evaluation phase starts when CLK rises to “high” and the values of nodes , , and are determined according to the logic functions implemented by the corresponding logic networks. A circuit with the above clocking style is defined as a CLK-section. By interchanging the CLK and CLKB signals in Fig. 1 a NORA circuit with the precharge phase starting when CLKB is “low” (CLK is “high”) is constructed. In that case the circuit is defined as a CLKB-section. As it has been mentioned earlier, the NORA design style has been mainly proposed for the implementation of pipelined logic structures. According to this design approach, a circuit is constructed of a sequence of CLK and CLKB seclatches to form a pipeline tions that are separated by structure. When the CLK-sections are in the precharge phase the CLKB-sections are in the evaluation phase using as inputs

1057-7122/$20.00 © 2006 IEEE

2648

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 12, DECEMBER 2006

Fig. 1. NORA logic design technique.

the values stored in the latches of the previous section and vice-versa. Compared to the popular Domino logic, the NORA design style has the disadvantages of using: a) pMOS dynamic logic networks; and b) both the clock signal and its complement for its operation. On the other side, NORA is a complete logic family capable of providing inverted signals (which is not true for Domino) while it is fully compatible with Domino logic permitting the use of Domino gates in a NORA design. Moreover, NORA has reduced intrinsic delay stages and in general presents less silicon area requirements due to its increased logic flexibility with respect to Domino [27]. For the sake of simplicity, let us consider in our discussion the NORA circuit of Fig. 2(a), consisting of two cascaded gates. The first gate is constructed of an nMOS dynamic logic network (transistor MN) while the second of a pMOS dynamic logic network (transistor MP). During the precharge phase ( ) the output node, , of the former is precharged to through transistor M1 while the output node, , of the latter is discharged to Gnd (ground) through transistor M4. In the evaluation phase the logic networks of the gates may be activated (disthrough MN and M2 and charging through MP charging and to their precharge states) and M3) or not (keeping depending on the input signal IN. In a clock cycle (an evaluation and its successor precharge phase), whenever the logic networks of the gates are activated during the evaluation phase (active evaluation phase) the total ) on nodes and is prodissipated dynamic energy ( vided by the well-known equation that follows [1], [28]: (1) where and are the dissipated dynamic energies on the and are the first and the second gate respectively while corresponding equivalent capacitances at their outputs. Half of this energy is dissipated during the evaluation phase and half during the precharge phase [29]. In case that the evaluation paths are not activated during a clock cycle the total dissipated dynamic energy is obviously zero. In what follows whenever we deal with energy dissipation we refer to dynamic energy dissipation. Our target is to reduce the energy dissipation by recycling part of the used charges in an active evaluation phase of the cir-

Fig. 2. Charge recycling concept in NORA circuits.

cuit. Towards this direction, we proceed placing an ideal switch (SW) that connects the outputs of the two gates as it is presented in Fig. 2(b). Furthermore, we insert an extra (optional) phase between the evaluation and its successor precharge phase. We will call this new phase recycle phase and its duration can be from zero to any desirable time interval. In practice we divide the original precharge phase into two sub-phases. During the re-

LIMNIOTIS et al.: DESIGN TECHNIQUE FOR ENERGY REDUCTION

2649

cycle phase the gates’ outputs are not connected to any power or Gnd). To achieve this capability an extra clock supply ( signal (CLKM) and its complement (CLKMB) are used as it is shown in Fig. 2(b). This clock signal has the same period as CLK but a higher duty cycle (see Fig. 2(c)). The duty cycle extension of CLKM is equal to the recycle phase duration. The switch is set to the “on” (conducting) state only during a recycle phase that follows an active evaluation phase. At the end of an active evaluation phase, the output of the first gate has been discharged to Gnd through its nMOS network while the through its output of the second gate has been charged to pMOS network. Thus, when SW is turned on a desirable charge sharing occurs between the two outputs. This is due to the fact that charges leave the output of the second gate (which is at a high potential and by the circuit operation will be discharged in the succeeding precharge phase) and reach the output of the first gate (which is at a low potential and by the circuit operation will be charged in the succeeding precharge phase). By this way, part of the charge used in an active evaluation phase, is recycled and reused in the subsequent precharge phase permitting eventual energy savings in the circuit operation. In more details, at the end of an active evaluation phase only is charged to with a charge equal to which is the total charge stored in the circuit output nodes. Providing the required time interval for a full recycle phase, that is the time to equalize through charge sharing, the voltages on both output nodes to and will be and respectively while the charge on the following equations will stand:

(2) and consequently (3) Then, in the next precharge phase, the output of the first gate will be charged from to while the output of the second to Gnd. gate will be discharged from The energy dissipated on the first gate ( ) in a clock cycle that contains an active evaluation phase will be the energy disto Gnd sipated to “flip a bit” discharging its output from during the evaluation phase plus the energy to charge back the to during the precharge phase, that is node from

(4) Similarly, the corresponding energy dissipated on the second gate ( ) will be

(5)

Fig. 3. Proposed charge recycling switch (SW).

From (4) and (5), it follows that the total energy dissipation when a full recycle phase is applied will be equal to

(6) From (1) and (6), we can easily derive that resulting in energy reduction when the proposed technique is apis provided by plied. In general, the energy reduction factor (7) (7) and it The maximum value of is achieved when is equal to 0.25. Although in the previous analysis a full recycle phase is considered, energy savings can be also achieved with reduced recycle times since we still enable a portion of the charge to be reused. III. PROPOSED CHARGE RECYCLING TECHNIQUE The energy reduction, discussed in the previous section, concerns an ideal switch SW. In practice, we need an efficient switching mechanism to permit charge recycling only after an active evaluation phase and never in the opposite case where extra energy dissipation may take place. A suitable recycle switch SW is proposed for this purpose. It consists of a diode-connected transistor (M5) in series with a pass transistor (M6) as it is illustrated in Fig. 3. The diodeconnected transistor determines the direction of the charge flow and ensures that charge transfer will occur only from the charged output node of the second gate to the discharged output node of the first one. Moreover, the pass transistor is driven by the complementary clock signal, CLKB, and ensures that the charge transfer will not take place during an active evaluation phase but just after this. In case that a recycle phase is inserted to the circuit’s operation, after an evaluation phase and before a precharge phase, then the modified clock signal, CLKM, is also used to drive the transistor M1 and its complementary, CLKMB, to drive M4 according to the signal waveforms of Fig. 2(c).

2650

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 12, DECEMBER 2006

The circuit operation is as follows. In an active evaluation ) the node is disphase ( is charged charged through MN and M2 while the node to through M3 and MP. The switch is in the “off” and state. Next, during the recycle phase ( ) the circuit is disconnected from the power supplies since the clocked transistors M1-M4 are in the “off” state. However, the transistor M6 turns to the conducting state to the node , and permits charge transfer from the node discharging the former and charging the latter. Finally, in the ), the precharge phase that follows ( node is further charged up to through M1 while the is further discharged to Gnd through M4 according to node the standard circuit precharge mechanism. In case that there is ), the circuit operno recycle phase present ( ation is the same as this of the standard NORA design except the fact that at the beginning of the precharge phase a part of is transferred to through the the charge stored on node switch. An interesting remark is that since the proposed switch is activated after the evaluation phase there are no unwanted (switch related) charge redistribution phenomena at the circuit internal nodes during an inactive evaluation phase. Charge redistribution in the evaluation phase is a well-known problem in dynamic designs. Similar to the analysis in Section II, at the end of an active is charged to with a charge equal to evaluation phase while is discharged and . Next, let us consider the application of a full recycle phase, which is defined as a recycle phase whose duration ends when the current through the switch and SW turns to zero. In that case the final voltages at nodes will be and , respectively, while the corresponding and will be and , respectively, and the charge on following equations will stand:

; or ii) , where and i) are the gate to source voltage and the drain to source voltage of transistor M6 respectively. Next, each case is considered separately. the following hold: i) For

In addition it stands that

This, by also considering (10), leads to

which is not valid. Consequently, assumption i) is not true. and since it ii) For comes out that

Thus, . The above analysis shows that (9) is true. From (8) and (9), it is easy to derive new expressions for the and voltages

(11) and (9), (11) lead to

(12) Thus, the energy dissipated on the first gate ( ) in a clock cycle that contains an active evaluation phase will be

(8) These equations hold under the assumption that the charge trapped on node between the diode connected transistor M5 and and the pass transistor M6 is too low compared to and thus negligible. Furthermore, it is proven that at the end of a full recycle phase and on nodes the relation between the final voltages and , respectively, is

(13) Similarly, the corresponding energy dissipated on the second gate ( ) will be

(9) where denotes the threshold voltage of an nMOS transistor. Indeed, at the end of a full recycle phase the current through the switch is zero. The current through M5 becomes zero when , where is the gate to source voltage of the diode connected transistor M5. In that case it stands that

(14) Summing the dissipated energies of (13) and (14), we get the total energy dissipation of the circuit in a clock cycle when a full recycle phase is applied

(10) where the voltage on node . Moreover in order the current through M6 to be also zero two cases are possible:

(15)

LIMNIOTIS et al.: DESIGN TECHNIQUE FOR ENERGY REDUCTION

2651

as a function of If we express ), then (18) can be written as

,

(where

(19) Considering for simplification the case where takes the following form:

, (19) (20)

Fig. 4. Energy reduction factor ( ) as a function of the capacitances ratio C =C ). (r

=

The dissipated energy utilizing the proposed switch is higher than this of the ideal switch but still lower than this of the orig). The new energy reduction inal circuit ( is given by the following equation: factor

V and V as previously, By setting is expressed as . the energy reduction factor Note that the expected values of are very small, especially for and . large-node capacitances However, an important parameter in the estimation of the overall energy dissipation and the possible reductions achieved by the proposed design technique is the switching activity of the the switching corresponding gates. Let us define as activity factor of the two gates in Fig. 3, that is, the probability that their outputs change state from the precharge value during the evaluation phase. Then, from (1), the average energy dissi, will be pation of the original circuit in Fig. 2(a), for (21) Considering (20), the average energy dissipation of the recycling circuit in Fig. 3 should be written as follows:

(16)

(22)

Equation (16) shows that the maximum energy reduction is provided when and that is better for small values. V and For example, in a 0.18- m technology with V the energy reduction factor with respect to this of the ideal switch is . In Fig. 4, a graphical representation of the energy reduction factor , for the above paradigm, , is given. as a function of the capacitances ratio In order to be more precise the body effect of transistor M5 should also be considered in (16). The threshold voltage can be written as

The second term of (22) expresses the average energy dissipation of the circuit when there is not any switching activity on and . Actually, this is the contribution of the switch nodes to the energy dissipation, since also in that case it dissipates energy switching on and off. According to the expressions of the energy dissipation for the original (21) and the recycling (22) versions of the circuit and V and V, the setting for simplification takes the following form: energy reduction factor

(17)

As it was expected, (23) shows that the switching activity is an important parameter in the decision to apply the recycling mechanism in a circuit. In Fig. 5 a 3-D graphical representation of as a function of and is presented. The higher the switching activity, the higher will be the energy reduction factor. Fortunately, the switching activity factor of dynamic gates is quite high, much higher than this of full CMOS gates [30]. For example, a two inputs full CMOS NOR gate has a switching factor equal to 3/16 (when the probability of logic “high” for each one of the inputs is 1/2) while the switching factor of its dynamic version is 3/4. Finally, note that the adopted clocking scheme eliminates the short circuit current, resulting in further energy reduction. Practically, in a NORA design short-circuit currents may be present only during the transition from the evaluation phase

where is the source to bulk voltage of M5 which is equal to since the bulk is grounded, is the threshold voltage , the gamma parameter of the nMOS transistor when the surface inversion potential. and Additionally, in the energy dissipation of the circuit under consideration we should take into account the energy dissipated to switch “on” and “off” the transistor M6. This additional term will be equal to where is the equivalent in capacitance at the gate of M6. Thus

(18)

(23)

2652

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 12, DECEMBER 2006

Fig. 5. Energy reduction factor ( ) as a function of the relative switch capacitance () and the switching activity (s).

to the precharge phase. However, the insertion of the recycle phase between these two phases, where the clocked transistors M1-M4 are in the non conducting state, results in the elimination of the short circuit currents. IV. CASE STUDIES The proposed low-power design technique has been applied in a commonly used, fast, NOR type, NORA 4 16 decoder and a NORA carry look-ahead (CLA) adder. The two circuits have been designed in a standard 0.18- m CMOS technology ( V and V). A. Decoder The decoder circuit is illustrated in Fig. 6(a). It consists of sixteen equivalent stages and each stage is composed of a decoding unit followed by a buffer. In Fig. 6(b), a modified stage is shown where the recycle switch is included. Both transistors of the recycle switch have the minimum size for the used techm m). nology ( SPECTRE simulations have been carried out for the evaluation of the proposed technique. In Fig. 7 simulated waveforms of the circuit operation with the three phases are presented. According to the simulations, a total energy reduction higher than 18% can be achieved with a negligible penalty in delay and silicon area. The experimental results are summarized in Table I considering various recycle times for the proposed design approach. The second column presents the total energy dissipation of the standard and the proposed design techniques. These measurements take into account also the energy dissipation of the clock signals. Next, in the third column, the percentage reduction in the dynamic energy due to the recycling mechanism is given while, in the fourth column, the total energy reduction (including both dynamic and short circuit related energies) is provided. As it is revealed about 1.5 percentage units in the total energy reduction is due to the short circuit related energy reduction. In order to exclude the energy dissipation of the standard design that is related to the short circuit phenomenon, we have also applied the modified clock signals (CLKM and CLKMB)

2

Fig. 6. (a) The 4 16 NORA decoder. (b) Stage of the decoder after the insertion of the recycle switch and the application of the modified clocks.

Fig. 7. Simulated circuit waveforms for a clock cycle.

to it. The energy dissipation of the standard design excluding the short circuit related energy is given within parenthesis in Table I. The simulations show that the recycling operation has been completed at 500-ps recycle time; yet by the time of 300 ps the 90% of the recycle operation has been accomplished. However, the energy reduction alone is not the only criterion for evaluating a low-power oriented design technique. In case that the design technique introduces extra delays in a circuit, it may be desirable this to be taken into account. A commonly

LIMNIOTIS et al.: DESIGN TECHNIQUE FOR ENERGY REDUCTION

2653

TABLE I EXPERIMENTAL RESULTS

used metric that combines both the energy dissipation and the time delay, for the circuit operation is the energy-delay product [2], [14], [29]. The delay penalty on the evaluation phase of the proposed design technique due to the recycle switch insertion is almost 1.7%. Considering next the precharge phase, it is easy to observe that in general the required precharge duration for a circuit is much smaller than this of the required evaluation duration. This is due to the fact that the precharge operation occurs in parallel to all the circuit while the evaluation takes place serially from the gates nearest to the circuit inputs towards those closer to its outputs. However, from the circuit operation, in a NORA pipeline structure the precharge phase duration must be equal to the evaluation phase duration. Consequently, there is enough time in a precharge phase to insert a recycle operation. Thus, considering the clock period of the system where the decoder may be embedded, the recycle time, being part of the original precharge phase duration, does not introduce any extra delays. In the fifth column of Table I, the percentage improvement in the energy-delay product is presented. Note that the silicon area penalty of the proposed design approach is less than 4%. B. The 32-bit CLA Adder The second circuit under consideration is a 32-bit NORA CLA adder. It is constructed as a two stages pipeline and each stage is composed of four 4-bit CLA units in ripple carry connection. The clock signals are applied to each stage in a complementary fashion establishing a CLK and a CLKB section, latches are used at the as it is proposed in [26] and outputs of each section. When the first section (stage) is in the precharge phase the second is in the evaluation phase and viceversa. Fig. 8(a) presents the 4-bit CLA unit (the dotted lines denote the modified clock signals of the recycling version) while Fig. 8(b) shows the NORA complex gate of the CLA generator carry [31]. Similar gates that provides the complement of the provide the rest of the carry outputs for each unit. A special attention has been given to the XOR gate implementation, since the basic principle of NORA design style is violated. This is a well-known problem for the NORA XOR gate where a precharged to high signal feeds a gate with also a precharged to high output signal, resulting in an inherent race

Fig. 8. (a) CLA adder topology. (b) The C carry generation complex gate.

problem. This problem is solved by utilizing the race-rescue mechanisms proposed in [32] for NORA logic. The charge recycling switches have been placed in the inrelated NOR/ terface between the propagate and generate NAND gates and the CLA generator as well as the interface between the CLA generator and the XOR gates. Minimum size transistors are used in the design of the switches. The original CLA adder has a stage evaluation delay of 683 ps. The stage evaluation delay of its charge recycling version is equal to 696 ps,

2654

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 12, DECEMBER 2006

V. CONCLUSION A low-power design technique for NORA circuits is presented in this work. It is based on the charge recycling approach and uses a unidirectional charge transfer topology and a new clocking scheme to allow charge recycling. This way, part of the charge used in a circuit operational phase is reused in a subsequent phase providing reductions in the required energy for the circuit operation. Moreover, due to the proposed clocking scheme, the elimination of the short circuit current is achieved. The proposed technique is characterized by insignificant delay penalty so that considerable reductions in the energy-delay product can be achieved that are also verified by the simulation results derived on circuit designs in a 0.18- m CMOS technology.

Fig. 9. Local “CLKM” signal generation unit.

higher than this of the original one by 13 ps, due to the addition of the charge recycling switches. This results in a small delay penalty (1.8% delay increase) for the recycling version. Due to the pipeline structure of the circuit, the time of the evaluation phase equals the time of the precharge phase. Since the recycling phase requires about 300 ps for a complete charge recycling operation and less than 100 ps are needed for the remaining precharging, we conclude that the recycling phase takes place without adding any extra delay with respect to the original circuit. Both circuits have been designed with minimum size precharge transistors. According to the simulations, the energy reduction achieved by the recycling adder version over the original one, for ten thousand random generated [linear feedback shift register (LFSR) based] input vectors, is 7.6% and the energy-delay product reduction is 5.9%. Finally, the silicon area cost is 5.7%. The above results exhibit that the energy reduction achieved in the Decoder is greater than the one achieved in the full adder; this is mainly due to the higher switching activity of the Decoder’s gates that results in higher energy savings, according to Section III. The distribution of an extra clock signal, like “CLKM,” in the entire circuit has a considerable contribution in the increase of energy dissipation. To alleviate it, the local generation of “CLKM” for each recycling unit of the dynamic circuit has been adopted, leading to very small additional energy dissipation. The generation of the “CLKM” signal is quite simple to be accomplished by replacing the final buffer that provides the clock signal to the unit with the buffer design that is illustrated in Fig. 9. This circuitry has been used in the above designs. Note that possible variations in the propagation delay of the dashed delay chain are not a critical design constrain. These variations affect the duration of the recycle phase. However, the charge recycling has an exponential behaviour in time and possible variations at the clock edge that defines the end of a full recycle phase have a very small effect on the overall recycled charge. Thus, the time duration of the recycle phase can be specified in a more or less relaxed manner, provided that the subsequent precharge phase will be given the required time duration.

ACKNOWLEDGMENT The authors would like to thank the anonymous referees for many useful comments which greatly improved the manuscript. REFERENCES [1] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, “Low-power CMOS digital design,” IEEE J. Solid-State Circuits, vol. 27, no. 4, pp. 473–484, Apr. 1992. [2] K. Roy and S. Prasad, Low-Power CMOS VLSI Circuit Design. Singapore: Wiley Interscience, 2000. [3] K. Y. Khoo and A. Willson, “Low power CMOS clock buffer,” in Proc. IEEE Int. Symp. Circuits Syst., 1994, vol. 4, pp. 355–358. [4] S. P. Mohanty and N. Ranganathan, “Simultaneous peak and average power minimization during data path scheduling,” IEEE Tran. Circuits Syst. I, Re. Papers, vol. 52, no. 6, pp. 1157–1165, Jun. 2005. [5] T. Kawahara, Y. Kawajiri, M. Horiguchi, T. Akiba, G. Kitsukawa, T. Kure, and M. Aoki, “A charge recycle refresh for Gb-scale DRAMs in file applications,” IEEE J. Solid-State Circuits, vol. 29, no. 7, pp. 715–722, Jul. 1994. [6] E. D. Kyriakis-Bitzaros and S. S. Nikolaidis, “Design of low-power CMOS drivers based on charge recycling,” in Proc. IEEE Int. Symp. Circuits Syst., 1997, pp. 1924–1927. [7] X. Wang and W. Porod, “A low-power charge-recycling CMOS clock buffer,” in Proc. 9th Great Lakes Symp. VLSI, 1999, pp. 238–239. [8] H. Yamauchi, H. Akamatsu, and T. Fujita, “An asymptotically zero power charge-recycling bus architecture for battery-operated ultrahigh data rate ULSIs,” IEEE J. Solid-State Circuits, vol. 30, no. 4, pp. 423–431, Apr. 1995. [9] Y. Moisiadis, I. Bouras, and A. Arapoyanni, “A CMOS differential logic for low-power and high-speed applications,” in Proc. Int. Symp. Circuits Syst., 2001, vol. IV, pp. 140–143. [10] B.-D. Yang and L.-S. Kim, “A low-power ROM using charge recycling and charge sharing techniques,” IEEE J. Solid-State Circuits, vol. 38, pp. 641–653, Jun. 2003. [11] A. Abbasian and A. Afzali-Kusha, “Pipeline event-driven no-race charge recycling logic (PENCL) for low-power application,” in Proc. Int. Conf. Electronic Circuits Syst., 2003, vol. 1, pp. 220–223. [12] Y. Tsiatouhas, K. Limniotis, A. Arapoyanni, and T. Haniotakis, “A low-power NORA circuit design technique based on charge recycling,” in Proc. Int. Conf. Electronic Circuits Syst., 2003, vol. 1, pp. 224–227. [13] H. Kojima, S. Tanaka, and K. Sasaki, “Half-swing clock scheme for 75% power saving in clocking circuitry,” IEEE J. Solid-State Circuits, vol. 30, no. 4, pp. 432–435, Apr. 1995. [14] H. Zhang and J. M. Rabaey, “Low-swing interconnect interface circuits,” in Proc. Int. Symp. Low-Power Electronics Design, Aug. 1998, pp. 161–166. [15] W. Athas, L. Svensson, J. Koller, N. Tzartzanis, and E. Chou, “Lowpower digital systems based on adiabatic-switching principles,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 2, no. 4, pp. 398–407, Dec. 1994.

LIMNIOTIS et al.: DESIGN TECHNIQUE FOR ENERGY REDUCTION

[16] S. Younis and T. Knight, “Asymptotically zero energy split-level charge recovery logic,” in Proc. Int. Symp. Low-Power Electronics Design, Aug. 1994, pp. 177–182. [17] A. G. Dickinson and J. S. Denker, “Adiabatic dynamic logic,” IEEE J. Solid-State Circuits, vol. 30, no. 3, pp. 311–315, Mar. 1995. [18] K.-Y. Cheung, “CRRDL: A novel charge recovery-recycling differential logic,” in Proc. Int. Symp. Circuits Syst., 2001, vol. 4, pp. 152–153. [19] P. Girard, C. Landrault, S. Pravossoudovitch, and D. Severac, “A noniterative gate resizing algorithm for high reduction in power consumption,” Integr. VLSI J., vol. 24, pp. 37–52, 1997. [20] T. Sakurai, “Reducing power consumption of CMOS VLSIs through V and V control,” in Proc. Int. Symp. Quality of Electronic Design, 2000, pp. 417–423. [21] B.-S. Kong, J.-S. Choi, S.-J. Lee, and K. Lee, “Charge recycling differential logic (CRDL) for low-power application,” IEEE J. Solid-State Circuits, vol. 31, no. 9, pp. 1267–1276, Sep. 1996. [22] J. Lee, J. Park, B. Song, and W. Kim, “Split-level precharge differential logic: A new type of high-speed charge-recycling differential logic,” IEEE J. Solid-State Circuits, vol. 36, no. 8, pp. 1276–1280, Aug. 2001. [23] D. Somasekhar and K. Roy, “Differential current switch logic: A lowpower DCVS logic family,” IEEE J. Solid-State Circuits, vol. 31, no. 7, pp. 981–991, Jul. 1996. [24] A. Rjoub, O. Koufopavlou, and S. Nikolaidis, “Low-power/low-swing Domino CMOS logic,” in Proc. IEEE Int. Symp. Circuits Syst., 1998, vol. 2, pp. 13–16. [25] A. Rao, T. Haniotakis, Y. Tsiatouhas, and H. Djemil, “The use of preevaluation phase in dynamic CMOS logic,” in Proc. IEEE CS Ann. Symp. VLSI, 2005, pp. 270–271. [26] N. F. Goncalves and H. J. De Man, “NORA: A racefree dynamic CMOS technique for pipelined logic structures,” IEEE J. Solid-State Circuits, vol. SC-18, no. 3, pp. 261–266, Jun. 1983. [27] K. Bernstein, K. Carrig, C. Durham, P. Hansen, D. Hogenmiller, E. Nowak, and N. Rohrer, High Speed CMOS Design Styles. Norwell, MA: Kluwer, 1999. [28] F. Moll, M. Roca, and E. Isern, “Analysis of dissipation energy of switching digital CMOS gates with coupled outputs,” Microelectron. J., vol. 34, pp. 833–842, 2003. [29] Low Power Design Methodologies, J. M. Rabaey and M. Pedram, Eds. Norwell, MA: Kluwer, 1996. [30] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits: A Design Perspective. Englewood Cliffs, NJ: Prentice-Hall, 2003. [31] A. Bellaouar and M. I. Elmasry, Low-Power Digital VLSI Design: Circuits and Systems. Norwell, MA: Kluwer, 1995. [32] C.-H. Huang, J.-S. Wang, C. Yeh, and C.-J. Fang, “The CMOS carryforward adders,” IEEE J. Solid-State Circuits, vol. 39, pp. 327–336, 2004.

Konstantinos Limniotis received the B.Sc. degree in informatics and the M.Sc. degree in communications systems and networks from the Department of Informatics and Telecommunications, University of Athens, Athens, Greece, in 1999 and 2002, respectively. He is currently working toward the Ph.D. degree at the same university. His research interests include VLSI design and low-voltage low-power design.

2655

Yiorgos Tsiatouhas (M’98) received the B.S. degree in physics, the M.S. degree in electronic automation, and the Ph.D. degree in computer science from the University of Athens, Athens, Greece, in 1990, 1993, and 1998, respectively. From 1992 to 1996, he was with the National Center of Scientific Research “Demokritos,” Athens, Greece. From 1998 to 2002, he was with Integrated Systems Development (ISD) S.A. as Cooperative Projects Director and Technical Manager of the Advanced Silicon Solutions Group. In 2002, he joined the Department of Computer Science, University of Ioannina, Ioannina, Greece, as a Lecturer. His research interests include low-voltage low-power design, memory design, VLSI testing, and design for testability. He is the main author or coauthor of more than 50 papers in scientific periodicals and conferences as well as two filed patents. Prof. Tsiatouhas is a member of the EDAA and the IEEE Test Technology Technical Council as well as a member of the IEEE International On-Line Testing Symposium program committee. He received the Best Paper Award of the 2002 International Symposium of Quality Electronic Design.

Themistoklis Haniotakis (M’98) received the B.S. degree in physics and the Ph.D. degree in computer science from the University of Athens, Athens, Greece, in 1991 and 1997, respectively. From 1991 to 1995, he was with the National Centre of Scientific Research (NCSR) “Demokritos,” Athens, Greece. During 1998 to 1999, he was a Senior Engineer in the Integrated Systems Development S.A. In 2000, he joined the Department of Electrical and Computer Engineering, Southern Illinois University, USA, as an Assistant Professor. His interests include, VLSI design, fault-tolerant computing, VLSI testing, and design for testability. Prof. Haniotakis is the main author or coauthor of more than fifty papers in scientific periodicals and conferences as well as in one filed patent. He received the Best Paper Award of the 2002 International Symposium of Quality Electronic Design. He is member of the IEEE and the Program Committee of the IEEE International On-Line Testing Symposium.

Angela Arapoyanni (M’99) received the B.S. degree in physics, the M.S. degrees in electronics and radioelectricity and in electronical automatism, and the Ph.D. degree in physics from the University of Athens, Athens, Greece, in 1973, 1975, 1976, and 1983, respectively. She was an Assistant at the Laboratory of Electronical Physics, University of Athens, from 1974 to 1983, Lecturer in the Department of Physics, Division of Applied Physics, University of Athens, from 1983 to 1988, and an Assistant Professor in Optoelectronics in the same Department from 1988. She is currently an Associate Professor in the Department of Informatics and Telecommunications, University of Athens. Since 1979, she participates to the Optoelectronics Research Group of the University of Athens. Since 1985, she has been teaching moelectronics to the students of physics and later to the students of informatics. She is the main author or coauthor in more than 65 papers in scientific periodicals and conferences. Prof. Arapoyanni received the Best Paper Award of the 2002 International Symposium of Quality Electronic Design.