high speed vedic multiplier designs - IEEE Xplore

Proceedings of 2014 RAECS UIET Panjab University Chandigarh, 06 – 08 March, 2014

HIGH SPEED VEDIC MULTIPLIER DESIGNSA REVIEW

Yogita Bansal

Charu Madhu

Pardeep Kaur

Department of Electronics and Communication University Institute of Engineering and Technology, Panjab University Chandigarh, India. Abstract—Multipliers are the key block in high speed arithmetic logic units, multiplier and accumulate units, digital signal processing units etc. With the increasing constraints on delay, more and more emphasis is being laid on design of faster multiplications. To enhance speed many modifications over the standard modified booth algorithm, Wallace tree methods for multiplier design have been made and several new techniques are being worked upon. Amongst these Vedic multipliers based on Vedic mathematics are presently under focus due to these being one of the fastest and low power multiplier. There are sixteen sutras in Vedic multiplication in which “Urdhva Tiryakbhyam” has been noticed to be the most efficient one in terms of speed. A large number of high speed Vedic multipliers have been proposed with Urdhva Tiryakbhyam sutra. Few of them are presented in this paper giving an insight into their methodology, merits and demerits. Compressor based Vedic Multipliers show considerable improvements in speed and area efficiency over the conventional ones. Keywords— Vedic Multiplier; Urdhva Tiryakbhyam (UT); Compressor Adder.

I. INTRODUCTION Vedic multipliers are based on Vedic Sutras. In Sanskrit word ‘Veda’ stands for ‘knowledge’. Vedic mathematics is believed to be reconstructed from Vedas by Sri Bharti Krishna Tirathaji between the years 1911 to 1918 [1]. The Vedic mathematics has been divided into sixteen different Sutras which can be applied to any branch of mathematics like algebra, trigonometry, geometry etc. Its methods reduce the complex calculations into simpler ones because they are based on methods similar to working of human mind thereby making them easier. It has been seen that being coherent and symmetrical, they consume lesser power and acquire lower chip area [1]. Designs based on Vedic Mathematics have been used in many applications like ALU, MAC etc. and have shown better results [2-6]. II. VEDIC MATHEMATICS SUTRAS Vedic Mathematics deals with Sixteen Sutras [7]. These sutras are given below alphabetically with their brief meaning. All these sutras have vast study. Discussion of all of them is beyond the scope of this paper. Only one Sutra number 14 “Urdhva Tiryakbhyam” has been discussed.

978-1-4799-2291-8/14/$31.00 ©2014 IEEE

1. Anurupye Shunyamanyat– If one is in ratio, the other is zero 2. Chalana-Kalanabyham– Differences and Similarities 3. Ekadhikina Purvena– By one more than the previous one 4. Ekanyunena Purvena– By one less than the previous one 5. Gunakasamuchyah– The factors of the sum is equal to the sum of the factors 6. Gunitasamuchyah– The product of the sum is equal to the sum of the product 7. Nikhilam Navatashcaramam Dashatah– All from 9 and the last from 10 8. Paraavartya Yojayet– Transpose and adjust 9. Puranapuranabyham– By the completion or noncompletion. 10. Sankalana-vyavakalanabhyam– By addition and by subtraction 11. Shesanyankena Charamena– The remainders by the last digit 12. Shunyam Saamyasamuccaye– When the sum is the same that sum is zero 13. Sopaantyadvayamantyam– The ultimate and twice the penultimate 14. Urdhva Tiryakbyham– Vertically and crosswise. 15. Vyashtisamanstih– Part and Whole 16. Yaavadunam– Whatever the extent to fits deficiency A. Urdhva Tiryakbhyam This sutra is based on “Vertically and Crosswise” technique. It makes almost all the numeric computations faster and easier. The advantage of multiplier based on this sutra over the others is that with the increase in number of bits, area and delay increase at a smaller rate in comparison to others [3]. In Fig. 1, this method is illustrated with the multiplication of two decimal numbers 325 and 738. The numbers of steps in the process depend upon the number of the digits being used. Digits on the two ends of the lines are multiplied and resultant is added to the carry from previous step. When the number of crossing lines in a single step is greater than one then they all are added along with the previous carry. After this, only the least significant digit of the resulting number is taken as product digit and rest are considered as carry digits. Initial carry is taken as zero. [8]

\

Fig. 1 Multiplication of two decimal numbers using Urdhva Tiryakbhyam [8]

Fig. 3 Block diagram for 2X2 Vedic Multiplier [9]

Another technique for the calculation of Urdhva Tiryakbhyam method is shown in Figure 2. In this technique, the numbers to be multiplied let us say 5498 and 2314 are written on the consecutive sides of the square table. On partitioning the square into rows and columns, each row/ column belongs to one of the digit of the two numbers to be multiplied such that every digit of one number has a small square common to the digit of other number. These small squares are further divided into two equal parts by crosswise lines. Now the each digit of one number is multiplied with every digit of second number and two digit products are placed in their corresponding square. The digits on crosswise line are added with previous carry. Digits on dotted significant digit of the resulting number are taken as product digit and rest are considered as carry digits. Initial carry is assumed to be zero here also. [7] The method can be extended for binary numbers. A simple 1-digit binary multiplication is described by AND gate operation. Using this and UT method 2X2 multiplication for a1a0 and b1b0 is implemented by 2 half adders and resultant bits are r2 (2 bits) r1r0 as shown in Fig. 3. The equations regarding this are given below. [9]

Higher binary multiplications can also be obtained with the help of lower multiplication units and the adder unit. The individual multiplication products are obtained by same partitioning method, ultimately using the 2X2 bit multiplication method. For NXN multiplication unit, we require four N/2 bit multipliers, two N bit full adders, one half adder and N/2 bit full adder to add the sum and carry of half adder shown in Fig. 5 [11-12]. High speed of multiplier depends highly upon speed of adder units used.

r0 (1bit) = a0b0 r1 (1bit) =a0b1 +a1b0 r2 (2bit) = b1a1 +c1 Product = r2&r1&r0

(1) (2) (3) (4)

Fig. 4 shows the general calculations for the 4 bit Vedic multiplier using the above sutra [10]

Fig. 2 Alternative way to calculate the Urdhva Tiryakbhyam [7]

III.

APPROACHES FOR HIGH SPEED VEDIC MULTIPLIER

A.

Design For Vedic Multiplier With Ripple Carry Adder Pushpalata [13] proposed architecture with the ripple carry adder in Vedic multiplication unit for 4 bit binary numbers. This architecture can be extended for higher bits like 8, 16, 32 bit multiplications. The 4X4 multiplier is implemented using 2X2 multiplier unit and Ripple carry adder as shown in Figure 6. N bit ripple carry adder consists of N-1 full adder and 1 half adder shown in Fig. 7[14]. This adder is also named as parallel adder because these full and half adders are arranged in parallel in such a way that each adder unit generates a sum bit and carry bit. The sum bit is taken as resultant bit and carry is transmitted to next adder unit as an input. The worst case computation time is function of N. In this approach, three 4-bit ripple carry adders are used and the combinational path delay is found to be 13.102 ns. Results are compared with Array and Booth Multiplier and it is observed that the execution time has been reduced for Vedic multiplier and thus proves to be better. Because the carry ripples and each next full adder has to wait for the carry coming from the previous adder, it takes time to propagate. This restricts the speed of this adder and thus proposed design.

Fig. 4 Block diagram for 4X4 Vedic multiplier [10]

Fig. 7 Ripple carry adder [14]

C. Fig. 5 Generalized block diagram for NXN multiplier [12]

B.

Low Power And High Speed Vedic Multiplier Fast and low power 16-bit multiplier architecture was proposed by R.K, R.S, S. Sarkar, and Rajesh [15] replacing ripple carry adder with the carry Lookahead adder as in Fig 8. The adder architecture consisted of two parts- Carry generator and Carry propagator. These parts generates the N+1th carry bit with the help of the initial carry and thus this does not need to wait for Nth carry to propagate. Since the carry is generated in advance in this adder, it decreases the carry propagation time and thus this architecture improves the operational speed. Fig. 9(c) shows tree like circuit for CLA for n=8 which consists of A and B modules. [14] A module in Fig 9a gives pi (ith carry propagate) and gi (ith carry generate) and sum bits outputs for Ai and Bi inputs. B Module in Fig 9b gives Block carry propagate, Block carry generate and carry bits which are used for large i values. The worst propagation delay for n-bit CLA is two unit delays2 of the A-module + 2 log2n -1 unit delays of the B-module. The power dissipated and propagation delay time is 0.17 mW and 27.15 ns respectively. A comparison of propagation delay, power dissipation and the number of transistors, made between this architecture, Array multiplier and the Booth radix 4 multiplier, shows that the Vedic multiplier with carry Lookahead adder is better than the other two in speed and power dissipation. But the number of transistors used increases in the proposed architecture.

Fig. 6 Block diagram for 4x4 Vedic multiplier using ripple carry adder [13]

Fast Vedic Multiplier Using Carry Save Adder . An effective design in speed had been proposed by Devika, Kabiraj and Rutuparna [9] for 8 and 16-bit multiplication. In this architecture, Adder Unit used is carry save. A MAC design implemented using this architecture, has shown better results Figure 10 depicts a block diagram for 4-bit Vedic multiplier with carry save adder. For N-bit multiplication, it requires N-bit carry save adder and N+1 bit Ripple carry adder. The carry save adder is used to add three or more N bit operands by generating the output of two N bit numbers in two sequences. One is having the N bit partial addition results and another is having the set of carry bits. Then a normal adder, generally Ripple carry adder is used to add these sets for the generation of final output. Unlike common adders like ripple carry adder, carry look ahead adder this adder does not has any carry propagation and has the propagation delay of a single full adder and delay does not change with the number of bits (n). Therefore for sufficiently large value of n, it is faster and smaller in size [16]. Fig 11 a. depicts n bit CSA which uses n full adders and produces n bit sum ‘S’ and n bit carry ‘C’. The carry save adder takes small transition counts and improves speed because in carry save adder, addition is performed in parallel without waiting for the result. The design is realized on two devices VERTEX2P:XC2VP2:-7 and SPARTAN3:XC3S50:-4 and the combinational delay are 13.07ns and 25.06 ns respectively for 8-bit and 18.58 ns and 36.09ns respectively for 16-bit multiplication. Appreciable results are observed when compared with Modified Booth Wallace Multiplier.

Fig. 8 Block diagram for 16-bit Vedic multiplier with CLA adder [15]

(a) (a)

(b)

(b)

(c) Fig. 9 Tree like structure for Carry Lookahead adder [14] a. A module b. B module c. Circuit diagram for CLA adder with A and B module.

Fig 11 a. Block diagram for n bit CSA [16] b. Symbol for CSA [16]

Divide and conquer technique has been used in this architecture. For 4X4 bit multiplication it is divided into four blocks and each given to 2-bit multiplier blocks such that P30[3:0] is partially generated products. In proposed additional tree structure, P0 considering 1st partial product will not be shifted, P1 and P2 both as 2nd partial product term will be shifted by two bits and MSB product P3 shifted by four bits with respect to P0 and by two bits with respect to P1 and P2, are added and gives the final result Q [7:0] as described in Figure 12a. This technique can be extended to N-bit Vedic multiplier as in Figure 12b. This architecture has given better results in comparison to Array, Booth, Wallace, Modified Booth Wallace Modifier etc. The maximum combinational path delay time for 8 and 16-bit multiplication obtained are 11.886 ns and 15.718 ns for device VERTEX2P:XC2VP2:-7 which are less than [9].

D.

Speed Efficient Design For Vedic Multiplier A tree multiplication design for Vedic multiplier had been proposed by Abhishek, Utsav and Vinod [17], which uses a new addition tree structure for the addition of partially generated products which is built on decimal arithmetic multiplication principle for three digit length of multiplier and multiplicand. (a)

(b)

Fig.10 Block diagram for 4 bit multiplier using carry save adder [9]

Fig. 12 Additional Tree structures a. Addition tree structure for four bit multiplier [17] b. N bit Vedic multiplier with additional tree structure [17]

E.

An Approach Using 7:2 Compressor Adders A novel architecture , proposed by Sushma, Sudhir , Kalpana and Surabhi [18] utilizing 4:2 compressors and 7:2 compressors for 4-bit and 8-bit multiplication respectively , appeared to be both area and speed efficient. With the increase in number of additions for higher bit multiplications, a circuit is required that can add them in a single step rather than using multiple full adders and half adders. A new method in Vedic Multiplication is to use the Compressor adder which can add more than three bits at a time as compared to Full adder circuit which can add only 3 bits at a time. Such circuit actually counts number of 1’s. It reduces the use of XOR gates and thus minimizes delay and uses MUXs which allowed only a single input to be high at a single time and thus causes decrease in critical delay. Such adders are found to be both high speed as well as low power circuits [19]. Fig. 13b shows the modified design for 4:2 compressors over the Conventional one Fig 13a and proved to have less propagation delay. A 7:2 Compressor with the help of modified 4:2 Compressor Fig 14a, has been used to implement 8X8 bit multiplier Fig.14b. The compressor based architecture needs only 12 parallel stages whereas conventional Vedic multiplier requires 15 stages and thus increases the speed. This compressor based Vedic multiplier is almost 1.12, 2.112 and 1.509 times faster than existing methods for Vedic multiplications, Booth and Modified Booth multipliers, respectively. Furthermore, regarding area, an improvement of 1% over the existing methods and 3% over the Modified Booth algorithm is observed. But area is increased in comparison with Booth methods.

(a)

ACKNOWLEDGMENT (Heading 5)

(a) (b)

Fig. 14 Compressor Technique a. 7:2 compressor using 4:2 compressor [18] b. Hardware architecture for 8-bit compressor based Vedic multiplier [18]

IV. (b)

Fig. 13 4:2 Compressor a. 4:2 compressors with full adders and half adder [20] b. Modified Design for 4:2 Compressor [20]

CONCLUSION AND FUTURE SCOPE

Vedic Multiplier is seen to be efficient in speed, power and area in digital designs with respect to other multipliers. Considering all the designs of it discussed above, we can conclude that the Compressor based Vedic multiplier with Urdhva Tiryakbhyam sutra is seen as a promising technique in terms of speed and area. The work can be further extended with the use of such multiplier in arithmetic logical unit,

multiply accumulator unit designs and comparing the results with existing designs for the same. REFERENCES [1] Saokar, S. S., R.M., and Siddamal, S.: “High Speed Signed Multiplier for Digital Signal. Processing Applications,” Proc. IEEE International Conference on Signal Processing, Computing and Control (ISPCC), Waknaghat Solan, 1517 March 2012, pp. 1 – 6. [2] Kumar, A. and Raman, A. : “Low Power ALU Design by Ancient Mathematics,” presented at IEEE ICAAE, Singapore, Feb. 2010, pp. 862-865. [3] Hanumantharaju , M.C. , Jayalaxmi, H., Renuka R.K. ,and Ravishankar, M. : “A High Speed Block Convolution Using Ancient Indian Vedic Mathematics, ” IEEE International Conference on Computational Intelligence and Multimedia Applications , Sivakasi, Tamil Nadu ,13-15 Dec , 2007, pp.169-173. [4] Prakash, A.R., Kirubaveni. S.: “Performance evaluation of FFT processor using conventional and Vedic algorithm,” IEEE International Conference on Emerging Trends in Computing, Communication and Nanotechnology (ICECCN), Tirunelveli, March 2013, pp. 89-94. [5] Saha, P., Banerjee, A., Dandapat, A., and Bhattacharyya , P. : “ASIC design of a high speed low power circuit for factorial calculation using ancient Vedic mathematics,” Elsevier Microelectronics Journal, 2011, vol. 42: 13431352. [6] Thanushkodi, K. , Deena Dayalan , K. , Dharani, P. “A Novel Time and Energy Efficient Cubing Circuit Using Vedic Mathematics for Finite Field Arithmetic,” IEEE International Conference on Advances in Recent Technologies in Communication and Computing, Kottayam, Kerala, 27-28 Oct. 2009, pp. 873 – 875. [7] Tiwari, H.D., Gankhuyag, G., Kim, M. , and Cho, B.: “Multiplier design based on ancient Indian Vedic Mathematics,” IEEE Proc. International SoC Design Conference, ISOCC, Busan, 2008, pp. II-65 - II-68. [8] Kunchigi, V., Kulkarni, L. and Kulkarni. S.: “High speed and area efficient Vedic multiplier,” Proc. IEEE International Conference on Devices, Circuits and Systems (ICDCS), Coimbatore, 2012, pp. 360 – 364. [9] Jaina, D., Sethi, K., and Panda, R.: “Vedic Mathematics Based Multiply Accumulate Unit,” Proc. IEEE Conference on Computational Intelligence and Communication Systems (CICN), Gwalior, Nov. 2011, pp.754-757.

[10] Kayal, D. , Mostafa, P. , Dandapat, A. , Sarkar, C. K.: “Design of High Performance 8 bit Multiplier using Vedic Multiplication Algorithm with McCMOS Technique,” Springer Journal of Signal Processing Systems, 10.1007/s11265-013-0818-3 [11] Akhter, S. : “VHDL implementation of fast N X N multiplier based on Vedic Mathematic,” Seville, Proc. IEEE 27-30 Aug., 2007, pp. 472-475. [12] Chanda , M. , Banerjee , S. , Saha ,D. , and Jain , S. : “Novel transistor level realization of ultra-low power high-speed adiabatic Vedic multiplier,” IEEE Proc. International Multi-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), , Kottayam, March , 2013, pp . 801-806 [13] Verma, P.: “Design of 4X4 bit Vedic Multiplier using EDA Tool,” International Journal of Computer Application (IJCA), Vol. 8, June, 2012. [14] Cheng, F., Unger, S. H., Theobald M.: “Self-Timed Carry-Lookahead Adders,” IEEE Transactions on Computers, Vol. 49, NO. 7, July, 2000, pp. 659-672 [15] Bathija, R.K., Meena, R.S., Sarkar, S., Sahu, Rajesh. : “Low Power High speed 16X16 bit Multiplier using Vedic Mathematics,” International Journal of Computer Applications(IJCA), Vol. 59 -Number 6, December ,2012 [16] Taewhan ,Kim , Jao, W. , Tjiang, S. : “Circuit Optimization Using Carry–Save–Adder Cells,” IEEE Transactions on “Computer-Aided Design of Integrated Circuits and Systems”, Vol. 17, No. 10,1998, pp. 974-984. [17] Gupta, A., Malviya, U. , Kapse, V.: “Design of Speed, Energy and power efficient Revesible logic based ALU for digital processors,” IEEE Proc. NUiCONE, Ahmedabad, 6-8 Dec, 2012, pp. 1-6. [18] Huddar, S.R., Rupanagudi, S.R., M., Mohan, S.: “Novel high speed Vedic mathematics multiplier using compressors,” IEEE International multi Conference, 2013, pp.465-469. [19] Chang, C.H. , Gu , J. , Zhang , M. : “Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits,” IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 51 , No. 10, 18 October 2004, pp. 1985 - 1997 [20] Radhakrishnan, D. , Preethy, A.P. :“Low power CMOS pass logic 4-2 compressor for high-speed multiplication,” Circuits and Systems, Proc. 43rd IEEE Midwest Symp. , vol. 3, 2000, pp. 1296-1298.