A Polyphase-filter-based FFT for DFT Calculation in ...

IEEE ICC 2013 - Signal Processing for Communications Symposium

A Polyphase-filter-based FFT for DFT Calculation in LTE uplink Yanbin Yao∗† , Yongtao Su∗ , Shoujun Huang∗ and Jinglin Shi∗

∗ The

Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China † University of Chinese Academy of Sciences, Beijing, China Email: {yaoyanbin, ysu, huangshoujun01, sjl}@ict.ac.cn Abstract—In LTE uplink, a DFT with 34 possible lengths is performed. The traditional method directly transforms the sequences with different lengths by using the mixed-radix FFT which is not appropriate to apply on the DSP processor only supplying radix-2 butterfly hardware especially. In this paper, we resample the arbitrary-length input sequence so that a 2n -point FFT can be applied by using the polyphase filter. Theoretical analysis demonstrates the equivalence of the proposed polyphasefilter-based FFT (PF-FFT) and the original DFT. Moreover, the PF-FFT also keeps low complexity and good fixed-point performance.

I. I NTRODUCTION With the rapid development of the wireless communication, a plenty of access schemes [1][2] are proposed, among which, Single-Carrier Frequency Division Multiple Access (SC-FDMA) is adopted as the multiple-access scheme in Long Term Evolution (LTE), aiming at achieving high data rate and low peak to average power ratio (PAPR) in the uplink transmission. Due to the allocation of user’s resources in the frequency domain, the time-domain data should firstly be transformed to the frequency domain by using Discrete Fourier Transform (DFT) the size of which, supporting 34 alternative lengths ranging from 12 to 1200 as specified in [3], is restricted to the multiples of 2, 3 and/or 5. The traditional approach to perform a variable-length DFT usually uses the mixed-radix FFT. The existing implementations include the software designs [4] and the hardware designs [5][6] all of which follow this way. Regardless of the implementation, using traditional approach means that the hardware circuit of the related butterfly operations must be supported. In hardware designs, the radix-2, 3, 5 hardware circuits are realized, which achieve a good compromise between the performance and the cost, but lack sufficient flexibility. If other radices are involved besides the radix-2, 3 or 5, the hardware has to be redesigned. When the software design is applied, it is difficult to achieve the data parallelization and rearrangement. As the bit-width of the vector unit is typically power of 2 in the popular DSP cores applied to LTE, such as TI’s TCI6616 [7], CEVA’s XC-323 [8] and Tensilica’s BBE16 [9], which is difficult to be compatible with the factor 3 and 5. The operations of the radix-3 and 5 butterflies can only be implemented by using the general addition and multiplication instructions since these DSP cores only provide the radix-2 and/or 4 butterfly instructions, which make the computational

978-1-4673-3122-7/13/$31.00 ©2013 IEEE

Modulation

Fig. 1.

Transform Precoding

RE mapping

SC-FDMA signal gen.

Block diagram for SC-FDMA baseband signal generation

complexity very high. Besides, the traditional method also confronts problems such as occupying extremely large storage space of the twiddle factors and the software codes. In this paper, we propose a polyphase-filter-based FFT (PFFFT) algorithm by attributing the variable-length DFT to the 2n -point FFT calculation, while keeping a high fixed-point performance. Compared to the traditional mixed-radix FFT, the PF-FFT first converts an arbitrary sequence to a 2n -length one by resampling. Then a 2n -point FFT is performed and finally the expected frequency samples are extracted from the output of FFT. The advantages of the proposed algorithm include that 1) it can be applied to the DFT of a data sequence with any length even which is prime; 2) since computing radix-3 and 5 butterflies is no necessary, the algorithm can readily be implemented on the popular DSP cores efficiently which only support radix-2 and/or 4 butterflies; 3) the storage space of the twiddle factors of the proposed algorithm is remarkably small. II. BACKGROUND A. The variable-length DFT in LTE uplink The generation of SC-FDMA baseband signal in LTE uplink is shown in Fig. 1. The input bits are modulated and transformed into the frequency domain by transform precoding, substantially DFT. Then the complex symbols are mapped on the resource elements. Finally, an Inverse FFT is performed to generate the SC-FDMA baseband signal. Before performing DFT, the complex-valued symbols x(0), x(1), . . . , x(Msymb − 1) are divided into Msymb /N sets, each corresponding to one SC-FDMA symbol, where Msymb stands for the number of modulated symbols and N represents the number of sub-carriers occupied by the Physical Uplink Shared CHannel (PUSCH). ∑N −1The DFT of the SC-FDMA symbol is defined as y(k) = n−0 x(n)WNnk , k = 0, · · · , N −1 with WNnk , e−j2πnk/N , which can also be expressed by the matrix form as

3364

y = FN · x

(1)

Resampling

x

and the inverse FFT (IFFT) matrix FLH is the conjugate transpose of FL . The lower-pass matrix Λ is an L×L diagonal matrix with the first ⌈N/2⌉ and the last ⌊N/2⌋ elements on the diagonal being 1 and others being zeros, i.e.

Low-pass filter

Interpolation

L-pt FFT

U

FL

LP Matrix

L-pt IFFT

Decimation

FLH

T

x

y

n

Extraction

2 -pt FFT

E

FM

Λ , diag[1, · · · , 1, 0, · · · , 0, 1, · · · , 1]L×L . | {z } | {z } ⌈N/2⌉

Fig. 2.

(6)

⌊N/2⌋

The structure of the proposed PF-FFT algorithm

[where ]x and y are N × 1 data vectors respectively, and FN , WNnk N ×N is an N × N Fourier matrix. The data length N should fulfill [3] N = 2α2 × 3α3 × 5α5 ≤ 1200

B. The mixed-radix FFT The traditional approach applies the mixed-radix FFT, which is rooted in the Cooley-Tukey algorithm [10]. Based on the principle of divide and conquer, the Cooley-Tukey algorithm recursively re-expresses the length of data sequence as N = N1 × N2 where N1 and N2 are the so-called radices. As a result, the indices n and k can be represented as n = n2 N1 + n1 and k = k1 N2 + k2 (ni , ki = 0, · · · , Ni − 1), respectively. Therefore, we have N∑ 1 −1

y 1 (n1 , k2 )WNn11 k1

(3)

n1 =0

∑N −1 where y 1 (n1 , k2 ) = WNn1 k2 n22=0 x(n2 N1 + n1 )WNn22 k2 . The decomposition will not finish until N1 can not be factorized recursively in the same principle any more. According to this method, the butterflies must be designed to satisfy different kinds of the radices in the hardware of either Application-specific integrated circuit (ASIC) or digital signal processor (DSP). In LTE, it is implied that radix-2, 3, 5 butterflies should be realized as shown in Eq. (2). III. A P OLYPHASE - FILTER - BASED FFT The proposed PF-FFT algorithm shown in Fig. 2 consists of three stages, the resampling, the 2n -point FFT and the extraction. In the resampling stage, the N × 1 complex data vector ˜ with M = 2n by x is converted to a M × 1 vector x interpolation, lower-pass filtering and decimation. In detail, the L × N interpolation matrix U inserts (I − 1) zeros between two samples of x with L = N I and can be written by [ ]T (4) U , V1T V2T · · · VNT where the I × N submatrix Vn , [V (i, j)] (n = 1, · · · , N ) has only the element in the first row and the n-th column being one, i.e. V (1, n) = 1, and others being zeros. In lower-pass filtering, the L-point FFT matrix can be defined as [ ] FL , WLl1 l2 , l1 , l2 = 0, · · · , L − 1 (5) L×L

T ,

[

R1

R2

(2)

From (2), it is readily comprehensible that N has 34 possible values which are multiples of 2, 3 and/or 5.

y(k1 N2 + k2 ) =

In the decimation, the L × 1 data vector is downsampled with factor D to a M × 1 vector by an M × L decimation matrix T defined by ···

RM

]T M ×L

(7)

where the M × D submatrix Rm , [R(i, j)] has only the element in the m-th row and the first column being one, i.e. R(m, 1) = 1, and others being zeros. In the 2n -point FFT stage, the decimated data vector can be transformed to the frequency domain by radix-2/4 butterflies. The M -point FFT matrix is represented by m1 m2 ]M ×M , FM , [WM

m1 , m2 = 0, . . . , M − 1.

(8)

In the extraction stage, only the first ⌈N/2⌉ and the last ⌊N/2⌋ elements of the input M -length data vector is reserved, that is, the extraction matrix E can be expressed as [ ] I⌈N/2⌉×⌈N/2⌉ 0⌈N/2⌉×(M −⌈N/2⌉) E= (9) 0⌊N/2⌋×(M −⌊N/2⌋) I⌊N/2⌋×⌊N/2⌋ N ×M where I is an identity matrix. Based on the definitions from Eq. (4) to (9), the relationship between the input data vector x and the output vector y can be written by y=

1 EFM T FLH ΛFL U · x M

(10)

From the system models (1) and (10), we have the following equivalence. Theorem 1: The expressions (1) and (10) are equivalent, i.e. 1 EFM T FLH ΛFL U = FN . M

(11)

Proof: See the Appendix. It is illustrated in Theorem 1 that the above three-stage operations are exactly the N -point DFT. In the rest of this papers, we discuss the implementation of the proposed algorithm in LTE uplink. IV. I MPLEMENTATION OF DFT IN LTE U PLINK The implementation of the 2n -point FFT and the extraction in Fig. 2 can be achieved conveniently in both software and hardware designs. Therefore, we only pay attention to the implementation of the resampling stage in this paper.

3365

x n

Z-1

Z-1

number of coefficients in each group. Note that the value of hs takes great effect on the data accuracy and also the storage space. In this paper, we set hs = 16. 4) Passband ripple δp (or Ap ) To make ensure the performance of the designed filter does not become the bottleneck of the performance of the whole system, we define √ passband ripple as Ap = −80dB. Thus, we have δp = 10Ap /10 = 0.0001. 5) Passband edge frequency ωp and stopband edge frequency ωs (Normalized) These two parameters determine the width of the transition band of LPF, the values of which should prevent aliasing (See Fig. 4). In other words, the choice of ωp should ensure that the frequency samples of the original sequence are all in the passband, i.e.

Z-1

y n

Fig. 3.

The structure of Polyphase filter

A. Simplified lower-pass FIR filer using Polyphase filtering From Eq. (10), the resampling processing can be expressed by matrix multiplication of T FLH ΛFL U involving much computational load and is unpractical. Observing closely, we assert that FLH ΛFL is an ideal lower-pass filter (LPF) and can be approximated by an equiripple FIR filter with relatively low order. Furthermore, since x is inserted (I − 1) zeros by U before the FIR filter and (D − 1) data samples are discarded by T after the FIR filter, it is no necessary to compute these inserted zeros and discarded data samples in FIR filtering. This simplification leads to much less multiplies and additions in the resampling stage. In essence, the resampling belongs to a sort of multi-rate digital signal processing with a fractional rate M/N which can be typically implemented by the Polyphase filter [11], [12]. The structure of the Polyphase filter is shown in Fig. 3. Compared to the ordinary filtering structure, Polyphase filter has an extra input control switch and a filter coefficient memory. The switch follows certain rules to feed the data into the filter. The switching rules come from the interpolation factor I and the decimation factor D. The coefficients of the lower-pass FIR filter are divided into I groups and stored in the memory. To calculate an output value of the filter, a certain group of coefficients is selected in filtering using the selecting rules determined by I and D. In other words, I and D identifies the combinations of filter coefficients. B. High-precision fixed-point LPF design Since we employ the FIR filter to approximate the ideal LPF, the precision degradation is inevitable especially in fixed-point digital signal processing systems. Hence, another issue of LPF design is to control the mean square error (MSE) between the fixed-point LPF and the float-point LPF in a limited range. In this paper, we specify that the coefficients of FIR filter should assure the MSE to be less than -70dB. In the following, we gives some prompts on the calculation of filter parameters. 1) Length of resampled sequence M The length of resampled sequence, M , is the minimum integer which satisfies N < M = 2n . 2) Resampling rate fs The resampling rate is defined as fs = M/N = I/D. 3) Order of LPF ht The order of the filter is ht = I × hs , where hs is the

1 I ·N = ωp N

⇒

ωp =

1 , I

and the stopband edge frequency should be set at the edge of the negative frequency component to prevent aliasing, i.e. 1 ωs +ωp 2

=

I ·N 2 1 ⇒ ωs = − . M D I

6) Stopband ripple δs (or As ) In Ref. [13] and [14], the stopband ripple of the equiripple FIR filter is described by δs = δp 10−

2.324ht (ωs −ωp )+13 10

As = 20 log10

(

δs 1 + δp

)

(12) (13)

The stopband ripple in dB can be calculated according to Eq. (12) and (13). If As is less than -70dB, the objective of the filter design achieves. Otherwise, the data length after resampling M should be adjusted and the above parameters must be redesigned. 7) Design of Filter coefficients If the design objective is satisfied, the coefficient of equiripple FIR filter can be acquired by using ParksMcClellan algorithm [15], [16], the details of which is out of the scope of this paper. For all possible lengths of DFT in LTE uplink, we can finally obtain the parameters of the lower-pass FIR filter in Table I. Every resampling rate corresponds to a set of coefficients. After grouping each set of the filter coefficients and defining the control rules for these groups, we finally obtain the polyphase filter. Of particular note is that since the high-order FIR filter can be replaced by a series of cascaded low-order FIR filters, only the coefficients of low-order FIR filters need to be stored, which can further save the hardware memory space. For example, assuming a DFT with the length being N = 972 is considered, we resample the data vector to be the length M = 2048 with the resample rate of fs = 512/243. This objective results an FIR filter with its order ht being

3366

TABLE II T HE D ESIGNED F ILTER C OEFFICIENT S CHEME IN LTE

|X(m)|

fs 1.3333 1.7778 2.1333 1.2800 1.4222 1.5802 1.8963

N |X(m)|

Aliasing region

M (=I·N/D)

I 4 16 32 32 64 128 256

D 3 9 15 25 45 81 135

|H(ej )| |X(m)|

Ă p

N

s

I·N (normalized)

M 1

Fig. 4. Schematic diagram of the frequency-domain samples: (a) the frequency-domain samples before resampling; (b) the frequency-domain samples after resampling; (c) the frequency-domain samples after upsampling by an integer factor and the frequency response of the related FIR filter.

T HE PARAMETER

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

N 12 24 36 48 60 72 96 108 120 144 180 192 216 240 288 300 324

M 16 32 64 64 128 128 128 256 256 256 256 256 512 512 512 512 512

I 4 4 16 4 32 16 4 64 32 16 64 4 64 32 16 128 128

additions, respectively. Therefore, the total number of multiplications and additions of the filter is M ht and M (ht − 1). On the other hand, assuming the 2n -point FFT adopts radix-2 butterfly which employs 1 multiplication and 2 additions, as shown in Eq. (14) X(k) =

18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

N 360 384 432 480 540 576 600 648 720 768 864 900 960 972 1080 1152 1200

M 512 512 1024 1024 1024 1024 1024 1024 1024 1024 2048 2048 2048 2048 2048 2048 2048

I 64 4 64 32 256 16 128 128 64 4 64 512 32 512 256 16 128

x(n)WNnk

n=0 N/2−1

TABLE I TABLE OF E ACH DFT L ENGTH D 3 3 9 3 15 9 3 27 15 9 45 3 27 15 9 75 81

N −1 ∑

=

∑

∑

N/2−1

x(2l)WN2lk +

l=0 N/2−1

D 45 3 27 15 135 9 75 81 45 3 27 225 15 243 135 9 75

=

∑

(2l+1)k

x(2l + 1)WN

l=0

∑

N/2−1 (e)

x

lk (l)WN/2

+

WNk

l=0

lk x(o) (l)WN/2

l=0

= X (e) (k) + WNk X (o) (k)

(14)

where X (e) (k) = DFTN/2 {x(e) (n)} and X (o) (k) = DFTN/2 {x(o) (n)}. Thus, for a sequence of length M , the number of multiplications and additions of the 2n -point FFT is M 2 log2 M and M log2 M , respectively. To summarize, the computational complexity of PF-FFT can be expressed by the total number of multiplications Cmul and that of additions Cadd as M log2 M + M · ht 2 = M log2 M + M · ht

Cmul =

(15)

Cadd

(16)

B. Space Complexity proportional to I = 512. Such high-order FIR filter can be decomposed to two cascaded FIR filters, one is of the resampling rate fs = 4/3 and the other fs = 128/81. The coefficients of these two filters can be reused with other FIR filter designs such as N = 12 and N = 324. The optimized types of coefficients adopted by the proposed PF-FFT in practice are shown in Table II. V. T HE E VALUATION OF THE P ROPOSED A LGORITHM In this section, we evaluate the proposed PF-FFT in the complexity and the fixed-point performance briefly. A. Computational Complexity The PF-FFT consists of two parts: a polyphase filter and the 2n -point FFT. As the order of the filter is ht , the calculation of each output needs to operate ht multiplications and ht − 1

In the PF-FFT algorithm, the coefficients of polyphase filter and the twiddle factors of the 2n -point FFT require to be stored in the memory in advance. For the twiddle factors, a 2n -point FFT with the length of M needs the M twiddle factors. As we adopt the bit-width of the real part and the imaginary part of a twiddle factor as 16-bit respectively, namely 4 byte in total, the twiddle factor in byte needed for a 2n -point FFT is T = 4M

(17)

For filter coefficients, each coefficient is assumed to be 16-bit, occupying 2 byte. As the filters coefficients have a symmetric character, only half of them need to be stored. The storage space occupied in byte by each set of filter coefficients in Table II is F = I · hs . (18)

3367

TABLE III T HE E VALUATION R ESULT OF THE P ROPOSED PF-FFT A LGORITHM

Multiplications Additions Data Memory Fixed-point Performance (DFT length: 972)

k+2/M

k owing to WM + WM matrix S is in form of [ S = IM

PF-FFT log2 M + M · hs M log2 M + M · hs 15.94KB+8.31KB M 2

C. Fixed-point Performance We use MATLAB to simulate and evaluate the fixed-point performance of PF-FFT. The data bit-width is 32 bits (the real part and the imaginary part are 16 bits respectively). For instance, the mean-square error (MSE) performance of DFT length 972 is -72.58dB. VI. C ONCLUSION In this paper, a polyphase-filter-based FFT is proposed to calculate the variable-length DFT in LTE uplink. The equivalence of the PF-FFT and original DFT is proved firstly. Then the implementation of the proposed PF-FFT in LTE uplink is demonstrated. At last, the computational and space complexity of the proposed PF-FFT algorithm are both evaluated. The simulation result shows that the proposed PF-FFT achieves a high fixed-point performance. ACKNOWLEDGMENT This work was funded by National Science and Technology Major project (2011ZX03003-003-02) and National Science and Technology Major project (2012ZX03001007-004).

,S

(19)

,P

FLH

Since the effect of T on is that one of every D elements is taken from every column of FLH , the element s(i, j) of S for j = i + kM (k = 0, · · · , D − 1) can be calculated as s(i, j) =

M −1 1 ∑ im −j(Dm) W W M m=0 M L

=

M −1 1 ∑ im −iDm −Lkm W W WL M m=0 M L

=

M −1 1 ∑ im −im W W =1 M m=0 M M

(20)

by noting that L = DM . For j ̸= i + kM , we have s(i, j) =

=

M −1 1 ∑ im −j(DM ) W W M m=0 M L M −1 1 ∑ (i−j)m W =0 M m=0 M

IM

···

IM

] M ×L

(22)

where IM is an M × M identity matrix. i Let the vector wL be the i-th column of the Fourier matrix [ ]T (L−1)i i FL , i.e. wL = WL0i , WL1i , · · · , WL . We have

-72.58dB

A PPENDIX P ROOF OF T HEOREM 1 The equation (10) can be rewritten as (1 ) y=E· M FM T FLH · (ΛFL U ) ·x | {z } | {z }

= 0. From Eq. (20) and (21), the

P = Λ · FL U [ ] (N −1)I 0 I = Λ · wL wL · · · wL [ 0 ] L×N N −1 1 = Λ · wN wN · · · wN L×N [ T ] T T = Ff 0 Fl

(23)

by using L = N I, where Ff is the first ⌈N/2⌉ rows of FN . Note that the (i, j)-th element of Fl can be derived as i(L−j)

Fl (i, j) = WN

i(N −j)

= WNiL−ij+iN = WN

(24)

which is exactly the last ⌊N/2⌋ rows of FN . Thus, it is readily to see that ESP = FN holds by substituting Eq. (9), (22) and (23) into (19). R EFERENCES [1] Y. Zhou, J. Wang, and M. Sawahashi, “Downlink transmission of broadband ofcdm systems-part i: Hybrid detection,” Communications, IEEE Transactions on, vol. 53, no. 4, pp. 718–729, 2005. [2] Y. Zhou, T. Ng, J. Wang, K. Higuchi, and M. Sawahashi, “Ofcdm: A promising broadband wireless access technique,” Communications Magazine, IEEE, vol. 46, no. 3, pp. 38–49, 2008. [3] 3GPP Technical Specification 36.211 v8.9.0, ”Evolved Universal Terrestrial Radio Access (E-UTRA); Physical Channels and Modulation (Release 8)”, 2008. [4] B. Beheshti, “On performance of LTE UE DFT and FFT implementations in flexible software based baseband processors,” in Systems, Applications and Technology Conference, 2009. LISAT’09. IEEE Long Island. IEEE, 2009, pp. 1–4. [5] DFT/IDFT Reference Design, Altera Corporation, 2007. [6] Discrete Fourier Transform v3.0, Xilinx Inc., 2008. [7] TMS320TCI6616 Data Manual, Texas Instruments, 2012. [8] J. Byrne, “CEVA Trains DSP Guns on TI,” Microprocessor Report, 2010. [9] C. Rowen, P. Nuth, and S. Fiske, “A DSP architecture optimized for wireless baseband,” in System-on-Chip, 2009. SOC 2009. International Symposium on. IEEE, 2009, pp. 151–156. [10] J. Cooley and J. Tukey, “An algorithm for the machine calculation of complex fourier series,” Mathematics of computation, vol. 19, no. 90, pp. 297–301, 1965. [11] R. Lyons, Understanding digital signal processing. Pearson Education, 2010, ch. 10. [12] D. Barker, “Efficient Resampling Implementations [DSP Tips & Tricks],” Signal Processing Magazine, IEEE, vol. 25, no. 4, pp. 114–117, 2008. [13] O. Herrmann, L. Rabiner, and D. Chan, “Practical design rules for optimum finite impulse response low-pass digital filters,” Bell Syst. Tech. J, vol. 52, no. 6, pp. 769–799, 1973. [14] J. Kaiser, “Nonrecursive digital filter design using the I0-sinh window function,” in Proc. IEEE Int. Symp. Circuits Syst, vol. 3, 1974, pp. 20– 23. [15] A. Oppenheim, R. Schafer, and B. J. R., Discrete-time signal processing. Prentice Hall, 1989, ch. 7. [16] T. Parks and J. McClellan, “Chebyshev approximation for nonrecursive digital filters with linear phase,” Circuit Theory, IEEE Transactions on, vol. 19, no. 2, pp. 189–194, 1972.

(21)

3368