(255, 239) Reed-Solomon Decoder Using a Simplified ... - IEEE Xplore

2013 IEEE 17th International Symposium on Consumer Electronics (ISCE)

Design of a (255, 239) Reed-Solomon Decoder Using a Simplified Step-by-Step Algorithm Yu-Shan Su1, Chu Yu1, Bor-Shing Lin2, Po-Hsun Cheng3, and Sao-Jie Chen4 1 Department of Electronic Engineering, National ILan University, Yilan, Taiwan, R.O.C. 2 Department of Computer Science and Information Engineering, National Taipei University, Taipei, Taiwan, R.O.C. 3 Department of Software Engineering, National Kaohsiung Normal University, Kaohsiung, Taiwan, R.O.C. 4 Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan, R.O.C. Abstract ⎯ This paper presents the design of a new (255, 239) Reed-Solomon (RS) decoder using a simplified step-by-step algorithm. For calculating the syndrome determinant in the RS decoder, the proposed architecture performs Gauss elimination on a 1-D systolic array that has lower hardware complexity and is more suited to be used on a higher-dimension matrix with the step-by-step algorithm. The proposed architecture, designed in 0.18 μm CMOS technology, has approximately 25K gates, and consumes approximately 40 mW at 250 MHz.

I. INTRODUCTION Forward error correction (FEC) has been widely applied in telecommunications, information theory, and coding theory. The Reed-Solomon (RS) code, a widely-used FEC, provides excellent capability of recovery from random and bust errors in many systems, such as storage devices, wireless communications, digital video broadcasting (DVB), and satellite communications. The well-known decoding method used in RS decodes is based on an algebraic algorithm, such as the BerlekampMassey and Euclidean algorithms, which applies some key equations to find the error location polynomial from the syndrome values and then to determine the error values. On the other hand, the step-by-step algorithm proposed in [1]-[2] adopted another decoding procedure, which can simultaneously determine the error locations and their corresponding error values by calculating on a syndrome matrix for every received symbol. Based on this feature, it can be used as a high-throughput parallel RS decoder because the order of the detected symbols does not affect the decoding result. To further reduce the complexity of the step-by-step algorithm, a simplified step-by-step algorithm was proposed [3], which is suited for VLSI hardware implementation. In this study, the proposed design is based on [3] for (255, 239) RS decoding. However, the high-dimension syndrome matrix used in [3] is difficult to calculate. Hence, to reduce the computation complexity, systolic Gauss elimination with partial pivoting has been proposed [4]. To save the chip area, a 1-D systolic array for Gauss elimination [5] is used in the proposed design.

II. SIMPLIFIED STEP-BY-STEP ALGORITHM To reduce the computation complexity, the simplified step-by-step algorithm proposed in [3] used the following temporarily-changed syndrome matrix v (1) det[ N ( β , j )] = det ( N 0 ) + β • α (2 x −1) j det ( N xx ) , v

v

v

x =1

where j is the jth symbol of the received codeword, v denotes the error number, α is a primitive element in GF(2m) and det(Nvxx) is the determinant of a sub-matrix of the syndrome matrix Nv0 obtained by deleting the xth row and xth column of Nv0. Both det(Nv0) and det(Nvxx) can be calculated before adding all the 2m-1 possible nonzero primitive elements, which reduce more operations in the decoding procedure than that of [1]. After computing all the determinants, Hv, j can be found as v xx (2) H = α ( 2 x −1) j det (N ) v, j

v

x =1

Finally, the corresponding error value can be calculated as = det(Nv0)/Hv,j. Since the detection of det[Nv(,j)] finds a number of v errors only, the classic step-by-step algorithm needs to compute det[Nv+1(,j)] in order to verify whether the received symbol has (v+1) errors or not. However, based on (2), the simplified algorithm only needs to calculate Hv+1,j instead of det[Nv+1(,j)], which reduces the computation complexity much more than [1]. III. PROPOSED ARCHITECTURE Based on the above simplified step-by-step algorithm, a new (255, 239) RS decoding architecture is proposed as shown in Fig. 1. The proposed architecture is composed of a syndrome calculator (SC), a determinant calculator (DC), an error number detector (END), a determinant accumulation & summation circuit (DASC), an error correction circuit (ECC), and a FIFO buffer. More detailed functions of these modules will be described in the following paragraphs. det(Nvxx) V

Input

Syndrome Calculator

Si 0

Determinant Calculator

Delta Accumulation & Summation

Error Number det(Nv) Detector

Hv,j det(Nv0 ) det(Nt+10)

det(Nv+1xx)

Delta Accumulation & Summation

Error Correct

Hv+1,j Error Value

Output

512x8 FIFO Register File

This work was supported by the National Chip Implementation Center in Taiwan and the National Science Council, R.O.C., under Grants NSC101-2220-E-197-001.

978-1-4673-6199-6/13/$31.00 ©2013 IEEE

247

Fig. 1 Proposed (255, 239) RS decoder.

2013 IEEE 17th International Symposium on Consumer Electronics (ISCE)

Hv+1,j in parallel for the final stage which is an error correction circuit (ECC) as shown in Fig. 5(c). The ECC calculates = det[Nv0]/Hv,j using a ROM with inverse elements and a multiplier instead of a division operation. According to [3], this circuit is also required to compute Hv+1,j and det[Nt+1 (,j)] which can detect whether the error value is or zero.

Triangulating the syndrome matrix by Gauss elimination with partial pivoting [4] can simplify the procedure of determinant calculation. Based on the foregoing method, the architecture of a 1D systolic array for Gauss elimination (1DSAGE) [5] is shown in Fig. 2. This architecture consists mainly of two kinds of cells: a round cell and a square cell. The round cell stores the input data in the register ‘axis’ to find the pivot at the first time, and then sends the OP signal to square cell according to the conditions as described in [4]. This cell is also to divide the next input by the output of axis. Meanwhile, the result of the divider is sent to the square cell, and then is multiplied by the first input data of the square cell. In addition, since the proposed design adopts 1-D systolic array to reduce the hardware complexity and power consumption, it introduces a higher latency which is 2 +…+ (t-1) + t = 35 clocks for a t×t matrix. After finding out the first pivot, the results of all the square cells are stored in the buffer and will be shifted back to the round cell to find the next pivot. 8

8

8

mode

8

8

S7

S8

rowout

Dout

(2x-1)

D

3

…

det(Nv77)

11

det(Nv 88)

13

det(Nv99)

15

v

(b) MUX

(a) Hv, j

init ain

ROM

OP out Dout

rowout

(c)

(a)

det(Nv22)

…

D

det(Nt+1 0 ) D D

…

…

…

…

…

…

Fig. 3 shows the DC which mainly uses a 1D-SAGE and a multiplier to compute the determinant of the syndrome matrix. Based on (1), the 1D-SAGE is repeated v+1 operation times for calculating det(Nv0), det(Nv+10), det(Nvxx) and det(Nv+1xx). However, the latency of performing the foregoing four det(⋅) calculations is still high, which spent 44+35*8=324 clock cycles. As det(Nvxx) is generated from det(Nv0), the DC can use register buffers to store the intermediate data of det(Nv0) for latency reduction. Based on this strategy, the operation cycles can be reduced within one code length. After computing the determinant, the END is initiated to detect det(Nv0) for determining the error number v and storing the values of all determinants from the DC. Such a circuit design is illustrated in Fig. 4. MUX

MUX

MUX

MUX

MUX

D

det(Nv0)

D

Hv+1, j

D

Delta_o

0

Fig. 3 Circuit diagram of the determinant calculator (DC).

Fig. 5(a) shows the accumulation circuit. According to (2), det(Nvxx) is multiplied by (2x-1)j, and then is accumulated with the foregoing results for each of symbols. Since (2x-1) is a constant, the proposed design uses a constant-variable multiplier to reduce its complexity. Fig. 5(b) shows the DASC which is composed of v+1 multipliers, v adders and one (v+1)-to-1 multiplexer for computing Hv+1,j. Note that the proposed design needs a pair of DASCs to calculate Hv,j and

0

det[Nt+1 (, j)]

D D

Fig. 2 Structures of (a) 1-D systolic array for Gauss elimination, (b) the round cell, and (c) the square cell.

1-D Systolic Array for Gauss Elimination

1 to 8

det(Nvxx)

OPin Din

Registers

Fig. 4 Circuit diagram of the error number detector (END).

…

…

Syndrome

4

1

S6

v

D MUX

det(Nvxx)

64

axis

D

0

…

4

Save value of determinant

cmp

cmp

0

1

0

1

0

S2

ROM

D

1

det(Nv11)

axis

S1

8

ain

init

r1

4

init 1

0

1

0

1

0

1

…

Zero checker

…

8

Error number detect 8

(c)

m u x

m u x

Hv, j

(b) err_val 8

err_check

V=8?

Fig. 5 Circuit diagrams of (a) the accumulation circuit, (b) the determinant accumulation and summation circuit, and (c) the error correction circuit.

IV. CONCLUSION This paper presents the design of a (255,239) RS decoder using a simplified step-by-step algorithm. Unlike those previous works which were all based on an algebraic algorithm, this work first showed a complete hardware implementation of RS decoder using the simplified step-bystep algorithm, which adopts a low-complexity 1D-systolic Gauss eliminator that is suited for the calculation in a highdimension syndrome matrix. Based on our design, a parallel RS decoder with high-throughput can further be constructed for supporting high-speed optical communications. REFERENCES [1] T. C. Chen, C. H. Wei, and S. W. Wei, “Step-by-step decoding algorithm for Reed–Solomon codes,” Proc. IEE Commun., vol. 147, no. 1, pp. 8-12, 2000. [2] T. C. Chen, C. H. Wei, and S. W. Wei, “A pipeline structure for highspeed step-by-step RS decoding,” IEICE Trans. Commun, vol. E86-B, no. 2, pp. 847-849, Feb. 2003. [3] X. Liu, C. Lu, T. H. Cheng, and S. N Koh, “A simplified step-by-step decoding algorithm for parallel decoding of Reed–Solomon codes,” IEEE Trans. Commun., vol. 55, no. 6, pp. 1103–1109, Jun. 2007. [4] B. Hochet, P. Quinton, and Y. Robert, “Systolic Gaussian elimination over GF(p) with partial pivoting,” IEEE Trans. Comput., vol. 38, pp. 1321–1324, Sep. 1989. [5] L. L. Lin, VLSI Implementation for finding error locator polynomials in the decoding of Reed-Solomon codes and Hermitian codes, Master Thesis, National Tsing Hua University, 1998.

248