Turbo and Turbo-Like Codes: Principles and ... - IEEE Xplore

4 downloads 0 Views 3MB Size Report
turbo and turbo-like codes have eclipsed classical methods. These powerful error-correcting techniques achieve excellent error-rate performance that can ...
INVITED PAPER

Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications Tremendous progress in reducing turbo code computational complexity, memory requirements and performance limitations is leading to their wide use in commercial communications systems. By Ken Gracie and Marie-He´ le` ne Hamon

ABSTRACT

| For decades, the de facto

standard for forward

error correction was a convolutional code decoded with the Viterbi algorithm, often concatenated with another code (e.g., a Reed–Solomon code). But since the introduction of turbo codes in 1993, much more powerful codes referred to collectively as turbo and turbo-like codes have eclipsed classical methods. These powerful error-correcting techniques achieve excellent error-rate performance that can closely approach Shannon’s channel capacity limit. The lure of these large coding gains has resulted in their incorporation into a widening array of telecommunications standards and systems. This paper will briefly characterize turbo and turbo-like codes, examine their implications for physical layer system design, and discuss standards and systems where they are being used. The emphasis will be on telecommunications applications, particularly wireless, though others are mentioned. Some thoughts on the use of turbo and turbo-like codes in the future will also be given. KEYWORDS

|

Composite codes; design and implementation;

iterative decoding; low-density parity-check (LDPC) codes; turbo codes; turbo product codes

I. INTRODUCTION Channel coding, or forward error correction (FEC), introduces redundancy into a data-bearing signal so that errors incurred during transmission can be corrected at the receiver. The practical value of FEC for communications Manuscript received August 24, 2006; revised December 20, 2006. K. Gracie is with the Communications Research Centre (CRC) Ottawa, ON K2H 8S2, Canada (e-mail: [email protected]). M.-H. Hamon is with France Telecom Research and Development, 35512 Cesson´vigne ´ Cedex, France (e-mail: [email protected]). Se Digital Object Identifier: 10.1109/JPROC.2007.895197

1228

Proceedings of the IEEE | Vol. 95, No. 6, June 2007

systems is clear: it allows a system to convey information more reliably than is possible otherwise. The ultimate limit on the performance improvement that is possible with FEC was defined in the 1940’s by Claude Shannon [1], [2], but this landmark work gave no indication of how to construct good practical codes. The achievement of the Shannon channel capacity limit has been the goal of channel coding theorists ever since. Succeeding decades saw the proposal of many FEC coding schemes in pursuit of this goal. Well-known examples include algebraic codes (e.g., Reed–Solomon codes), convolutional codes, product codes, concatenated codes, and, long before their significance was realized, low-density parity-check (LDPC) codes [3]–[5]. Convolutional codes with modest constraint lengths (e.g., seven or nine) and decoded with the Viterbi algorithm became particularly popular, either on their own or as part of a concatenated code. Offering efficient encoding and decoding, reasonable memory requirements, and good performance, these codes became a de facto industry standard for communications systems. However, the performance of these practical convolutional codes is still a long way from the Shannon limit. Research into stronger codes that might ultimately be used in real implementations continued. In the 1980s, Tanner [6] introduced coding on graphs and Battail [7] proposed using soft-output decoding with product codes. In the early 1990s, Lodge et al., [8], [9] and Berrou et al., [10], [11] began investigating the soft iterative decoding of composite codes, the former with product codes and the latter with concatenated convolutional codes. These efforts yielded substantial performance improvements relative to more traditional methods. However, it was Berrou’s Bturbo codes[ that generated the most interest. Performance that was only 0.7 dB from Shannon’s channel capacity limit at a 0018-9219/$25.00 Ó 2007 IEEE

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

bit-error rate (BER) of 105 on an additive white Gaussian noise (AWGN) channel was achieved. Furthermore, this impressive performance was achieved with much lower computational complexity than a convolutional code with equivalent performance. This breakthrough in complexity was initially greeted with skepticism, but once confirmed it touched off intense efforts to investigate turbo codes and characterize their performance. Lodge’s early work on product codes was followed by others (e.g., [12] and [13]), and there was a strong resurgence of interest in Gallager’s LDPC codes starting in the mid-1990s [14]–[17]. Despite their tremendous performance, such turbo and Bturbo-like[ codes were only of interest to industry if they could be implemented effectively. The objections were many, including high computational complexity and high latency relative to standard coding methods and error Bfloors[ or Bflares.[ Today, less than 15 years after the publication of the initial turbo code result, there has been phenomenal progress in dealing with these obstacles, and a wide range of practical FEC solutions using turbo and turbo-like codes are in use. BTurbo FEC[ is being added to an increasing array of applications as well as to more and more telecommunications standards. Their impact has been such that even the business community and popular press are dimly aware of these extraordinary errorcorrection techniques [18]–[20]. This paper explores the principles and applications of turbo and turbo-like codes in four parts. The first (Section II) consists of an overview of the codes themselves and their general characteristics. Several broad categories of turbo and turbo-like codes are described and characterized, namely turbo codes, turbo product codes (TPCs), LDPC codes, and other Bturbo-like[ codes. Technical summaries for each category are given, along with example error-rate performance results for selected codes that have been adopted in major standards. Numerous challenges to the practical use of turbo FEC in physical layer systems are then discussed (Section III), for example computational complexity, memory requirements, and flare performance. This is followed by a survey of telecommunications applications of turbo and turbo-like codes, focussed mainly on wireless. Some additional applications are mentioned but not discussed in detail. Standards that are known to include turbo FEC are identified, including those for third generation (3G) terrestrial wireless, digital broadcast, and wireless networking (Section IV). Some additional applications of turbo and turbo-like codes are then discussed, followed by a list of companies known to be supporting or developing commercial turbo FEC technology (Section V). The goals of these latter sections are to show how numerous standardized codes have resolved the practical challenges outlined previously and to illustrate the strong and growing level of industrial interest in these techniques. Finally, a number of trends in the use and development of turbo and turbo-like codes will be presented (Section VI). Section VII concludes the paper.

II. TURB O AND TURBO-LIKE ERROR- CORRECT ING CODES Though a generic definition of turbo and turbo-like codes is difficult to formulate, they typically have the following characteristics. 1) Composite structureVThe information bits are encoded with multiple low-complexity constituent or component codes. Each constituent code may or may not involve all of the information bits. 2) InterleavingVThere is some reordering or permutation of the information bits between encodings. 3) Soft iterative decodingVThe component codes are decoded multiple times and soft reliability estimates derived from each code are used to improve the decoding of the other component codes. That it is possible to construct good codes from other codes has been known for some time (e.g., concatenated codes [21]). If done correctly, multiple encodings of the data and intelligent reordering of bits between these encodings result in coded data bits that are strongly interconnected with one another, producing very powerful codes. Optimal maximum-likelihood (ML) decoding of such powerful code structures is computationally intractable and therefore impractical. The application of iterative decoding techniques to these composite codes was a major breakthrough. Iterative decoders begin by decoding the constituent codes individually, either serially or in parallel, based on inputs derived from the channel and typically some a priori information (e.g., that bits are either zero or one with a probability of 0.5). The additional information about the data symbols derived from these decoding operations is then shared with the other constituent decoders and the process is repeated multiple times. That is, each constituent decoder improves its data estimates with the help of information produced by the other decoders, each building upon the results of the others to gradually enhance the data decisions. When the process is halted, estimates of the data bits are obtained by making hard decisions on the combination of the data estimates derived directly from the channel outputs and the information from all of the component codes. Another key element that delivers substantial performance gains is the use of so-called Bsoft[ information rather than hard decisions, i.e., real numbers whose values are measures of confidence in the associated data symbol estimates. Soft-input decoding alone is known to deliver significant performance improvements [22], and exchanging soft information during the iterative decoding process yields even larger gains. Iterative decoders therefore require soft-in, soft-out (SISO) algorithms as decoders for the constituent codes. Virtually all of the constituent decoding algorithms used for turbo and turbolike codes may be interpreted as realizations of a technique known as message passing or belief propagation (BP). Vol. 95, No. 6, June 2007 | Proceedings of the IEEE

1229

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

While Gallager [4] was apparently the first to propose it, this probabilistic inference technique was also independently introduced in the context of artificial intelligence [23], the connection being noted later [24]. BP passes messages locally in the code structure, i.e., updates to a given data symbol estimate are derived from other symbols that are Bnearby[ in the code structure. Information from each data symbol therefore propagates through the overall code structure over time. The messages or updates may be exchanged in different forms. In the probability domain, messages may be in the form of likelihood ratios (LRs) for each data bit dk , expressed as

‘k ¼

Y Probfdk ¼ 0jy; Cg ¼ ‘kn  ‘kei k Probfd ¼ 1jy; Cg i

(1)

where binary data symbols have been assumed, y refers to the decoder inputs, C refers to the code structure, ‘kn denotes the contribution due to the decoder inputs (the intrinsic information), ‘kei denotes the contribution due to the decoding of the constituent codes (the extrinsic information), and i denotes the number of constituent decoding operations. Often, the constituent decoders operate in the log domain, in which case the decoders exchange loglikelihood ratios (LLRs), expressed as

Lk ¼ ln

X Probfdk ¼ 0jy; Cg ¼ Lkn þ Lkei : k Probfd ¼ 1jy; Cg i

(2)

The remainder of this section describes the major categories of turbo FEC, namely turbo codes, turbo product codes (TPCs), LDPC codes, and turbo-like codes. These descriptions include constituent decoding methods and example performance results using standardized codes. A number of key terms and concepts are used throughout the discussion [5]. All of the codes described are binary codes since they operate on bits. A codeword is the output from a block FEC encoder, i.e., it is the result of encoding a finite-length information sequence. Each information sequence is mapped to a unique codeword by the encoder. Each codeword differs from the others by some number of bits; this is referred to as the Hamming distance, or simply distance, between codewords. The smallest distance between any two codewords is the minimum distance ðdmin Þ. Typically, it is desirable to have high distances between codewords.

A. Turbo Codes The term Bturbo code[ usually refers to the errorcorrecting codes proposed by Berrou et al. in 1993 [10]. Turbo codes involve the parallel concatenation of two recursive systematic convolutional (RSC) codes and are referred to as parallel concatenated convolutional codes (PCCCs), convolutional turbo codes (CTCs), or turbo convolutional codes (TCCs). Fig. 1 shows the PCCC encoder adopted by the 3G wireless standards. This structure may be extended to a higher number of constituent codes at the cost of increased complexity. The interleaver permutes or reorders the information stream before the second encoding, introducing code diversity. The natural or base

Fig. 1. An 8-state, single-binary turbo (i.e., PCCC) encoder with two identical RSC encoders arranged in parallel (m ¼ 3, polynomials ¼ ðFB; FFÞ ¼ ð13; 15Þ8 ). represents the interleaver. This encoder is used in the 3G wireless standards (W-CDMA, cdma2000, TD-SCDMA).

1230

Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

code rate for the code of Fig. 1 is 1/3. Higher code rates are obtained by puncturing, i.e., by removing either parity symbols or both parity and data symbols after the encoder and assigning these symbols low reliability values at the input to the decoder. Lower code rates may be obtained by adding feed-forward paths to the constituent encoders or additional constituent codes. Key elements affecting turbo code performance are the interleaver, the number of memory elements m in each constituent code, the encoder polynomials (i.e., the taps on the encoder delay line), whether or not the encoders are properly flushed or terminated, the puncturing pattern, and the fact that the constituent codes are recursive. Fig. 2 shows the structure of a turbo decoder. The Lei are vectors of soft extrinsic information derived from the parity and code structure, typically LLRs or approximate LLRs. The iterative decoding process begins with the first constituent decoder processing the original data estimates and the first set of parity estimates. The resulting extrinsic information ðLe1 Þ, either computed directly by the constituent decoder or obtained by subtracting the decoder input from the decoder output, is then passed to the second decoder. The second constituent decoder processes the original data estimates, the second set of parity estimates, and the extrinsics from the first decoder. A new set of extrinsic information ðLe2 Þ is then computed and passed to the first decoder, and the process repeats or iterates. Each decoder thus alternately builds upon the results of the other to gradually enhance the reliability of the decisions. Note that even if the original data estimates are independent, they lose this independence after the first constituent decoding operation (i.e., after the first halfiteration) because the extrinsics become correlated. To mitigate this correlation effect, each decoder ignores its own previous extrinsic information. For example, when the first decoder executes the second time, it processes the original data estimates, the first set of parity estimates, and

the extrinsic information from the second decoder but not the previous extrinsic from the first decoder. This significantly improves performance. The name Bturbo codes[ stems from the similarity between this iterative decoding process and a turbo-charged engine: the latter feeds back energy from the exhaust to improve engine performance, while the former feeds back information derived from the component codes to improve decoder performance. The optimal algorithm for each component code in terms of minimizing the probability of bit error given independent inputs is the Bahl–Cocke–Jelinek–Raviv (BCJR) algorithm [25], [26], i.e., it realizes the maximum a posteriori (MAP) criterion. Since the MAP criterion assumes hard decisions, a soft-output algorithm based on soft a posteriori probabilities is more accurately referred to as an a posteriori probability (APP) decoder. The BCJR algorithm is typically implemented in the log domain and is known as the log-APP algorithm. While the soft-output Viterbi algorithm (SOVA) has been used in the past [27], more common choices for simpler decoding algorithms are approximations to log-APP, specifically the linear log-APP, constant log-APP [28], max-log-APP [29], [30], and enhanced max-log-APP algorithms [31], [32]. These alternatives produce approximate LLRs and exchange degradations in error-rate performance for reduced complexity. For example, a max-log-APP decoder often displays a substantial loss in error-rate performance relative to log-APP decoding for an equal number of decoding iterations [30]. The enhanced max-log-APP algorithm is of particular interest. It exploits the observed fact that the extrinsic information produced by a max-log-APP decoder is optimistic. That is, a max-log-APP decoder exaggerates the reliability of the data estimates relative to those produced with log-APP decoding, particularly during the first few iterations. An enhanced max-log-APP decoder compensates for this by scaling back the extrinsic information

Fig. 2. A turbo decoder corresponding to the encoder of Fig. 1. denotes the interleaver.

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE

1231

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

Fig. 3. An 8-state double-binary convolutional turbo code. DVB-RCS, DVB-RCT, and 802.16 convolutional turbo codes all use slight variations on this encoder structure. denotes the interleaver.

produced by each max-log-APP decoding operation, where this scale factor SF may be constant or may vary as the turbo decoder iterates. Relative to log-APP decoding for 8-state turbo codes, this technique significantly lowers decoding complexity while achieving performance that is typically degraded by only about 0.1 dB. Scaling of the extrinsic information has also been applied to log-APP decoding [33]. The invention of turbo codes triggered a significant amount of research, leading to numerous enhancements and refinements of the original scheme. The design of the interleaver has been heavily investigated, yielding good permutations that substantially improve the distance properties that are achieved (e.g., [34] and [35]). Different values of m have been studiedVthe original turbo codes used m ¼ 4 (16-state), several current standards use m ¼ 3 (8-state), and some recent work has shown good performance with m ¼ 2 (4-state). Another interesting innovation is multibinary convolutional turbo codes [36], [37]. Classical turbo codes process a single bit per time step and are referred to as single-binary turbo codes. Multibinary turbo codes use multibinary convolutional codes as constituent codes, i.e., multiple bits are processed simultaneously. Turbo codes that process bit pairs, known as duo-binary or double-binary turbo codes (Fig. 3), perform well and have been adopted in several standards. The multibit input property may be used in the design of the code, e.g., a second level of interleaving may be added within the multibinary symbols. They also tend to be more robust as they are punctured to higher code rates. There has also been significant interest in partially systematic turbo codes (PSTCs), where higher code rates are achieved by puncturing both parity and data bits. This approach can substantially improve performance and has 1232

Proceedings of the IEEE | Vol. 95, No. 6, June 2007

been applied to 4-state [38]–[42], 8-state [43], [44], and 16-state [45], [46] codes. Figs. 4 and 5 show the BER and word error rate (WER) performance versus transmitted power expressed as the ratio of energy per information bit to the one-sided noise power spectral density ðEb =N0 Þ. Results are shown for

Fig. 4. BER performance of cdma2000 turbo code (8-state, single-binary) and DVB-RCS turbo code (8-state, double-binary) versus that of zero-flushed K ¼ 7 and K ¼ 9 convolutional codes and uncoded transmission. AWGN channel, binary antipodal signalling (i.e., BPSK), k ¼ 1504, 300 packet errors per point, nominal code rate of 1/2, actual code rates as indicated. The turbo codes used enhanced max-log-APP decoding (SF ¼ 0:75, I ¼ 16 iterations per packet) while the convolutional codes used soft-input Viterbi decoding.

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

clearly. Also shown in Fig. 6 is the performance achieved with an 8-state PSTC (i.e., an 8-state turbo code using data puncturing) that includes a well-designed dithered relative prime (DRP) interleaver [44]. This latter curve was generated with the software of [47]. The dramatic improvement in the flare region shows what is possible when using data puncturing with 8-state codes. Substantial improvements in waterfall performance are possible with 4-state PSTCs. Note that while the standard codes and the PSTC have different numbers of information bits (k ¼ 1504 and k ¼ 1498, respectively), the interleaver is of length 1504 in all cases.

Fig. 5. WER performance of cdma2000 turbo code (8-state, single-binary) and DVB-RCS turbo code (8-state, double-binary) versus that of zero-flushed K ¼ 7 and K ¼ 9 convolutional codes and uncoded transmission. AWGN channel, binary antipodal signalling (i.e., BPSK), k ¼ 1504, 300 packet errors per point, nominal code rate of 1/2, actual code rates as indicated. The turbo codes used enhanced max-log-APP decoding (SF ¼ 0:75, I ¼ 16 iterations per packet) while the convolutional codes used soft-input Viterbi decoding.

uncoded binary phase shift keying (BPSK) and several codes at a nominal code rate of 1/2: the cdma2000 8-state, single-binary turbo code, the digital video broadcasting– return channel satellite (DVB-RCS) 8-state, duo-binary turbo code, and zero-flushed (ZF) convolutional codes with constraint lengths K ¼ 7 and K ¼ 9. The turbo code curves correspond to 16 iterations of an enhanced max-logAPP decoder using a fixed extrinsic scale factor SF ¼ 0:75. The interleavers, trellis termination methods, and puncture masks are those specified in the standards. Many coding studies focus on the BER, though the WER is often of more interest to system designers, particularly where automatic-repeat-request (ARQ) is in use. Note that a Bword[ here is the information portion of an FEC codeword rather than a packet length associated with a higher level protocol. WER is also referred to as packet-error rate (PER) or frame-error rate (FER) in the literature. Typical of turbo and turbo-like codes, these results display three distinct regions. At very low signal-to-noise ratios (SNRs) (on the far left in the plots), the signal is so corrupted by channel noise that the decoder cannot improve the error rate and may even degrade it. As the SNR increases, a Bwaterfall[ region is encountered where the error rate drops sharply. As the SNR increases still further, a Bflare[ or Bfloor[ region is encountered where the curve becomes less steep, limiting the performance gains that are possible as the transmitted power is increased. This flare is primarily a function of the distance properties of the code. Fig. 6 shows the flares in the turbo codes of Fig. 5 more

B. Turbo Product Codes The serial concatenation of block codes separated by a structured permutation (either implicit or explicit) was introduced in the 1950s. Codes with this structure are referred to as product codes [48], [49]. The constituent codes may be any kind of block code (including terminated convolutional codes), but common choices include Bose– Chaudhuri–Hochquenghem (BCH) codes, extended Hamming codes, or simple parity checks (SPCs). Product codes may have many dimensions but are usually restricted to 2 or 3. Applying iterative decoding to such code structures results in TPCs, and exchanging soft extrinsic information yields good performance. TPCs are also referred to as block turbo codes (BTCs) or turbo block codes (TBCs).

Fig. 6. WER performance of cdma2000 turbo code (8-state, single-binary) and DVB-RCS turbo code (8-state, double-binary) versus that of PSTC generated with tool of [47] (8-state, single-binary). AWGN channel, binary antipodal signalling (i.e., BPSK), 300 packet errors per point, nominal code rate of 1/2, actual code rates as indicated. The cdma2000 and DVB-RCS results correspond to k ¼ 1504, PSTC result to k ¼ 1498, and the interleaver length in all cases is 1504 bits. All codes used enhanced max-log-APP decoding (SF ¼ 0:75, I ¼ 16 iterations per packet).

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE

1233

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

Fig. 7. Construction of a product code from two component codes C1 and C2 with parameters ðn1 ; k1 ; dmin;1 Þ and ðn2 ; k2 ; dmin;2 Þ, respectively.

Fig. 7 depicts the construction of a product code from two block codes C1 and C2 with parameters ðn1 ; k1 ; dmin;1 Þ and ðn2 ; k2 ; dmin;2 Þ, respectively. The variables ni , ki , and dmin;i represent the coded block length, the length of the information message, and the minimum distance, respectively, for each component code. The information bits are first represented as a matrix A with k1 columns and k2 rows. The k2 rows are encoded by C1 , yielding an intermediate matrix with n1 columns and k2 rows. The columns of this matrix are then encoded by C2 , resulting in a matrix of dimension n2 n1 . In this case, the interleaving is uniform. By construction, all rows are codewords of C1 and all columns are codewords of C2 . The parameters of the resulting product code are the product of the parameters of the constituent codes. The code length and code rate are determined by the dimensions of the constituent codes, but puncturing and shortening techniques can be employed to adjust these values to meet specific requirements. Shortening consists of fixing some of the information bits to a known value, thereby reducing the code rate. These information bits are not transmitted and are assigned high reliability values at the input to the decoder. Iterative decoding of TPCs is performed by alternately decoding along the different dimensions of the code, where again reliability information is represented as true or approximate LLRs. Numerous algorithms may be used to decode the component codes, for example [50]–[52]. A common choice with reasonable complexity is the Chase algorithm [53], [54]. This algorithm first determines a subset of most likely codewords, then performs ML decoding on this set of possibilities. This soft-in, hardout (SIHO) algorithm produces hard decisions for each data symbol and requires an additional algorithm to compute soft decisions, for example the method described in [12]. Extrinsic information is then extracted from the soft decisions and combined with the received or intrinsic information in preparation for the decoding of the next dimension. As with turbo codes, the extrinsic LLRs may be optimistic at the beginning of the decoding process and attenuating or scaling back the extrinsic information compensates for this. The earliest known proposal of scaled extrinsic decoding was for TPCs using log-APP decoding [12]. The decoding process for a single dimension of a product code is illustrated in Fig. 8. 1234

Proceedings of the IEEE | Vol. 95, No. 6, June 2007

In the figure, i denotes the number of code dimensions that have been processed, Lei represents the extrinsic information for dimension i, R represents the original data estimates derived from the channel, Ri represents the sum of intrinsic and extrinsic information at the decoder input, and SFi is an extrinsic scale factor that varies during the decoding process. As with turbo codes, each constituent decoding operation processes a combination of intrinsic information and extrinsic information from other constituent codes. Figs. 9 and 10 show BER and WER performance, respectively, for uncoded transmission, K ¼ 7 and K ¼ 9 zero-flushed (ZF) convolutional codes, and the optional ð64 57Þ2 TPC with dmin ¼ 16 specified in the IEEE 802.16 standards. The latter is a 2-D TPC based on extended Hamming codes. The performance shown was achieved with the vector SISO decoder described in [51].

C. LDPC Codes First proposed by Gallager in the early 1960s [3], [4], LDPC codes were largely forgotten until the mid-1990s. The discovery of turbo codes led to renewed interest [14]–[16] and since then they have received considerable attention [55]–[59]. LDPC codes on very large blocks have approached capacity limits extremely closely (e.g., [60] and [61]). LDPC codes are linear block codes based on simple parity check equations and specified by a sparse parity-check matrix, containing mostly zeros and a few ones (hence Blow density[). They are often represented as bipartite (BTanner[) graphs that contain loops or cycles [6]. All of

Fig. 8. An elementary decoder for a single dimension of a multidimensional turbo product code.

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

Fig. 9. BER performance of the IEEE 802.16 ð64 57Þ2 TPC (SF ¼ 0:625, 32 iterations) versus that of K ¼ 7 and K ¼ 9 zero-flushed convolutional codes and uncoded transmission. AWGN channel, binary antipodal signalling (i.e., BPSK), k ¼ 3249, 300 packet errors per point, nominal code rate of 4/5, actual code rates as indicated. The TPC was decoded with the vector SISO algorithm of [51] while the convolutional codes used soft-input decoding.

the nodes in the graph are classified as either variable nodes corresponding to the data and parity bits or check nodes corresponding to the parity check equations, and the edges define which bits are involved in which equations. The degree of each node is the number of edges that connect to it.

An LDPC code is said to be regular if the check-node and variable-node degrees are constant and irregular if they are not. To illustrate, Fig. 11 shows the Tanner graph of a regular LDPC code with check-node degree dc ¼ 4 and variablenode degree dv ¼ 2. These degrees correspond to the number of ones in the rows (for checks) and columns (for variables) of the parity-check matrix H. The low-density feature of the code means that the node degrees are small compared to the codeword length n, so that each parity equation involves only a few data elements. This sparseness property means that there are relatively few edges in the corresponding graph, so that the computational complexity of message-passing decoding is relatively low. More importantly, the low node degrees aid the convergence of the iterative decoder, i.e., they help the decoder to find its best answer more quickly. The distribution of the node degrees largely determines the convergence behavior of the code. This suggests that irregular LDPC codes could potentially display better convergence, and this is typically the caseVthose bits that are involved in more equations tend to converge more quickly and help to Bbootstrap[ the decoder. Optimizing the connections between the nodes is analogous to optimizing the interleaver in a turbo code and determines the Binterconnectedness[ of the coded symbols. The design of efficient and low-complexity encoding algorithms is one of the main challenges involved in designing LDPC codes. The most straightforward encoding method consists of the direct multiplication of the data symbols by the generator matrix. However, this approach is unattractive for complexity reasons: a sparse paritycheck matrix often corresponds to a dense generator matrix. Efficient encoding techniques typically involve the introduction of structure into the parity-check matrix, for

Fig. 10. WER performance of the IEEE 802.16 ð64 57Þ2 TPC (SF ¼ 0:625, 32 iterations) versus that of K ¼ 7 and K ¼ 9 zero-flushed convolutional codes and uncoded transmission. AWGN channel, binary antipodal signalling (i.e., BPSK), k ¼ 3249, 300 packet errors per point, nominal code rate of 4/5, actual code rates as indicated. The TPC was decoded with the vector SISO algorithm of [51] while the convolutional codes used soft-input decoding.

Fig. 11. Tanner graph representation of ðdv ; dc Þ ¼ ð2; 4Þ regular LDPC code and corresponding parity-check matrix H. Variables nodes are on the left, check nodes on the right.

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE

1235

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

Table 1 Examples of Practical LDPC Code Constructions

example a lower triangular structure [58], [62] or quasicyclic structure [63]–[67]. Gallager originally proposed both hard-decision (Bbit-flipping[) and soft-decision decoding algorithms for LDPC codes [3], [4]. Though work continues on the simpler hard-decision approach (e.g., [68]), LDPC decoders typically pass soft messages. Information is exchanged between neighboring nodes in the graph by passing messages along the edges, and computations are performed at the nodes according to update rules based on the parity equations. BP decoding usually delivers good performance and is often used as a reference. A well-known instance of BP is the sum–product algorithm first proposed by Gallager [4], [6], [55], [57], [69], which may also be realized in the log-domain. As in the case of turbo codes, numerous simplified alternatives to BP have been proposed for decoding LDPC codes. The min-sum algorithm [6], [7], [49], more recently referred to as the uniformly most powerful (UMP) BP-based algorithm [70], operates in the log domain and considers only the dominant contribution when updating messages at a check node. The outgoing soft messages are composed of the smallest magnitude of the incoming messages and the product of all of the signs of the incoming messages. Performance with this simplified technique may be improved by scaling back the extrinsic information, resulting in the corrected min-sum algorithm [71]–[73]. Other approximations to BP decoding for LDPC codes include the -min algorithm [74], the A-min (approximate-min) algorithm [75], and degree-matched check-node decoding [76]. Note that turbo codes and TPCs can also be interpreted as codes on graphs with cycles. In particular, the sum–product and min-sum algorithms correspond to the BCJR and max-log-APP algorithms, respectively [15]. Trellis representations of LDPC codes are also possible (e.g., [77]). Again, each constituent decoder operates on a combination of intrinsic information and extrinsic information from other constituent codes. For example, for the code of Fig. 11, an updated message from c1 to v2 contains contributions from v4 , v5 , and v7 , but not the previous message from v2 itself. Different ways of scheduling the variable and check node updates exist and can have a significant impact on the convergence of the decoding process. The default approach used in classical BP is flood scheduling, where all of the variable nodes are updated in parallel followed by all of the check nodes. Faster convergence is achieved with shuffle 1236

Proceedings of the IEEE | Vol. 95, No. 6, June 2007

scheduling [78], where updated extrinsic information is propagated to the other nodes in the graph as it becomes available. In the nominal case, this involves processing the parity checks serially, i.e., apply the first equation, propagate updated messages, apply the second equation, and so on. Since parallel decoding is desirable in practice, a hybrid approach where sub-blocks of the code are updated in parallel is preferred (Bgroup shuffle[ scheduling). Other scheduling algorithms that improve on flood scheduling are explored in [79]. A variety of LDPC codes that feature efficient encoding, good performance, and can be efficiently decoded have been proposed in recent years. Table 1 lists a selection of them. Note that Hyper codes and Skew codes are only LDPC codes if they are use SPCs as component codes. Fig. 12 shows the BER and WER performance of the 802.16e LDPC code for two different block sizes. Finally, note there has been work on constructing codes with sparse parity-check matrices based on component codes other than SPCs. This includes LDPC convolutional codes [87]–[89] and generalized LDPC codes [90], [91].

Fig. 12. BER and WER performance of two IEEE 802.16e LDPC codes versus that of a zero-flushed K ¼ 7 convolutional code. AWGN channel, binary antipodal signalling (i.e., BPSK), n ¼ 576 and 2304, 100 packet errors per point, r ¼ 1=2. The LDPC codes used 50 iterations, BP decoding, and flood scheduling while convolutional code used soft-input Viterbi decoding.

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

D. BTurbo-Like[ Codes Some forms of turbo FEC do not fall neatly into any of the previous three categories and are referred to as Bturbolike[ codes. Hybrids of turbo codes and LDPC codes fall into this category [92]. Much attention has also been devoted to serially concatenated codes [93], where the data and parity bits produced by the first encoder (the outer code) are interleaved and encoded by the second encoder (the inner code). Convolutional codes are often used as the constituent codes, resulting in serially concatenated convolutional codes (SCCCs) [94], [95]. SCCCs are sometimes also referred to as CTCs or TCCs. The outputs of the constituent codes are punctured to achieve the desired overall code rate and typically only parity bits are punctured. Decoding is performed in a manner similar to the parallel concatenated case, iteratively applying SISO decoders for each constituent code and exchanging extrinsic information between them. The same decoding algorithms can be used. SCCCs may achieve lower error flares than their parallel counterparts in some cases, though this may come at the cost of degraded waterfall performance. An SCCC scheme was investigated in the Modem for Higher Order Modulation Schemes (MHOMS) project [96]. Constructed from 4-state constituent convolutional codes, it yields significantly lower complexity than the LDPC codes and PCCCs that were investigated. Finally, note that other code types or even modulation schemes can be employed in a serially concatenated structure. For example, schemes involving convolutional codes concatenated with continuous phase modulation (CPM) [97] and convolutional codes with 16-ary quadrature amplitude modulation (16-QAM) [98] have been investigated. Another notable example is the serial concatenation of an inner convolutional code, a set of SPCs that control the code rate, and an outer convolutional code, known as Flexicodes [83], [99]. As well, repeat-accumulate (RA) codes [85] may be viewed as the serial concatenation of a repetition code and a rate-1 accumulator. E. Observations on Code Structure and Error Rate Performance The best possible performance of a turbo or turbo-like code is that achieved with an ML decoder, i.e., a decoder that compares the received data sequence to all possible codewords and chooses the codeword with the lowest distance from the received data. Though too complex to realize in practice, the performance of such a decoder is a useful means of evaluating code performance. An upper bound (the union bound [5]) on ML performance may be obtained from the distance spectrum of the code [100]. Recall that each codeword is separated from the others in a given code by some distance, i.e., by some number of differing bits. The distance spectrum is the distribution of these distances (number of codewords versus distance between codewords), with dmin the smallest. The distance spectrum is a function of the constituent codes and the

interconnections between bits of the code. For turbo codes these interconnections are largely determined by the interleaver but are also affected by trellis termination [101]–[103] and puncturing. For LDPC codes the interconnections are determined by the numbers and patterns of connections among the nodes. The performance achieved with a turbo or turbo-like code in the flare region typically parallels the performance predicted by the distance spectrum but is degraded due to the use of suboptimal iterative decoding. The severity of the flare and where it occurs are a function of both the distance spectrum of the code and the decoding process. In practice, even this suboptimal performance may be masked, for example by a poor early-stopping rule [104]. Performance in the waterfall region is dominated by the convergence properties of the code and the decoder. Convergence and the best possible waterfall performance are strongly affected by both the interleaver/graph connections and the constituent codes. Useful analysis of convergence performance is often done assuming very large blocks and ideal interleavers, for example using EXIT charts [105] or density evolution [59], [60], [106]. Turbo and turbo-like codes may be optimized for either flare or waterfall performance or be designed to yield a compromise between the two. A common misconception regarding turbo and turbolike codes is that they do not produce good codes at smaller block lengths (e.g., hundreds of information bits). This misconception appears to have arisen for at least two reasons. First, it is often assumed that the fundamental capacity limits are the same for all codes. In fact, there is a distinct performance limit for each combination of block size and code rate, and smaller blocks are fundamentally unable to achieve the same ultimate performance as larger blocks. Second, the central importance of interleaving/ graph connections to good performance was not clearly understood when turbo FEC was first introduced. In particular, it was not understood that a random interleaver is a poor choice for the permutation for small block sizes. With a properly designed interleaver, turbo FEC may be effective for even very small block sizes (e.g., 128 information bits [102]). Discussions of the performance of turbo codes relative to capacity limits as a function of block length may be found in [107] and [108]. The importance of the Bmixing[ of bits to good performance in turbo FEC is illustrated by the measured minimum distance results for numerous unpunctured turbo codes shown in Fig. 13. Most results were computed using Garello’s true distance measurement method [109], while the highest distances were estimated with Crozier’s iterative impulse methods [110], [111]. This is an expanded version of a plot from [112]. Results are shown for the wideband code-division multiple-access (W-CDMA), DVBRCS, and Consultative Committee for Space Data Systems (CCSDS) turbo codes (dashed lines) [109], [113], where the CCSDS point corresponds to an interleaver length of Vol. 95, No. 6, June 2007 | Proceedings of the IEEE

1237

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

Fig. 13. Measured minimum distance results for unpunctured ðr ¼ 1=3Þ turbo codes with 4-, 8-, and 16-state constituent codes, including the W-CDMA (8-state, single-binary), DVB-RCS (8-state, double-binary), and CCSDS (16-state, single-binary) codes.

3568 bits. For comparison, results are also given for a range of 4-, 8-, and 16-state turbo codes using random and DRP interleavers [34], [111], [112]. All of the codes are single binary except for the DVB-RCS code. The BAverage Random[ points were obtained by averaging the distances achieved by many random interleavers, while the BBest Random[ results represent the best distances found by testing 10 000 random interleavers for each block length (5000 for 16-state and a block size of 1280) [109], [114]. These results clearly show the importance of the interconnections between bits implied by the interleaver. Choosing an interleaver at random is not a particularly attractive strategy for turbo codes, either in terms of average performance or the ease with which a good permutation may be found. Even codes that are designed well otherwise can be greatly improved if careful attention is paid to the interleaver design.

III . PRACTICAL ISSUES FOR T URBO FEC The previous performance plots show examples of the large coding gains that are possible with turbo and turbo-like codes and explain the intense interest in them. However, achieving these gains in practice presents numerous challenges and costs. This section discusses these practical design issues and ways to resolve them.

A. Error Rate Performance and Power Savings Compared to codes that found wide application in the past (e.g., K ¼ 7 convolutional codes), turbo and turbolike codes typically achieve much lower error rates for a fixed level of transmitted power or, alternatively, a given 1238

Proceedings of the IEEE | Vol. 95, No. 6, June 2007

error rate is maintained for a much lower level of transmitted power. Two example scenarios help to illustrate the potential benefits of these improvements. First, assume that it is desired to maintain the same signal constellation, bandwidth, and target WER as an existing system. Better coding allows the same error rate to be achieved at a lower Eb =N0 , i.e., at a lower transmitted power per information bit. If this reduction is large enough, it may be translated into equipment cost reductions in the form of smaller antennas, cheaper electronics, or smaller batteries. Alternatively, assume that the target WER, signal bandwidth, and transmitted power per information bit are held constant but the density of the signal constellation is increased, for example from quadrature phase shift keying (QPSK) to 8-ary phase shift keying (8-PSK). Throughput in bits per second increases due to the higher number of bits per symbol, but so does the error rate. More powerful channel codes can compensate for this degradation in error rate without a reduction in code rate. The result is a system that is able to convey more information per unit time using the same bandwidth and still achieve a desired level of performance. The latter approach has been adopted by many broadband satellite systems (Section V), where bandwidth is more constrained than in terrestrial systems. However, if an application must operate at very low WERs, the value of some forms of turbo FEC may be reduced due to the presence of error flares. If ARQ is used, this does not present a problem; as long as the flare occurs below the target WER, retransmissions will eventually clean up any residual errors. But for broadcast video, for example where a return channel and ARQ may not be available, the target error rate must be quite low (e.g., WER ¼ 106 or 107 ). If the flare cannot be driven below this target at a low enough SNR, it may not make sense to use turbo FEC, at least not by itself. Aside from retransmissions, there are numerous ways to lower the flare and/or increase its slope, including the following. 1) Concatenate the turbo or turbo-like code with a high-rate outer code on some or all of the transmitted bits, for example a BCH code [40], [81], [115]–[117]. This incurs penalties in energy per coded bit and code rate that grow as the block length decreases. This is a very attractive solution for large block sizes where such penalties are minimal. 2) Optimize the design of the turbo or turbo-like code to improve performance in the flare region, where performance tends to be dominated by the distance properties of the code. This involves optimizing interleaving or bit node connections, constituent codes, trellis termination, and/or puncturing. 3) Modify the decoding process. An example of the latter approach is to force small numbers of bits to particular values and repeat the iterative decoding process multiple times. The forced bits are those

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

most likely to be in error. Some performance improvements are possible if the constraints of the turbo or turbolike code itself are used to check the results of the additional decoding operations [118], [119], but larger gains are obtained if a weak outer code is used for error detection [e.g., a cyclic redundancy check (CRC)] [120]. The preferred choice for flare reduction, if any, will vary widely depending on circumstances.

B. Computational Complexity Computational complexity has a strong impact on throughput and power consumption. Processing requirements also affect system costs by affecting the choice of computation engine. Assessing the computational complexity of turbo and turbo-like codes depends upon the basis of comparison. If compared with K ¼ 7 convolutional codes, decoding them typically involves an increase in computation. Since these increases are not extremely large and result in substantially improved performance, the cost is deemed to be acceptable in many cases. However, a very different impression emerges if one imagines block or convolutional codes that could deliver the same performance as turbo FEC. Compared to such codes, the decoding complexity of turbo and turbo-like codes is quite low. In fact, block or convolutional codes that performed so well could not be seriously considered in practice. From this perspective, turbo and turbo-like codes may be recognized as a true breakthrough, not in performance, but in complexity. Major factors affecting computational complexity for turbo and turbo-like codes are as follows. 1) Number and complexity of constituent codes: Solutions vary, from small numbers of relatively powerful component codes (e.g., turbo codes) to large numbers of relatively weak component codes (e.g., LDPC codes). 2) Decoding algorithm: More complex constituent decoders increase the computational requirements of the overall decoder. To achieve improved

performance, it may be necessary to use a more complex constituent decoder. As noted, simplified decoders that scale back the extrinsic information are an attractive choice. 3) Number of decoding iterations: Decoding each constituent code a larger number of times implies a greater computational load. 4) Early stopping (convergence detection): Some form of early stopping may be used to reduce the computational load required by the iterative decoding process [104], [121], [122]. These methods may be used to reduce the average number of iterations (at the cost of increased buffering) or reduce average power consumption. 5) Encoding complexity: Encoding for turbo codes and TPCs is typically very efficient. LDPC codes that are used in practice are similarly easy to encode (e.g., 802.16e). 6) Memory access: This refers to how well the code structure lends itself to the efficient retrieval of data from memory. A regular structure that allows memory reads and writes to be done efficiently in parallel is preferable. However, one of the defining features of turbo FEC is the interconnectedness of the code structure, implying a tradeoff between structure and performance. There is no longer any doubt that turbo and turbo-like codes can be decoded at high speeds, both in software and hardware. A hybrid approach is to implement key decoder functions in hardware coprocessors (e.g., [126] and [127]). To illustrate the kinds of speeds that are possible, Table 2 shows several hardware implementations of turbo FEC decoders, either on application-specific integrated circuits (ASICs) or field-programmable gate arrays (FPGAs) and their throughputs in megabits per second. Other hardware LDPC decoders are listed in [89]. Table 3 shows several examples of efficient software implementations of turbo or turbo-like codes on generic microprocessors and digital signal processing (DSP) processors along with their

Table 2 Examples of Commercially Available Hardware Decoders for Turbo and Turbo-Like Codes With Associated Throughputs

Table 3 Examples of Commercially Available Software Codecs for Turbo and Turbo-Like Codes With Associated Throughputs

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE

1239

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

throughputs in kilobits per second. It is evident that software written on generic processors is not able to achieve the same speeds as hardware, due both to slower clock speeds and the greater customization that is possible in hardware. Speeds will only improve as processing technology improves and further refinements in decoding technology are developed.

C. Parallelism High-speed hardware implementations of turbo and turbo-like codes are achieved with parallel processing, i.e., by decoding distinct portions of the code structure in parallel. Parallelism tends to degrade the distance spectrum by restricting the code structure, resulting in degraded flare performance. However, this effect is typically not substantial unless high degrees of parallelism are used. Furthermore, some cost in terms of distance properties is typically acceptable in order to increase parallelism and therefore reduce latency. Moreover, degraded distance may not be important if another means of improving flare performance is available (e.g., a BCH outer code) or the desired WER is above the flare. By their very nature, TPCs have a regular structure with many equations that are mutually independent. They therefore lend themselves naturally to parallel processing. LDPC codes can also achieve high degrees of parallelism. This is true even for irregular LDPC codes that achieve the best performance. A notable example is the class of block structured or quasi-cyclic LDPC codes, where the parity-check matrix is structured in such a way that many equations may be updated simultaneously (e.g., [64] and [77]). Specifically, the parity-check matrix is composed of cyclically shifted identity matrices. This kind of structure is present in the LDPC codes specified by Digital Video Broadcasting via Satellite, 2nd Generation (DVB-S2) and IEEE 802.16e standards (Section IV). Turbo codes can achieve high degrees of parallelism by breaking the block into windows or subblocks [128]–[130]. However, this presents some challenges. First, a fully parallel implementation requires larger memory resources (see Section III-D). A combination of parallel and serial processing of subblocks may be used to reduce this. Second, subdividing the turbo interleaver to allow parallel processing may cause a degradation in the distance spectrum in the general case. This does not present a significant problem, however, since interleavers that are already structured are unlikely to suffer much loss. Examples of interleavers that may be subdivided include those in [35], [112], [131], and [132]. D. Memory Requirements Memory requirements for turbo or turbo-like codes depend on the block size, the numerical precision (i.e., the number of bits) used to represent information in the decoder, and what information must be stored during the decoding process. 1240

Proceedings of the IEEE | Vol. 95, No. 6, June 2007

A turbo decoder requires significant amounts of memory for the internal state metrics, the received samples, the extrinsic information, and the interleaver indices. The state metrics are values produced by a trellisbased decoder that measure the probability that a given possible state at a given point in the decoding trellis matches the true encoder state [5]. An APP decoder must store these metrics for the entire set of data bits. Thus, the number of memory words required by a turbo decoder for the state metrics is equal to the number of states in the constituent decoder multiplied by the number of data bits and typically dominates memory requirements in this case. If the decoder uses subblock processing and the subblocks are processed serially, the memory requirements for the state metrics are dramatically reduced. However, to achieve high speeds/low latencies, hardware implementations may process subblocks in parallel. A fully parallel implementation therefore requires the full amount of state metric memory. A hybrid serial/parallel implementation may be used to reduce these memory requirements. The memory requirements are slightly different for a double-binary turbo code. Since a double-binary decoder processes bit pairs, the number of trellis sections in the decoder is half that for a single-binary decoder, assuming the same number of data bits. This reduces the state metric memory requirements by half. On the other hand, for each pair of bits a double-binary decoder must store three extrinsic LLRs while a single-binary decoder must store only two. Many systems require multiple block sizes and/or code rates. Each block size requires a separate interleaver and different interleavers may also be used for different code rates. If a full set of interleaver indices is stored in each case, the memory requirements can be prohibitive. This has led to much research into structured interleavers for turbo codes that can be represented by a small number of parameters and/or generated at run time. Notable examples of algorithmic and/or low-memory interleaver designs are those specified by several standards discussed in the following [133]–[136], almost regular permutation (ARP) interleavers [35], and DRP interleavers [34], [112]. To illustrate, consider the decoding of an 8-state, single-binary turbo code with k ¼ 8192 data bits encoded at r ¼ 1=3 (i.e., there are n ¼ 24 576 inputs to the decoder) and assume that a low-memory interleaver is being used. If the decoder inputs (data and parity) are stored as 4-bit values, they require 12 288 bytes of memory. The extrinsics must be stored in larger words since they will grow during the decoding process, but only one extrinsic per data bit must be stored (not two). Note that no extrinsics are generated for the parity bits. With 8-bit resolution, these buffers then require 8192 bytes of memory. For fully parallel decoding, the state metrics require at least 8192 8 states 8 bits per metric ¼ 65 536 bytes of memory, possibly more if greater resolution

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

is required. If the code rate is increased to r ¼ 4=5, the state metric, extrinsic, and interleaver memory requirements are unchanged, while the decoder inputs require only 5120 bytes. The overall memory requirements are 86 016 bytes at r ¼ 1=3 and 78 848 bytes at r ¼ 4=5. If subblocks of 256 data bits were processed serially, the state metric requirements would be reduced by a factor of 32, yielding totals of 22 528 bytes at r ¼ 1=3 and 15 360 bytes at r ¼ 4=5. Totals for hybrid serial/parallel implementations would fall between these extremes. To more efficiently compute the messages that are sent along the edges of the code graph, practical LDPC decoders may represent each parity check with a two-state trellis (e.g., [77]). Assuming this, the nominal case for LDPC codes involves log-domain sum-product decoding where one state metric is stored per bit per equation being processed (the other can always be normalized out). Relating to the code itself, interleaver indices are not required, but a similar problem exists due to the need for a code definition table at both the encoder and decoder that specifies which bits are involved in which equations. In practice, many constructions feature a structure that removes the need to store a large table. Memory is also required for the received samples and the extrinsic information. Of these, the extrinsic memory requirements dominate in this case. This is because these codes update the soft estimates for both the data and parity bits, the bits are involved in multiple equations, and the extrinsics typically require larger words to account for their growth. Less memory is required if some form of min-sum decoding is used. In the latter case, only the minimum extrinsic magnitude and second minimum extrinsic magnitude must be stored as soft values, while the signs of the other extrinsics can be packed together in the same memory word [71], [137]. The state metrics are eliminated. This advantage of min-sum decoding becomes more significant as the equations grow in length. Finally, note that flood scheduling requires the decoder to maintain two distinct sets of extrinsic information (old and new), while shuffle decoding only requires one set. To illustrate, consider the decoding of an irregular LDPC code with k ¼ 8192 data bits at r ¼ 1=3 (i.e., n ¼ 24 576) with average variable-node degree dv ¼ 3 and average check-node degree dc ¼ 4:5 that features log-domain sum–product decoding, shuffle scheduling, and does not require a large code definition table. The decoder requires 12 288 bytes for the decoder inputs at 4 bits per value. If the decoder is implemented with a degree of parallelism q ¼ 128, t h e s t a t e m e t r i c s r e q u i r e q dc;max 8 bits per metric ¼ 8 bits per metric ¼ 640 bytes. The extrinsic information requires 24 576 dv 8bit values ¼ 73 728 bytes. If the code rate is increased to r ¼ 4=5 with dv ¼ 3 and dc ¼ 15, the extrinsics require 30 720 bytes, the decoder inputs require 5120 bytes, and the state metrics require 1920 bytes. The overall memory requirements are 86 656 bytes at r ¼ 1=3 and 37 760 bytes at r ¼ 4=5.

These examples show that the memory requirements of turbo codes versus those of LDPC codes may look more or less attractive at different code rates. An LDPC code needs more and more memory for its extrinsics as the code rate decreases, while for a turbo code the memory required for the state metrics and extrinsics is constant across all code rates. Thus, in terms of memory requirements, turbo codes are relatively attractive at low code rates while LDPC codes are relatively attractive at high code rates.

E. Latency Latency is the difference between the time that a data bit is received and the time that the corresponding bit estimate is produced by the decoder. As with computational complexity, assessing latency depends upon the basis for comparison. On one hand, the latency of turbo FEC is typically larger than that for K ¼ 7 or K ¼ 9 convolutional codes. The latency of a Viterbi decoder is determined by the history depth [5], typically a small multiple of the constraint length. The decoder will begin producing final data estimates after one history depth of inputs has been processed. Given the short constraint lengths of these example codes, the required history depth is likely to be much smaller than the data block. For example, a K ¼ 7, rate 1/2 convolutional code typically requires a history depth on the order of ten constraint lengths or 70 bits, especially with puncturing. A turbo decoder for the same block length must receive all of the data estimates for the block before decoding can begin and must then perform some number of iterations. Thus, the latency of the turbo decoder is higher in this case. On the other hand, if one compares a turbo or turbo-like code with a convolutional code that delivers the same performance, the latency increase with turbo FEC will be small. That is, there is an inherent correspondence between code performance and latency. The latency issue is one of the main reasons why parallel decoding is of interest. Any latency advantage for a convolutional code is far smaller if it is used as part of a concatenated code (e.g., with Reed–Solomon codes), since decoding of the outer code cannot begin until decoding of the inner code is complete for the entire block of data. Many systems also include an additional interleaver to combat time variations in the channel. If such an interleaver is at least as long as the code block length, it will significantly affect the receiver latency. In practice, the effective decoding latency may be determined by available memory resources. That is, the decoder may be forced to produce decisions when the receiver cannot buffer any more data blocks, regardless of the state of the decoding process. F. Flexibility The relative ease with which a designer can both select different block sizes and code rates and achieve good performance with these block sizes and code rates is a significant issue. Different approaches to this design Vol. 95, No. 6, June 2007 | Proceedings of the IEEE

1241

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

problem are typically used for turbo codes on the one hand and TPCs and LDPC codes on the other. Turbo codes achieve different code rates with puncturing. This approach allows many different code rates to be obtained easily from the same encoder and allows the use of a common decoder. At the same time, it has been found that convolutional codes such as those used in turbo codes are fairly robust to puncturing, maintaining good performance across many code rates. Nevertheless, challenges remain. Not all puncturing patterns deliver the same performance for a given code rate, so offline searches must be performed. The choice of puncturing pattern and interleaver are also intertwined, complicating code design. How best to deal with these design challenges is still an active area of research (e.g., [44] and [138]). TPCs and LDPC codes are typically designed for a particular combination of block size and code rate. Distinct codes at different rates may still be decoded by a generic decoderVthe challenge in this case is to design an array of separate codes with good performance. Puncturing and shortening are still used to match these codes to arbitrary block sizes.

G. Hardware (VLSI) Complexity and Power Dissipation Ultimately, any codec that is used in significant volume will be implemented in very large-scale integration (VLSI) hardware to increase speeds and lower unit costs. Thus, turbo FEC schemes that lend themselves to hardware realizations are more attractive. One crucial issue that has already been addressed is parallelism. Another is the complexity of the constituent decoders and their parts, often expressed in terms of how much surface area they require on a chip. Adders are smaller and simpler than multipliers, favoring a log-domain implementation for the decoder. The component decoders for LDPC codes are very simple, and many of them will fit in the space required by a more complex SISO decoder. The impact of such considerations can be substantial. A notable example is the DVB-S2 standardization process, which made minimizing chip surface area a key design constraint. Candidate decoders were constrained to use no more than 14 mm2 of silicon on an ASIC using 0.13 m technology [81]. Another important issue is the power required by a VLSI implementation. Power consumption is affected by the computational complexity of the decoding algorithms as well as the iterative nature of the decoding process and is a particularly important issue with high-throughput applications and battery-powered devices. Efficient architectures and code designs targeting this issue have been investigated in an effort to achieve low power dissipation and high data rates without sacrificing performance (e.g., [89], [139], and [140]). Specific examples include: 1) the LDPC decoder of [141], which consumes 690 mW when decoding an irregular code with dv ¼ 3:25, dc ¼ 6:5 at a throughput of 1 Gb/s (k ¼ 1024, r ¼ 0:5, 64 iterations) 1242

Proceedings of the IEEE | Vol. 95, No. 6, June 2007

and 2) the LDPC decoder of [142], which consumes 787 mW when decoding a (3, 6)-regular LDPC at 640 Mb/s (k ¼ 2048, r ¼ 0:5, ten iterations).

H. Numerical Precision Though the constituent decoder may be implemented in the probability domain, practical implementations virtually always operate in the log domain. A significant reason for this is numerical precision. Probabilities are typically represented as floating-point numbers, composed of an exponent field and a mantissa or magnitude field. Small values can be represented very accurately with this format. But numbers close to one all have the same exponent and differ only slightly in their mantissas. Thus, as probabilities approach one, it becomes increasingly difficult to represent differences between them and the accuracy of quantities computed by the decoder suffers. It is preferable to use as small a word length as possible in the decoder to minimize memory requirements. This means that the various data sets in the decoder are subjected to clipping and low levels of quantization (e.g., four bits for the decoder inputs, more for the extrinsics). This introduces quantization and clipping noise that may degrade performance (e.g., [143]). I. Effect on Synchronization An important issue that is often overlooked in discussions of turbo and turbo-like codes is their relationship to the system’s synchronization strategy. The excellent performance promised by turbo FEC is only achievable if the receiver can estimate and remove frequency, symbol timing, and phase errors with sufficient accuracy. This is a bigger challenge for turbo FEC than for the K ¼ 7 and K ¼ 9 convolutional codes that have been widely used in the past. Turbo FEC typically achieves a desired error rate at a much lower value of Eb =N0 than these convolutional codes, and therefore the desired operating point occurs at a much lower value of channel SNR. For example, Fig. 5 shows the K ¼ 7 convolutional code reaching a WER of 103 at about 4.5 dB while both turbo codes achieve this error rate at roughly 1.4 dB. The synchronization strategy must be able to function effectively at this lower channel SNR and may require an increase in synchronization overhead and/or processing in order to deliver acceptable performance. This issue has been explicitly identified in connection with the DVB-S2 standard [81]. J. Intellectual Property Proper attention to intellectual property (IP) issues is crucial to the commercial success of any product. If a given technology is protected by patents that require licensing fees and/or royalty payments, the overall cost of a system can be significantly increased. 1) Turbo Codes: The fundamental technology involved in turbo codes is protected by a number of patents, jointly

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

held by France Te´le´com, Te´le´diffusion de France (TDF), ´coles des Te´le´communications (GET). and Groupe des E Any company wishing to use turbo code technology in a commercial product must therefore obtain a license via the Turbo Code Licensing Program (TCLP) [144]. Despite the many advantages of using turbo codes, any decision to use them in a commercial system must account for the cost of licensing the core technology. There are also patents on various aspects of turbo code design and decoding (e.g., [145]). 2) Turbo Product Codes: There are no known patents on the fundamental theory surrounding product codes, since they have been widely used and studied for decades. However, there are numerous patented and proprietary decoding methods for TPCs that may represent an attractive alternative to turbo codes in some cases (e.g., [146]). 3) LDPC Codes: There are no patents covering the fundamentals of LDPC codes. However, individual companies may have patent protection on specific implementation technologies or decoding techniques (e.g., [137], [147], and [148]).

IV. TURB O OR T URB O-L IKE CODES IN STANDARDS Turbo and turbo-like codes have been incorporated into a variety of standards. This section gives descriptions and limited comparisons for the most significant of them and notes several others. These lists are not exhaustive but are reasonably comprehensive. The descriptions illustrate how each standardized turbo FEC code has addressed the practical constraints discussed in the previous section. Also, note that the turbo and turbo-like codes described here are complemented by other FEC, often a convolutional code or concatenated convolutional/Reed-Solomon code. Additional information can be found in the standards documents themselves or some textbooks (e.g., [93]). The performance of numerous codes below is compared with the binary-input capacity limit on an AWGN channel. That is, performance is evaluated relative to the channel capacity limit corresponding to the given code rate, an infinite block length, and error-free communication. As noted, comparing the performance of smaller block sizes with this limit may be misleading. When evaluating distances to the capacity limit, bear in mind the corresponding distances for the convolutional codes discussed earlier. For r 1=2 and WER ¼ 103 , the K ¼ 7 code is 4.36 dB from capacity while the K ¼ 9 code is 3.65 dB away. At WER ¼ 105 , these values grow to 5.54 and 4.77 dB, respectively.

means of mitigating the effects of poor channels, for example those affected by multi-access (MA) interference. The propagation delays in such systems are low enough that some form of ARQ is typically used to remove residual packet errors, provided that the frequency of retransmissions is low enough (i.e., the target WER is perhaps 103 ). Flare performance is therefore unimportant as long as the target WER is met. The fact that turbo decoding is performed in mobile terminals (e.g., cellular telephones) with limited power and processing resources means that memory and processing requirements must be kept modest. 1) W-CDMA: Wideband code-division, multiple-access (W-CDMA) is the 3G migration goal for second generation (2G) systems based on global system for mobile communication (GSM) and time-division, multiple-access (TDMA) technology. It is supported by the Third Generation Partnership Project (3GPP). Table 4 summarizes the main features of the W-CDMA turbo code. It offers only the base code rate of 1/3. Both encoders begin and end each block in the all-zeroes state. Trellis termination is achieved with Bfeedback[ termination, i.e., after the information bits have been encoded by a constituent encoder, the bits from the feedback path (i.e., the modulo-2 sum of the two feedback branches shown in Fig. 1) are inserted at the encoder input for m ¼ 3 bit periods. From the figure, it is clear that this will result in zeroes being inserted into the delay line, returning the encoder to the all-zeroes state after m ¼ 3 bits have been processed. These feedback termination bits and the corresponding parity bits from both encoders are included in the transmitted codeword, lowering the code rate. This scheme leaves the termination bits unprotected by the interleaver and offers performance inferior to dual tail-biting or dual termination [101]–[103]. The advantage of leaving these bits outside the interleaver is that encoding can be done independently, i.e., neither encoder requires knowledge of the termination bits for the other. The interleaver itself is a modified block interleaver. Conceptually, inputs are read into a rectangular matrix row-wise (optionally with padding bits inserted), the bits are permuted both within rows and across rows, and any padding bits are removed as the outputs are read out column-wise. The dimensions of the matrix and the exact row-column permutations depend on the desired block

Table 4 Code Features for the W-CDMA (3GPP) Turbo Codes [133]

A. 3G Wireless 8-state, single-binary turbo codes have been adopted by 3G terrestrial wireless systems. Strong coding is a key Vol. 95, No. 6, June 2007 | Proceedings of the IEEE

1243

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

Table 5 Distances From the Binary-Input Capacity Limit (Infinite Block Size) for Several W-CDMA (3GPP) Turbo Codes [133]. Performance for 3GPP Codes was Measured at WER ¼ 103 Using Enhanced max-log-APP Decoding ðSF ¼ 0:75Þ and 16 Fixed Iterations

Table 6 Code Features for the cdma2000 (3GPP2) Turbo Codes [134]

size [133]. In practice, the interleaver indices are likely computed at run time. These interleavers typically perform substantially better than random interleavers but are not the best known for these 8-state constituent codes (e.g., Fig. 6). Table 5 shows distances from the binary-input capacity limit for these codes. Slight improvements in performance might be achieved by optimizing the extrinsic scale factor SF or using log-APP decoding. Though substantially better than convolutional codes, these codes are still a significant distance from the capacity limit. However, much of this distance is due solely to the short block lengthsVfor the largest block size (5114 bits), the distance from capacity is less than 1 dB. 2) cdma2000: The term Bcdma2000[ refers to a family of standards that include 1xRTT, 1xEV-DO, and 1xEV-DV. These different standards were introduced as steps in a migratory path from 2G CDMA systems (i.e., IS-95) to 3G. Not surprisingly, their channel coding strategies are all closely related. The technologies defining cdma2000 are

supported by the Third Generation Partnership Project 2 (3GPP2). Table 6 shows a summary of the code characteristics [134]. The nominal base code rate for these codes is 1/5, since each constituent code is rate 1/3. The code rate and block size options vary depending on the variant of cdma2000 and whether the forward or return channel is utilized. Feedback trellis termination is used, but the termination bits and the resulting parity bits are repeated at the encoder output in some cases. This partially compensates for the fact that they are left outside the interleaver. The data are encoded at rate 1/5 and then punctured to the higher rates with puncturing patterns specified in the standard. Different puncturing patterns are applied to the termination bits and information bits. As with W-CDMA, the interleaver is a modified block interleaver with row and column permutations. However, the block dimensions and permutations differ from W-CDMA. This means that the cdma2000 turbo code at rate 1/3 is not the same as the W-CDMA turbo code. A runtime algorithm for computing the interleaver indices is given in the standard document [134]. Table 7 shows distances from the binary-input capacity limit for several cdma2000 turbo codes. As for the W-CDMA codes, some improvements in performance could be achieved by optimizing the value of SF or using log-APP decoding. Distance from the binary-input capacity limit is roughly consistent across the different code rates for the same block length and is again substantially larger for the smaller block lengths, as expected. For the longest block of 20 730 information bits, the smallest distance of 0.79 dB is achieved at r 1=2. 3) TD-SCDMA: The time-division, synchronous CDMA (TD-SCDMA) standard is another 3G wireless standard closely related to W-CDMA and cdma2000 [149]. While W-CDMA is dominant in Europe and cdma2000 in North America, TD-SCDMA is favored by the Chinese government for their domestic market [150]. The channel coding specified in the TD-SCDMA standard is virtually identical to that for the W-CDMA standard, including the restriction to rate 1/3. The algorithm used to generate the interleaver is very similar

Table 7 Distances From the Binary-Input Capacity Limit (Infinite Block Size) for Several cdma2000 (3GPP2) Turbo Codes [134]. Performance for 3GPP2 Codes was Measured at WER ¼ 103 Using Enhanced max-log-APP Decoding ðSF ¼ 0:75Þ and 16 Fixed Iterations

1244

Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

Table 8 Code Features, Block Sizes, and Nominal Code Rates for the CCSDS Turbo Code [135]

to that specified for W-CDMA, but differs in some details [149].

B. Satellite Communications Turbo and turbo-like codes have been incorporated into several satellite standards. Propagation delays are long, so ARQ is unattractive and target WERs are typically much lower than for terrestrial wireless links (e.g., WER ¼ 106 ). 1) Consultative Committee for Space Data Systems (CCSDS): The CCSDS was created to identify data communications issues experienced by member space agencies and recommend technical solutions. Adherence to these recommendations is voluntary, but they nonetheless appear to function as de facto standards. Recommendations have been made for telemetry channel coding that include a range of 16-state, single-binary turbo codes [135], and there have been proposals to extend the recommendations to include an LDPC code [151]. The intended application is data communication from deep space probes to earth. This is a scenario where the SNR is often poor, computational complexity and decoding latency are not limiting factors for the terrestrial receiver, and retransmissions are not possible due to the lack of a return channel. Table 8 shows key code parameters for the CCSDS turbo codes. One of the feed-forward paths from the second encoder is never used, making the base code rate 1/6. Higher rates of 1/2, 1/3, and 1/4 are achieved by selecting a subset of the feed-forward paths from the two encoders and with additional puncturing for rate 1/2. Both encoders begin each block in the all-zeroes state and use feedback trellis termination, though the termination bits from the second encoder are not transmitted. The interleaver is a structured design attributed to Berrou [152] whose indices can be generated at run time [135].

Table 9 shows the range of block sizes offered by the CCSDS recommendation. Based on [107, Table 1] and [153, Fig. 2], distances from the binary-input capacity limit for a block of 10 200 information bits, ten decoding iterations, and performance measured at a BER of 106 are 0.79 dB at r 1=2; 0.87 dB at r 1=3; 0.92 dB at r 1=4; and 0.95 dB at r 1=6. Error-rate performance results and some additional distance results for these codes are given in [107], [153]. The CCSDS turbo codes have been incorporated into numerous spacecraft. The National Aeronautics and Space Administration (NASA) has used the CCSDS turbo codes to support a range of missions (e.g., MESSENGER, STEREO) and has plans to use them on future missions (e.g., Kepler, Mars Telecommunication Orbiter) [151], [154]. The European Space Agency (ESA) has included them in the Rosetta and SMART-1 systems [19], [155]. 2) Digital Video BroadcastingVReturn Channel via Satellite (DVB-RCS): The DVB-RCS standard, adopted in 2001, specifies a return channel for satellite distribution systems [136]. The forward channel adheres to the Digital Video Broadcasting via Satellite (DVB-S) standard and will support DVB-S2. These two links define an interactive connection for satellite broadband systems serving remote areas, making fast Internet connections available to users unable to use more established technologies such as ADSL. Supported throughputs vary from 144 kb/s to 2 Mb/s. The return channel will typically process short messages (e.g., requests for Web pages or television programs), implying smaller block sizes and relatively low latency. The desired operating point appears to be WER ¼ 105 or below [156], consistent with the fact that long propagation delays over satellite links make ARQ unattractive. DVB-RCS calls for 8-state, double-binary turbo codes and represents the first example of a double-binary turbo code in a standard. Table 10 shows the code parameters for this code and Fig. 3 the structure of the duo-binary encoder. The constituent encoders each process two data bits and produce two parity bits on each time step, making the natural code rate of the turbo code 1/3. Puncturing is used to achieve six higher code rates between 2/5 and 6/7. Both encoders use tail-biting to ensure that all bits are equally protected, i.e., each encoder starts and ends in the same state. The interleaver operates at two levels. The first involves reversing the order of bits within every

Table 9 Block Sizes and Nominal Code Rates for the CCSDS Turbo Codes [135]

Table 10 Code Features for the DVB-RCS Turbo Codes [136]

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE

1245

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

Table 11 Distances From the Binary-Input Capacity Limit (Infinite Block Size) for Two DVB-RCS Turbo Codes [136]. Performance for the DVB-RCS Codes was Measured at WER ¼ 103 Using Enhanced max-log-APP Decoding ðSF ¼ 0:75Þ, 16 Fixed Iterations, and Tail-Biting Overlaps of 50 (ATM) or 75 (MPEG) Symbols

Aside from being sparse, the parity-check matrices for the DVB-S2 codes have some useful structural properties. First, they may be decomposed as HS2 ¼ ½A : B

second bit pair. The second involves permuting bit pairs (Bsymbols[) in the same way that a single-binary turbo code permutes bits. The indices for this symbol interleaver are computed by a simple algorithm given in the standard and may therefore be generated at run time. The interleaver design parameters are specified by the standard for each of the 12 supported block sizes, from 12 to 216 bytes. This range includes both ATM and MPEG block lengths (424 and 1504 bits, respectively). Table 11 shows distances from the binary-input capacity limit for the latter two block sizes. 3) Digital Video Broadcasting via Satellite, Second Generation (DVB-S2): The DVB-S2 standard is aimed at satellite systems delivering primarily broadcast services (e.g., digital television) but also some interactive services (e.g., Internet access). The broadcast application means that delay sensitivity is low, and therefore large blocks with large decoding latencies may be used to improve performance. The satellite scenario also involves long propagation delays that make ARQ unattractive, so low error flares are required. The standard calls for a range of LDPC codes concatenated with high-rate BCH codes for two block sizes: 64 800 channel bits for delay-insensitive applications (e.g., television broadcast), 16 200 channel bits otherwise [62], [81], [157]. A selection of supported code rates is shown in Table 12. The lower codes rates use QPSK modulation, while 8-PSK, 16-ary amplitude phase shift keying (16-APSK) and 32-APSK may be used at the higher rates.

where A is ðn  kÞ k, B is ðn  kÞ ðn  kÞ, and B has a bidiagonal structure [81]. The code is thus systematic, and each parity bit is a function of a set of information bits and only one other parity bit. The sole exception is the first parity bit, which depends only on information bits. This Baccumulator[ structure for the parity bits yields efficient (i.e., linear-time) encoding, since the parity can be computed recursively. Second, the connections of bit nodes and check nodes are restricted within blocks of q bit nodes. These blocks of q can therefore be decoded in parallel. Also, only one set of connections per block of q must be stored, simplifying the code definition problem, though different connections are required for each code rate and block size. The connections to these initial bit nodes are randomized such that length four cycles and most length six cycles in the corresponding Tanner graph are eliminated. The code also has an irregular structure, improving convergence. The Baccumulator[ structure of B and the irregular structure make these codes similar to IRA codes (e.g., [86]). A range of performance results using log-domain sumproduct decoding are shown in [62], [157], though no results for the 16 200 bit blocks are given. The error-rate performance is impressive. However, much of the gain relative to the 3GPP/3GPP2 results is due to the larger block size rather than any inherent superiority of the LDPC code. The primary broadcast application can tolerate delays of seconds, so the longer latency associated with these long block lengths is acceptable. Despite its virtues, the LDPC code alone would not achieve the WER specification for DVB-S2 of 107 . This is due at least in part to the fact that some Bhanging[ bits are involved in only one parity equation, i.e., the code has not been Bterminated[ properly. Such bits will have a higher average error rate and degrade flare performance. However, this is of no importance in this case because of the

Table 12 Selected Code Rates, Block Sizes, and BCH Overhead for the DVB-S2 LDPC + BCH Codes [157]

1246

Proceedings of the IEEE | Vol. 95, No. 6, June 2007

(3)

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

presence of the BCH outer codes. This is an effective approach to lowering the flare but comes at a small cost (Table 12). First, there is a penalty in terms of Eb =N0 due to the BCH parity bits. This loss is minimal for larger information blocks but becomes noticeable as the block shrinks. For example, the ðn; kÞ ¼ ð16 200; 3072Þ code (r 1=5, not shown in Table 12) incurs an Eb =N0 penalty of 0.23 dB. Second, the reduction in code rate shifts the binary-input capacity limit to a lower SNR, further away from the actual code performance. This effect is more pronounced as the code rate climbs. Taken together, these two effects degrade performance relative to the capacity limit by roughly 0.04 dB for n ¼ 64 800 and between 0.1 and 0.2 dB for n ¼ 16 200 for the cases shown in Table 12. These are reasonable penalties in return for flares below WER ¼ 107 , especially for the longer block length, but the overhead does become more substantial for smaller information block sizes. Table 12 shows the distance from the adjusted binary-input capacity limit in three cases. The performance results used to compute these differences were read from the plots in [62] and are therefore approximate. Even with the additional penalties due to the BCH outer codes, these codes perform well. Finally, there is nothing magical about the concatenation of LDPC codes with BCH codes. Comparable performance can be achieved by combining BCH outer codes with other forms of turbo FEC. For example, it was found in [40] that performance superior to the DVB-S2 code can be achieved at rate 1/2 using a well-designed 4-state turbo code concatenated with a BCH code.

C. Wireless Networking Recent years have seen explosive growth in terrestrial wireless networks that are intended to operate over local or metropolitan areas. The primary traffic is data (e.g., Internet access or file transfer) that is less sensitive to delay, but there is a desire to support some delay-sensitive services such as voice over Internet protocol (VoIP). The short propagation delays mean that ARQ may be used. 1) Wi-MAX (IEEE 802.16): The IEEE 802.16 family of standards defines a fixed wireless access system for wireless metropolitan area networks (WMANs) [158]. The intent is to cover a radius of roughly 30 miles (50 km), addressing the Blast mile[ issue by providing broadband connections (i.e., shared bandwidths of up to 70 Mb/s) to more isolated and remote areas. Unlike most standards, 802.16 specifies different physical layers to allow equipment manufacturers to distinguish themselves. The Wi-MAX Forum is a consortium that seeks to focus work on specific configurations to guarantee interoperability of Wi-MAX products in spite of these differences. The 802.16 standards specify three advanced error-correcting codes as optional codes: a block turbo code (i.e., a TPC), a double-binary convolutional turbo code, and an LDPC code. The target WER is 104 .

The TPC option consists of a range of 2-D product codes based on either extended Hamming codes or SPCs. The standard supports (64, 57), (32, 26), (16, 11), or (8, 4) extended Hamming component codes and (64, 63), (32, 31), (16, 15), or (8, 7) SPC component codes. The product codes may be composed of extended Hamming codes in both dimensions, parity checks in both dimensions, or a combination of the two. The resulting blocks may be shortened by deleting rows and/or columns, in whole or in part. Mixing component codes and/or shortening allows the TPC to be matched to a desired information block size. The turbo codes defined in 802.16 are 8-state, doublebinary codes very similar (but not identical) to those standardized for Digital Video BroadcastingVReturn Channel Terrestrial (DVB-RCT) and DVB-RCS (Fig. 3). The interleavers are generated with an algorithm that is very similar to that used by DVB-RCS. The input bits are passed to the encoder in a different order. Also, individual Bsub-blocks[ of each encoder output codeword are individually permuted and then interspersed with one another before puncturing. This appears to function as a channel interleaver. The set of data block sizes supported is different from DVB-RCS and DVB-RCT. The standard calls for 22 modulation and coding sets (MCSs) corresponding to different combinations of data payload size (from 6 to 60 bytes) and code rate. Additional MCSs are supported when hybrid ARQ (H-ARQ) mode is employed, up to 600 bytes (4800 bits). The 802.16 standard family was recently amended to provide support for mobility. Referred to as 802.16e [82], this extension aims to provide shared bandwidths of up to 15 Mb/s to mobile terminals within distances of 1 to 3 miles (1.6 to 4.8 km). Among the new features of 802.16e was the addition of structured block LDPC codes based on quasi-cyclic parity-check matrices. The systematic portion of the parity-check matrices are divided into q q submatrices, which are either all-zero matrices or elementary matrices constructed from cyclic shifts of the identity matrix. They are therefore characterized by: 1) a Bmask,[ defining the positions of these elementary matrices within the parity-check matrix, and 2) a set of Broll[ parameters that specify the shifts on the nonzero submatrices. The parameter q determines the degree of parallelism. The parity section of the matrix has a bidiagonal accumulator structure similar to that of the DVB-S2 LDPC code, allowing very efficient encoding. These LDPC codes also include a form of Btermination,[ i.e., the codes are constructed such that the initial and final groups of bits in each block are guaranteed to be included in more than one equation. This improves flare performance relative to the DVB-S2 LDPC codes, which rely on the BCH outer codes to achieve low flares. The standard supports 19 coded block sizes (from n ¼ 576 to n ¼ 2304) and four code rates (1/2, 2/3, 3/4, and 5/6), where each value of n corresponds to a different Vol. 95, No. 6, June 2007 | Proceedings of the IEEE

1247

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

q. Each combination of block size and code rate requires a distinct parity-check matrix. To reduce memory requirements, a single Bmodel matrix[ is specified for each code rate that determines both the locations of the nonzero submatrices and their circular shifts. Parity-check matrices for different block sizes are then found by adjusting the value of q and recomputing the roll parameters. Example performance results for this LDPC code are shown in Fig. 12. The profiles produced by the WiMax Forum determine the features of the standard that will be necessarily implemented in certified products. The convolutional turbo codes have been included in mobile profiles, based on both 802.16 and 802.16e specifications, and will therefore be mandatory for all certified chipsets. 2) Wi-Fi (IEEE 802.11): The popular IEEE 802.11 family of standards enable wireless local area networks (WLANs) and commonly referred to by the nickname BWi-Fi[ (Bwireless fidelity[). The ever-increasing demand for higher throughput over WLANs motivated the creation of the new 802.11n task group in 2003 whose aim is to define an extension to the standard that would support data rates greater than 100 Mb/s above the media access (MAC) layer. This is a significant increase over the 11 and 54 Mb/s offered by 802.11b and 802.11a/g, respectively. The potential market for improved 802.11 devices is very largeVin addition to the existing computer market, it is hoped that 802.11n chipsets will be introduced into consumer electronics (e.g., televisions and camcorders) and mobile handsets. Advanced error-correcting codes have been proposed as a means of improving the performance of the future 802.11n standard, due to be released in 2007. Specifically, structured block LDPC codes that are very similar to those included in 802.16e have been introduced as an optional code. Three coded block sizes (n ¼ 648; 1296, and 1944) and four code rates (1/2, 2/3, 3/4, 5/6) are supported.

terrestrial return channel for digital television and was adopted in 2001. The forward channel corresponds to the terrestrial broadcasting standard (Digital Video Broadcasting Terrestrial or DVB-T), and together the links constitute a two-way terrestrial communication channel. New services requiring interactivity, allowing real-time interaction with TV programmes and data transfers between the user and the service provider, motivated this standard. Typical applications include Internet access or real-time voting. The DVB-RCT standard covers large areas (up to 65 km) with user throughputs between a few kilobits per second for the largest cells and a few megabits per second for smaller cells. The standard specifies 8-state, double-binary turbo codes very similar to the DVB-RCS and 802.16 turbo codes (Fig. 3). Specifically, DVB-RCT uses the same component codes, tail-biting, puncturing, and interleaver design algorithm as DVB-RCS. However, there are notable differences. The natural code rate for the DVBRCS turbo codes is 1/3, whereas for DVB-RCT turbo codes it is 1/2 (the W parity bits in Fig. 3 are not generated). DVB-RCT only supports code rates of 1/2 and 3/4, fewer than DVB-RCS. The DVB-RCT turbo code supports five block lengths between 18 and 81 bytes, but none of these lengths matches those supported by the DVB-RCS turbo code. 2) Other Standards: Table 13 shows a number of other standards that either call for turbo FEC or are considering including it [160]–[162]. A competitor with DVB-H for the delivery of mobile digital video service is a proprietary system developed by QUALCOMM called Media Forward Link Only (MediaFLO). Though details are scarce, this system is using turbo codes concatenated with Reed– Solomon codes [163].

D. Other Standards

V. ADDITIONAL SYSTEMS AND APPLICATIONS USING T URBO OR TURBO-LIKE CODES

1) Digital Video BroadcastingVReturn Channel Terrestrial (DVB-RCT): The DVB-RCT [159] standard defines a

This section describes additional applications where turbo and turbo-like codes are being used, aside from those using the standards listed above. It also lists private companies

Table 13 Additional Standards Known to Include or be Considering Turbo or Turbo-Like Codes

1248

Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

that are known to have developed expertise or technology related to turbo FEC. Such descriptions cannot hope to be exhaustive. The goal is to enhance the reader’s sense of how widespread the industrial use of turbo and turbo-like codes has become.

A. Broadband Satellite Systems Satellite-based broadband services have the potential to occupy a key and viable niche in the telecommunications market [165]. They can cover a much wider geographical area than cable, ADSL, or wireless local loop (WLL) systems, they do not require the same terrestrial infrastructure (e.g., base stations), and they offer a simpler network topology. At the same time, satellite systems have more stringent limitations than terrestrial systems. The space segment hardware is difficult to upgrade once it is in orbit, bandwidth is less plentiful, and power resources are more limited for a satellite than for a terrestrial base station. These limitations have motivated the use of higher order signalling constellations such as 8-PSK or 16-QAM in order to maintain existing bandwidth requirements while increasing throughput. Turbo FEC facilitates this and also reduces power requirements. Table 14 shows a selection of satellite broadband services that include some form of turbo FEC and shows that turbo and turbo-like codes are in widespread use by this sector of industry. Turbo FEC is also being applied to systems using DOCSIS over satellite even though the standard does not call for it. In addition, both Sirius Satellite Radio and XM Satellite Radio have licensed Raptor codes from Digital Fountain [161]. B. Commercial Activity Table 15 lists numerous companies with significant expertise in turbo or turbo-like codes. Clearly, elements of all of the major classes of turbo FEC have come into the mainstream. The companies listed in Table 15 have either: 1) explicitly described themselves as working with turbo FEC technology; 2) are involved with producing technology that will necessarily include a turbo FEC component (e.g., 3G chipsets); or 3) manufacture products with turbo FEC in mind (e.g., DSP processors with coprocessors aimed at decoder applications). Note that companies

producing turbo decoding devices for W-CDMA or cdma2000 could adapt them to the TD-SCDMA turbo code relatively easily. What this industrial information makes abundantly clear is that turbo and turbo-like codes have moved far beyond the realm of potential or theoretical gains. Industry has come to grips with the technology, learned how to implement it effectively, and is using it to improve commercial communications services. Vendors are producing practical and effective turbo FEC codecs, and service providers are buying and using them. The technology has gone from early proposals to active service in a relatively short time.

VI . FUTURE TRENDS A number of trends in the use of turbo and turbo-like codes may be identified. 1) Turbo and turbo-like codes will be widely used for at least the next decade and probably substantially longer. The performance gains are large enough to justify the system resources and design effort required. 2) No single class of turbo or turbo-like codes will dominate in the way that convolutional codes and Viterbi decoding did in the past. Multiple coding solutions are being used now, and robust competition between different options will continue. 3) Additional performance improvements will be modest. This is clear from the fact that it is already possible to achieve performance that is less than 1 dB from the capacity limit on the AWGN channel, even for relatively small block lengths. 4) Substantial improvements in computational efficiency and reductions in unit costs are still possible. More experience in implementation and a growing number of commercial solutions will make the technology more and more economical. 5) An increasing number of well-designed, practical LDPC codes are now available. This growth in popularity will likely continue, with a wider array of good LDPC codes finding their way into a wider range of standards.

Table 14 Satellite Broadband Services Using Turbo or Turbo-Like Codes

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE

1249

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

Table 15 Sampling of Companies With Expertise in Turbo and Turbo-Like Codes

6)

1250

An increasing number of turbo and turbo-like codes will be tailored to different channel conditions and system designs. One important example of this will be an increasing interest in jointly optimizing the design of turbo FEC and space-time Proceedings of the IEEE | Vol. 95, No. 6, June 2007

7)

block codes (STBCs) for multiple-input multipleoutput (MIMO) systems. Turbo and turbo-like codes will be used in a wider range of applications. Their use is already widespread in wireless telecommunications, but

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

8)

they are also being applied to wired systems and magnetic storage media. The success of turbo FEC has led to the application of soft iterative decoding techniques beyond channel coding. These efforts have met with considerable success, for example in iterative demodulation and decoding [166]–[168]. Broader application of the principles involved in turbo FEC is expected.

VII. CONCLUSION The advent of turbo and turbo-like codes has shown that excellent performance, closely approaching the ultimate Shannon capacity limit for an AWGN channel, can be achieved through the soft iterative decoding of composite channel codes. Remarkably, this performance is achieved with a tractable amount of computation. Implementing and using turbo and turbo-like codes in real systems does present challenges, but tremendous progress in addressing these issues has been made, and all varieties of turbo FEC are finding application. Evidence of this may be found in the widespread use of turbo and turbo-like codes in standards and commercial systems. Turbo and turbo-like codes are no longer a curiosity or novelty, but a powerful REFERENCES [1] C. E. Shannon, BA mathematical theory of communication (Part 1),[ Bell Syst. Tech. J., vol. 27, pp. 379–423, Jul. 1948. [2] C. E. Shannon, BA mathematical theory of communication (Part 2),[ Bell Syst. Tech. J., vol. 27, pp. 623–656, Oct. 1948. [3] R. Gallager, BLow-density parity-check codes,[ IRE Trans. Inform. Theory, vol. 8, no. 1, pp. 21–28, Jan. 1962. [4] R. G. Gallager, BLow-density parity-check codes,[ Ph.D. dissertation, Massachusetts Inst. Technology (MIT), Cambridge, MA, 1963. [5] S. Lin and D. J. Costello, Jr., Error Control Coding, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 2004. [6] R. M. Tanner, BA recursive approach to low complexity codes,[ IEEE Trans. Inform. Theory, vol. 27, no. 5, pp. 533–547, Sep. 1981. [7] G. Battail, BBuilding long codes by combination of simple ones, thanks to weighted-output decoding,[ in Proc. URSI ISSSE 1989, Erlangen, Germany, Sep. 1989, pp. 634–637. [8] J. Lodge, P. Hoeher, and J. Hagenauer, BThe decoding of multidimensional codes using separable MAP Ffilters_,[ in Proc. 16th Biennial Symp. Communications, Kingston, Canada, May 27–29, 1992, pp. 343–346, Queen’s Univ. [9] J. Lodge, R. Young, P. Hoeher, and J. Hagenauer, BSeparable MAP Ffilters_ for the decoding of product and concatenated codes,[ in Proc. Int. Conf. Communications (ICC ’93), Geneva, Switzerland, May 23–26, 1993, pp. 1740–1745. [10] C. Berrou, A. Glavieux, and P. Thitimajshima, BNear Shannon limit error-correcting coding and decoding:

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18] [19] [20] [21]

tool for improving the performance of communications systems. Their excellent and achievable performance means that turbo and turbo-like codes are expected to be in use for some time and to be applied to a wider and wider range of applications. h

Acknowledgment K. Gracie would like to thank his friends and colleagues J. Lodge, S. Crozier, P. Guinand, A. Hunt, R. Kerr, and Y. Ould Cheikh Mouhamedou for their many insights and helpful comments during the preparation and refinement of this paper. It would not have been possible without their guidance and support. Specifically, thanks to Dr. Lodge for his background knowledge, thoughts on complexity, and for pointing out key references, Dr. Crozier for the distance results in Section II-E and wording suggestions, Dr. Guinand for his input on LDPC codes, and Mr. Hunt for discussions on implementation. M.-H. Hamon would like to thank her colleagues J.-B. Dore´, P. Pe´nard, and P. Siohan for their valuable inputs and comments (particularly the IEEE 802.16e LDPC results), P. Gelpi and S. Raes for their encouragement, and J.-C. Carlach and P. Tortelier for interesting discussions on coding issues.

Turbo-codes,[ in Proc. Int. Conf. Communications (ICC ’93), Geneva, Switzerland, May 23–26, 1993, pp. 1064–1070. C. Berrou and A. Glavieux, BNear optimum error correcting coding and decoding: Turbo-codes,[ IEEE Trans. Commun., vol. 44, no. 10, pp. 1261–1271, Oct. 1996. R. Pyndiah, A. Glavieux, A. Picart, and S. Jacq, BNear optimum decoding of product codes,[ in Proc. Global Telecommunications Conf. (Globecom’94), San Francisco, CA, Nov. 28–Dec. 2, 1994, pp. 339–343. R. Pyndiah, BNear-optimum decoding of product codes: Block turbo codes,[ IEEE Trans. Commun., vol. 46, no. 8, pp. 1003–1010, Aug. 1998. N. Wiberg, H.-A. Loeliger, and R. Ko¨tter, BCodes and iterative decoding on general graphs,[ Eur. Trans. Telecommun., vol. 6, no. 5, pp. 513–525, Sep./Oct. 1995. N. Wiberg, BCodes and decoding on general graphs,[ Ph.D. dissertation, Univ. Linko¨ping, Linko¨ping, Sweden, 1996. D. J. MacKay and R. M. Neal, BNear Shannon limit performance of low-density parity-check codes,[ IEE Electron. Lett., vol. 32, no. 18, pp. 1261–1271, Aug. 1996. T. Richardson and R. Urbanke, BThe renaissance of Gallager’s low-density parity-check codes,[ IEEE Commun. Mag., vol. 41, no. 8, pp. 126–131, Aug. 2003. Not the Usual Channels, The Economist, Jul. 1, 2004. E. Klarreich, BPushing the limit,[ Science News Online, vol. 168, no. 19, p. 296, Nov. 5, 2005. D. Mackenzie, BTake it to the limit,[ New Scientist, pp. 38–41, Jul. 9, 2005. G. D. Forney, Concatenated Codes. Cambridge, MA: MIT Press, 1966.

[22] A. J. Viterbi and J. K. Omura, Principles of Digital Communications and Coding. New York: McGraw-Hill, 1979. [23] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann, 1988. [24] R. J. McEliece, D. J. C. MacKay, and J.-F. Cheng, BTurbo decoding as an instance of Pearl’s Fbelief propagation_ algorithm,[ IEEE J. Select. Areas Commun., vol. 16, no. 2, pp. 140–152, Feb. 1998. [25] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, BOptimal decoding of linear codes for minimizing symbol error rate,[ IEEE Trans. Inform. Theory, vol. IT-20, pp. 284–287, Mar. 1974. [26] J. Erfanian, S. Pasupathy, and G. Gulak, BReduced complexity symbol detectors with parallel structures on ISI channels,[ IEEE Trans. Commun., vol. 42, no. 2/3/4, pp. 1661–1671, Feb./Mar./Apr. 1994. [27] J. Hagenauer and P. Hoeher, BA Viterbi algorithm with soft-decision outputs and its applications,[ in Proc. Global Telecommunications Conf. (Globecom’89), Dallas, TX, Nov. 1989, pp. 1680–1686. [28] M. C. Valenti and J. Sun, BThe UMTS turbo code and an efficient decoder implementation suitable for softwaredefined radios,[ Proc. Int. J. Wireless Information Networks, vol. 8, no. 4, pp. 203–215, Oct. 2001. [29] P. Robertson, E. Villebrun, and P. Hoeher, BA comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain,[ in Proc. Int. Conf. Communications (ICC ’95), Seattle, WA, Jun. 1995, pp. 1009–1013. [30] P. Robertson, P. Hoeher, and E. Villebrun, BOptimal and suboptimal maximum a posteriori algorithms suitable for turbo

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE

1251

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

1252

decoding,[ Eur. Trans. Telecommun., vol. 8, pp. 119–125, Mar.–Apr. 1997. K. Gracie, S. Crozier, and A. Hunt, BPerformance of a low-complexity turbo decoder with a simple early stopping criterion implemented on a SHARC processor,[ in Proc. Int. Mobile Satellite Conf. (IMSC’99), Ottawa, ON, Canada, Jun. 16–19, 1999, pp. 281–286. J. Vogt and A. Finger, BImproving the max-log-MAP turbo decoder,[ IEE Electron. Lett., vol. 36, pp. 1937–1939, Nov. 2000. D. Divsalar and F. Pollara, BMultiple turbo codes,[ in Proc. Military Commun. Conf. (MILCOM ’95), San Diego, CA, Nov. 5–8, 1995, pp. 279–285. S. Crozier and P. Guinand, BDistance upper bounds and true minimum distance results for Turbo-codes designed with DRP interleavers,[ in Proc. Third Int. Symp. Turbo Codes and Related Topics, Brest, France, Sep. 1–5, 2003, Ecole Nationale Supe´rieure des Te´le´communications de Bretagne. C. Berrou, Y. Saouter, C. Douillard, S. Keroue´dan, and M. Je´ze´quel, BDesigning good permutations for turbo codes: Towards a single model,[ in Proc. IEEE Int. Conf. Commun. (ICC 2004), Paris, France, Jun. 20–24, 2004. C. Berrou and M. Je´ze´quel, BNon binary convolutional codes for turbo coding,[ Electron. Lett., vol. 35, no. 1, pp. 39–40, Jan. 1999. C. Berrou, M. Je´ze´quel, C. Douillard, and S. Keroue´dan, BThe advantages of non-binary turbo codes,[ in Proc. IEEE Information Theory Workshop, Cairns, Australia, Sep. 2–7, 2001, pp. 61–63. P. Jung and J. Plechinger, BPerformance of rate compatible punctured turbo-codes for mobile radio applications,[ IEE Electron. Lett., vol. 33, no. 25, pp. 2102–2103, Dec. 1997. I. Land and P. Hoeher, BPartially systematic rate 1/2 turbo codes,[ in Proc. Second Int. Symp. Turbo Codes and Related Topics, Brest, France, Sep. 4–7, 2000, pp. 287–290. R. Kerr, K. Gracie, and S. Crozier, BPerformance of a 4-state turbo code with data puncturing and a BCH outer code,[ in Proc. 23rd Biennial Symp. Communications, Kingston, Canada, May 29–Jun. 1, 2006, Queen’s Univ. K. Gracie and S. Crozier, BImproving the performance of 4-state turbo codes with the correction impulse method and data puncturing,[ in Proc. 23rd Biennial Symp. Commun., Kingston, Canada, May 29–Jun. 1, 2006, Queen’s Univ. I. A. Chatzigeorgiou, M. R. D. Rodrigues, I. J. Wassell, and R. Carrasco, BCan punctured rate-1/2 turbo codes achieve a lower error floor than their rate-1/3 parent codes?[ in Proc. IEEE Information Theory Workshop, Chengdu, China, Oct. 22–26, 2006. D. Costello, Jr., A. Banerjee, F. Vatta, and B. Scanavino, BOn the convergence of nonsystematic turbo codes,[ in Proc. 15th Int. Symp. Mathematical Theory of Networks and Systems, South Bend, IN, Aug. 12–16, 2002, Univ. Notre Dame. S. Crozier, P. Guinand, and A. Hunt, BOn designing turbo-codes with data puncturing,[ in Proc. 9th Canadian Workshop Information Theory (CWIT 2005), Montre´al, QC, Canada, Jun. 5–8, 2005. G. P. Calzolari, M. Chiani, F. Chiaraluce, and R. Garello, BSome remarks about a possible

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

new standard for telemetry channel coding with high spectral efficiency,[ in Proc. Space Operations Conf. (SpaceOps 2002), Houston, TX, Oct. 2002. F. Chiaraluce and R. Garello, BDesign and comparison of turbo codes under frame-length and code-rate constraints,[ Int. J. Satellite Communications and Networking, vol. 24, no. 3, pp. 241–259, May–Jun. 2006. Turbo8VAn 8-State Turbo Code Design and Simulation Tool. (Jun. 2006). Communications Research Centre Canada. [Online]. Available: www.crc.ca/fec P. Elias, BError-free coding,[ IEEE Trans. Inform. Theory, vol. IT-4, pp. 29–37, Sep. 1954. J. Hagenauer, E. Offer, and L. Papke, BIterative decoding of binary block and convolutional codes,[ IEEE Trans. Inform. Theory, vol. 42, no. 2, pp. 429–445, Mar. 1996. J. Fang, F. Buda, and E. Lemois, BTurbo product code: A well suitable solution to wireless packet transmission for very low error rates,[ in Proc. 2nd Int. Symp. Turbo Codes and Related Topics, Brest, France, Sep. 2000, pp. 101–111. J. Lodge and R. Kerr, BVector soft-in-soft-out decoding of linear block codes,[ in Proc. 22nd Biennial Symp. Communications, Kingston, Canada, May 31–Jun. 3, 2004, pp. 373–375, Queen’s Univ. R. Kerr and J. Lodge, BNear ML performance for linear block codes using an iterative vector SISO decoder,[ in Proc. 4th Int. Symp. Turbo Codes and Related Topics, Munich, Germany, Apr. 3–7, 2006. D. Chase, BA class of algorithms for decoding block codes with channel measurements information,[ IEEE Trans. Inform. Theory, vol. IT-18, no. 1, pp. 170–182, Jan. 1972. S. A. Hirst, B. Honary, and G. Markarian, BFast chase algorithm with an application in turbo decoding,[ IEEE Trans. Commun., vol. 49, no. 10, pp. 1693–1699, Oct. 2001. D. J. C. MacKay, S. T. Wilson, and M. C. Davey, BComparison of constructions of irregular Gallager codes,[ IEEE Trans. Commun., vol. 6, no. 4, pp. 1449–1454, Oct. 1999. M. G. Luby, M. Mitzenmacher, M. A. Shokrollahi, and D. A. Spielman, BImproved low-density parity-check codes using irregular graphs and belief propagation,[ in Proc. IEEE Int. Symp. Information Theory, Allerton, IL, Aug. 16–21, 1998, p. 117. D. J. C. MacKay, BGood error-correcting codes based on sparse matrices,[ IEEE Trans. Inform. Theory, vol. 45, no. 2, pp. 399–431, Mar. 1999. T. J. Richardson and R. L. Urbanke, BEfficient encoding of low-density parity-check codes,[ IEEE Trans. Inform. Theory, vol. 47, pp. 638–656, Feb. 2001. T. Richardson and R. Urbanke, BThe capacity of low-density parity-check codes under message-passing decoding,[ IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 599–618, Feb. 2001. T. Richardson, M. A. Shokrollahi, and R. Urbanke, BDesign of capacity-approaching irregular low-density parity-check codes,[ IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 619–637, Feb. 2001. S.-Y. Chung, G. D. Forney, T. J. Richardson, and R. Urbanke, BOn the design of low-density parity-check codes within 0.0045 dB of the Shannon limit,[ IEEE

Proceedings of the IEEE | Vol. 95, No. 6, June 2007

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

Commun. Lett., vol. 5, no. 2, pp. 58–60, Feb. 2001. M. Eroz, F.-W. Sun, and L.-N. Lee, BDVB-S2 low density parity check codes with near Shannon limit performance,[ Int. J. Satellite Communications Networking, vol. 22, no. 3, pp. 269–279, May–Jun. 2004. D. E. Hocevar, BEfficient encoding for a family of quasi-cyclic LDPC codes,[ in Proc. Global Telecommunications Conf. (Globecom 2003), San Francisco, CA, Dec. 2003, pp. 3996–4000. M. P. C. Fossorier, BQuasi-cyclic low-density parity-check codes from circulant permutation matrices,[ IEEE Trans. Inform. Theory, vol. 50, no. 8, pp. 1788–1793, Aug. 2004. R. Echard and S.-C. Chang, BDeterministic -rotation low-density parity-check codes,[ IEEE Commun. Lett., vol. 9, no. 5, pp. 447–449, May 2005. J.-B. Dore´, M.-H. Hamon, and P. Pe´nard, BA structured LDPC code construction for efficient encoder design,[ in Proc. IEEE Int. Conf. Communications (ICC 2006), Istanbul, Turkey, Jun. 11–15, 2006. J.-B. Dore´, M.-H. Hamon, and P. Pe´nard, BDesign and decoding of a serial concatenated code structure based on quasi-cyclic LDPC codes,[ in Proc. 4th Int. Symp. Turbo Codes and Related Topics, Munich, Germany, Apr. 3–7, 2006. J. Zhang and M. P. C. Fossorier, BA modified weighted bit-flipping decoding of low-density parity-check codes,[ IEEE Commun. Lett., vol. 8, no. 3, pp. 165–167, Mar. 2004. F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, BFactor graphs and the sum-product algorithm,[ IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 498–519, Feb. 2001. M. P. C. Fossorier, M. Mihaljevic, and H. Imai, BReduced complexity iterative decoding of low-density parity-check codes based on belief propagation,[ IEEE Trans. Commun., vol. 47, pp. 673–680, May 1999. A. Hunt, BHyper-Codes: High-Performance low-complexity error-correcting codes,[ Master’s thesis, Carleton Univ., Ottawa, Canada, 1998. J. Chen and M. P. C. Fossorier, BNear-optimum universal belief propagation based decoding of low-density parity-check codes,[ IEEE Trans. Commun., vol. 50, no. 3, pp. 406–414, Mar. 2002. N. Kim and H. Park, BModified UMP-BP decoding algorithm based on mean square error,[ IEEE Electron. Lett., vol. 40, no. 13, pp. 816–817, Jun. 2004. F. Guillou, E. Boutillon, and J.-L. Danger, B-min decoding algorithm of regular and irregular LDPC codes,[ in Proc. 3rd Int. Symp. Turbo Codes and Related Topics, Brest, France, Sep. 1–5, 2003, Ecole Nationale Supe´rieure des Te´le´communications de Bretagne. C. Jones, E. Valles, M. Smith, and J. Villasenor, BApproximate-min constraint node updating for LDPC code decoding,[ in Proc. Military Communications Conf. (MILCOM 2003), Boston, MA, Oct. 13–16, 2003, pp. 158–162. S. Howard, C. Schegel, and V. C. Gaudet, BDegree-matched check node decoding for regular and irregular LDPCs,[ IEEE Trans. Circuits Syst.VII: Express Briefs, vol. 53, no. 10, pp. 1054–1058, Oct. 2006. M. M. Mansour and N. R. Shanbhag, BHigh-throughput LDPC decoders,[ IEEE

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

Trans. Very Large Scale Integrated Systems, vol. 11, no. 6, pp. 976–996, Dec. 2003. J. Zhang and M. P. C. Fossorier, BShuffled iterative decoding,[ IEEE Trans. Commun., vol. 53, no. 2, pp. 209–213, Feb. 2005. H. Xiao and A. H. Banihashemi, BGraph-based message-passing schedules for decoding LDPC codes,[ IEEE Trans. Commun., vol. 52, no. 12, pp. 2098–2105, Dec. 2004. Vector-LDPC for mobile broadband communications, Flarion Technologies, Nov. 2003. A. Morello and V. Mignone, BDVB-S2: The second generation standard for satellite broad-band services,[ Proc. IEEE, vol. 94, no. 1, pp. 210–227, Jan. 2006. Amendment to IEEE Standard for Local and Metropolitan Area NetworksVPart 16: Air Interface for Fixed Broadband Wireless Access SystemsVPhysical and Medium Access Control Layers for Combined Fixed and Mobile Operation in Licensed Bands, Inst. Electrical and Electronics Engineers (IEEE), Std. IEEE Project P802.16e, 2005. K. Chugg, P. Thiennviboon, G. Dimou, P. Gray, and J. Melzer, BA new class of turbo-like codes with universally good performance and high-speed decoding,[ in Proc. Military Communications Conf. (MILCOM 2005), Atlantic City, NJ, Oct. 17–20, 2005. J. Lodge, A. Hunt, and P. Guinand, BHigh code rate iteratively-decodable fec codes for applications requiring low packet error rates,[ in Proc. 2nd Int. Symp. Turbo Codes and Related Topics, Ecole Nationale Supe´rieure des Te´le´communications de Bretagne, Brest, France, Sep. 4–7, 2000, pp. 117–120. D. Divsalar, H. Jin, and R. McEliece, BCoding theorems for FTurbo-like_ codes,[ in Proc. 36th Annu. Allerton Conf. Communications, Control, and Computing, Allerton, IL, Sep. 23–25, 1998, pp. 201–210. H. Jin, A. Khandekar, and R. McEliece, BIrregular repeat-accumulate codes,[ in Proc. 2nd Int. Symp. Turbo Codes and Related Topics, Brest, France, Sep. 4–7, 2000, pp. 1–8. A. J. Felstro¨m and K. S. Zigangirov, BTime-varying periodic convolutional codes with low-density parity-check matrices,[ IEEE Trans. Inform. Theory, vol. 45, no. 6, pp. 2181–2191, Sep. 1999. R. M. Tanner, D. Sridhara, A. Sridharan, T. E. Fuja, and D. J. Costello, Jr., BLDPC block and convolutional codes based on circulant matrices,[ IEEE Trans. Inform. Theory, vol. 50, no. 12, p. 29 662 984, Dec. 2004. D. J. Costello, Jr., A. E. Pusane, S. Bates, and K. S. Zigangirov, BA comparison between LDPC block and convolutional codes,[ in Proc. Workshop Information Theory Its Applications, Univ. California, San Diego, La Jolla, CA, Feb. 6–10, 2006. J. Boutros, O. Pothier, and G. Zemor, BGeneralized low density (Tanner) codes,[ in Proc. Int. Conf. Communications (ICC ’99), Vancouver, BC, Canada, Jun. 9–10, 1999, pp. 441–445. N. Miladinovic and M. Fossorier, BGeneralized LDPC codes with Reed-Solomon and BCH codes as component codes for binary channels,[ in Proc. Global Telecommunications Conf. (Globecom 2005), St. Louis, MO, Nov. 28–Dec. 2, 2005. I. Andriyanova, J.-P. Tillich, and J.-C. Carlach, BA new family of codes with high iterative decoding performances,[ in Proc. IEEE Int. Conf. Communcations (ICC 2006), Istanbul, Turkey, Jun. 11–15, 2006.

[93] C. Schlegel, Trellis and Turbo Coding. New York: IEEE/Wiley, 2005. [94] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, BSerial concatenation of interleaved codes: Performance analysis, design, and iterative decoding,[ IEEE Trans. Inform. Theory, vol. 44, no. 3, pp. 909–926, May 1998. [95] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, BAnalysis, design and iterative decoding of double serially concatenated codes with interleavers,[ IEEE J. Select. Areas Commun., vol. 16, no. 2, pp. 231–244, Feb. 1998. [96] S. Benedetto, R. Garello, G. Montorsi, C. Berrou, C. Douillard, D. Giancristofaro, A. Ginesi, L. Giugno, and M. Luise, BMHOMS: High-speed ACM modem for satellite applications,[ IEEE Wireless Commun., vol. 12, no. 2, pp. 66–77, Apr. 2005. [97] P. Moqvist and T. M. Aulin, BSerially concatenated continuous phase modulation with iterative decoding,[ IEEE Trans. Commun., vol. 49, no. 11, pp. 1901–1915, Nov. 2001. [98] S. ten Brink, BConvergence of iterative decoding,[ IEE Electron. Lett., vol. 35, no. 13, pp. 1117–1119, Jun. 1999. [99] FlexiCodes: A highly flexible FEC solution. (2004, Apr. 6). TrellisWare Technologies, Inc. [Online]. Available: www. trellisware.com [100] L. Perez, J. Seghers, and D. Costello, Jr., BA distance spectrum interpretation of turbo codes,[ IEEE Trans. Inform. Theory, vol. 42, no. 6, pp. 1698–1709, Nov. 1996. [101] P. Guinand and J. Lodge, BTrellis termination for turbo encoders,[ in Proc. 17th Biennial Symp. Communications, Kingston, Canada, May 30–Jun. 1, 1994, pp. 389–392, Queen’s Univ. [102] S. Crozier, P. Guinand, J. Lodge, and A. Hunt, BConstruction and performance of new tail-biting turbo codes,[ in Proc. 6th Int. Workshop on Digital Signal Processing Techniques for Space Applications (DSP ’98), Noordwijk, The Netherlands, Sep. 23–25, 1998, ESTEC. [103] C. Berrou and M. Je´ze´quel, BFrame-oriented convolutional turbo codes,[ IEE Electron. Lett., vol. 32, no. 15, pp. 1362–1364, Jul. 1996. [104] K. Gracie, S. Crozier, and P. Guinand, BPerformance of an MLSE-based early stopping technique for turbo codes,[ in Proc. 60th IEEE Vehicular Technology Conf. 2004VFall (VTC 2004VFall), Los Angeles, CA, Sep. 26–29, 2004. [105] S. ten Brink, BConvergence behaviour of iteratively decoded parallel concatenated codes,[ IEEE Trans. Commun., vol. 49, no. 10, pp. 1727–1737, Oct. 2001. [106] D. Divsalar, S. Dolinar, and F. Pollara, BIterative turbo decoder analysis based on density evolution,[ IEEE J. Select. Areas Commun., vol. 19, no. 5, pp. 891–907, May 2001. [107] S. Dolinar, D. Divsalar, and F. Pollara, BCode performance as a function of block size,[ Jet Propulsion Laboratory, Tech. Rep. TMO Progress Rep. 42-133, May 15, 1998. [108] C. Schlegel and L. Pe´rez, BOn error bounds and turbo-codes,[ IEEE Commun. Lett., vol. 3, no. 7, pp. 205–207, Jul. 1999. [109] R. Garello, P. Pierleoni, and S. Benedetto, BComputing the free distance of turbo codes and serially concatenated codes with interleavers: Algorithms and applications,[ IEEE J. Select. Areas Commun., vol. 19, no. 5, pp. 800–812, May 2001.

[110] S. Crozier, P. Guinand, and A. Hunt, BEstimating the minimum distance of turbo-codes using double and triple impulse methods,[ IEEE Commun. Lett., vol. 9, no. 7, pp. 631–633, Jul. 2005. [111] S. Crozier, P. Guinand, and A. Hunt, BEstimating the minimum distance of large-block turbo codes using iterative multiple-impulse methods,[ in Proc. 4th Int. Symp. Turbo Codes and Related Topics, Munich, Germany, Apr. 3–7, 2006. [112] S. Crozier and P. Guinand, BDistance upper bounds and true minimum distance results for Turbo-codes designed with DRP interleavers,[ Annals Telecommun., Special Issue on Turbo Codes, vol. 60, no. 1–2, pp. 10–28, Jan./Feb. 2005. [113] E. Rosnes and O. Ytrehus, BAn efficient algorithm for tailbiting turbo code weight distribution calculation,[ in Proc. 3rd Int. Symp. Turbo Codes and Related Topics, Ecole Nationale Supe´rieure des Te´le´communications de Bretagne, Brest, France, Sep. 1–5, 2003. [114] R. Garello, F. Chiaraluce, P. Pierleoni, M. Scaloni, and S. Benedetto, BOn error floor and free distance of turbo codes,[ in Proc. IEEE Int. Conf. Communications (ICC 2001), Helsinki, Finland, Jun. 11–15, 2001, pp. 45–49. [115] J. D. Andersen, BTurbo codes extended with outer BCH codes,[ IEE Electron. Lett., vol. 32, no. 22, pp. 2059–2060, Oct. 1996. [116] O. Y. Takeshita, O. M. Collins, P. C. Massey, and D. J. Costello, Jr., BOn the frame-error rate of concatenated turbo codes,[ IEEE Trans. Commun., vol. 49, no. 4, pp. 602–608, Apr. 2001. [117] K. R. Narayanan and G. L. Stu¨ber, BSelective serial concatenation of turbo codes,[ IEEE Commun. Lett., vol. 1, no. 5, pp. 136–139, Sep. 1997. [118] H. Pishro-Nik and F. Fekri, BImproved decoding algorithms for low-density parity-check codes,[ in Proc. 3rd Int. Symp. Turbo Codes and Related Topics, Ecole Nationale Supe´rieure des Te´le´communications de Bretagne, Brest, France, Sep. 1–5, 2003. [119] E. Papagiannis, M. A. Ambroze, and M. Tomlinson, BApproaching the ML performance with iterative decoding,[ in Proc. Int. Zurich Seminar Communications (IZS 2004), Zurich, Switzerland, Feb. 18–20, 2004, pp. 220–223. [120] Y. Ould-Cheikh-Mouhamedou, S. Crozier, K. Gracie, P. Guinand, and P. Kabal, BA method for lowering Turbo code error flare using correction impulses and repeated decoding,[ in Proc. 4th Int. Symp. Turbo Codes and Related Topics, Munich, Germany, Apr. 3–7, 2006. [121] A. Matache, S. Dolinar, and F. Pollara, BStopping rules for turbo decoders,[ Jet Propulsion Laboratory, Tech. Rep. TMO Progress Rep. 42-142, Aug. 15, 2000. [122] Z. Ma, W. H. Mow, and P. Fan, BOn the complexity reduction of turbo decoding for wideband CDMA,[ IEEE Trans. Wireless Commun., vol. 4, no. 1, pp. 353–356, Mar. 2005. [123] J. G. Harrison, BImplementation of a 3GPP turbo decoder on a programmable DSP core,[ in Conf. Commun. Des., San Jose, CA, Oct. 2, 2001. [124] Communications Research Centre Canada. [Online]. Available: www.crc.ca/fec [125] K. Andrews, V. Stanton, S. Dolinar, V. Chen, J. Berner, and F. Pollara, BTurbo-decoder implementation for the deep space

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE

1253

Gracie and Hamon: Turbo and Turbo-Like Codes: Principles and Applications in Telecommunications

[126]

[127]

[128]

[129]

[130]

[131]

[132]

[133]

[134]

[135]

[136]

[137]

[138]

network,[ Jet Propulsion Laboratory, Tech. Rep. IPN Progress Rep. 42-148, Feb. 15, 2002. Using the turbo code coprocessor (TCOP) of the MSC8126 DSP device, Jul. 2004, Freescale Semiconductor Application Note. BDecoding convolutional and turbo codes in 3G wireless,[ Jan. 2005, Texas Instruments Application Rep. P. Hoeher, BOptimal subblock-by-sublock detection,[ IEEE Trans. Commun., vol. 43, no. 2/3/4, pp. 714–717, Feb./Mar./Apr. 1995. A. Hunt, S. Crozier, M. Richards, and K. Gracie, BPerformance degradation as a function of overlap depth when using sub-block processing in the decoding of turbo codes,[ in Proc. Int. Mobile Satellite Conf. (IMSC’99), Ottawa, ON, Canada, Jun. 16–19, 1999, pp. 276–280. G. Masera, G. Piccinini, M. R. Roch, and M. Zamboni, BVLSI architectures for turbo codes,[ IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 7, no. 3, pp. 369–379, Jun. 1999. A. Nimbalker, T. K. Blankenship, B. Classon, T. E. Fuja, and D. J. Costello, Jr., BInter-window shuffle interleavers for high throughput turbo decoding,[ in Proc. 3rd Int. Symp. Turbo Codes and Related Topics, Ecole Nationale Supe´rieure des Te´le´communications de Bretagne, Brest, France, Sep. 1–5, 2003. R. Dobkin, M. Peleg, and R. Ginosar, BParallel interleaver design and VLSI architecture for low-latency MAP decoders,[ IEEE Trans. Very Large Scale Integration (VLSI) Syst., vol. 13, no. 4, pp. 427–438, Apr. 2005. Technical Specification Group Radio Access Network; Multiplexing and Channel Coding (FDD) (Release 5), 3rd Generation Partnership Project Std. 3GPP TS 25.212 v5.4.0, 2003. Physical Layer Standard for cdma2000 Spread Spectrum SystemsVRelease C, 3rd Generation Partnership Project 2 Std. 3GPP2 C.S0002-C v1.0, 2002. Recommendation for Space Data System Standards: Telemetry Channel Coding, Std. 101.0-B-6, Consultative Committee for Space Data Systems (CCSDS), 2002, Blue Book. Digital Video Broadcasting (DVB): Interaction Channel for Satellite Distribution Systems, Eur. Telecommun. Standards Institute Std. ETSI EN 301 790 v1.3.1, 2003. S. Crozier, A. Hunt, and J. Lodge, BMethod of enhanced max-log-a posteriori probability processing,[ U.S. Patent 6 145 114, Nov. 2000. I. Chatzigeorgiou, M. R. D. Rodrigues, I. J. Wassell, and R. Carrasco, BPunctured binary turbo-codes with optimized performance,[ in Proc. 62nd Semiannual

[139]

[140]

[141]

[142]

[143]

[144]

[145]

[146]

[147]

[148]

[149]

[150]

[151]

[152]

Vehicular Technol. Conf. (VTC Fall 2005), Dallas, TX, Sep. 25–28, 2005. S. Hong and W. Stark, BPower consumption versus decoding performance relationship of VLSI decoders for low energy wireless communication system design,[ in Proc. IEEE Int. Symp. Electronics, Circuits, and Systems (ISECS ’99), Pafos, Cyprus, Sep. 1999, pp. 1593–1596. G. Masera, M. Mazza, G. Piccinini, F. Viglione, and M. Zamboni, BArchitectural strategies for low-power VLSI turbo decoders,[ IEEE Trans. Very Large-Scale Integration (VLSI) Syst., vol. 10, no. 3, pp. 279–285, Jun. 2002. A. J. Blanksby and C. J. Howland, BA 690-mW 1-Gb/s 1024-bit, rate-1/2 low-density parity-check code decoder,[ IEEE J. Solid-State Circuits, vol. 37, no. 3, pp. 404–412, Mar. 2002. M. M. Mansour and N. R. Shanbhag, BA 640-Mb/s 2048-bit programmable LDPC decoder chip,[ IEEE J. Solid-State Circuits, vol. 41, no. 3, pp. 684–698, Mar. 2006. K. Gracie, A. Hunt, and S. Crozier, BPerformance of turbo codes using MLSE-based early stopping and path ambiguity checking for inputs quantized to 4 bits,[ in Proc. 4th Int. Symp. Turbo Codes Related Topics, Munich, Germany, Apr. 3–7, 2006. Spectra Licensing Group, TCLP worldwide agent. [Online]. Available: www. spectralicensing.com S. Crozier and P. Guinand, BHigh-performance low-memory interleaver banks for turbo-codes,[ U.S. Patent 6 857 087, Feb. 2005. W. Thesling, BBlock decoding with soft output information,[ U.S. Patent 5 930 272, Jul. 27, 1999. M. Eroz, F.-W. Sun, and L.-N. Lee, BMethod and system for decoding low density parity check (LDPC) codes,[ U.S. Patent Application 20050166133, Jul. 28, 2005. T. Richardson and V. Novichkov, BMethods and apparatus for decoding LDPC codes,[ U.S. Patent 6 633 856, Oct. 2003. CTWS Working Group 1 (WG1), Multiplexing and Channel Coding, China Wireless Telecommunication Standard (CTWS) Group, Std. TS C103 v3.1.0, 2000. B. Li, D. Xie, S. Cheng, J. Chen, P. Zhang, W. Zhu, and B. Li, BRecent advances on TD-SCDMA in China,[ IEEE Commun. Mag., vol. 43, no. 1, pp. 30–37, Jan. 2005. W. Tai, BCross support implementation plan and status: NASA deep space domain,[ NASA, Tech. Rep., Dec. 6, 2004. F. Chiaraluce, E. Gambi, R. Garello, P. Pierleoni, G. P. Calzolari, and E. Vassallo, BOn the new CCSDS standard for space telemetry: Turbo codes and symbol synchronization,[ in Proc. Int. Conf.

[153]

[154]

[155]

[156]

[157]

[158]

[159]

[160]

[161]

[162]

[163] [164]

[165] [166]

[167]

[168]

Communications (ICC 2000), New Orleans, LA, May 28–31, 2000. S. Dolinar, D. Divsalar, and F. Pollara, BTurbo codes and space communications,[ in Proc. Space Operations Conf. (SpaceOps ’98), Tokyo, Japan, Jun. 1–5, 1998. E. Guizzo, BClosing in on the perfect code,[ IEEE Spectrum Mag., vol. 41, no. 3, pp. 36–42, Mar. 2004. CCSDS Gives SMART-1’s data transmissions a turbo boost at the moon, Nov. 24, 2004, CCSDS Press Release. Interaction Channel for Satellite Distribution Systems: Guidelines for Use of en301 790, Jul. 2005, DVB Document A063 Rev. 2. Digital Video Broadcasting (DVB): Second Generation Framing Structure, Channel Coding and Modulation Systems for Broadcasting, Interactive Services, News Gathering and Other Broadband Satellite Applications, Eur. Telecommunication Standards Inst., Std. ETSI EN 302 307 v1.1.1, 2005. IEEE Standard for Local and Metropolitan Area NetworksVPart 16: Air Interface for Fixed Broadband Wireless Access Systems, Inst. Electrical and Electronics Engineers (IEEE), Std. IEEE P802.16-REVd, 2004. Digital Video Broadcasting: Interaction Channel for Digital Terrestrial Television (RCT) Incorporating Multiple Access OFDM, Eur. Telecommun. Standards Inst., Std. ETSI EN 301 958, 2001. ¨ lc¸er, and H. Sadjadpour, E. Eleftheriou, S. O BApplication of capacity-approaching coding techniques to digital subscriber lines,[ IEEE Commun. Mag., vol. 42, no. 4, pp. 88–94, Apr. 2004. DF Raptor FEC for mobile multimedia. (2005, Jun. 21). [Online]. Available: www. digitalfountain.com/solutions/ mobileMultimedia.cfm HomePlug 1.0VTechnology White Paper, HomePlug Powerline Alliance. [Online]. Available: www.homeplug.org BMediaFLO Field Test Rep. (80-t1021-1) rev. b,[ QUALCOMM, Nov. 8, 2006. IP Datacast Over DVB-H: Content Delivery Protocols (CDP), Eur. Telecommun. Standards Inst., Std. DVB Document A101, 2005. C. McLean, BBandwidth hunger hits final frontier,[ Globe and Mail, Jul. 8, 2006. M. Moher, BAn iterative multiuser decoder for near-capacity communications,[ IEEE Trans. Commun., vol. 46, no. 7, pp. 870–880, Jul. 1998. H. V. Poor, BTurbo multiuser detection: A primer,[ IEEE J. Commun. Networks, vol. 3, no. 3, pp. 196–201, Sep. 2001. C. Schlegel, Turbo Code Applications: A Journey From a Paper to Realization, 2nd ed. New York: Springer, 2005.

ABOUT THE AUTHORS Ken Gracie received the B.Sc. and M.Sc. degrees in electrical engineering from Queen’s University, Kingston, Canada, in 1995 and 1997, respectively. He has been a Research Engineer with the Communications and Signal Processing group, Communications Research Centre (CRC), Ottawa, Canada, since 1996, and a licensed Professional Engineer (P.Eng.) in the province of Ontario since 1999. His research interests include modulation, channel coding, synchronization, and signal processing. He also has experience in the efficient implementation of modems and error-control codecs.

1254

Proceedings of the IEEE | Vol. 95, No. 6, June 2007

´ le ` ne Hamon received the Engineering Marie-He ´, Diploma from Ecole Superieure d’Electricite France, and the M.S. degree in electrical engineering from Ohio State University, Columbus, in 2002. She has been working on error-correcting codes in France Telecom’s Division R&D, Rennes, since 2002. Her research interests are iterative techniques for channel coding: turbo codes, and low-density parity check codes. She is interested in the design of codes taking into account implementation constraints and integration into a complete system.