A New Approach for Parallel CRC Generation for High Speed ...

ISSN No: 2309-4893 International Journal of Advanced Engineering and Global Technology Vol-2, Issue-2, February 2014 ,

A New Approach for Parallel CRC Generation for High Speed Application Manu G1, Mohammed Abdul Haque 2 1,2

Department of Electronics and Communications Engineering, Aurora’s Scientific Technological and Research Academy Hyderabad, Andhra Pradesh

coding the number of symbols in the source encoded message is increase in a controlled manner in order to facilitate to basic objectives at the receiver. Error detection and error correction.

Abstract Cyclic redundancy check is commonly used in data communication and other fields such as data storage, data compression. As a vital method for dealing with data errors. Usually the hardware implementation of CRC computations is based on the linear feedback shift registers (LFSRs), which handle the data in a serial way. Though, the serial calculation of the CRC codes cannot achieve a high throughput. In constrant, parallel CRC calculation can significantly increase the throughput of CRC computations.

Error detection and error correction to achieve good communication is also employed in electron devices. It is used to reduce the level of noise and interferences in electronic medium. The amount of error detection and correction required and its effectiveness depends on the signal to noise ratio(SNR).

Varients of CRCs are used in applications like CRC-16BISYNC protocols, CRC32 in Ethernet for error detection, CRC8 in ATM, CRC-CCITT in X25 protocols, disk storage, SDLC, and XMODEM. This paper presents 64 bits parallel CRC architecture based on F matrix with order of generator polynomial is 32. It is hardware efficient and required 50% less cycles to generate CRC with same order of generator polynomial. In this architecture w=64 bits are parallel processed and order of generator polynomial is m=32. If 32 bits are processed parallely then CRC-32 will be generated after (k+m)/w cycles. Where k indicates number of data bit and m indicates the order of generarator polynomial if we increase number of bits to be processed parallely, number of cycles required to calculate CRC can be reduced.

1.

In source coding, the encoder maps the digital generated at the source output into another signal in digital form. The objective is to eliminate or reduce redundancy so as to provide an efficient representation of the source output. Since the source encoder mapping is one-to-one, the source decoder on the other end simply performs the inverse mapping, thereby delivers to the user a reproduction of the original digital source output. The primary benefit thus gained from the application of source coding is a reduced bandwidth requirement.

Introduction

Digital communication system is used to transport an information bearing signal from the source to a user destination via a communication channel. The information signal is processed in a digital communication system to form discrete message which makes the information more reliable for transmission channel coding is any important signal processing operation for the efficient transmission of digital information over the channel. It was introduce by Claude E. Shannon in 1948 by using channel capacity as any important parameter for error free transmission. In channel

In channel coding, the objective for the encoder is to map the incoming digital signal into a channel input and for the decoder is to map the channel output into an output signal in such a way that the effect of channel noise is minimized. That is the combined role of the channel encoder and decoder is to provide for a reliable communication over a noisy channel. This provision is satisfied by introducing redundancy in a prescribed fashion in 418

www.ijaegt.com


the channel encoder and exploiting it in the decoder to construct the original encoder input as accurately as possible. Thus in source coding, redundant bits are removed whereas in channel coding, redundancy is introduced in a controlled manner.

are the types of the error detecting and correcting codes. The general idea for achieving error detection and correction is to add some redundancy (i.e., some extra data) to a message, which receivers can use to check consistency of the delivered message, and to recover data determined to be erroneous. Error-detection and correction schemes can be either systematic or nonsystematic: In a systematic scheme, the transmitter sends the original data, and attaches a fixed number of check bits (or parity data), which are derived from the data bits by some deterministic algorithm. If only error detection is required, a receiver can simply apply the same algorithm to the received data bits and compare its output with the received check bits; if the values do not match, an error has occurred at some point during the transmission. In a system that uses a non-systematic code the original message is transformed into an encoded message that has at least as many bits as the original message.

Then modulation is performed for the efficient transmission of the signal over the channel. Various digital modulation techniques could be applied for modulation such as Amplitude Shift Keying (ASK), Frequency- Shift Keying (FSK) or Phase – Shift Keying (PSK). The addition of redundancy in the coded messages implies the need for increased transmission bandwidth. Moreover, the use of coding adds complexity to the system, especially for the implementation of decoding operations in the receiver. Thus bandwidth and system complexity has to be considered in the design trade – offs in the use of error - control coding to achieve acceptable error performance. Different errors correcting codes can be used depending on the properties of the system and the application in which the error correcting is to be introduced. Generally error – correcting codes have been classified into block codes and convolutional codes. The distinguishing feature for the classification is the presence or absence of memory in the encoders for the two codes. To generate a block code, the incoming information stream is divided into blocks and each block is processed individually by adding redundancy in accordance with a prescribed algorithm. The decoder processes each block individually and corrects errors by exploiting redundancy. Many of the important block codes used for error – detection are cyclic codes. These are also called cyclic redundancy check codes.

If only error detection is required, a receiver can simply apply the same algorithm to the received data bits and compare its output with the received check bits; if the values do not match, an error has occurred at some point during the transmission. In a system that uses a nonsystematic code the original message is transformed into an encoded message that has at least as many bits as the original message. Good error control performance requires the scheme to be selected based on the characteristics of the communication channel. Common channel models include memory-less models where errors occur randomly and with a certain probability, and dynamic models where errors occur primarily in bursts. Consequently, error-detecting and correcting codes can be generally distinguished between random-errordetecting/correcting and burst-errordetecting/correcting. Some codes can also be suitable for a mixture of random errors and burst errors.

In a convolutional code, the encoding operation may be viewed as the discrete – time convolution of the input sequence with the impulse response of the encoder. The duration of the impulse response equals the memory of the encoder. Accordingly, the encoder for a convolutional code operates on the incoming message sequence, using a “sliding window” equal in duration to its own memory. Hence in a convolutional code, unlike a block code where code words are produced on a block- by– block basis, the channel encoder accepts message bits as continuous sequence and thereby generates a continuous sequence of encoded bits at a higher rate.

If the channel capacity cannot be determined, or is highly varying, an error-detection scheme may be combined with a system for retransmissions of erroneous data. This is known as Automatic Repeat Request (ARQ), and is most notably used in the Internet. An alternate approach for error control is hybrid automatic repeat request (HARQ), which is a combination of ARQ and error-correction coding. 2.1. ECC (Error Correcting Codes):

2.Error Correcting Codes :

ECC Stands for “Error Correction Code”. ECC is used to verify data transmissions by locating and correcting transmission errors. It is commonly used by RAM chips that include

This chapter explains about the error correcting codes. And also explains the definitions of the error detection and error correction and what 419

www.ijaegt.com


Forward Error Correction (FEC), which ensures all the data being sent to and from the RAM is transmitted correctly.

communication channels. Many communication channels are subject to channel noise, and thus errors may be introduced during transmission from the source to a receiver. Error detection techniques allow detecting such errors, while error correction enables reconstruction of the original data.

ECC RAM or memory is similar to parity RAM, which includes a parity bit that validates the data being sent. The parity bit is a redundant binary value of 1 or 0 that is sent along with the data. If the parity bit does not match the value of the data it represents, it indicates an error in the transmission and the data may need to be resent. ECC works in a similar way, but uses a more advanced error correction system that can correct data transmission errors on the fly.

The general definitions of the terms are as follows: 2.4.1. Error detection: It is the detection of errors caused by noise or other impairments during transmission from the transmitter to the receiver. 2.4.2. Error correction: It is the detection of errors and reconstruction of the original, error-free data. Error correction may generally be realized in two different ways:

Since ECC memory requires more processing, it can be slower than non-ECC RAM and basic parity RAM. However, ECC RAM provides more reliable data transfers, which results is greater system stability. Therefore, high-end servers and workstations may use ECC memory to minimize crashers and system downtime.

2.4.2.1. Automatic Repeat Request (ARQ): sometimes also referred to as backward error correction. This is an error control technique whereby an error detection scheme is combined with requests for retransmission of erroneous data. Every block of data received is checked using the error detection code used, and if the check fails, retransmission of the data is requested-this may be done repeatedly, until the data can be verified.

2.2. Types of Error-Correcting Codes: A Code that is designed for channel coding .i.e. for encoding information so that a decoder can correct, with a high probability of success, any errors caused in the signal by an intervening noisy channel. Convolutional codes, and in either case are employed in a forward errorcorrection system. The most common error correcting block codes are the Hamming Codes, Bose-Chaudhuri-Hocquenghem (BCH) codes, Reed-Solomon (RS) codes, simplex codes, and the Golay code.

2.4.2.2. Forward Error Correction (FEC): The sender encodes the data using an Error Correcting Code (ECC) prior to transmission. The additional information (redundancy) added by the code is used by the receiver to recover the original data. In general, the reconstructed data is what is deemed the “most likely” original data. ARQ and FEC may be combined, such that minor errors are corrected without retransmission, and major errors are corrected via a request for retransmission: this is called Hybrid Automatic Repeat Request (HARQ).

Since errors may be corrected by detecting them and requesting retransmission, the process of error correction is sometimes taken to include backward error correction systems and hence, error-detecting codes. 2.3. Error-Correction:

2.5. Error Detection Schemes:

The process of errors in data that may have been corrupted during transmission or in storage. Data transmissions are always subject to corruption due to errors, but in video transmissions, error correction needs to deal with the errors but not retransmit the corrupted data. Video errors are corrected using a process of forward error correction in the encoder or through error concealment techniques in the decoder.

Error detection is most commonly realized using a suitable hah function (or checksum algorithm). A hash function adds a fixed-length tag to a message, which enables receivers to verify the delivered message by recomputing the tag and comparing it with the one provided. There exists a vast variety of different hash function designs. However, some are of particularly wide spread use because of either their simplicity or their suitability for detecting certain kinds of errors (e.g., the cyclic redundancy check’s performance in detecting burst errors).

2.4. Error Detection and Correction: In information theory and coding theory with applications in computer science and telecommunication, error detection and correction or error control are techniques that enable reliable delivery of digital data over unreliable

Random-Error-Correcting Codes based on minimum distance coding can provide a suitable alternative to hash functions when a strict guarantee on the minimum number of errors to be 420

www.ijaegt.com


detected is desired. Repetition codes, described below, are special cases of error-correcting codes: although rather inefficient, they find applications for both error correction and detection due to their simplicity.

changes to digital data in computer networks. It is characterized by specification of a so-called generator polynomial, which is used as the divisor in a polynomial long division over a finite field, talking the input data as the dividend, and where the remainder becomes the result. Cyclic codes have favorable properties in that they are well suited for detecting burst errors. CRCs are particularly easy to implement in hardware, and are therefore commonly used in digital networks and storage devices such as hard disk drives. Even parity is a special case of a cyclic redundancy check, where the single-bit CRC is generated by the divisor x+1.

2.5.1. Repetition Codes: A Repetition Code is a coding scheme that repeats the bits across a channel to achieve error-free communication. Given a stream of data to be transmitted, the data is divided into blocks of bits. Each block is transmitted some predetermined number of times. For example, to send the bit pattern “1011”, the four-bit block can be repeated three times, thus producing “1011 1011 1011”, However, if this twelve-bit pattern was received as “1011 1011 1011” – Where the first block is unlike the other two . It can be determined that an error has occurred.

2.6. Error-Correcting Codes: Any error-correcting code can be used for error detection. A code with minimum Hamming distance, d, can detect up to d-1 errors in a code word. Using minimum-distance-based errorcorrecting codes for error detection can be suitable if a strict limit on the minimum number of errors to be detected is desired.

Repetition Codes are not very efficient, and can be susceptible to problems if the error occurs in exactly the same place for each group (e.g., “1010 1010 1010” in the previous example would be detected as correct). The advantage of repetition codes is that they are extremely simple, and are in fact used in some transmissions of numbers stations.

Codes with minimum Hamming distance d=2 are degenerate cases of error-correcting codes, and can be used to detect single errors. The parity bit is an example of a single-error-detecting code

2.5.2. Parity Bits:

The Berger code is an early example of a unidirectional error(-correcting) code that can detect any number of errors on an asymmetric channel, provided that only transitions of cleared bits to set bits or set bits to cleared bits can occur.

A Parity Bit is a bit that is added to a group of source bits to ensure that the number of set bits (i.e., bits with value 1) in the outcome is even or odd. It is a very simple scheme that can be used to detect single or any other odd number (i.e., three, five, etc.) of errors in the output. An even number of flipped bits will make the parity bit appear correct even though the data is erroneous. Extensions and variations on the parity bit mechanism are horizontal redundancy checks, vertical redundancy checks, and “double,” “dual,” or “diagonal” parity.

2.6.1. Automatic Repeat Request: Automatic Repeat Request (ARQ) is an error control method for data transmission that makes use of error-detection codes, acknowledgment and/or negative acknowledgment message, and timeouts to achieve reliable data transmission. An acknowledgment is a message sent bit the receiver to indicate that it has correctly received a data frame. Usually, when the transmitter does not receive the acknowledgment before the timeout occurs (i.e., within a reasonable amount of time after sending the data frame), it retransmits the frame until it is either correctly received or the error persists beyond a predetermined number of retransmissions.Three types of ARQ protocols are Stop-and-Wait ARQ, Go-Back-N ARQ, and Selective Repeat ARQ.

2.5.3. Checksum: A Checksum of a message is a modular arithmetic sum of message code words of a fixed word length (e.g., byte values).The sum may be negated by means of a one’s-complement prior to transmission to detect errors resulting in all-zero messages. Checksum schemes include parity bits, check digits, and longitudinal redundancy checks. Some checksum schemes, such as the Luhn algorithm and the Verhoeff algorithm, are specially designed to detect errors commonly introduced by humans in writing down or remembering identification numbers.

ARQ is appropriate if the communication channel has varying or unknown capacity, such as is the case on the Internet. However, ARQ requires the availability of a back channel, results in possibly increased latency due to retransmissions, and requires the maintenance of buffers and timers for retransmissions, which in the case of network

2.5.4. Cyclic Redundancy Checks (CRCs): A cyclic redundancy check (CRC) is a single-burst-error-detecting cyclic code and nonsecure hash function designed to detect accidental 421

www.ijaegt.com


congestion can put a strain on the server and overall network capacity.

approaches. Messages are always transmitted with FEC parity data (and error-detection redundancy). A receiver decodes a message using the parity information, and requests retransmission using ARQ only if the parity data was not sufficient for successful decoding (identified through a failed integrity check).

2.6.2. Forward Error Correction An Error-Correcting Code(ECC) or Forward Error Correction (FEC) code is a system of adding redundant data, or parity data, to a message , such that it can be recovered by a receiver even when a number of errors (up to the capability of the code being used) were introduced, either during the process of transmission, or on storage. Since the receiver does not have to ask the sender for retransmission of the data, a backchannel is not required in forward error correction, and it is therefore suitable for simplex communication such as broadcasting. ErrorCorrecting codes are frequently used in lowerlayer communication, as well as for reliable storage in media such as CDs, DBDs, hard disks, and RAM. Error-correcting codes are usually distinguished between convolutional codes and block codes. Convolutional codes are processed on a bit-by-bit basis. They are particularly suitable for implementation in hardware, and the Viterbi decoder allows optimal decoding. Block codes are processed on a block-by-block basis. Early examples of block codes are repetition codes, hamming codes and multidimensional parity-check codes. They were followed by a number of efficient codes, Reed-Solomon codes being the most notable due to their current widespread use. Turbo codes and Low-Density Parity-Check Codes (LDPC) are relatively new constructions that can provide almost optimal efficiency.

Messages are transmitted without parity data (only with error-detection information). If a receiver detects an error, it requests FEC information from the transmitter using ARQ, and uses it to reconstruct the original message. The latter approach is particularly attractive on an erasure channel when using a rateless erasure code. 2.6. Applications: Applications that require low latency (such as telephone conversations) cannot use Automatic Repeat Request (ARQ); they must use Forward Error Correction (FEC). By the time an ARQ system discovers an error and retransmits it, the re-sent data will arrive too late to be any good. Applications where the transmitter immediately forgets the information as soon as it is sent (such as most television cameras) cannot use ARQ; they must use FEC because when an error occurs, the original data is no longer available. (This is also why FEC is used in data storage systems such as RAID and distributed data storage). Applications that use ARQ must have a return channel. Applications that have no return channel cannot use ARQ. Applications that require extremely low error rates (such as digital money transfers) must use ARQ.

Shannon’s theorem is an important theorem in forward error correction, and describes the maximum information rate at which reliable communication is possible over a channel that has a certain error probability or Signal-to-Noise Ratio (SNR). This strict upper limit is expressed in terms of the channel capacity. More specifically, the theorem says that there exist codes such that with increasing encoding length the probability of error on a discrete memorylesss channel can be made arbitrarily small, provided that the code rate is smaller than the channel capacity. The code rate is defined as the fraction k/n of k source symbols and n encoded symbols.

2.6.1. The Internet: In a typical TCP/IP stack, error control is performed at multiple levels: •

•

The maximum code rate allowed depends on the error-correcting code used, and may be lower. This is because Shannon’s proof was only of existential nature, and did not show how to construct codes which are both optimal and have efficient encoding and decoding algorithms.

•

•

2.6.3. Hybrid Schemes: Hybrid ARQ is a combination of ARQ and Forward Error Correction. There are two basic 422

Each Ethernet frame carries a CRC-32 checksum. Frames received with incorrect checksums are discarded by the receiver hardware. The IPv4 header contains a checksum protecting the contents of the header. Packets with mismatching checksums are dropped within the network or at the receiver. The checksum was omitted from the IPv6 header in order to minimize processing costs in network routing and because current link layer technology is assumed to provide sufficient error detection. UDP has an optional checksum convering the payload and addressing information from the UDP and IP headers. Packets with incorrect checksums are discarded by www.ijaegt.com


•

the operating system network stack. The checksum is optional under IPv4, only, because the IP layer checksums may already provide the desired level of error protection. TCP provides a checksum for protecting the payload and addressing information from the TCP and IP headers. Packets with incorrect checksums are discarded within the network stack, and eventually get retransmitted using ARQ, either explicitly or implicitly due to a timeout.

2.6.3. Satellite Broadcasting: The demand for satellite transponder bandwidth continues to grow, fueled by the desire to deliver television (including new channels and High Definition TV) and IP data. Transponder availability and bandwidth constraints have limited this growth, because transponder capacity is determined by the selected modulation scheme and Forward error correction (FEC) rate. Overview 

2.6.2. Deep-Space Telecommunications: Development of errorcorrection codes was tightly coupled with the history of deepspace missions due to the extreme dilution of signal power over interplanetary distances, and the limited power availability aboard space probes. Whereas early missions sent their data uncoded, starting from 1968 digital error correction was implemented in the form of convolutional codes and ReedMuller codes. The Reed-Muller code was well suited to the noise the spacecraft was subject to, and was implemented at the Mariner spacecraft for missions between 1969 and 1966.

QPSK coupled with traditional Reed Solomon and Viterbi codes have been used for nearly 20 years for the delivery of digital satellite TV.  Higher order modulation schemes such as 8PSK, 16QAM and 32QAM have enabled the satellite industry to increase transponder efficiency by several orders of magnitude.  This increase in the information rate in a transponder comes at the expense of an increase in the carrier power to meet the threshold requirement for existing antennas.  Tests conducted using the latest chipsets demonstrate that the performance achieved by using Turbo Codes may be even lower than the 0.8 dB figure assumed in early designs. 2.6.4. Data Storage: Error Detection and Correction codes are often used to improve the reliability of data storage media. A "parity track" was present on the first magnetic tape data storage in 1951. The "Optimal Rectangular Code" used in group code recording tapes not only detects but also corrects single-bit errors. Some file formats, particularly archive formats, include a checksum (most often CRC32) to detect corruption and truncation and can employ redundancy and/or parity files to recover portions of corrupted data. Reed Solomon codes are used in compact discs to correct errors caused by scratches. Modern hard drives use CRC codes to detect and ReedSolomon codes to correct minor errors in sector reads, and to recover data from sectors that have "gone bad" and store that data in the spare sectors. RAID systems use a variety of error correction techniques, to correct errors when a hard drive

The Voyager 1 and Voyager 2 missions, which started in 1966, were designed to deliver color imaging amongst scientific information of Jupiter and Saturn. This resulted in increased coding requirements, and thus the spacecrafts were supported by (optimally Viterbi-decoded) convolutional codes that could be concatenated with an outer Golay (24, 12, 8) code. The Voyager 2 probe additionally supported an implementation of a Reed-Solomon code: the concatenated Reed-Solomon-Viterbi (RSV) code allowed for very powerful error correction, and enabled the spacecraft's extended journey to Uranus and Neptune. The CCSDS currently recommends usage of error correction codes with performance similar to the Voyager 2 RSV code as a minimum. Concatenated codes are increasingly falling out of favour with space missions, and are replaced by more powerful codes such as Turbo codes or LDPC codes

completed.

The different kinds of deep space and orbital missions that are conducted suggest that trying to find a "one size fits all" error correction system will be an ongoing problem for some time to come. For missions close to earth the nature of the channel noise is different from that of a spacecraft on an interplanetary mission experiences. Additionally, as a spacecraft increases its distance from earth, the problem of correcting for noise gets larger.

3. Cyclic Redundency Check(CRC): Cyclic Redundancy Check (CRC) is widely used to detect errors in data communication and storage devices. When high-speed data transmission is required, the general serial implementation cannot meet the speed requirement. Since parallel processing is a very efficient way to increase the throughput rate, parallel CRC implementations have been discussed extensively 423

www.ijaegt.com


in the past decade. Although parallel processing increases the number of message bits that can be processed in one clock cycle, it can also lead to a long critical path (CP); thus, the increase of throughput rate that is achieved by parallel processing is reduced by the decrease of circuit speed. Another issue is the increase of hardware cost caused by parallel processing, which needs to be controlled. These two issues of parallel CRC implementations, Recursive formulas have been developed for parallel CRC hardware computation based on mathematical deduction. They have identical critical paths. The parallel CRC algorithm in the process is an 8-bit message in clock cycles, where the order of the generator polynomial is and is the level of parallelism. However message bits can be processed in clock cycles. High-speed architectures for parallel long encoders in which are based on the multiplication and division computations on generator polynomial are efficient in terms of speeding up the parallel linear feedback shift register (LFSR) structures.

the stream must be bit-serial. This means that n clock cycles required calculating the CRC values for an n-bit data stream. In many high speed data networking applications where data frames need to be processed at high speeds, this latency is intolerable and hence, implementation of CRC generation and checking on a parallel stream of data becomes desirable.

They can also be generally used for the LFSR of any generator polynomial. However, their hardware cost is high. We show that our proposed design can achieve shorter critical path, which leads to a parallel CRC circuit with higher processing speed, and can control or reduce the hardware cost of parallel processing, with regard to the most commonly used CRC generator polynomials.

3.2 Literature survey

CRC architectures for the generator polynomial are developed using DSP algorithms such as pipelining, unfolding and retiming. The architectures are first pipelined to reduce the iteration bound by using novel look-ahead techniques and then unfolded and retimed to design high speed parallel circuits. Secondly, a new scheme is the use of a recursive formula for realizing the parallel version. In this scheme number of bits processed in parallel can be different from the degree of the polynomial generator.

Choosing a good generator polynomial is vital for detecting errors in traffic workloads. Unfortunately, most standard CRCs do not perform well for the short messages that are commonly sent in embedded system applications, and in general are not designed with short messages. If a receiving node finds a mismatch it can be certain that the message has been corrupted while transmission. And the other case is that if the receiver doesn’t find any mismatch in the received message then it has only a probability assurance that the message is uncorrupted. Since there are many standard polynomials it’s the art of picking up a polynomial for optimal error detection process

3.1 Objective and goal of the thesis Error correction codes provides a mean to detect and correct errors introduced by the transmission channel. Two main categories of codes exist: block codes and convolution codes. They both introduce redundancy by adding parity symbols to the message data. Cyclic redundancy check (CRC) codes are the subset of the cyclic codes that are also a subset of linear block codes.

COMMONLY USED GENERATOR POLYNOMIALS

CRC is a very powerful and easily implemented technique to obtain data reliability. Even if error correcting codes exists, their use is limited like, when the channel is simplex, where retransmissions cannot be requested. Most often error detection followed by retransmission is preferred because it is more efficient. The CRC technique is used to verify the integrity of blocks of data called Frames. Using this technique, the transmitter appends an extra n bit sequence to every frame called Frame Check Sequence (FCS). FCS holds redundant information about the frame that helps the receiver detect errors in the frame.

CRC-12

y12+y11+y3+y2+y+1

CRC-16

y16+y15+y2+1

SDLC(CCITT )

y16+y12+y5+1

CRC-16 REVERSE

y16+y14+y+1

SDLC REVERSE

y16+y11+y4+1

CRC-32

The hardware implementation of a bitwide CRC is a simple linear feedback shift register. While such a circuit is simple and can run at very high clock speeds, it suffers from the limitation that

y32+y26+y23+y22+y16+y12+y11+y10+ y8+y7+y5+y4+y2+y+1 Table

424

www.ijaegt.com


4.6 Proposed System: We will show that our proposed design can achieve shorter CP, which leads to a parallel CRC circuit with higher processing speed, and can control or reduce the hardware cost of parallel processing, with regard to the most commonly used CRC generator polynomials.

3.3 Serial Implementation The implementation of CRC check generation circuit can be implemented with the use of linear feedback circuit. Following figure shows the LFSR representation of CRC with generator polynomial 1+y+y3+y5

The proposed design starts from LFSR, which is generally used for serial CRC. An unfolding algorithm is used to realize parallel processing. However, direct application of unfolding may lead to a parallel CRC circuit with long iteration bound, which is the lowest achievable CP. Two novel look-ahead pipelining methods are developed to reduce the iteration bound of the original serial LFSR CRC structures; then, the unfolding algorithm is applied to obtain a parallel CRC structure with low iteration bound. The retiming algorithm is then applied to obtain the achievable lowest CP.

Architectures for polynomial G(y)=1+y+y3+y5 CRC codes have been used for years to detect data errors on interfaces, and their operation and capabilities are well understood. Two codes that have found wide use are CRC–16 and CRC–32. As the names imply, CRC–16 makes use of a 16-bit LFSR, while CRC–32 uses a 32-bit LFSR. 3.4 Motivation for the parallel Implementation: Cyclic redundancy check (CRC) is widely used to detect errors in data communication and storage devices. When high-speed data transmission is required, the general serial implementation cannot meet the speed requirement. Since parallel processing is a very efficient way to increase the throughput rate, parallel CRC implementations have been discussed extensively in the past decade.

Algorithms for Pipelining, unfolding and retiming are discussed in the chapter 3. 5. Detecting Error In Digital Data: The Cyclic Redundancy Check or CRC is a technique for detecting errors in digital data, but not for making corrections when errors are detected. It is used primarily in data transmission. In the CRC method, a certain number of check bits, often called a checksum, are appended to the message being transmitted. The receiver can determine whether or not the check bits agree with the data, to ascertain with a certain degree of probability whether or not an error occurred in transmission. If an error occurred, the receiver sends a “negative acknowledgement” back to the sender, requesting that the message be retransmitted. The technique is also sometimes applied to data storage devices, such as a disk drive. In this situation each block on the disk would have check bits, and the hardware might automatically initiate a reread of the block when an error is detected, or it might report the error to software. The material that follows speaks in terms of a “sender” and a “receiver” of a “message,” but it should be understood that it applies to storage writing and reading as well.

Although parallel processing increases the number of message bits that can be processed in one clock cycle, it can also lead to a long critical path (CP); thus, the increase of throughput rate that is achieved by parallel processing is reduced by the decrease of circuit speed. Another issue is the increase of hardware cost caused by parallel processing, which needs to be controlled. This brief addresses these two issues of parallel CRC implementations. 3.5 Literature Survey and Existing Systems: In the past recursive formulas have been developed for parallel CRC hardware computation based on mathematical deduction. They have identical CPs. The parallel CRC algorithm in processes an m-bit message in (m+k)/L clock cycles, where the order of the generator polynomial and L is is the level of parallelism. However, in m message bits can be processed in m/L clock cycles.

5.1 Background

High-speed architectures for parallel long encoders in which are based on the multiplication and division computations on generator polynomial are efficient in terms of speeding up the parallel linear feedback shift register (LFSR) structures. They can also be generally used for the LFSR of any generator polynomial. However, their hardware cost is high.

There are several techniques for generating check bits that can be added to a message. Perhaps the simplest is to append a single bit, called the “parity bit,” which makes the total number of 1-bits in the code vector (message with parity bit appended) even (or odd). If a single bit gets altered in transmission, this will change the parity from even to odd (or the reverse). The sender 425

www.ijaegt.com


generates the parity bit by simply summing the message bits modulo 2-that is, by exclusive or’ing them together. It then appends the parity bit (or its complement) to the message. The receiver can check the message by summing all the message bits modulo 2 and checking that the sum agrees with the parity bit. Equivalently, the receiver can sum all the bits (message and parity) and check that the result is 0 (if even parity is being used).

whose coefficients are 0 or 1. Addition and subtraction are done modulo 2, i.e., they are both the same as the exclusive and operator. For example, the sum of the polynomials x3+x+1 and x4+x3+x2+x is x4+x2+1 as their difference. These polynomials are not usually written with minus signs, but they could be, as a coefficient of –1 is equivalent to a coefficient of 1.

This simple parity technique is often said to detect 1-bit errors. Actually it detects errors in any odd number of bits (including the parity bit), but it is a small comfort to know you are detecting 3-bit errors if you are missing 2-bit errors.

Multiplication of such polynomials is straightforward. The product of one coefficient by another is the same as their combination by the logical and operator, and the partial products are summed using exclusive or. Multiplication is not needed to compute the CRC checksum.

For bit serial sending and receiving, the hardware to generate and check a single parity bit is very simple. It consists of a single exclusive or gate together with some control circuitry. For bit parallel transmission, an exclusive or tree may be used, as illustrated in Fig

Division of polynomials can be done in much the same way as long division of polynomials over the integers. Below is an example.

The CRC method treats the message as a polynomial in GF (2). For example, the message 11001001, where the order of transmission is from left to right (110…) is treated as a representation of the polynomial. The sender and receiver agree on a certain fixed polynomial called the generator polynomial.

Fig: Exclusive or tree Other techniques for computing a checksum are to form the exclusive or of all the bytes in the message, or to compute a sum with end-around carry of all the bytes. In the latter method the carry from each 8-bit sum is added into the least significant bit of the accumulator. It is believed that this is more likely to detect errors than the simple exclusive or, or the sum of the bytes with carry discarded.

To compute an r-bit CRC checksum, the generator polynomial must be of degree r. The sender appends r 0-bits to the m-bit message and divides the resulting polynomial of degree by the generator polynomial. This produces a remainder polynomial of degree (or less). The remainder polynomial has coefficients, which are the checksum. The quotient polynomial is discarded. The data transmitted (the code vector) is the original m-bit message followed by the r-bit checksum.

5.2 Theory The CRC is based on polynomial arithmetic, in particular, on computing the remainder of dividing one polynomial by another. It is a little like treating the message as a very large binary number, and computing the remainder on dividing it by a fairly large prime such as intuitively, one would expect this to give a reliable checksum. A polynomial is a single variable x

There are two ways for the receiver to assess the correctness of the transmission. It can compute the checksum from the first m bits of the received data, and verify that it agrees with the last r received bits. Alternatively following usual practice, the receiver can divide all the m+r 426

www.ijaegt.com


received bits by the generator polynomial and check that the r-bit remainder is 0. To see that the remainder must be 0, let M be the polynomial representation of the message, and let R be the polynomial representation of the remainder that was computed by the sender. Then the transmitted data corresponds to the polynomial M xr-R (or, equivalently, M xr +R)

where a message consists of many bytes and the polynomial generator, as desired parallelism, consist of a few nibbles. In the final circuit that we will obtain, the sequence s1plus the zeros are sent to the circuit in blocks of w bits each. After K+m/w clock periods, the FFs output give the desired FCS.

By the way R was computed, we know that M xr =QG+R where G is the generator polynomial and Q is the quotient (that was discarded). Therefore the transmitted data, M x r – R is equal to QG, which is clearly a multiple of G. If the receiver is built as nearly as possible just like the sender, the receiver will append r 0-bits to the received data as it computes the remainder R. But the received data with 0-bits appended is still a multiple of G, so the computed remainder is still 0. 5.3 LFSR (Linear Feedback Shift Register) A Linear Feedback Shift Register (LFSR) is a shift register whose input bit is a linear function of its previous state. The only linear functions of single bits are xor and inverse-xor; thus it is a shift register whose input bit is driven by the exclusiveor (xor) of some bits of the overall shift register value.

Where pi the bits of the divisor p (i.e. the coefficients of the generator polynomial). When coincides with substitution of the operators,

Serial input hardware realization CRC-32

Where X(0)is the initial state of the FFs. Considering that the system is time-invariant, we obtain a recursive formula: Fig: Basic LFSR architecture The initial value of the LFSR is called the seed, and because the operation of the register is deterministic, the sequence of values produced by the register is completely determined by its current (or previous) state. Likewise, because the register has a finite number of possible states, it must eventually enter a repeating cycle. However, an LFSR with a well-chosen feedback function can produce a sequence of bits which appears random and which has a very long cycle.

This result implies that it is possible to calculate the m bits of the FCS by sending the k+m bits of the message s1 plus the zeros, in blocks of � bits each. So, after K+m/w clock periods, is the desired FCS. Now, it is important to evaluate the matrix fw There are several options, but it is easy to show that the matrix can be constructed recursively, when ranges from 2 to w:

we have developed our parallel implementation of the CRC. In the following, we assume that the degree of polynomial generator (m) and the length of the message to be processed (k) are both multiples of the number of bits to be processed in parallel (m). This is typical in data transmission 427

www.ijaegt.com


Let us suppose, for example, p={1,0,0,1,1} It follows that

we

have

Fig:Block diagram for crc-32 for 32 bit data A parallel implementation of the CRC can be derived from the above considerations. Yet again, it consists of a special register. In this case the inputs of the FFs are the exclusive sum of some FF outputs and inputs. Fig shows a possible implementation.More precisely is equal to the value at the i th row and j th column. Even in the case of ,if the divisor is fixed, then the AND gates are unnecessary. Furthermore, the number of FFs remains unchanged. We recall that if then inputs are not needed. Inputs are the bits of dividend sent in groups of bits each. As to the realization of the LFSR2, by considering we have a circuit very similar to that inputs are XOR ed with FF outputs and results are fed back.

428

www.ijaegt.com


better understanding, the initial state X (0) is still set to 0x0000 when the circuit is implemented. In proposed architecture w= 64 bits are parallel processed and order of generator polynomial is m= 32 as shown in fig.. As discussed in section 3, if 32 bits are processed parallely then CRC-32 will be generated after (k +m)/w cycles. If we increase number of bits to be processed parallely, number of cycles required to calculate CRC can be reduced. Proposed architecture can be realized by below equation.

Xtemp = F^w

X' = F^W

⊗ D(0to31)⊕ D(32to63) ⊗X⊕Xtemp

In proposed architecture di is the parallel input and F(i)(j) is the element of F32 matrix located at ith row and jth column. As shown in figure 3 input data bits d0….d31 anded with each row of FW matrix and result will be xored individually with d32, d33 …….d63. Then each xored result is then xored with the X' (i) term of CRC32. Finally X will be the CRC generated after (k +m)/w cycle, where w=64.

Property of the Fw matrix and the previously mentioned fact that can be regarded as a recursive calculation of the next state X’ by matrix Fw, current state X and parallel input D, make the 32-bit parallel input vector suitable for any length of messages besides the multiple of 32 bits. Remember that the length of the message is bytebased. If the length of message is not the multiple of 32,after a sequence of 32-bit parallel calculation, the final remaining number of bits of the message could be 8; 16,or 24. For all these situations, an additional parallel calculation w = 8; 16; 24 is needed by choosing the corresponding F w. Since F^w can be easily derived from F32, the calculation can be performed using Equation (8) within the same circuit as 32- bit parallel calculation, the only difference is the F^w matrix. If the length of the message is not the multiple of the number of parallel processing bits w = 4 i.e. data bit is 11011101011. Then last two more bits (D (3)) need to be calculated after getting X (12). Therefore, F2 must be obtained from matrix F4, and the extra two bits are stored at the lower significant bits of the input vector D. Equation (8) can then be applied to calculate the final state X (14), which is the CRC code. Therefore, only an extra cycle is needed for calculating the extra bits if the data message length is not the multiple of w, the number of parallel processing bits. It is worth to notice that in CRC-32 algorithm, the initial state of the shift registers is preset to all `1's. Therefore, X (0) = 0xFFFF. However, the initial state X (0) does not affect the correctness of the design. In order for

REFERENCES [1] Campobello, G.; Patane, G.; Russo, M.; "Parallel CRC realization," Computers, IEEE Transactions on , vol.52, no.10, pp. 1312- 1319, Oct.2003 [2] Albertengo, G.; Sisto, R.; , "Parallel CRC generation," Micro, IEEE , vol.10, no.5, pp.6371,Oct1990 [3] M.D.Shieh et al., “A Systematic Approach for Parallel CRC Computations,” Journal of Information Science and Engineering, May 2001. [4] Braun, F.; Waldvogel, M.; , "Fast incremental CRC updates for IP over ATM networks," High Performance Switching and Routing, 2001 IEEE Workshop on , vol., no., pp.48-52, 2001 [5] Weidong Lu and Stephan Wong, “A Fast CRC Update Implementation”, IEEE Workshop on High Performance Switching and Routing ,pp. 113-120, Oct. 2003. [6] S.R. Ruckmani, P. Anbalagan, “ High Speed cyclic Redundancy Check for USB” Reasearch Scholar, Department of Electrical Engineering, Coimbatore Institute of Technology, Coimbatore-641014, DSP Journal, Volume 6, Issue 1, September, 2006. [7] Yan Sun; Min Sik Kim; , "A Pipelined CRC Calculation Using Lookup Tables," Consumer

429

www.ijaegt.com


Communications and Networking Conference (CCNC), 2010 7th IEEE , vol., no., pp.1-2, 9-12 Jan. 2010 [8] Sprachmann, M.; , "Automatic gene ration of parallel CRC circuits," Design & Test of Computers, IEEE , vol.18, no.3, pp.108-114, May 2001 583

Mohammed Abdul Haque, B.Tech. (ECE), MS(U.K.), MIEEE(USA), MSCE(KSA), AM-IETE (India), MIPA (India) ,Sr. Assistant Professor, Dept. of ECE, Aurora's Scientific, Technological & Research Academy (ASTRA), Bandlaguda, Hyderabad. With 2 years of experience. His area of interests is electro magnates.

About Author Manu.G completed his B.Tech in Electronics and Communications Engineering from DVR Engineering College from Sangareddy in 2004. Presently he is pursuing his M.Tech in Electronics and Communications Engineering from Aurora's Scientific Tech & Research Academy, Hyderabad. His area of interests is communications.

430

www.ijaegt.com

A New Approach for Parallel CRC Generation for High Speed ...

A New Approach for Parallel CRC Generation for High Speed ...

Suggest Documents

A new approach to high-speed protection for ...

Implementation of A High-Speed Parallel Turbo Decoder for 3GPP ...

A FPGA-based Parallel Architecture for Scalable High-Speed Packet ...

Design and Implementation of a High Speed Parallel Architecture for

A Practical Parallel CRC Generation Method F - OutputLogic.com

sOLiDzipper: A High speed encoding Method for the next-Generation ...

A robust approach to high-speed navigation for ... - CiteSeerX

A Mixed-Signal Approach for High-Speed Fully ... - Maury Microwave

CRC Generation for Protocol Processing - CiteSeerX

A Simplified Multipath Component Modeling Approach for High-Speed

A parallel high-order fictitious domain approach for ... - CiteSeerX

A Parallel Approach for High Performance Hardware Design of Intra ...

High-Speed Resistance Training in Elderly People: A New Approach

Parallel Architecture for Decoding LDPC Codes on High Speed ...

pFANGS: Parallel High Speed Sequence Mapping for Next ... - HiCOMB

Parallel and Distributed Algorithms for High Speed Image ... - CiteSeerX

Distributed Parallel Scheduling Algorithms for High-Speed ... - HKUST

High speed image space parallel processing for computer-generated ...

A new approach for short A new approach for short

K-vec: A New Approach for Aligning Parallel Texts

Parallel Synthesis: A New Approach for Developing Analytical Internal ...

A New MAC Scheme for Very High-Speed WLANs - Maynooth ...

- 1 - A New Network Processor Architecture for High-Speed ...

A radically new, ultra-high-speed method for the ... - Plumettaz