a communication laboratory course based on the ... - CiteSeerX

A COMMUNICATION LABORATORY COURSE BASED ON THE TMS320C6711 DSK * Michalis D. Galanis1 and Evangelos Zigouris2 1

VLSI Design Lab., Electrical & Computer Engineering Dept., University of Patras, 265 00, Greece [email protected] 2 Electronics Lab., Electronics & Computer Div., Physics Dept., University of Patras, 265 00, Greece [email protected] ABSTRACT In this paper, a laboratory course on communication systems design and real-time implementation is proposed. The Texas Instruments TMS320C6711 DSP Starter Kit is used for executing and demonstrating the communication transmitters and receivers that will be developed for the purposes of this course. Three communication systems that are used in International Telecommunication Union's standards are presented. Issues and design hints about their implementation in a DSP processor are given. The considered communication systems have been implemented by the authors and their proper operation has been tested in the TMS320C6711 DSP Starter Kit module. Main emphasis is given to the V.34 receiver design, since it the most challenging one in this paper. The execution time and the memory requirements of the three designs considered in this paper, are also given. The results on the TMS320C6711 illustrate that the real-time requirements of the presented designs have been met.

1. INTRODUCTION The establishment of a hardware supported communication laboratory is a necessity in teaching the aspects of communication systems. This necessity is reinforced by the fact that contemporary communications rely in Digital Signal Processing (DSP) algorithms. A programmable DSP processor is usually used for executing the communication systems implemented in such laboratories. With such type of communication laboratory, the students better understand the theoretical aspects of communication systems, as they have to implement and execute, in a real-world DSP processor, communication algorithms. Also, they become familiarized with issues of embedded real-time systems design, like debugging, testing and efficient software development. This paper presents a laboratory course for design and real-time implementation of communication applications in the TMS320C6711 DSP Starter Kit (DSK). This course is given in the 3rd semester of the

Master degree course in Electronics and Computers at the Department of Physics, University of Patras, in Greece. The graduate students attending this laboratory have become familiar in designing and implementing fundamental DSP algorithms in the Texas Instruments TMS320C6711 DSP Starter Kit (DSK), in the frame of previous semester courses [1]. Examples of such algorithms are: • Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters • Tone Generation and Detection • Fast Fourier Transforms (FFT) • Multi-rate signal processing • Adaptive filters The aforementioned algorithms are usually part of modern communications standards. For example, the FFT is part of the baseband processing of Orthogonal Frequency Division Multiplex (OFDM) modems in Digital Video (Audio) Broadcasting and in Wireless Local Area Networks (LAN) [2]. Thus, the students have already designed parts of communication systems and they can move to the stage of designing complete communication systems, as the ones presented in this paper. Additionally, the students have gained experience in writing software for the TMS320C6711 DSP processor. For the abovementioned algorithms, C or mixed assembly-C programs have been written by the graduate students of previous years for implementing them. Three communication systems are presented in this paper, which have been implemented by the authors. The proper operation of these implementations has been thoroughly checked. The first design involves the implementation of a binary Differential Phase Shift Keying (DPSK) transmitter/receiver. The bit rate is 600 bits/second with a carrier frequency of 1200Hz. This kind of system is used during Phase 2, for the exchange of INFO sequences, in the ITU-T V.34 [3] and V.90 modem Recommendations. Next, the implementation of a 300 bits/s binary Frequency Shift Keying (FSK) transmitter/receiver is presented. The mean carrier frequency is set to 1080Hz and the frequency deviation is ±100Hz. This kind of system is actually the low-band

*This work was partially funded by the Alexander S. Onassis Public Benefit foundation

channel of the ITU-T V.21 Recommendation. Finally, the main part of this paper presents the design and implementation aspects of a V.34 transmitter/receiver [3]. 2. COMMUNICATION LABORATORY ENVIRONMENT As mentioned earlier, the contemporary physical layers of communication standards consist of computational and data intensive DSP algorithms. Hence, a laboratory in real-time implementations of DSP communication algorithms is much like a prerequisite in undergraduate or graduate degrees targeting DSP. For the purposes of the communication laboratory course, ten workstations are used. Each one of the ten workstations in the laboratory, is equipped with a Windows-based PC, a TMS320C6711 DSK module, an oscilloscope and a function generator. Also, in every PC, the Code Composer StudioTM software environment from Texas Instruments is installed, which accompanies each DSK module. The main purpose of this laboratory course is the real-time DSP software development for different types of communication transmitters and receivers. The different types are referred to the different modulationdemodulation schemes used in each case. For testing the correctness of the implementation, especially the receiver’s one, a system composed by two TMS320C6711 DSKs, is utilized. The first one implements the transmitter and the communication channel. The communication channel distorts the transmitter’s output signal by filtering it (e.g. using an FIR filter). The second DSK implements the receiver. Two host computers (PCs) control each DSK module. In the future, one PC will control the two DSKs by using a parallel port switch. The analog signals can be observed by an oscilloscope, for testing and debugging the operation of the transmitter and the receiver. The digital signals can be either viewed within the Texas Instruments Code ComposerTM environment or by sending them to the D/A converter and transforming them to analog ones. For example, it is necessary to compare the input signal (information) to the transmitter and the output signal of the receiver, to check the proper operation of the receiver, which is usually the more complex than the transmitter part. In Fig. 1, the magnitude response of a 512-point complex radix-2 FFT executed on the TMS320C6711, is given. This response is obtained with the Graph viewing utility of the Code Composer StudioTM. This environment provides various debugging capabilities which are useful in testing the proper behavior of the developed software. Such capabilities are viewing memory status, the call stack, variables e.t.c. Also, with the Code Composer profiler the machine clock cycles required for an algorithm execution can be measured. So, computational intensive parts of an algorithm or an application can be identified and optimized for reducing the execution time.

Fig. 1. Real-time 512-point complex FFT magnitude response. The TMS320C6711 DSK module (Fig. 2) was chosen as it provides a low cost gateway into real-time implementation of DSP algorithms. This module has the following features: A 150MHz TMS320C6711 DSP capable of executing 1200 Million Instruction Per Second (MIPS), 4M-bytes of 100MHz SDRAM, 128K-bytes of flash memory, a 16-bit audio codec, a parallel port interface to standard parallel port on a host PC, a host port interface (HPI) access to all DSP memory via the parallel port, an embedded JTAG emulation via the parallel port and expansion memory and peripheral connectors for daughterboard support. The TMS320C6711 DSK module is accompanied by the Code Composer Studio IDE software, developed by Texas Instruments.

2Mx16 SDRAM

16 bit audio codec TLC320AD535

McBSP

128x8 Flash EEPROM

TEXAS INSTRUMENTS TMS320C6711

HPI

To PC Parallel port

TMS320C6711 DSK

Fig. 2. TMS320C6711 DSK block diagram. The TMS320C6711 is a 32-bit floating point DSP [4]. It provides two level memory architecture for the internal program and data busses. The first level memory for both the internal program and data bus is a 4K-byte cache. The second level memory is a 64K-byte memory block that is shared both by the program and data buses. It comprises of two serial ports (McBSP) for full duplex asynchronous communications, an Enhanced Direct memory (EDMA) controller used for block data or peripheral initiated transfers, a Host Port Interface (HPI), external memory interface (EMIF) used by the CPU to access off-chip memory, two 32-bit general purpose

timers and an expansion bus used by the CPU to access off-chip peripherals, FIFOs and PCI interface chips. 3. DPSK TRANSMITTER/RECEIVER DESIGN In this section an implementation of a DSP based 600 bits/s binary DSPK transmitter and receiver at a carrier frequency of 1200Hz is presented. This kind of transmitter/receiver is used during Phase 2, for the exchange of INFO sequences, in the ITU-T V.34 [3] and V.90 Recommendations. The sampling frequency of the system is set to 9600Hz. So, there are 16 samples per transmitted bit. Because the codec of the TMS320C6711 DSK module has a sampling frequency of 8000Hz, a sampling rate converter (e.g. as the one presented in [1]) have to be employed to change the frequency of the transmitted (received) signal from 9600Hz to 8000Hz (8000Hz to 9600Hz). The transmitted signal is a single frequency tone with 180 degrees phase reversals, according to the following differential coding. The transmit point is rotated 180 degrees from the previous point if the transmit bit is 1, and the transmit point is rotated 0 degrees from the previous point if the transmit bit is 0. The tones are generated with a Look Up Table (LUT) technique, for reducing the implementation complexity. For the implementation of the receiver, a Discrete Fourier Transform (DFT) method is chosen so as to identify a phase reversal at the input tone which indicates the transmission of a binary 1. When there is no phase reversal, the transmitted bit is binary 0. The DFT is applied every 16 samples of the received DPSK signal. The transmitted signal is filtered with a bandpass filter centered at the carrier frequency. In this implementation two filters are designed, one centered at 1200Hz and the other one at 2400Hz. These filters are in accordance with the Figure 13/V.34 of the V.34 Recommendation [3]. The correctness and the real-time constraints of the implementation have been checked. Another design of the DPSK transmitter/receiver, which can be also followed and implemented in the TMS320C6711 by the graduate students, can use a noncoherent DPSK demodulator, as the one described in [5] . 4. FSK TRANSMITTER/RECEIVER DESIGN This section presents an implementation of a DSP based 300 bits/s binary FSK transmitter and receiver. In this implementation, the mean carrier frequency of 1080Hz and the frequency deviation is ±100Hz. The higher frequency corresponds to the transmission of binary 0. This kind of FSK transmitter/receiver is the low-band channel of the ITU-T V.21 Recommendation. The sampling frequency of the FSK transmitter/receiver is set at 9600Hz, as in the case of the presented DPSK transmitter/receiver. Thus, there are 32 samples per transmitted bit in this case. Since, the TMS320C6711 DSK module has a sampling frequency of 8000Hz, a sampling rate converter [1] is employed

again to change the frequency of the transmitted (received) signal from 9600Hz to 8000Hz (8000Hz to 9600Hz). For the receiver implementation, DFT in the input FSK signal samples is employed. The DFT values are computed for the 980Hz and 1180Hz and the larger DFT value is used to select the transmitted bit. For example, if the DFT value for the 980Hz is larger than the 1180Hz DFT value, the transmitted bit is the binary 1. The DFT function is the same as in the DPSK implementation. Alternatively, an FSK receiver design, that can be also used and implemented by the graduate students, utilizes a bank of several resonator type filters operating in parallel. If sufficient energy at the various frequencies is present for the correct amount of time, the appropriate bits are detected. 5. V.34 TRANSMITTER/ RECEIVER DESIGN In this section, the architecture of the V.34 transmitter and receiver that they were considered for implementation, are presented. The most challenging task is the receiver’s architecture, since it is not defined in the V.34 Recommendation. Various types of V.34 receivers were simulated with MATLABTM, since a variety of them have been proposed in the literature. It was concluded that an effective receiver structure, for DSP software implementation, is the one that it is given in sub-section 5.2. In the following, main emphasis is given to the V.34 receiver architecture. The graduate students of the proposed laboratory course are motivated to implement different receiver structures, so as to evaluate their performance and effectiveness in a real-world environment, the TMS320C6711 DSK. 5.1. V.34 transmitter The V.34 Recommendation [3] explicitly defines the structure of the transmitter, so for a V.34 compliant modem an engineer must follow the Recommendation’s specifications. The transmitter consists of the V.34 encoder and the Quadrature Amplitude Modulation (QAM) unit [5], [6]. The V.34 supports 6 symbol transmission rates, opposed to previous modem Recommendations, like the V.32. Three of the symbol rates (the 2400, 3000 and 3200 Hz) are mandatory, while the rest three ones (2743, 2800, 3000 and 3429 Hz) are optional for the design of a V.34 modem. The V.34 encoder consists of three units that correspond to the stages through which the data are encoded for transmission. The first of these units, called parse, accepts a stream of binary input data, scrambles it, and then partitions these scrambled bits into different groups to be passed to the next unit. The second logical unit, point-select, uses the parsed bits to select signal points from a constellation of 2-Dimensional (2D) points that has been specified for use in V.34. The third logical unit, precode, applies a precoding filter to the signal points to compensate for the noise-enhancement caused by the linear adaptive equalizer in the V.34 receiver. This

unit also contains the trellis encoder [6], connected in a feedback configuration, which ensures that the transmitted points correspond to a proper trellis sequence. The goal of the V.34 encoder is to map binary input data to an output sequence of 2D signal points. These points are then modulated using QAM at a specified carrier frequency for transmission over an analog channel, as it is shown in the general V.34 transmitter

diagram in Fig. 3. The pulse which is going to be modulated by the carrier signal contains infinite frequency spectrum. Thus, if it is transmitted as it is, InterSymbol Interference (ISI) will be caused [6]. For this reason a pulse shaping lowpass filter (Square Root Raised Cosine) is inserted to transform suitably the pulses so as the QAM demodulation is possible without the presence of ISI [5], [6]. cos(2πfcn)

Real Input bits

V.34 encoder

Encoded bits

part

Pulse Shaping filter

I(n)

X

Signal mapping

+ Imag. part

Q(n) Pulse Shaping filter

X

Modulated output

sin(2πfcn)

Fig. 3. V.34 transmitter diagram. 5.2. V.34 receiver The structure of the V.34 receiver is not defined by the Recommendation, so it is the designer’s responsibility to select a structure and the appropriate algorithms for this structure so as to recover the transmitted data. Demodulator

Αnalog signal

QAM Demodulator

Adaptive Equalizer

Decoder Viterbi Decoder

Inverse Precoder

Inverse Mapper

Output bits

Fig. 4. V.34 receiver diagram. A telecommunication receiver mainly performs the inverse operations of the corresponding transmitter, but there are additional operations so as to combat the imperfections introduced by the channel, which are amplitude and phase distortion and noise insertion. The proposed and implemented V.34 receiver consists of the QAM demodulator and the decoder. The general diagram of receiver architecture that has been implemented is illustrated in Fig. 4. The QAM demodulator performs the inverse operation of a QAM modulator. Nevertheless, it is a much complex system due to two basic reasons: demand for phase recovery of the carrier signals and demand for proper synchronization for the symbol recovery. In Fig. 5, the QAM demodulator architecture used in the proposed V.34 receiver is given. It is a coherent demodulator [6], since the received QAM signal is

convolved with the carrier signals cos(2πfcn) and sin(2πfcn), where fc is the carrier frequency. At the input of the QAM demodulator the signal is sampled at Fs=k ⋅ Fsym, where k is a small integer number and Fsym is the symbol rate (e.g. 3200 symbols/s). For the proposed V.34 implementation, k equals 4. At the output of the demodulator, the sampling rate is reduced to Fsym; thus the equalizer unit operates in this frequency. It has to be noted that the received QAM signal at the input of the demodulator is sampled at k ⋅ Fsym after a sampling rate conversion from 8000Hz, which is the frequency of the codec in the TMS320C6711 DSK module. The matched filter in Fig. 5 is a Square Root Raised Cosine, as in the case of the V.34 transmitter. So, their overall response is Raised Cosine, which aids in the minimization of the ISI [5], [6]. The sample unit uses the information from the symbol timing recovery unit for reducing the sampling rate from the Fs down to the symbol rate Fsym. The phase recovery unit adjusts the phases of the carrier signals. A mismatch between the phases of the carriers cos(2πfcn) and sin(2πfcn) generated in the transmitter and receiver causes a reduction in the Signalto-Noise Ratio (SNR) in the receiver; thus more errors can occur in the receiver in this case. For the phase correction, a Phase Lock Loop (PLL) [6] was implemented with a second order loop filter and a Numerically Controlled Oscillator (NCO) that outputs the corrected carrier frequency. The symbol timing recovery system has immediate relation to the structure of the adaptive equalizer. Since the structure of complex symbol spaced baseband equalizer has been chosen (since the adaptive equalizer unit follows the QAM demodulator block in Fig. 4), a proper timing recovery system has to be implemented.

Input signal

Matched filter

X

I'(n)

Sampler

Real part

cos(2πf cn)

Fs=kFsym

Phase recovery

Timing recovery

Fsym

To Equalizer

π /2 sin(2πf cn) Matched filter

X

Imaginary Sampler Q'(n)

part

Fsym

Fig. 5. QAM coherent demodulator. In a digital receiver, the goal is to strobe the sampled signal at the top of the symbol, corresponding to the maximum eye opening for reducing the ISI [6]. However, the sampled signal values in a digital receiver show a time shift with relation to the samples of the transmitter. This can be constant or there can be a difference in sample rate due to a mismatch between the transmitter and the receiver sampling clock. The original samples can be recovered either by adjusting the local clock phase of the receiver or by digital interpolation on the signal [7]. In the proposed V.34 receiver architecture, a solution that uses a fixed sampling clock and performs interpolation on the received signal samples to calculate the intermediate values, has been chosen. In Fig. 6, the symbol timing recovery scheme implemented in the V.34 receiver is shown. The interpolation filter used is a cubic filter which has the Farrow structure [8]. The symbol timing recovery unit outputs samples at symbol rate (i.e. received symbols), which are input to the adaptive equalizer unit.

flexibility, since different symbol rates (which is the case in V.34) can be supported. The equalizer corrects the phase and amplitude distortion caused by the communication channel. Its structure was chosen to be a Complex Symbol Spaced Equalizer (CSSE) in baseband, as it is shown in Fig. 7. The decision device in training mode is a memory containing the training signal TRN [3] and in data mode is the Viterbi decoder. The equalizer’s coefficients are adapted by the Least Mean Square (LMS) algorithm [6]. {Re(hn)}

+

Real part {Im(hn)}

{Im(hn)} A/D

Filter

Variable Digital Fs = Fsym To equalizer Fs = k Fsym Decimator Interpolator unit 1-of-K Filter µ

Fixed Clock

Imaginary part

Integrator

integer delay + decimation

Timing Error Detector

Loop Filter

Fig. 6. Symbol timing recovery diagram. The control value for the interpolation consist of an integer part m (the basepoint index) and a fractional part µ. The basepoint index determines for each symbol which sample passes to the output of the variable decimator (sampler). The fractional delay parameter is used in the interpolation filter. It determines the point between two samples at which the interpolated value must be calculated. The proposed digital symbol timing recovery method has several advantages. A fixed crystal oscillator is used instead of an expensive Voltage Controlled Crystal Oscillator (VCXO), which is not provided in the TMS320C6711 DSK. Also, this approach provides high

- Equalized Real part

+ +

+

m

fractional delay

+

{Re(hn)}

Equalized Imaginary part

Fig. 7. Complex baseband equalizer structure. In Fig. 8, the real and imaginary error signals are displayed. From the error signals it is found out that the equalizer’s convergence is quite satisfactory and the steady state error is quite small, so the equalizer compensates quite well for the channel distortion. The output of the equalizer is a sequence of noisecorrupted 2D points. The first step of the decoder unit is to decide to which ideal constellation points these distorted points correspond. This task is accomplished by the Viterbi decoder. Next, groups of eight points (i.e., one mapping frame) are passed through the inverse precoder and the inverse mapper to be decoded into the output data stream. The key concept behind the operation of the Viterbi algorithm is the idea of a trellis sequence that is generated in the transmitter by the trellis encoder [9].

1

4 x 10 Error signal - real part

1

0.5

0.5

0

0

-0.5

-0.5

-1

0

500

1000

1500

-1

4 x 10Error signal - imag part

0

500

1000

1500

Fig. 8. Error signals of the CSSE baseband equalizer. The outputs of the Viterbi decoder correspond to the estimated trellis sequence that was transmitted. However, because of the transmitter’s nonlinear precoder, this trellis sequence is not necessarily the actual sequence of points that was selected by the transmitter’s point-select unit. To account for this possible discrepancy, the trellis sequence from the Viterbi decoder must be inversely precoded. Ideally, the input to the inverse mapper is the exact sequence of points that was output from the pointselect stage of the transmitter. To retrieve the original sequence of bits from these points, it is necessary to invert the operation of the: a) differential encoder, b) shell mapper, c) simple mapper, d) scrambler, and e) parser. Undoing these operations is very similar to the actual encoding methods themselves. 6. IMPLEMENTATION RESULTS The presented communication systems have been implemented by the authors and executed in the TMS320C6711 DSK module. Their proper operation has been checked using the debugging capabilities of the Code Composer StudioTM and by monitoring and saving signals at the various stages of the considered transmitters/receivers. The programs were developed in ANSI C for faster development time. Since the TMS320C6x platform has a powerful optimizing compiler, the development of programs in a high-level language is not a disadvantage. For computationally intensive parts, like the FIR filtering, we have used assembly optimized functions provided by the Texas Instruments [10]. The designs were compiled using the options: -o3 -op2 –pmm, which correspond to compiling for high performance. Table 1. Execution time and memory requirements Design Dpsk_trans Dpsk_rcv Fsk_trans Fsk_rcv V34_trans V34_rcv

Cycles (*106) 1.885 0.347 1.300 0.372 2.906 4.072

Time (ms) 12.57 2.31 8.67 2.48 19.38 27.16

Memory req. (K-bytes) 61.2 60.5 102.9

In Table 1, the 150MHz ’C6711ΤΜ CPU cycles, the execution time and memory requirements (program and data memory) for the implementation of the DPSK, FSK and V.34 transmitter/receivers, are presented. The

memory requirements are given for both the transmitter and receivers since they have been compiled together. For the case of the V.34, the results in Table 1 are for the symbol rate of 3200Hz. The cycles and execution time results were obtained with the assumption that the program is stored in an external memory and the data are stored in the internal memory of the processor. From Table 1, it is realized that the real-time requirements are satisfied. 7. CONCLUSIONS - FUTURE ACTIVITIES A real-time DSP communication course based on the Texas Instruments TMS320C6711 DSK has been proposed in this paper. Three communication systems which are part of international telecom standards have been presented. These have been implemented by the authors and executed on the TMS320C6711 DSK module. The results on the TMS320C6711 show that the real-time requirements of these designs are met, especially in the case of the V.34 receiver which is the most computational intensive system in this paper. The students attending the proposed laboratory course will study the presented architectures and they will implement them more efficiently or they will consider different architectures. In future, more contemporary communication systems than the presented in this paper, like the V.90, will be a topic of implementation. 8. REFERENCES [1] M. D. Galanis, A. Papazacharias, E. Zigouris, “A DSP Course For Real Time Systems Design and Implementation Based On the TMS320C6211 DSK”, Proc. of 14th IEEE Intl. Conf. on Digital Signal Processing (DSP2002), Santorini, Greece, pp. 853-856, vol. 2, 2002. [2] IEEE Working group for Wireless LANs, http://grouper.ieee.org/groups/802/11/, 2004. [3] ITU-T Recommendation V.34, “A modem operating at data signalling rates of up to 33600 bit/s for use on the general switched telephone network and on leased pointto-point 2-wire telephone-type circuits”, 1998. [4] Texas Instruments, “TMS320C6000 CPU and Instruction Set”, 2004. [5] J. B. Anderson, Digital Transmission Engineering, IEEE Press, 1997. [6] J. G. Proakis, M. Salehi, Communication System Engineering, Prentice Hall International Editions, 1994. [7] F. Gardner, “Interpolation in Digital Modems - Part I: Fundamentals”, IEEE Trans on Communication, vol. 41, no.3, pp. 501-507, March 1993. [8] C. W. Farrow, “A continuously variable digital delay element”, Proc. of IEEE Int. Symp. Circuits & Systems (ISCAS), Espoo, Finland, pp. 2641-2645, June 1988. [9] L.-F Wei, “Trellis-Coded Modulation Using Multidimensional Constellations”, IEEE Trans. on Information Theory, vol. IT-33, pp. 483-501, July 1987. [10] Texas Instruments Inc., “TMS320C62x/C67x Programmer's Guide”, 2004.