A Software Radio Development System for Wireless Multimedia Systems C. Gnudi (*), P. Antognoni (**), S. Cacopardi (***), F. Frescura (***), M. Vagheggini (*) (*) Digilab2000 - Foligno (PG) – ITALY (**) CNIT – Unita’ di Ricerca di Perugia - DIEI, Università di Perugia –Perugia - Italy (***) DIEI, Università di Perugia – Perugia - Italy e-mail:
[email protected] -
[email protected] -
[email protected] -
[email protected] -
[email protected]
Abstract - In this paper a software radio development system and its hardware and software DSP based architecture is presented. The proposed hardware configuration and the software analysis are intended to evaluate the capability and the costs, in terms of hardware and software, of the integration in one common hardware platform (based on programmable DSPs), of different wireless standards.
proposed in [3] but it is mainly optimized for military voice and data applications. Although the project includes hardware design, the main interest is focused on software issues. In this framework the studies have as a target the integration of GSM and UMTS in the cellular telephony market, IEEE 802.11, HIPERLAN and Bluetooth, in the WLAN market, and DAB, DVB and DTV in the broadcasting area.
1. Introduction Software radio (SWR) [1][2] is a key technology in the wireless communications industry. One of the main challenges in this market is the integration of multiple systems and applications on a single terminal. Existing wireless communication standards are mainly adopted in a regional scale, thus problems arise when dealing with roaming users or different markets. Software radio allows developing transceivers that operate with several standards and in several frequency bands on a common hardware platform. This technology is expected to be a key issue in several emerging application scenarios of wireless communications, including not only cellular telephony, but also the wireless networking and the new digital broadcasting and interactive services. In this paper we describe software and hardware DSP based architectures in order to design a software radio development system, i.e. a completely DSP based digital radio transceiver, in which software radio algorithms could be tested. A similar idea is
2. Hardware configuration Since the above mentioned application areas require different processing capabilities and power consumption, the analysis is carried out by developing a software radio board based on both C5000 and C6000 devices in a “double signal path” architecture as shown in Fig. 1. According to this architecture, and depending on the required computational complexity, the transceiver is made-up of a certain number of standard modules, called BPU (Basic Processing Unit). The BPU contains a C5000 and a C6000 device, each with a proper amount of dedicated external RAM. Every BPU can perform one or processing functions (e.g. source coding/decoding, channel coding/decoding modulation/demodulation) and these functions are exclusively performed by the C5000 or the C6000 device depending on the multiplexer configuration. BPU operation and synchronization can be managed by the microcontroller and monitored by the JTAG connectors. This kind of modular hardware configuration will allow to design
and test a wide range of software radio algorithms for different standards, simply varying the number of BPUs in both the transmitter and the receiver side and updating the microcontroller software. Thus, the whole complexity of the transceiver design is moved to the software side. Transmitter
IF out
Digital Digital Modulator Modulator
R R A A M M
C6000 C6000
M M U U XX
M M U U X X R R A A M M
JTAG
C5000 C5000
BPU
FIFO FIFO
R R A A M M
•••
JTAG
M M U U XX
M M U U X X R R A A M M
JTAG
C6000 C6000
C5000 C5000
BPU
FIFO FIFO
Data in
JTAG
Shared Shared BUS BUS
MCU MCU R R
C6000 C6000 AA
JTAG
R R
C6000 C6000 AA
JTAG
M M
M M
conversion (zero IF or homodyne) configurations and all the digital processing is carried out by a flexible ASIC-DSP module (Fig. 2). In this architecture the ASIC implements functions with very high processing demand and simple algorithm structure (e.g. correlators, high speed FIR filters), while the DSP implements the more complex but less processing demanding functions (e.g. source/channel coding/decoding, interleaving/deinterleaving). Moreover, since the DSP accesses the ASIC resources, it is able to manage the ASIC configuration and operation for the different telecommunications standard implemented.
RAM/ROM RAM/ROM
IF in
Digital Digital Demodulator Demodulator
M M U U X X
FIFO FIFO
JTAG
BPU
M M U U XX
•••
R R
C5000 C5000 AA
M M
M M U U X X
FIFO FIFO
JTAG
BPU
M M U U X X
Data out
RF Front-End
R R
C5000 C5000 AA
M M
LNA
RF Filter
Data acq.
IF Filter
AMP
Digital processing functions
A/D
Receiver BPU: Basic Processing Unit
Memory
AGC
ASIC
Fig.1 - SWR development system for standard applications
In the framework of the integration of a pool of standards in one common hardware platform, the following issues are explored in the software area: •= Which algorithm configuration is more efficient in terms of code size and speed. •= Which standard is more critical in the computational cost. •= Which functional blocks of different standards may be efficiently integrated. •= Which family (C5000 or C6000) is more efficient in a specific design. 3. Architecture Optimization For Handheld And Portable Applications Battery powered handheld and portable devices (i.e. a GSM – UMTS – Bluetooth terminal) have critical power and space constraints, so their SWR architecture must be optimized with respect to these issues. For battery powered applications, a commonly accepted SWR architecture adopts a RF receiver based on pass-band superheterodyne with a signal digitalization at IF or a direct
DSP Core EEPROM
PA
RF Filter
IF Filter
AMP
D/A
Fig.2 - SWR architecture for low power applications
While this configuration optimizes performance with respect to the power consumption and space constraints, it represents only an approximation of an ideal SWR design. This means that, in order to obtain the required flexibility, the ASIC must be designed to perform different (but fixed) functions that are selected and configured by the DSP core or it has to be designed as a specialized programmable processor (a “coprocessor”). The hardware and software architecture that we described in the previous section can address these issues. In particular the proposed modular hardware structure allows to integrate an FPGA based BPU (Basic Processing Unit) with proper interfaces with the DSP modules. The FPGA based BPU can be considered as a valid prototype of a new ASIC layout, so different low power SWR configurations may be designed and evaluated. For low power applications the pro-
posed SWR development system may be configured as shown in fig. 3. Transmitter R R A A M M
JTAG JTAG
C5000 C5000
DSP BPU
FPGA BPU IF out
Digital Digital Modulator Modulator
FIFO FIFO
FPGA FPGA
•••
RAM RAM
M M U U X X
R R A A M M
Data in
M M U U X X
JTAG JTAG
C6000 C6000
FPGA-DSP link
FIFO FIFO
FPGA-DSP link
Shared Shared BUS BUS FPGA-DSP link
FPGA-DSP link
MCU MCU RAM RAM IF in
Digital Digital Demodulator Demodulator
FIFO FIFO
FPGA FPGA
C6000 C6000
JTAG JTAG
R R A A M M
RAM/ROM RAM/ROM
•••
M M U U X X
M M U U X X
FIFO FIFO
Data out
FPGA BPU DSP BPU
JTAG JTAG
C5000 C5000
R R A A M M
Receiver BPU: Basic Processing Unit
Fig.3 – SWR development system with DSP and FPGA based BPUs, optimized for low power applications
In order to optimize the performance of the SWR architecture with respect to space and power consumption constraints, a design process that compares optimal (as regard a specified reference hardware structure) and sub optimal implementations is carried out. In particular we take account of the following issues: •= The number of bit in the ADC and DAC converters and in the ASIC. ADCs show a near exponential relationship between increased resolution and dissipated power. Also ASIC area and power consumption are a polynomial function of the number of bits. This means that the choice of the ADC/DAC resolution must take account of a trade-off between the requirements of different telecommunications standards implemented and the power control and AGC subsystems complexity. •= The splitting of processing functions between the ASIC and the DSP. A key aspect for the optimization is represented by the proper assignment of the processing functions between the ASIC and DSP; implementing functions on ASIC, that can be run-time configured to sup-
port a range of telecommunications standards, does not always help to reduce space and power consumption, since different functions correspond to increased ASIC area. Moreover the number of functions performed by the DSP should be maximized in order to maintain a high degree of flexibility. •= The impact on the radio performance of different sub optimal algorithms. In portable applications all the power for transmission and signal processing is provided by the batteries, so both consumption requirements must be minimized. This means that any sub optimal algorithm must be carefully evaluated in terms of performance degradation with respect to the ideal case, because this degradation leads directly to increased power requirement for transmission and/or to a reduced sensitivity of the receiver. Algorithms Definition & Specification Suboptimal design
Optimal design
Radio Performance
Radio Performance
DSP / ASIC
DSP / ASIC Cost function F minimization
no
no
yes
DSP
Cost function F minimization yes
END Cost function: F(power consumption, chip area, computational complexity, cost of components,…)
Fig.4 – Development method for SWR low power applications
The comparison of optimal and sub optimal design is carried out according to some development guidelines as shown in the flowchart of Fig. 4. Following these criteria we will define a Cost Function F that comprises different parameters (with proper weights) as overall power consumption (processing +
standards such as UMTS, IEEE 802.11 and GPRS. The implementation of the different telecommunication standards can be reduced to the implementation and optimization of some basic processing blocks. Although transmission parameters are different, due to the great variation in terms of useful throughput and bit stream protection, common algorithms are adopted for encoding/interleaving tasks, for example. Thus, thinking to modular design architecture, the physical layer of each standard can be segmented into some basic processing units capable to supply the required MIPS for that item. Fig. 5 depicts a comparison among the physical layers of IEEE 802.11, UMTS and GPRS, at the transmitter side. An preliminary examination permitted to estimate the computational requirements and to split the transmission chain in several pieces, each implemented by a single DSP-based processing unit. However, since operations such
transmission), chip area (estimated by the used FPGA area for a given implementation), computational complexity (in the DSP), cost of components and other aspects. This Cost Function may be used to compare different approaches after an evaluation of the radio performance and the splitting of processing functions between the ASIC and the DSP. In particular, for sub optimal algorithms, the analysis for the DSP devices covers the evaluation of the number of required MIPS for both C5000 and C6000 families, the number and the type of peripherals involved and the memory utilization. The optimization of the sub optimal design may be obtained by refining both the algorithms and the ASIC-DSP functions assignment. 4. An application example: GMBS. In this paragraph we present an example of a software radio application evaluating the complexity requirements of a software deConvolutional Encoder rate 1/2, K = 7
Scrambling
Puncturing rate 2/3, 3/4
Block Bit Interleaving
Mapping BPSK, QPSK, (16/64)-QAM
IFFT + Guard Band Addition
Symbol Shaping
IEEE 802.11 Higher Layers Interface
CRC + Segmentation
Convolutional Encoder rate 1/2, 1/3
Rate Matching
Block Bit Interleaving
Segmentation + Multiplexing
Block Bit Interleaving
Spreading + Amplification
BB - IF Interface
UMTS Block Encoder
Precoding and Tail
Convolutional Encoder rate 1/2, K = 4
DSP BPU
GPRS
Puncturing rate 2/3, 3/4
Block Interleaving
ASIC / FPGA BPU
Fig. 5: Splitting of the algorithms between DSP BPUs and ASIC/FPGA BPUs, for the selected standards, at the transmitter side.
fined radio transceiver for the Global Mobile Broadband System (GMBS), to be implemented in the framework of the SUITED project [4]. In particular, the GMBS terminal will be capable to interact with terrestrial
as symbol shaping or spreading are performed on a high sampling rate stream, FPGA units or ASIC devices are required to fulfill the task, too. Some remarks can be reported, for each standard:
IEEE 802.11: because of the high throughput of this W-LAN standard (up to 54 Mbps) [5], a dedicated unit has to realize OFDM modulation, i.e. an IFFT operation. The symbol shaping is left to FPGA or ASIC
Conclusions The proposed hardware configuration and the software analysis are intended to evaluate the capability and the costs, in terms of hardware and software, of the integration in
CPU Clock bit block CPU Clock/bit DSP core Programming Language Avg. Instruction / clock cycle Scrambling Convolutional Encoder Puncturing Time Interleaving 256 Point IFFT/FFT
28 104 14 17920 222375
127 16 16 17920
0.220 6.5 0.875 1
C55x C55x C55x C62x C62x
C C C Parallel Asm Parallel Asm
6.5 7.25
(include bit reverse ordering)
Tab. 1.Computational complexity of basic processing functions for the selected standards.
devices. This is the most computational demanding standard. Estimated computational complexity: 1000 MIPS. UMTS: according to 3GPP drafts [6], only spreading/amplification requires an ASIC/FPGA device, while encoding, interleaving and rate matching are carried out by DSP units. Estimated computational complexity: 1000 MIPS. GPRS: a single DSP unit is sufficient. Similarities with GSM make it quite easy to implement [7]. Estimated computational complexity: 100 MIPS. The algorithms have been developed in C language, optimised by the DSP Development tools, mainly because programming times are shorter. However, some functional blocks in parallel assembler code have been implemented on the TI C62x to optimize performance. Up to now, the achieved results are listed in Tab. 1. For each algorithm implemented, Tab. 1 reports the number of CPU clock cycles, the length of the processed data blocks in bit, the number of cycles for each processed bit, the core in which the algorithm has been developed, the programming language, and finally for the C62x architecture, the average number of instructions per cycle processed.
one common hardware platform (based on programmable DSPs), of different wireless standards. An example of application, regarding a multi standard transceiver has been presented. References [1] J. Mitola, "The software radio architecture," IEEE Commun. Mag., vol. 33, no. 5, May 1995. [2] V. Bose et al., "Virtual Radios," IEEE JSAC, April 1999. [3] S. Reichhart, B Youmans, R. Dygert, “The Software Radio Development System”, IEEE Personal Communications, August 1999, Vol.6 No.4 [4] P. Conforto, C. Tocci, G. Losquadro, M. Luglio, R. Sheriff, "SUITED/GMBS System Architecture," IST Mobile Communications Summit, 2000. [5] IEEE, IEEE Std 802.11, IEEE Std 802.11-a, IEEE Std 802.11-b, 1997/1999. [6] 3GPP, TS 25.212 V 3.2.0, March 2000. [7] C. Bettstetter, H. Voegel, and J Eberspaecher, "GSM Phase 2+ - General Packet Radio Service GPRS: Architecture, Protocols, and Air Interface," IEEE Communications Surveys, 3rd quarter 1999, vol. 2, no. 3.