Reconfigurable Computing Systems Design: Issues at ... - CiteSeerX

3 downloads 14837 Views 135KB Size Report
key steps in RCS design as application analysis, system partitioning into hardware (HW) and ..... Architectures. • Custom Memory Management Methodology.
1

Reconfigurable Computing Systems Design: Issues at System-Level Architectures K. Solomon Raju, M. V. Kartikeyan, Senior Member, IEEE, R C Joshi and Chandra Shekhar

Abstract— Reconfigurable computing system (RCS) is emerging as an important new paradigm of system design for present and future computing demands of application requirement in performance and flexibility. In this paper, we discuss the issues involved in the design space of Reconfigurable computing system. We have identified nine key steps in RCS design as application analysis, system partitioning into hardware (HW) and software (SW), architectural design space analysis, mapping of the design library onto the architecture, partitioning of fixed HW and RLU of HW part, reconfiguration process, HW and SW synthesis, compilation and scheduling tasks and Integration of all the components. We briefly describe the different models, architectures, compilation and scheduling of tasks, reconfiguration methods, optimal mapping of the design library on the RLU and the state-of-the art of RCSs. Finally, explain how we are going to solve some of the above issues and methods in our system design. Index Terms — RCS, ASIP, Reconfiguration and SystemC

I. INTRODUCTION he conventional computing for the execution of algorithms primarily two methods. One is using Von Neumann computing is programming microprocessors /microcontrollers using software. The second method is making application specific processors or integrated circuits. The first one is more flexible solution with performance degradation. In the later method the system has been designed for particular applications, may not be cost-effective to modify or add more features.

T

Fig.1. Makimoto’s wave modified by Prof. Reiner Hartenstein [2] To fill the gap between HW (ASIC/ASIP) and SW (Microprocessor) approaches one can use reconfigurable computing systems (RCS). Reconfigurable computing (RC) a

K. Solomon Raju, M. V. Kartikeyan and R. C. Joshi are with Electronics and Computer Engineering Department, Indian Institute of Technology, Roorkee – 247 667, India. E-mails: [email protected], [email protected], [email protected] Chandra Shekhar is with Central Electronics Engineering Research Institute (CEERI) Pilani-333031, India – 221 005, India. E-mail: [email protected]

general definition is “Computing via a post-fabrication and spatially and temporally programmed connection of processing elements [1]. Prof. Reiner Hartenstein used different terminology such as Morphware, Configware and Flowware in connection with development of RCS [2]. Mokimoto’s wave which shows system design paradigm shift with time to meet the application as well as user demand modified by Prof. Reiner is shown in Fig.1. To design a RCS to any application one must be aware of the technology of configurable device, How to design an algorithm with that technology? What are the different models and architecture? How to compile a design before execution?. The answers to these questions are the aim of this paper. The design process involves several steps starting from executable specifications. We have identified nine key steps in RCS design as application analysis, system partitioning into hardware (HW) and software (SW), architectural design space analysis, mapping of the design library onto the architecture, partitioning of fixed HW and RLU of HW part, reconfiguration process, , HW and SW synthesis, compilation and scheduling tasks and Integration of all the components. At present no tool is providing all those above steps, to perform automatically. Present EDA tools require incorporation of all the above steps. The process of on the fly reconfiguration is known as run-time reconfiguration (RTR). The architectures which will implement RTR technique may not suitable present architectural design techniques. Section-2 provides an overview of RCSs, such as RCS’s characteristics, classification and different models. Section 3 discusses the related work and scope for the future research areas and Issues in reconfigurable computing systems, Section 4 provides our methodology for obtaining optimized architecture of a given application and mapping of the design library onto the RLU. Section 5 gives the conclusion and future work. Reference are given in the last section. II. RECONFIGURABLE COMPUTING SYSTEMS Conventional computing systems have fixed HW and variable algorithm (SW), to implement a system, but RCS has both HW as well as algorithm (SH) are variables. With this technique, most of the future computational oriented demands of consumer as well as scientific supercomputing applications will be meeting. In this section we explain briefly characteristics, classification and design patterns ™ Characteristics of RCS Reconfigurable configurable computing system normally consisting of a matrix of programmable computational units with a programmable interconnection network superimposed

2

on the computational matrix. The main characteristics that have in RCS are [1] 1. Spatial computation, 2. Configurable datapath, 3. Distributed controland 5. Distributed resources: ™ Classification of reconfigurable architectures (RAs) A large number of reconfigurable architectures have been developed over the years by researchers and the industry. Reconfigurable architectures can be classified based on several different parameters. Broadly the classification taxonomy of the reconfigurable architecture according to criteria is presented in [3] as follows: • Granularity Fine grain, (Ex: Xilinx 6200 series, Altera flex 10k (FG but CG than 6200); 1-4bits), Coarse grain (Ex: Chameleon, RaPid architecture; 4-32bits) and Medium grain (Ex: Garp, CHESS ; 2-16bits) • Host Coupling (Degree of Coupling) Loose system-level coupling (Ex: Splash), Loose chip-level coupling (Ex: System where FPGA will be used as coprocessor.), Tight on-chip coupling (Ex: Virtex-II Pro with embedded PowerPC 405 processor) • Type of Interconnect Network Fixed external network for communication between host and RLU and Reconfigurable external network for communication among configurable logic blocks and functional blocks along with host. • Type of configuration allowed Static Configuration, Dynamic Configuration and Partial or Runtime Configuration. ™ Reconfigurable Computing Models There are five models can be used for the simulation and architecting the RCS those are given in [1] and [3] 1.Single context 2.Multi-context 3.Partially Reconfigurable 4. Pipeline Reconfigurable 5. Mesh model and 5. Hybrid System Architecture model ™ Configuration Because run-time reconfigurable systems involve reconfiguration during program execution, the reconfiguration must be done as efficiently and as quickly as possible. This is in order to ensure that the overhead of the reconfiguration does not eclipse the benefit gained by hardware acceleration. There are a number of different tactics for reducing the configuration overhead. These are configuration prefetching, configuration, compression, configuration caching, Relocation and Defragmentation and partial reconfiguration techniques were existed [3].

III. RELATED WORK AND SCOPE FOR RESEARCH IN RCS The main tasks in RCS design is HW-SW partitioning at specification level, design space for Architectural modeling, HW-SW partitioning at architectural level and scheduling of the partitioned blocks or functional units, mapping of the HW components onto the reconfigurable logic units (RLUs). The entire above process can be divided into nine key steps as mentioned in section 1. The reconfigurable computing systems

(RCSs) advantage could be available if and only if, when we have a proper environment and CAD tools with nine design steps. To have these tasks in an EDA tools there will be a lot of scope in research in this field of work. Some of the contributions towards this work so far done explained in this section in addition to their limitation. The scopes for doing research in a few areas of them are given as follows. 1. 2. 3. 4.

Design of architectures for RCS CAD Synthesis Tools and Compilers for Dynamically Reconfigurable Hardware Task Scheduling for Reconfigurable Hardware HW/SW Codesign Techniques for Dynamically Reconfigurable Architectures

™ Design of architectures for RCS The architectural aspects of any system are more important for the system designer, and till today all the architectures for RCS are either application oriented or fixed target processor and RLU. The investigation of architectural aspects for the RCS is the burning topic to the investigators. RCSs can be classify as fine-grain architectures and/or coarse-grain architectures. Some of the architectures are that developed at research level [4] 1. Datapath FPGA (DP-FPGA), 2. Colt, 3. The Garp 4.Reconfigurable Architecture Workstation (RAW) 5. Morphoing System (MorphoSys has a MIPS-like "TinyRISC") 6. The Reconfigurable Pipelined Datapath (RaPiD) aims at speed-up of highly regular, computationintensive tasks by deep pipelines on its 1-D RA. No architecture in the above list is either for general application or used by any body without their software. Most of them are permitted to academic purpose not feasible for commercial systems. Now technology has been producing partial reconfigurable SRAM based Field Programmable Gate Arrays (FPGAs), such as Virtex-II Pro and Virtec-4 from Xilinx and Stratix-II from Altera etc. are providing partial reconfiguration with almost SOC capacity of gates per chip. ™ CAD Synthesis Tools and Compilers for Dynamically Reconfigurable Hardware Compilers for reconfigurable computing usually borrow the techniques commonly used for parallelism extraction. They take advantage of the research for SIMD and MIMD parallel machines or VLIW [5]. Most of them are developed for a particular architecture (commonly coarse grain), although CAMERON [6] and DEFACTO [7] projects target a general reconfigurable architecture. All these compilers share a very important feature: they use C or very similar notations as the input format. Some compilers, [8], [5], [7], use the Stanford University intermediate format (SUIF) compiler infrastructure [9], which provides a general environment for the exploitation of machine independent optimizations. The temporal partitioning problem is formulated as an integer linear programming (ILP) model and it is solved through an ILP solver. The tools at present are not providing facilities to addresses important issues such as incremental reconfiguration, the control part only synthesis. The present tools for the compilation of RCS is done at least in two environments one is at C/C++| basically and standard HDL

3

simulation environment, this makes the design and testing very difficult and time consuming process in addition to the interface problems. That is multiple languages, multiple environments in a multiple platforms are required to complete design Cycle. Also these tools are not suitable for the latest high-level chips like Virtex-IIPro, Virtex-4 and Stratix II ™ Task Scheduling for Reconfigurable Hardware The scheduling problem in reconfigurable computing is relatively new. Most approaches are versions of existing HLS techniques, extended to consider specific features of reconfigurable systems such as the reconfiguration time. HW/SW scheduling can be classified as static or dynamic. A scheduling policy is said to be static when tasks are executed in a fixed order determined offline and dynamic when the order of execution is decided online. A strategy for the mixed implementation of dynamic real-time schedulers in HW/SW is presented in [10]. In [11] a review of several approaches to control-dominated and dataflow-dominated software scheduling is presented. No above work has been raised architectural issues involved in RCSs. A heuristic technique[12], level-based[13], loop fission[15] and ILP[14] model scheduling algorithms are used for scheduling and temporal partitioning of a task graph to reduce configuration while meeting the other constraints. The works [10]-[15] have not considers architectural issues The only work so far considers explicitly about mapping is [7], they developed mathematical model for mapping of the design onto the reconfigurable logic unit (RLU), at the architectural level but it considers once again vonNuemann’s architecture which may not be suitable for the reconfigurable computing systems. ™ HW/SW Codesign Techniques for Dynamically Reconfigurable Architectures The flexibility of DRL architectures requires the development of new methodologies and algorithms. Earlier approaches to the HW/SW codesign, model the system based on a template of a CPU and an ASIC [16]. HW/SW partitioning and scheduling techniques can be differentiated in several ways. For instance, partitioning can be classified as fine-grained (if it partitions the system specification at the basic-block level) or as coarse-grained (if system specification is partitioned at the process or task level). These previous approaches address the problem of reconfiguration latency minimization, but they do not address HW/SW partitioning and scheduling. A more recent work [17] presents a fine-grained. HW/SW partitioning algorithm (at loop level). Both previous approaches are similar to [18] which take the reconfiguration time into account when performing the partitioning, but they do not consider the effects of configuration prefetching for latency minimization. In [19] work has contributed towards framing a HW/SW codesign methodology with dynamic scheduling for discrete event systems using dynamically reconfigurable architectures, an approach to dynamic DRL multicontext scheduling and a HW/SW partitioning algorithm for dynamically reconfigurable architectures. But this is only simulation work, which addressed HW-SW codesign aspect of an RCS, but it is neither considers practical target architecture nor considers practical

environment for the rest of the system design stages. The recent work [20] has been exploiting the advantage of using real-time operating systems (RTOSs) to provide the run-time support for heterogeneous multitasking of reconfigurable architecture on SoCs. All the above works have not been used directly for latest technological advanced partial reconfigurable devices such as Virtex-II Pro chip from xilinx and SystemC environment for their CAD environment. Our approach combines the advantages of SystemC and Virtex-II Pro device for designing the reconfigurable computing systems discussed in the next section. ™ Issues in Reconfigurable System Design Since reconfigurable computing systems design is new paradigm with lot of advantages, hence most of investigators focusing on this are of research actively since 1995 onwards. In the starting they faced lot of problems because of nonavailability of the technology for reconfiguration or on-the fly configuration. Recently Xilinx’s Virtex-II Pro and Virtex-4 platforms and Alrtera’s Stratix II gives lot of scope in this area to get most of the RCS advantages. The following are the Issues or problems not yet solved completely [21]: ¾

Architectures • Exploration of hybrid architectural models at system level • Exploration of Pipeline, Parallel Architectures and their software Development Environment • Targeting Tiled Architectures in Design Exploration. • Dependability (Data as well as control) analysis for Run-Time Reconfiguration. • Techniques for low-power and high-performance architectures

¾

Memory Communication Architecture (hot research topic in embedded systems) • Storage context transformations • For low power applications • For high performance • Startups provide memory IP or generators

¾

Memory Architectures for • High Performance Embedded Memory Architectures • High Performance Memory Communication Architectures • Custom Memory Management Methodology • Data Reuse Transformations • Data Reuse Exploration

¾

Design Methods • System-Level Modeling of Dynamically Reconfigurable Hardware • Efficient reconfigurable Computing Design Methodologies • Methods to tackle change management of HW/SW resources in Run-time / dynamic configuration environment • Efficient methods for seamless Interface for different environment

4

• • • •

Finding Metrics for generalized reconfigurable architecture characterization. Reducing overheads in reconfiguration Automatic partitioning Co-design never ending

¾

Need of Tools development for i. e. Tool should have provision for • Automated synthesized architecture for a given application and constraints • Logic design mapping onto the FPGAs / RLU (Reconfigurable Logic Unit) at partial and run-time Reconfiguration. • Automated RTR temporal partitioning for reconfigurable embedded real-time system design • HW-SW codesign partitioning

¾

New techniques and algorithms are needed for • Fast routing and placement of FPGAs ii. HW/SW partitioning and Co-design • Mapping algorithms, to map design library functions on RLU in run-time reconfigurable architectures • Methods for effective Utilization and Reconfiguration of MULTIPROCESSOR ENVIRONMENT • Designing an operating system for a Heterogeneous reconfigurable SOC

IV.

OUR APPROACH

We define our problem as “Exploration of various architectural solutions to be implemented on heterogeneous reconfigurable architectures (Reconfigurable SoC) in order to select the most efficient architecture for given one or several applications. Run-time mapping of the design libraries on to the RLUs using partial reconfiguration and/or dynamic reconfiguration while considering design constraint such as performance and cost.” ™ Proposed environment for simulation We have chosen SystemC as simulating languages since it provides both HW and SW components simulation and verification in a single environment. Recently on December 12th 2005 IEEE has recognized this language as standard for doing simulation and verification at system level. The IEEE standard number for systemC is 1666. The proposed simulation flow in SystemC is shown in Fig.2. The nine steps of system design described in seven steps follows. 1. Modeling different architectural choices for a given application which will be optimized in terms performance versus either given constraints or default design constraints after the application analysis. 2. Proposing the optimized reconfigurable the architecture for a given application by exploring the different 'design space of the architecture' for reconfigurable architectures; here we consider that one or more reconfigurable FPGAs. Ex: Virtex-II Pro chips or SOC board which may contain One Processor + Virtex-II Pro. 3. Translating application onto DFG/CDFG or Hybrid architecture depending upon application requirement.4. Partitioning the application

using hardware (HW)-software (SW) partitioning methods and algorithms; here we may use best existing HW-SW partitioning methods and algorithms for our application with two level one level for basic partitioning that is HW-SW tasks and other level is reconfigurable logic block (RLB)-fixed kind of HW (F-HW) partitioning. 5.Design and implementation of the optimized algorithms for mapping of the design library on to the proposed reconfigurable architecture.6. Design and implementation optimized algorithms for scheduling the reconfigurable tasks (RTs), which will be implemented in RLB before mapping the design on to the reconfigurable architecture (RA). 7. Implementation of prototype of the complete system; this involves integration of the entire modules using designed algorithms for scheduling of RTs and mapping of these RTs on to the proposed RA for given application. V. CONCLUSION AND FUTURE WORK

We have described in section 2 overview of the RCS which are essential to know the design of RCS. In section 3 we have explained current research in this area briefly. Section 4 gives our approach to solve the design space of architectural exploration for a given application. In future we are going to implement the proposed approach for the case of SoftwareDefined Radios. Future work will be optimization and exploration of architecture for a given application or group of applications. All the design templates will be in the memory, and design entry is output of the application analyzer program which will give how many integer units, floating-point and special operations are required? What is the repetition rate of the special instructions? Depending up on the above analysis, our algorithm will make architecture using the above parameters and resources availability. To get our architecture optimality, we will consider the aspects like how many functions can be made with reconfiguration? How many functions are made as fixed hardware functions? And what functionality can be achieved through software, if a processor is incorporated in our architecture.

5

REFERENCES [1] [2]

[3]

[4] [5]

[6] [7]

[8]

[9]

[10]

[11] [12]

[13]

[14]

[15]

[16]

[17]

[18]

Fig. 2. Proposed simulation environment flow of RCS design methodology using SystemC environment.

[19]

[20]

[21]

K. Bondalapati andV. Prasanna. “Reconfigurable Computing systems in Proc. IEEE, vol.90, no.7, July 2002, pp.1201-1217 R. Hartenstein, H. Griinbacher (Editors): The Roadmap to Reconfigurable computing - Proc. FPL2000, Aug. 27-30,2000; LNCS, Springer-Verlag 2000 KATHERINE COMPTON and SCOTT HAUCK, “Reconfigurable Computing: A Survey of Systems and Software,” ACM Computing Surveys, Vol. 34, No. 2, June 2002, pp.171-210. R. Hartenstein (invited embedded tutorial): Coarse Grain Reconfigurable Architectures; ASP-DAC'01, Yokohama, Japan, Jan 30 - Feb. 2,2001. T. J. Callahan and J. Wawrzynek, “Instruction-level parallelism for reconfigurable computing,” in Proc. FPL’98 Field-Programmable Logic and Applications 8th Int. Workshop, Tallinn, Estonia, Sept. 1998. W. J. Najjar, B. Draper, A. P. W. Böhm, and R. Beveridge. Cameron Project. [Online]. Available: http://www.cs.colostate.edu /Cameron K. Bondalapati, P. Diniz, P. Duncan, J. Granacki, M. Hall, R. Jain, and H. Ziegler, “DEFACTO: A design environment for adaptative computing technology,” in Proc. 6th Reconfigurable Architectures Workshop (RAW’99), San Juan, Puerto Rico, Apr. 1999. W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Badd, V. Sarkar, and S. Amarasinghe, “Space-time scheduling of instruction-level parallelism on a raw machine,” in Proc. 8th Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, CA, Oct. 1998. R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson, S. W. K. Tjiang, S.W. Liao, C. W. Tseng, M. W. Hall, M. S.Lam, and J. L. Hennesy, “SUIF: An infrastructure for research on parallelizing and optimizing compilers,” ACM SIGPLAN Notices, vol. 29, no. 12, Dec. 1994. V. Mooney and G. De Micheli, ““Real time analysis and priority scheduler generation for hardware-software systems with a synthesized run-time system,” in Proc. Int. Conf. Computer-Aided Design (ICCAD’97), San Jose, CA, Nov. 1997, pp. 605–612. F. Balarin, L. Lavagno, P. Murthy, and A. S. Vincentelli, “Scheduling for embedded real-time systems,” IEEE Design and Test, Jan–Mar. 1998. “Scheduling for dynamically reconfigurable FPGAs,” in Proc. Int. Workshop Logic and Architecture Synthesis, Grenoble, France, Dec. 1995, pp. 328–336. K. M. GajjalaPurna and D. Bhatia, “Temporal partitioning and scheduling for reconfigurable computing,” Proc. IEEE Symp. FPGAs for Custom Computing Machines, pp. 329–330, 1998. M. Kaul and R. Vemuri, “Optimal temporal partitioning and synthesis for reconfigurable architectures,” in Proc. Design, Automation, and Test in Eur. (DATE), Paris, France, Feb. 1998, pp. 389–396. M. Kaul, R.Vemuri, S. Govindarajan, and I. Ouaiss, “An automated temporal partitioning and loop fission approach for FPGA based reconfigurable synthesis of DSP application,” in Proc. Design Automation Conf. (DAC), Atlanta, GA, Oct. 1999, pp. 616–622. R. Ernst, J. Henkel, and T. Benner, “Hardware–software cosynthesis for microcontrollers,” IEEE Design Test Comput., vol. 10, pp. 64–75, Dec.1993. K. Chatta and R.Vemuri, “Hardware–software codesign for dynamically reconfigurable architectures,” in Proc. of FPL’99, Glasgow, Scotland, Sept. 1999. J. Fleischman et al., “A hardware/software prototyping environment for dynamically reconfigurable embedded systems,” in Proc. CODES’98, Seattle, WA, Mar. 1998. Juanjo Noguera and Rosa M. Badia, “HW/SW Codesign Techniques for Dynamically Reconfigurable Architectures”, IEEE Transactions on VLSI Systems, Vol. 10, NO. 4, August 2002, pp 399-415. E. Caspi, A. DeHon, J. Wawrzynek,, “A streaming multithreaded model” , in: Proceedings of the Third Workshop on Media and Stream processors, in conjunction with MICRO- 34, 2001. Andr´e DeHon and John Wawrzynek, “Reconfigurable Computing: What, Why, and Implications for Design Automation”, technical report.