ARTS: A SystemC-based framework for ... - Semantic Scholar

23 downloads 28542 Views 821KB Size Report
Nov 22, 2007 - We present the application and platform models of ARTS as well as their implementation ... Given the high development cost and often.
Des Autom Embed Syst (2007) 11: 285–311 DOI 10.1007/s10617-007-9007-6

ARTS: A SystemC-based framework for multiprocessor Systems-on-Chip modelling Shankar Mahadevan · Kashif Virk · Jan Madsen

Received: 9 February 2006 / Accepted: 22 October 2007 / Published online: 22 November 2007 © Springer Science+Business Media, LLC 2007

Abstract One of the challenges of designing a heterogeneous multiprocessor SoC is to find the right partitioning of the application for the target platform architecture. The right partitioning is dependent on the characteristics of the processors and the network connecting them as well as the application. We present an abstract system-level modelling and simulation framework (ARTS) which allows for cross-layer modelling and analysis covering the application layer, middleware layer, and hardware layer. ARTS allows MPSoC designers to explore and analyze the network performance under different traffic and load conditions, consequences of different task mappings to processors (software or hardware) including memory and power usage, and effects of RTOS selection, including scheduling, synchronization and resource allocation policies. We present the application and platform models of ARTS as well as their implementation in SystemC. We present the usage of the ARTS framework as seen from platform developers’ point of view, where new components may be created and integrated into the framework, and from application designers’ point of view, where existing components are used to explore possible implementations. The latter is illustrated through a case study of a real-time, smart phone application consisting of 5 applications with a total of 114 tasks mapped onto different platforms. Finally, we discuss the simulation performance of the ARTS framework in relation to scalability. Keywords Heterogeneous MPSoC · NoC · Abstract RTOS model · Task graph · Scheduler · Allocator · Synchronizer · OCP

This work has been partially funded by ARTIST2 (IST-004527). S. Mahadevan () · K. Virk · J. Madsen Computer Science & Engineering Section, Informatics & Mathematical Modeling, Technical University of Denmark, Lyngby, Denmark e-mail: [email protected] K. Virk e-mail: [email protected] J. Madsen e-mail: [email protected]

286

S. Mahadevan et al.

1 Introduction The design of modern, highly-complex embedded computing systems on a single chip, System-on-Chip, is a challenging endeavor. Given the high development cost and often short time-to-market demands, these systems are developed as domain-specific heterogeneous multiprocessor platforms (MPSoC) which can be programmed and/or configured to fit a particular application or set of applications. They are, typically, designed under rigorous resource and performance constrains such as speed, size and power. The large amount of embedded software as well as the large number of interacting components, requires that hardware/software partitioning, hardware/software interaction, and communication planning, which are crucial issues of the design process, are properly addressed. We present ARTS, a SystemC-based abstract system-level modelling and simulation framework, which allows MPSoC designers to model and analyze the different layers, i.e., application software, middleware and platform architecture, and their interaction prior to implementation. In particular, we focus on streaming applications as found in many multimedia processing applications. ARTS provides a simulation engine that captures cross-layer properties, such as the impact of OS scheduling policies on memory and communication performance, or of communication topology and protocol on deadline misses. Hence, ARTS can be used to explore and trade-off different design choices. An ARTS system model (Fig. 1c) of an embedded computing system is formed by mapping components of an ARTS application model onto computing components of an ARTS platform model. The application model is represented as a set of task graphs (Fig. 1a), where each task represents a sub-element of the application which is considered an atomic unit during mapping. The platform model is composed of computing components interconnected through communication components (Fig. 1b), where each computing component takes care of executing the tasks mapped to it. Thus, a computing component may be a programmable processor, a dedicated hardware accelerator or an operating system executing on a processor. All components interact through an event-driven model. The ARTS framework is implemented in SystemC which has several advantages. Besides being an industry standard for transaction-level modeling and system-level design, SystemC offers the possibility to co-simulate hardware and software, hence, bridging different layers, and to co-simulate system components at different abstraction levels. In the ARTS framework, we have separated the encapsulation of task/application properties, such as execution time, memory requirements, power usage, deadline constraints, etc., from the properties of the platform instance, i.e., the number/type of processing elements, and the selection of RTOS and interconnect. Figure 2 shows a design flow using the ARTS modelling framework. A user has to provide the application model described as set of task graphs (each representing a specific application), and the platform model which consists of platform components, i.e., processing elements (PE) and communication networks, described in one file and a platform instance described in another file. These descriptions are Fig. 1 ARTS example a application model, b platform model, c mapped system model

ARTS: A SystemC-based framework for multiprocessor

287

Fig. 2 Using the ARTS framework

expressed in a simple language called the ARTS scripting language. The user then has to provide a mapping of the application onto the platform, also in terms of an ARTS script. Prior to mapping, a characterization of each task mapped onto each processing element has to be done. This characterization tells whether a task τi can execute on a processing element PEk , and if so, how many cycles it will take to execute, how much memory is required, etc. When loaded into the ARTS framework, a SystemC model of the complete system is created and simulated. The output from simulation is a set of files providing run-time profiles and system characteristics, such as task execution profiles, bus contentions, memory and energy profiles, etc. This allows the user to investigate the merits of the solution and to explore alternative solutions by applying different mappings or by changing the platform and/or the application, as outlined in Fig. 2. Previously, in [17, 19] and [18], we have introduced some of the components of the ARTS framework. In this paper, we present the ARTS framework as a whole, i.e., the application and platform models with focus on SystemC implementation. The design flow when using ARTS, through a case study of a hand-held multimedia application running 5 applications with a total of 114 tasks and SystemC performance discussion is also explored. The rest of the paper is organized as follows. In Sect. 2 we discuss related work and, in particular, work which has been built in SystemC. Sections 3 and 4 presents the basic application and platform model along with implementation details. Section 5 discusses the usage of ARTS from an Application Designers and a Platform Designers point of view. Whereas Sect. 6 illustrates the usage of the ARTS framework through a case study. Section 7 discusses simulation performance in relation to scalability of the ARTS model. Finally, Sect. 8 concludes the paper.

2 Related work Within the realm of simulation-based approaches for MPSoC design, SystemC is a de-facto language for system modelling. SystemC allows to address the MPSoC design-related problems at many abstraction levels, with varying detail of the MPSoC layers (i.e., application, operating system and hardware). Other C/C++ based languages have been proposed such as SpecC [9] and SoCOS [6] of OCAPI. In [14, 21], C/C++ libraries are implemented that provide API’s to model and access RTOS functionality for the overlaying applications. Additional mechanisms to interface with an HDL simulator are also discussed in [14]. In [10],

288

S. Mahadevan et al.

SpecC is the basis of RTOS-centric MPSoC model. A methodology, based on abstract modelling, has been presented in [29], the method is based on SpecC and is limited to model preemptive schedulers for single-processor systems. SystemC has adopted and enhanced many of the features of SpecC, such as channel-based communication and support to model hardware behavior. SystemC, with its built in communication and synchronization mechanisms, has diminished the need to build cumbersome custom libraries to model inter-processor interactions. However, support for RTOS requirements, such as preemption, is still missing. We have contributed in this direction by creating an event based model to handle preemption. Transaction-level modelling (TLM), has been widely used for MPSoC design, e.g. [3, 7, 11, 21, 22]. The use of SystemC in the TLMs, discussed in [21, 22], is primarily, motivated by the ease of describing MPSoC models for on-chip bus at different abstraction levels. In [2], the issue of cosimulation and emulation of an MPSoC model described in SystemC for cycle-true simulation is addressed. Cycle-true models based on SystemC have been proposed in [16] (MPARM) and [8]. The MPARM cycle-true model embeds an ISS within a SystemC simulator. For the OS, it has the port of RTEMS [1] available in the framework. This enables a detailed RTOS scheduling analysis, including preemption. The cycle-true representation, while being very accurate for RTOS performance evaluation, however, impacts the simulation speed and the scalability of the system. Further, the time required to investigate the performance impact of relatively-minor changes in the systems modelled in such a way is, often, inflated by the implementation time and, then, by a relevant simulation time. System-level MPSoC models, for design-space exploration of real-time applications, have been proposed in [10, 13, 14, 20, 23]. In [20], the RTOS functionality has been integrated into task execution. Preemption is modelled via method calls in the task SystemC class. The motivation for this approach is to reduce the simulation overhead of communicating with the independent OS layer, which will not be scalable in the event of a large number of tasks. The drawback of this approach, however, is reduced flexibility in realizing different OS scheduling schemes as they need to be carefully incorporated in task execution. A SystemC POSIX library which models preemption is proposed in [23]. Here, for hardware interrupts, instead of spontaneous suspension of the task, delay equal to the time between preempt and resume event is tracked, and appended at the end of task execution. Thereby maintaining timing consistency in the model. The approach followed by [26] describes a methodology based on the principle of composition to model real-time systems though the approach is not extended to modelling of real-time systems implemented on heterogeneous multiprocessor platforms. Another approach, followed by VEST [27], is based on the functional description of multiprocessor real-time systems providing modelling capabilities and automatic generation of different components including the operating system. However, the focus of both of these is a level of abstraction lower than the one in our approach. In [4], a high-level performance model for multi-threaded, multiprocessor systems is presented. This approach is based on modelling a layer of schedulers in an abstract manner which resembles the aim of our approach. The ARTS framework presented here combines many of the features of the models described above. It provides a layer of RTOS API’s on top of SystemC, similar to the SpecCbased model in [10], and uses the principles of decomposition described in [26] to realize the RTOS features. However, our ARTS modelling framework is significantly different from either a purely behavioral encapsulation of application code or the detailed cycle-true simulators. The focus is on leveraging model decomposition and on implementing preemptive capability in SystemC. Such preemptive capability is crucial to capture complex behaviour between RTOS and application. We now detail the workings of ARTS that demonstrates this difference.

ARTS: A SystemC-based framework for multiprocessor

289

3 ARTS application model A complex embedded application, such as a multimedia application, is often composed of a set of specific applications. Each application, ai , may be modelled as a task graph or application graph, i.e., a directed acyclic graph (DAG) Ga = (Va , Ea ), where the vertices in Va represent tasks, i.e., computations which contain instructions that execute sequentially. The edges in Ea represent data dependencies among the tasks with the direction of an edge indicating the direction of dataflow. Thus, an edge ei,j ∈ E represents data transfer from task τi to task τj (τi ≺ τj ). Each edge is associated with a weight indicating the amount of data which has to be exchanged between the corresponding tasks. Each task graph is characterized by end-to-end deadline constraint. 3.1 Generic Task Model A task, τi , represents an atomic unit of computation and is characterized by a set of parameters which are used to determine the timed execution of the task (see Fig. 3a), {rk , oi , sk , BCET i , WCET i , dk , Ti , cswi }, where rk is the release time, oi is the release-time offset, sk is the start time, BCET i is the best-case execution time, WCET i is the worst-case execution time, dk is the deadline, Ti is the period (if periodic), and cswi is the context switch time. Additionally, a task may have resource requirements and precedence constraints. The behavior of a task is modelled as a finite-state machine (FSM) which can be represented by a quintuple, TFSM = Q, , δ, q0 , F , where Q is a set with four states, Q ={idle, ready, running, preempted},  is a set of input alphabet,  = {cperiod , run, preempt, resume, crunning }, δ is the state transition function, δ : Q ×  → Q, q0 is the initial state and F is a set of final states, F = {idle} = q0 (as indicated in Fig. 3b). The state transition function, δ, is described as follows: upon initialization, each task starts in the idle state. If the task’s offset value is zero and all its parent tasks have finished execution and communicated data (which is available at its input), it transits to the ready state and issues a ready message to the RTOS scheduler. The task remains in the ready state until it receives a run command from the RTOS scheduler upon which it transits to the running state. When the task has finished its execution, it issues a finished message to the RTOS scheduler and transits back to the idle state. At any time during its execution, a task may be preempted by the scheduler and it then enters into the preempted state where

Fig. 3 a Task Timing Model. b Task Behavioral Model

290

S. Mahadevan et al.

it waits till it receives a resume command from the RTOS scheduler which enables it to reenter the running state. The vertices are strict with respect to both their inputs and their outputs which means that, a task associated with a vertex cannot begin execution until all its input data have been communicated to it through communication message transfers, and no output data is available until the computation has finished and at that time all output data are available for communication simultaneously. 3.2 Task mapping and specific task models In order to be executed, the application tasks have to be mapped onto an execution platform. Tasks are mapped onto processing elements, while the edges between the tasks mapped to different processors are decomposed into three dependent tasks—two I/O tasks and a message task—in order to take cross-layer communication issue into account. The introduction of I/O and message tasks has an immediate effect of modifying the task graph topology i.e. τi ≺ τio ≺ τmx ≺ τio ≺ τj . This results in three types of tasks, 1. Computation Tasks: The computation tasks are executed under the direct control of the RTOS(s) and can be accordingly periodic, aperiodic, or sporadic. 2. Message Tasks: The message tasks represent the transmission of inter-processing element communication through the on-chip communication network and they execute on the communication processor modelling the on-chip communication network. 3. I/O Tasks: The I/O task represents the usage of the I/O devices on a multiprocessor SoC platform, e.g., the communication network interfaces. These tasks form, a link between the RTOS model and the on-chip communication network model with which they are interfaced using specific interface protocols (e.g., OCP TL0/1, etc.). The vertices and edges of the original task graph are associated with weights which determine the execution and communication times when mapped to a platform. The positive weight w(vi ) associated with vertex vi ∈ V represents the number of clock cycles need to execute the task on a given processing element, i.e. there exist a weight for each possible mapping, w(vi , PEk ). Hence, the actual execution time is calculated as t (vi , PEk ) = w(vi , PEk )/f (PEk )

(1)

where f (PEk ) is a function which returns the clock frequency of the processing element PEk . As an example, for a task which consumes 41 580 clock cycles (w(vi , PEk )) to execute on a processor running at a clock frequency of 25 000 KHz (f (PEk )), t (vi , PEk ) = w(vi , PEk )/f (PEk ) = 41 580/25 000 = 1663 μs. The weight associated with an edge represents the amount of data to be transferred. This weight is transformed into a weight for the vertex representing the communication, i.e., the message task. Similar to the execution time, the actual communication time is based on the bandwidth of the link. The access to the link, however, is determined dynamically based on the current state of the system (thus, accounting for any arbitration delays). A detailed account for the various communication characteristics may be found in [15]. 3.3 Task model implementation in SystemC The abstract SystemC base class object and its functions implement the task model, including the actual state transitions shown in Fig. 3b. This is demonstrated in Fig. 4 for periodic

ARTS: A SystemC-based framework for multiprocessor

291

class PerTask : public abs_task { public: SC_HAS_PROCESS(PerTask); PerTask(sc_module_name name_, // unsigned int taskID_, unsigned int appID_, // unsigned int period_, unsigned int deadline_, // unsigned int offset_) // { SC_METHOD(state_machine); sensitive_pos