A Comparison and Performance Evaluation of FPGA ... - IEEE Xplore

3 downloads 0 Views 740KB Size Report
LAMIH, University of Valenciennes, France. Email: [email protected]valenciennes.fr. Mouna Baklouti. National Engineering School of Sfax, Tunisia.
2016 11th International Design & Test Symposium (IDT)

A Comparison and Performance Evaluation of FPGA Soft-cores for Embedded Multi-core Systems Mariem Makni LAMIH, University of Valenciennes, France Email: [email protected]

Mouna Baklouti National Engineering School of Sfax, Tunisia Email: [email protected]

Smail Niar LAMIH, University of Valenciennes, France Email: [email protected]

Mohamed Wassim JMAL and Mohamed Abid National Engineering School of Sfax, Tunisia Email: [email protected] [email protected]

Abstract—Field Programmable Gate Arrays (FPGAs) are increasingly being considered as an effective solution to cope with current performance requirements of embedded systems due to their reconfigurability, scalability and their lower cost solution. The increasing configurable logic capacity of the FPGA has enabled designers to integrate a large number of softcore processors into FPGA devices. Evaluating the performance of existing soft-cores presents a great challenge for designers to select the most efficient and the suitable soft-core for a specific software application. This paper presents an overview of soft-core processors that are used in embedded systems. We compare different open-source and commercial soft-cores such as openFire, LEON3, Microblaze, etc, based on major architectural features. We also evaluate the impact of the selected soft-core processors on the total execution time and the FPGA area consumption using different applications.

I. I NTRODUCTION New applications such as massively data-parallel applications [1] are more and more embedded (e.g. signal processing, multimedia, etc). They demand complex multi-core Systemon-Chip (SoC) designs to meet their real-time processing requirements while respecting other critical embedded design constraints, such as low energy consumption and reduced implementation size. Multi-core Parallel Processing Systemon-Chip (MPPSoC) has been proposed as a promising solution for all these previous requirements. MPPSoC is an intellectual property (IP) based multi-core parallel architecture [2]. It is widely used to exploit data intensive parallelism for numerous data-parallel applications. MPPSoC architecture is composed of a main processor and several processing elements (PEs) used to compute the main critical parts of the massively parallel program. A soft-core processor is a hardware description language (HDL) model of a specific processor that can be customized for a given application and synthesized for an ApplicationSpecific Integrated Circuit (ASIC) or FPGA target [3]. Softcore processors are more advantageous than their hard-core counterparts due to their reduced cost and flexibility of reconfiguring the same core as the application changes.

978-1-5090-4900-4/16/$31.00 ©2016 IEEE

The increasing capacity and complexity in latest FPGA devices such as Xilinx Zynq-7000 All programmable SoCs [4], makes the selection of the efficient FPGA soft-core a challenging and time consuming task for designers. In addition, the large number of existing soft-cores make the selection process difficult. A hard-core processor has dedicated silicon area on the FPGA. This allows it to operate with a core frequency similar to that of a discrete microprocessor. However, a hardcore processor does not provide the ability to reconfigure it to better meet the needs of the application, nor does it allow for the flexibility of adding a processor to an existing SoC design to provide more processing capabilities [3]. In an effort to address these challenges, soft-core processor is proposed as a promising solution to be entirely implemented in the FPGA. Soft-core/MPPSoC approach has the advantage of using lower cost FPGA parts and enabling a custom number of softcore processors. These important advantages can reduce area consumption and improve embedded system performance by reducing the total execution time and the energy consumption for specific embedded applications. Homogeneous MPPSoCs are Symmetric Multi-Processing (SMP) systems where identical PEs with same Instruction Set Architecture (ISA) are used. Since homogeneous MPPSoCs replicate identical tiles, they are scalable and easy to program. Several FPGA vendors offer soft-core processors that designers can implement using a standard FPGA: Altera [5] offers the NIOS soft-core, Xilinx offers the PicoBlaze and the MicroBlaze soft-cores [6], OpenCore offers OpenRISC soft-cores [7] and Gailer Research offers LEON soft-core [8]. Here, each processor presents a set of parameters and characteristics that make the choice as an important challenge. The soft-core’s features will also significantly influence performance of the whole system. The most challenge for designers is how to select the suitable soft-core architecture to implement complex embedded applications on the most efficient MPPSoC. However, the lack of precise specification and documentation in the softcore processors may cause a problem in the design of recent MPPSoC such as the increase of both the complexity and the time-to-market (TTM). The main goal of this paper is

154

2016 11th International Design & Test Symposium (IDT) to compare and evaluate the performance of different architectures of commercial and open-source soft-cores in order to select the appropriate ones for implementing MPPSoC designs. Based on major architectural features and significant aspects such as stability and usability, we choose two soft-cores: the commercial Microblaze [9] and the open-source LEON3 [10]. The selected soft-core processors are compared, implemented and discussed in this paper. The major contributions of this paper are: • An overview of open-source and commercial soft-core processors and their classifications. • A comparison of the different soft-cores based on major architectural characteristics and parameters. • The implementation and the evaluation of the selected soft-cores (LEON3 and Microblaze), which offer a high stability and usability using two different MPPSoC implementations. The remainder of this paper is organized as follows. Section II presents an overview of soft-core processors and some existing MPPSoC designs. Section III compares the selected softcores. Section IV presents the proposed MPPSoC architectures in this work. The experimental results to evaluate different MPPSoC designs with the implementation of the selected processors are discussed in Section V. Section VI concludes the paper with a brief outlook on future work. II. RELATED WORKS: A SURVEY OF SOFT-CORE PROCESSORS AND MPPS O C DESIGNS In this section, we will survey the available soft-core processors provided by commercial vendors and open-source communities. We also investigate some architectures of existing MPPSoC designs, which are based on the reconfigurable logic such as FPGA platforms. A. An overview of existing MPPSoCs MPPSoC has become the latest trend in embedded systems to obtain maximal utilization of the billions of transistors on a chip. This trend has been followed by several processor vendors in different embedded domains such as the Aeroflex Gaisler GR712RC processor [11], implementing a dual-core LEON3 processor for space domain, the Freescale MPC5510 [12], implementing a dual-core processor for automotive and avionics domain. In [13], the authors present an architecture with four MB LITE soft-cores in a NoC architecture. They have implemented this architecture on Xilinx Virtex-5. In the recent smartphones, Apple’s iPhone 5 has integrated two processors while Samsung’s Galaxy S3 has implemented four processors in the MPPSoC architecture [14]. Homogenous soft-cores are all exactly the same: equivalent frequencies, functions, cache sizes, and so on. For example, ARM’s MPCore [15] contains four identical ARM11 processors with same ISA, connected to a shared memory. A recent example of homogenous MPPSoC architectures includes IBM’s Power7 chip [16], which has eight processor cores integrated in a SoC design.

B. An overview of available Soft-core processors Soft-core processors are widely used to execute numerous embedded applications due to cost effectiveness and platform independence. When implementing a MPPSoC design that integrates multiple soft-core processors, it can be difficult to decide which soft-core processor to use. To help designers with this decision, we present a comparison of the features of several soft-core processors. There are many aspects that determine the usability and the stability of a soft-core such as compiler and assembler, design documentation, tool chains, etc. Table I shows a comparison of the main characteristics of available open-source and commercial soft-core processors that we have inspected. The first column presents the features across which the soft-core processors are being compared. The subsequent columns show the available features inside of each soft-core processor. Each core surveyed has different performance characteristics and architectural parameters that are suitable for specific applications. Thus, designers should select a soft-core based on the requirements and performance constraints of their particular application. MicroBlaze [9] and PicoBlaze [17] are the leading soft-core processors provided by Xilinx [6]. On the other hand, OpenFire [18], OpenRISC1200 [19], MB Lite [20], AeMB [21] and LEON3 [10] are popular open-source soft-cores provided by OpenCores [7]. 1) Picoblaze soft-core: Picoblaze [17] is a fully embedded 8-bit Reduced Instruction Set Computer (RISC) microcontroller core developed by Xilinx that can be synthesized in some FPGA families. The PicoBlaze processor is a small, costeffective soft-core that is useful for simple data processing applications with minimal peripherals. However, many experiments show the limits of Picoblaze. Among its limits, we cite the lack of design documentation, the incorrect simulation results, the absence of C compiler and the small size of the instruction memory. All these limits make the implementation a difficult task for a designer. Furthermore, its architecture is quite complex so it will be hard to make some customizations and modifications. 2) OpenFire soft-core: OpenFire soft-core [18] is a simple Microblaze clone, which supports Microblaze ISA and compiler tool chain. It has FSL ports (FIFO ports) and 3-stages of pipeline. It is developed for configurable processor research and provides the same flexibility of his open-source counterparts. Thus, because of its simplicity, it will not require a large area. Moreover, we can only use 4 KB for instruction and 4 KB for data memory in order to limit the on-chip area. However, no stable compiler and design methodology were applied for this soft-core. Furthermore, the documentation that describes the processor architecture and the process of verification is shallow. Therefore, the designers take more effort and time to understand the processor and add it to their embedded designs. 3) OpenRISC soft-core: The OpenRISC1200 (OR1200) [19] is a synthesisable core provided by OpenCores. The Verilog RTL description is released under the GNU Lesser General Public License (LGPL). This soft-core features a 32-bit and 64-bit RISC architecture suitable for numerous

155

2016 11th International Design & Test Symposium (IDT) TABLE I F EATURES C OMPARISON OF SOFT- CORES Category Maximum Frequency (MHz)

Microblaze

LEON3

OpenRISC1200

OpenFire

AeMB

MB Lite

250

400/183 (ASIC/FPGA)

300/185 (ASIC/FPGA)

198

136

229

Interface

FSL, OPB, PLB, LMB

AMBA 2.0

Wishbone

OPB, FSL

Wishbone

Wishbone

Pipeline (Stages) Architecture Language Implementation Address/ Data Bus

7-stages

7-stages

5-stages

3-stages

3-stages

5-stages

Microblaze VHDL FPGA

Sparc V8 VHDL FPGA/ASIC

ORBIS Verilog FPGA/ASIC

MicroBlaze Verilog FPGA

MicroBlaze Verilog FPGA

MicroBlaze VHDL FPGA

32-bits

32-bits

32/64-bits

32-bits

32-bits

32-bits

applications including networking, telecom and automotive applications. OR1200 can efficiently run any modern operating system. However, few FPGA development boards support the OR1200 soft-core. In addition, this processor has complicated debug solutions that make its implementation a difficult task for designers. Furthermore, many IP blocks in the open-source code of OR1200 processor are not maintained. The Wishbone bus is somewhat outdated. 4) MB Lite soft-core: The MB Lite [20] processor is also considered as a Microblaze clone. Similar to aeMB and OR1200 architectures, MB Lite processor supports the wishbone bus. It has a Harvard architecture with 5-stages of pipeline. Unfortunately, it has very low usability due to the lack of existing compiler and assembler to build an embedded system. Furthermore, it is difficult for a designer to compile an application program with the MB Lite tools due to the complexity of using the compiler to build a simple embedded system. It is also relatively complex to be customized for the exploration of research projects. 5) aeMB soft-core: The aeMB soft-core [21] has an architecture compatible to the MicroBlaze for most software commands. It uses the Wishbone interface for both data and instruction memory bus. aeMB has a Harvard architecture with a separate 32-bit instruction and data busses. The address space for each bus can be separately configured with core parameters. It has a 3-stage pipeline that is capable of executing one instruction per clock. It supports hardware multiplier and barrel shifter configurations. The main limits of this soft-core are the complexity and the absence of a detailed design documentation and development tools. Besides, the aeMB provides incomplete and unstable open-source implementations of the MicroBlaze. Moreover, it does not have any peripherals nor interrupt controllers, although support for external interrupts is provided. The aeMB soft-core has the lowest clock frequency (136 MHz).

and reliable embedded systems. Because of its high performance, the Xilinx MicroBlaze was selected as a central node or a main control unit in many embedded designs. LEON3 with 7-stages of pipeline achieves the highest frequency in all open-source soft-cores. The maximum frequency of opensource soft-core is proportionate to the pipeline depth. To implement a massively parallel system based on small PEs that can be integrated in large quantities on FPGA, LEON3 and Microblaze soft-cores are the best candidates due to their efficiency and high performance. The level of portability and efficiency of the Picoblaze, OR1200, MB Lite, aeMB and openFire are very limited compared to the features of LEON3 and Microblaze soft-cores. The majority of the current state-of-the-art investigates the use of soft-core processors as PEs for developing multi-core system applications [22] [23]. However, few studies have evaluated the soft-core performances on MPPSoC implementations. The authors in [24] have evaluated three soft-cores namely LEON2, MicroBlaze and OpenRISC to measure the execution time and the area consumption, using Dhrystone and Standford benchmarks. In [25], the authors present performance evaluation techniques of the soft-cores. However, they only evaluate the performance of the different configurations of a single Microblaze soft-core using two synthetic benchmarks: Dhrystone and Whetstone. In addition, they did not compare the numerous features and the performance of existing soft-cores to validate their choice. Based on the main aspects of usability and stability, Microblaze and LEON3 are selected to be implemented and compared in this paper. Table II shows a comparison of Microblaze and LEON3 soft-core processors. TABLE II M ICROBLAZE VS LEON3 Features

LEON3

Microblaze

- No licences are required for

III. M ICROBLAZE VS LEON3 Advantages

A comparison of the features of the soft-cores demonstrated that the Xilinx MicroBlaze and the LEON3 soft-cores have the highest performance and stability. The MicroBlaze has a RISC architecture and has been designed for easy implementation

research and education use.

- Can be integrated in all Xilinx FPGA devices.

- All RTL source code is available.

- Includes a lot of configuration options.

- Fast support.

- Uses the AXI standard bus.

- Linux and RTOS can be installed.

Disadvantages

- Not all FPGA development boards

- Can be used only in Xilinx FPGAs.

are supported.

- Development tools need a licence.

- Not in widespread use.

- Source code not available.

156

2016 11th International Design & Test Symposium (IDT) In the next section, we will evaluate two different MPPSoC designs. The first MPPSoC presented in Figure 1, includes quad-core Microblaze soft-cores. The second one is a simple MPPSoC composed of four LEON3 soft-core (Figure 2). The feature of stability of these two soft-cores guarantees the performance of the whole SoC system. IV. P ROPOSED MPPS O C DESIGNS MPPSoC architectures are widely used in embedded and mobile systems. These architectures are primarily intended to match the required constraints in terms of area, performance and energy consumption. In this section, we present two different MPPSoC designs, used for evaluating the selected soft-core processors. A. A quad-core MicroBlaze architecture MicroBlaze soft-core [9] has a Harvard architecture with a separate 32-bit instruction and data buses to execute programs and access data from both on-chip and external memory. Figure 1 shows a simple quad-core Microblaze architecture. The proposed design is defined as a master-slave system, where one Microblaze processor acts as the master processor, controlling the behavior of the other Microblaze soft-cores, which act as slave PEs. The master is also charged for mapping the parallel program on the different PEs and gathering the executed results.

integer multiplier unit (mul), floating-point unit (FPU), barrel shifter unit (BS) and integer divider unit (ID). B. A quad-core LEON3 architecture LEON3 [10] is a synthesisable VHDL model of a 32 bit processor compliant with the SPARC V8 architecture developed by Aeroflex Gaisler [8]. The model is highly configurable and suitable for several embedded systems. The main objective of the proposed architecture (Figure2) is to provide an open, portable and non-proprietary processor design, capable to meet requirements for performance, software compatibility and low system cost. LEON3 is the most advanced opensource processor, which offers an easily customizable processor for many embedded systems. It supports both symmetric and asymmetric multiprocessing systems and embedded real time operating systems such as Linux, RTEMS, etc. It also has optional features such as debug unit, timers, UARTs, interrupt controllers and a hardware floating-point unit for high-performance applications.

Fig. 2. LEON3 MPPSoC design

Fig. 1. Microblaze MPPSoC design

On-chip memory can be accessed by MicroBlaze using a Local Memory Bus (LMB). Microblaze soft-core has a lot of configuration options, allowing the designer to customize it for a specific application with relative ease. The peripherals are connected via the Processor Local Bus (PLB). In addition, each Microblaze is attached to a PLB and accesses instructions and data from a 64 KB dual-ported local memory (BRAM) for storing its program. In the asymmetric multiprocessor programming (AMP) system, each Microblaze has different independent peripherals and memories. The processors are connected to a 64 KB shared memory for sharing larger amount of data. All the reconfigurable logic, including processors and buses are connected to a 100 MHz clock. To improve the performance of the MicroBlaze, the designer can adjust a number of features through the setting parameters, such as

Figure 2 shows our proposed MPPSoC design, composed of four LEON3 soft-cores running at 100 MHz. All processors are connected using Advanced Microcontroller Bus Architecture (AMBA) shared bus. The AMBA-2.0 AHB/APB bus has been selected as the common on-chip bus due to its market dominance and because it is well documented and can be used for free without licence restrictions. The LEON3 can be easily configured using graphical interfaces. The proposed architecture is a scalable system with centralized 64 KB shared memory. Each processor has their individual local memory and peripherals. LEON3 uses the AMBA Advanced Highperformance Bus (AHB) as a main on-chip communication bus for IP access. AMBA APB bus is designed for low bandwidth control accesses such as register interfaces on system peripherals. V. EXPERIMENTAL RESULTS The MPPSoC architectures presented in Section IV are implemented on Xilinx Virtex-5 using corresponding Xilinx EDK tools. All hardware and software configurations were kept as similar as possible to provide comparable application

157

2016 11th International Design & Test Symposium (IDT) scores. The experimental results are compared and discussed in this section. To evaluate the performance of the proposed MPPSoC designs, we estimate the hardware area consumption and the total execution time in order to choose the most efficient configuration, which takes the minimum execution time with smaller hardware area.

open-source soft-core, the larger pipeline depth, the higher maximum frequency can be achieved. LEON3 configured with a larger cache memory needs more FPGA BRAM resources. As a result, they use most memory blocks.

A. Synthesis Results

In this subsection, the efficiency of the selected soft-cores will be measured using the JPEG encoder/decoder applications executed on different multi-core architectures. The proposed multi-core architectures aim to take advantage of the data-level parallelism existing in JPEG application. The Communication and synchronization between the different cores are obtained through a shared memory. When a master processor finishes writing an image in the shared memory, a slave processor is activated by setting its flag. The corresponding processor starts executing its program and stores the resulting image in the shared memory. All the processors execute the JPEG application on different images. For the performance evaluation of the quad-core LEON3 architecture, we use the Linux distribution recommended by Gaisler. The execution times include the process of writing input data in the shared memory and collecting the resulting data, which is done on the master processor. As illustrated in Figure 4, the JPEG decoder/encoder applications were partitioned over the four Microblaze processors. The total execution time for a given application is calculated by multiplying the clock driver by the total instruction cycle count. As illustrated in Figures 4 and 5, the execution time is estimated in second (sec). Figure 4 shows the benefits of the Xilinx MicroBlaze soft-core for both JPEG encoder and decoder applications. The implementation of this soft-core in a multi-core system reduces the total execution time of the two applications. In addition, Figure 4 also demonstrates the advantages of configuring Microblaze soft-core to evaluate the embedded applications in terms of execution time and area size constraints.

The area consumption presents one of the metric in the choice of embedded systems, which require an optimal area. In this paper, we measure the total execution time and the area consumption of the various configurations using JPEG decoder/encoder applications. JPEG (Joint Photographic Experts Group) is a widely used still image compression standard, very popular in embedded systems, especially in multimedia devices and digital cameras. The JPEG standard includes encoding and decoding algorithms. MicroBlaze and LEON3 soft-cores include a large set of configurable parameters that can be set at the design time. For open-source soft-cores, the FPGA resource utilization depends on the architectural parameters. Figure 3 shows the area results of implementing the proposed MPPSoC systems with both the FPU and cache enabled. The main criteria for the area usage are Look-Up Tables (LUTs), the Block RAMs (BRAMs) resources and the slice registers (FF) of an FPGA. Occupation indicates the percentage of FPGA resources utilized by the architecture. As illustrated in the Figure 3, the configuration with four LEON3 processors consumes 78% of the available LUTs, 37% of FF and 56% of the available BRAMs within the FPGA.

B. Timing Performance

Fig. 3. Slice percentage occupation of the proposed MPPSoC systems measured on the Xilinx Virtex-5

As shown in Figure 3, it is interesting to note that the hardware implementation of the configuration with four LEON3 processors consumes an important number of LUTs compared to the configuration with four Microblaze processors. The LEON3 uses more LUT resources than the Microblaze when utilizing the FPU unit. Here, the LUT resources limit the number of LEON3 processors. BRAM and FF occupations are almost the same in both MPPSoC architectures. The different implementations need an important number of BRAM resources (about 60%) to implement the applications. For

Fig. 4. Execution time in second of JPEG encoder/decoder applications measured on the quad-core Microblaze architecture

On the other hand, Figure 5 presents the total execution time of the applications implemented on a quad-core LEON3 design. The implementation with four Microblaze processors

158

2016 11th International Design & Test Symposium (IDT)

Fig. 5. Execution time in second of JPEG encoder/decoder applications measured on the quad-core LEON3 architecture

achieves about 3-fold speedup compared to the configuration with four LEON3. For the JPEG encoder application (Figure 4), the total execution time for the configuration with a single Microblaze core and is about 16 seconds. On the other hand, we note that the quad-core Microblaze configuration reduces the execution time by about 6% compared to the single Microblaze configuration. VI. D ISCUSSION Despite its complexity, the experimental results confirm that LEON3 provides a performance level close to MicroBlaze, which offers an interesting alternative for a commercial processor (Figure 5). LEON3 provides the visibility, flexibility and portability that we expect in an open-source hardware design. LEON3 soft-core is superior to Microblaze processor in the LUT resource utilization when enabling the FPU option. The FPU used within the LEON3 is designed to be a more capable hardware instantiation than the Microblaze FPU; thus, the extra resources needed. For FF and BRAM resource utilization, both processors consumed about the same number of resources. Microblaze soft-core has a higher clock frequency compared to the LEON3 soft-core but LEON3 still has a high level of flexibility, which balances computing performances and area cost to meet embedded system requirements. A great advantage of the LEON3 processor is that it uses a structured organization of folders and VHDL source files, which influences the usability and the scalability in a positive sense. This processor is very reliable and hence is used in a large number of military and space applications. Commercial soft-cores including Microblaze, have the advantages of configurability and optimizations due to the availability of development tools from FPGA vendors. For applications that require customizations, the open-source processor can be chosen as a good candidate to improve the performance and reduce the power consumption of the embedded system. VII. C ONCLUSION MPPSoC architectures have been proposed as a possible solution to tackle the complexity of forthcoming embedded

applications. These new systems are very complex to design since they have to be able to run various complex applications (e.g. video processing, or multimedia), while meeting several additional design constraints, such as real-time performance or energy consumption. To achieve the required performance, soft-core processors seem to be a promising solution. They offer the flexibility to configure several MPPSoC designs and therefore reduce the TTM. This paper presents an overview of several open-source and commercial soft-cores. A selection of suitable soft-cores is presented. Experimental results demonstrate that the choice of the suitable soft-core architecture has a significant impact on the system performance. As a future work, we intend to propose an FPGA-based multi-cluster heterogeneous MPPSoC architecture, which adopts the hybrid interconnection, composed of both busbased and NoC architecture for inter-cluster communication. This architecture supports several PEs and different hardware accelerators. R EFERENCES [1] Meilander, W. C., Baker, J. W. and Jin, M. 2003. Importance of SIMD Computation Reconsidered. In International Parallel and Distributed Processing Symposium, 2003. [2] M. Baklouti, Ph. Marquet, M. Abid and JL. Dekeyser. A design and an implementation of a parallel based SIMD architecture for SoC on FPGA. In Proc. DASIP, 2008. [3] P. Yiannacouras , J. G. Steffan and J. Rose, ”Exploration and customization of FPGA-based soft processors”, IEEE Trans. Comput.Aided Design Integr. Circuits Syst., vol. 26, no. 2, pp. 266-277, 2007. [4] Xilinx. Zynq-7000 All Programmable SOC. www.xilinx.com. [5] Altera Corp. http://www.altera.com. available at 2012. [6] Xilinx Inc. http://www.xilinx.com. available at 2012. [7] Opencores.org Website, www.opencores.org, June 2006. [8] Gaisler Research Website, www.gaisler.com, June 2006. [9] Xilinx. Microblaze processor reference guide embedded development kit edk 7.1i. 2012. [10] LEON3 Processor Users Manual XT Edition, Gaisler Research, October, 2008. [11] Aeroflex Gaisler AB. GR712RC Dual-Core LEON3FT SPARC V8 Processor. www.gaisler.com/index.php/products/components/gr712rc. Accessed: 2014-01-10. [12] Freescale: http://www.freescale.com/files/32bit/doc/fact sheet/MPC5510FS.pdf. [13] Tamar Kranenburg and Rene van Leuken ,MB-LITE: A robust, lightweight soft-core implementation of the MicroBlaze architecture, 2010. [14] G. Martin, Multi-processor soc-based design methodologies using configurable and extensible processors. J. Signal Process. Syst. 53(12), 113127 (2008). [15] J. Goodacre, A. Sloss, Parallelism and the arm instruction set architecture. Computer 38, 4250, 2005. [16] Ron Kalla, Balaram Sinharoy, William J. Starke, and Michael Floyd. 2010. Power7: IBMs Next-Generation Server Processor. IEEE Micro 30, 2 (MarchApril 2010), 715. DOI:http://dx.doi.org/10.1109/MM.2010.38. [17] Picoblaze.[online].Available: http://www.picoblaze.info/cores.html. [18] OpenFire.[Online].Available:http://www.opencores.org/project/openfire/. [19] D. Lampret, OpenRISC1200 IP Core Specification, 2001. [20] MB-Lite. [Online]. Available: http://www.opencores.org/project/mblite/. [21] aeMB. [Online]. Available: http:/www.opencores.org/project/aemb/. [22] Berkeley Design Technology Inc., ”Evaluating DSP Processor Performance”, 2002. [23] T. Stolze, Kramer, Fengler: Benchmarking: ClassicDSPs vs. Microcontrollers, Paper IMCIC, 2010, Orlando. [24] Lizy Kurian John, Performance evaluation: Techniques, tools and benchmarks, Electrical and computer engineering department, The university of Texas at Austin. [25] I.MHADHBI, N.Rejeb, SBEN.OTHMEN, SBEN.SAOUD, ”Performance Evaluation of FPGA Soft Cores Configurations Case of Xilinx MicroBlaze”, International Journal of Computer Science, Communication & Information Technology (CSCIT), 2014.

159