A Massively Parallel Virtual Machine for SIMD Architectures - hikari

Advanced Studies in Theoretical Physics Vol. 9, 2015, no. 5, 237 - 243 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/astp.2015.519

A Massively Parallel Virtual Machine for SIMD Architectures M. Youssfi and O. Bouattane Lab. SSDIA, ENSET University Hassan II of Casablanca, Morocco M. O. Bensalah University Mohammed V of Rabat Faculty of Science, BP 1014 Rabat, Morocco Copyright © 2015 M. Youssfi, O. Bouattane and M. O. Bensalah. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract In this paper, we present a new model of a massively parallel Single Instruction Multiple Data (SIMD) structure machines in a distributed system. Among the modeled machines, we distinguish the linear, 2D, 3D meshes, pyramidal structures and GPU structure. All these computers are based physically on a multitude of fine grained processing elements (PE) arranged and coupled according to their associated topological pattern. In this model the host is represented by a distributed agent. Each virtual host agent, deployed in a physical computer, manages a local parallel virtual computer composed by a set of virtual processing elements (VPE). Each VPE is represented by a self threaded object. The distributed virtual host agents are interconnected throw a multi agent system platform. The developed software is based on a hard kernel of a parallel virtual machine in which we translate all the physical properties of its different components. This kernel is based on an abstract layer which can be easily extended to the other topological structures. To implement a parallel program in the proposed platform, we have developed a new parallel programming language based on XML and its compiler which allow editing, compiling and running parallel programs. To illustrate the performance of this model, we present an example of a parallel program implementation for the edge detection of a brain MRI image presenting pathology.

238

M. Youssfi, O. Bouattane and M. O. Bensalah

Keywords: Parallel Programming, SIMD, Emulation, Parallel Virtual Machine

1. Introduction In the field of high performance computing, the computationally intensive applications using high volumes of data require huge computing power and data storage. Today, the parallel and distributed computing is more necessary than ever. New massively parallel architectures have emerged and are being heavily exploited. This is the case of the Mesh Connected Computer (MCC) [1], the Reconfigurable Mesh Computer (RMC) [2, 3] or the graphics processing unit (GPU) [4]. However, the performance of these architectures remains limited. The theorists innovate more and more by imagining new parallel machines having well adapted topologies to specific problems. These innovations lead to new algorithms for high performance calculation. Unfortunately, the technology is not able to follow the scientist’s imaginations. Due to the unavailability or to the high cost of this kind of real parallel machines, we conclude that creating parallel virtual architectures is the first good way to validate and execute the parallel algorithms on serial machines. In this context, the realization of an emulator for SIMD parallel machines has been the first exploited model in our research works [5, 6, 7, 8]. This emulator has been of great use to elaborate, validate and test new parallel algorithms without need the real parallel machines. However, emulating a parallel machine on a single processor machine does not introduce any enhancement in terms of performance at runtime. It is therefore necessary to look for other ways to achieve these emulated parallel applications in a real environment that offers a high performance gain. In this paper, we present a new model to describe and emulate a polymorphic massively parallel single Instruction Multiple Data (SIMD) structure machines in a distributed system. Among the modeled machines, we distinguish the linear, 2D, 3D meshes, pyramidal structures and GPU structure. All these computers are based physically on a multitude of fine grained processing elements (PE) arranged and coupled according to their associated topological pattern. In this model the host is represented by a distributed agent. Each virtual host agent, deployed in a physical computer, manages a local parallel virtual computer composed by a set of virtual processing elements (VPE). Each VPE is represented by a self threaded object. The distributed virtual host agents are interconnected throw a multi agent system platform. This feature allows combining the performances of a set of physical computers to build a high performance massively parallel virtual machine. The developed software is based on a hard kernel of a parallel virtual machine in which we translate all the physical properties of its different components. This kernel can be easily extended to the other mentioned topological structures. To implement a parallel program in the proposed platform, we have developed a new parallel programming language based on XML and its compiler which allow editing, compiling and running parallel programs.

A massively parallel virtual machine for SIMD architectures

239

This paper is organized as follows. In Section 2, we will present the proposed massively parallel computational model. In the newt section, we illustrate the performance of this model by presenting an example of a parallel program implementation for the edge detection of a brain MRI image presenting pathology. Finally, the last section gives some concluding remarks and some exploiting perspectives.

2. The Proposed Parallel computational model In this work, our virtual Reconfigurable Parallel Computer (PRC) is based on the reconfigurable 2D mesh easily extensible to the other mentioned architectures. A 2D Reconfigurable Mesh Computer is a massively parallel machine having M x N Processing elements (PEs) arranged on a 2-D matrix. It is a Single Instruction Multiple Data (SIMD) structure, in which each PE is localized in Cartesian coordinates (i, j). The PEs can carry out arithmetic and logical operations using its Arithmetic and Logic Unit (ALU). They can also carry out reconfiguration operations to exchange data over the mesh. The PEs can use a shared a memory. Each PE is a self Thread autonomous component. All the PEs are managed by a virtual host manager represented by a distributed agent that is designed to load parallel program and global data to distribute them over the mesh of PEs for any execution. A virtual host agent can communicate with other host agents deployed in the distributed system. This feature gives the possibility to combine the performances of a set of distributed physical computers in order to build a massively parallel virtual machine.

Thread

Agent

ALU AbstractVirtualPE

AbstractParallelVirtualComputer

* * Mesh3

GPU

Other

VPEImpl

Port

OtherVPEImpl

Figure. 1: The UML class diagram of the main components of the parallel virtual computer

The proposed parallel virtual machine is represented by a set of distributed agents. Each agent represents the host of its local virtual machine. Each host has a set of

240


virtual processors arranged according to the topology of the virtual machine (2D, 3D pyramidal, etc...). The virtual PE is a self threaded objet which is ready to run the basic instructions of the parallel program broadcasted by the virtual host. In the proposed model, we have defined an abstract layer of the virtual machine which represents the core of this framework. We have defined some concrete implementations of the virtual machine, but the programmer can add other implementations extending the features of the model. The UML class diagram of Figure 1 shows the main components of the proposed model. In the realized framework, the parallel programmers must edit or open an edited parallel program and compile it before its execution. In the developed emulator, we have proposed a new parallel programming language based on XML language according to a specific developed XML schema.

3. Application In this section, we present an example of a parallel algorithm implementation for contour detection of a gray levelled image using Sobel operator [9]. The operator uses two 3×3 kernels which are convolved with the original image to calculate approximations of the derivatives: one for horizontal changes, and one for vertical. If we define A as the source image, and Gx and Gy are two images which at each point contain the horizontal and vertical derivative approximations, the computations are as follows [10] : −1 Gx = [−2 −1

0 0 0

+1 +2] ∗ A and +1

+1 Gy = [ 0 −1

+2 0 −2

+1 0 ]∗A −1

Where * here denotes the 2-dimensional convolution operation. The x-coordinate is defined here as increasing in the right direction, and the y-coordinate is defined as increasing in the down direction. At each point in the image, the resulting gradient approximations can be combined to give the gradient magnitude, using: G = √Gx 2 + Gy 2 In the sequential algorithm the complexity of this algorithm is evaluated to N where N represents the pixel count of the image. We will show that in this parallel implementation the complexity is Ɵ One time. The structure of a parallel program implemented in the defined XML language is described as bellow: 0: 1: 2: 3: 4: 5: 6: 7:

 Instruction 1: Loading the image the massively virtual machine : As shown in figure 2, in this step, the gray level value of each pixel of the image is loaded in the register 0 of each PE of the virtual computer.  Instruction 2: The statement is used by the host agent to mark all the virtual PEs which are designated to participate to the parallel computing.  Instruction 3: As shown in figure 3, In this instruction each PE will exchange the data stored in register 0 with its 8 neighbours, if they exist. The received data are stored in other registers reg[1], reg[2], reg[3], reg[4], reg[5], reg[6], reg[7] and reg[8]  Instructions 4, 5 and 6 : Each PE compute Sobel operator Gx, Gy and G, then save them, respectively, in reg[9], reg[10] and reg[1]. PE[-1,j-1] 0 0 0

0 10 0

0 0 0

PE[i-1,j] 0 0 0

PE[i,j-1] 0 0 0

0 2 0

0 22 0

0 0 0

0 0 0

PE[i,j] 0 0 0

0 0 0

PE[i+1,j-1] 0 0 0

0 11 0

PE[i-1,j+1]

0 0 0

0 20 0

0 44 0

0 0 0

0 0 0

PE[i,j+1] 0 0 0

PE[i+1,j] 0 0 0

0 33 0

PE[-1,j-1]

0 0 0

0 4 0

0 0 0

0 8 0

0 11 20

0 10 2

PE[i,j-1] 0 0 0

PE[i+1,j+1] 0 0 0

0 10 2

PE[i-1,j]

0 0 0

Figure 2: Data registers contents of 9 PEs after loading the image

0 0 0

10 2 22

11 20 44

PE[i+1,j-1] 0 0 0

2 22 0

20 44 0

0 11 20

PE[i-1,j+1]

0 33 4

PE[i,j] 10 2 22

11 20 44

20 44 0

0 33 4

33 4 8

0 20 44

0 4 8

0 0 0

PE[i+1,j+1] 4 8 0

20 44 0

4 8 0

0 0 0

Figure 3: Data registers contents of 9 PEs after the exchanging data operation

Figure 4 shows the parallel program result of a component contour detection of a gray levelled image using Sobel operator. The image in figure 4.a represents the input image representing a brain magnetic resonance image. The resulted output image after the parallel Sobel algorithm is shown in figure 4.b.

Figure 4:

0 0 0

PE[i,j+1]

PE[i+1,j] 2 22 0

0 11 20

a) b) Results of the parallel program for contour detection using parallel Sobel operator.

242


4. Conclusion In this paper, we have presented a new model of a polymorphic massively parallel single Instruction Multiple Data (SIMD) structure machines in a distributed system. In this model the host is represented by a distributed agent. Each virtual host agent, deployed in a physical computer, manages a local parallel virtual computer composed by a set of virtual processing elements (VPE). Each VPE is represented by a self threaded object. The obtained parallel virtual machine and its programming language compiler can be used as a high performance computing system. This platform can be used to resolve massively parallel applications related to other domains. In this model, we have proposed an abstract layer representing the core of the framework to allow other developers adding new extensions for new parallel virtual machine structures. Using a new parallel programming language based on XML is a good solution for this platform. However, we are working to develop other modules which allow writing parallel programs using the scientific programming languages like C++, Java or Fortran.

References [1] R. Miller et al, “Geometric algorithms for digitized pictures on a mesh connected computer,” IEEE Transactions on PAMI, Vol. 7, No. 2, pp. 216–228, 1985. http://dx.doi.org/10.1109/tpami.1985.4767645 [2] T. Hayachi, K. Nakano, and S. Olariu, “An O ((log log n)²) time algorithm to compute the convex hull of sorted points on re-configurable meshes,” IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 12, 1167– 1179, 1998. http://dx.doi.org/10.1109/71.737694 [3] R. Miller, V. K. Prasanna-Kummar, D. I. Reisis, and Q. F. Stout, “Parallel computation on re-configurable meshes,” IEEE Transactions on Computer, Vol. 42, No. 6, pp. 678–692, 1993. http://dx.doi.org/10.1109/12.277290 [4] Yue Zhao, Francis C.M. Lau, "Implementation of Decoders for LDPC Block Codes and LDPC Convolutional Codes Based on GPUs," IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 3, pp. 663-672, March 2014, doi:10.1109/TPDS.2013.52 [5] M. Youssfi, O. Bouattane and M. O. Bensalah “A Massively Parallel Re-Configurable Mesh Computer Emulator: Design, Modeling and Realization” J. Software Engineering & Applications, 2010, 3: 11-26 doi:10.4236/jsea.2010.31002 Published Online January 2010.

A massively parallel virtual machine for SIMD architectures

243

[6] Bouattane, B. Cherradi, M. Youssfi and M.O. Bensalah « Parallel c-means algorithm for image segmentation on a reconfigurable mesh computer » ELSEVIER. Parallel computing, 37 (2011) pp 230-243. http://dx.doi.org/10.1016/j.parco.2011.03.001 [7] M. Youssfi, O. Bouattane, and M.O. Bensalah “ On the Object Modelling of the Massively Parallel Architecture Computers” Proceedings of the IASTED Inter. Conf. Software engineering, February 16 - 18, 2010, Innsbruck, AUSTRIA. Pp 71-78. [8] M. Youssfi, O. Bouattane, and M.O. Bensalah “Parallelization of the local image processing operators. Application on the emulating framework of the reconfigurable mesh connected computer”. Proceeding of scientists meeting in Information Technology and Communication JOSTIC. Rabat, November 3-4, 2008. pp 81-83. [9] Sobel, I., An Isotropic 3×3 Gradient Operator, Machine Vision for Three Dimensional Scenes, Freeman, H., Academic Press, NY, 376-379, 1990. [10] R. Gonzalez and R. Woods Digital Image Processing, Addison Wesley, 1992, pp 414 -428. Received: February 4, 2015; Published: March 9, 2015