Coarse grain parallelization of evolutionary ... - Semantic Scholar

10 downloads 0 Views 827KB Size Report
Jul 12, 2009 - A first prototype of the EASEA [4] (EAsy Specification for. Evolutionary Algorithms .... Back in 1965, Moore predicted that the evolution of tech-.
Coarse Grain Parallelization of Evolutionary Algorithms on GPGPU Cards with EASEA Ogier Maitre

Laurent A. Baumes

Université de Strasbourg, LSIIT, FDBT, France

Instituto de Tecnologia Quimica, UPV-CSIC Valencia, Spain

[email protected]

[email protected] Nicolas Lachiche

Avelino Corma

Pierre Collet

Université de Strasbourg, LSIIT, FDBT, France

Instituto de Tecnologia Quimica, UPV-CSIC Valencia, Spain

Université de Strasbourg, LSIIT, FDBT, France

[email protected]

[email protected]

[email protected] ABSTRACT

researchers have been interested in using them for evolutionary computation, due to the inherent parallelism of these algorithms. However, surprisingly enough, even though many papers have been published on the challenging implementation of Genetic Programming on these cards, after extensive search, only three (different) papers [9, 8, 12] could be found that addressed the implementation of standard evolutionary algorithms (with a fixed genome size and a common evaluation function for all individuals), but with overly complex implementation decisions and results that do not encourage to follow their steps, looking at the involved work. The aim of this paper is to test the basic idea of running the evolutionary algorithm on the host CPU, and evaluating all individuals in parallel on the GPGPU card. The EASEA language [4] has been revived in order to help non expert GPGPU programmers obtain comparable results to those presented in this paper. The paper starts by examining the state of the art. Then comes a description of what GPGPUs are and some of their more and less desirable characteristics will be pointed out, that is followed by a section that briefly recalls the history of the EASEA language and its functionalities. Finally, results are presented on a standard benchmark and a real-world problem, followed by a discussion on the presented work.

This paper presents a straightforward implementation of a standard evolutionary algorithm that evaluates its population in parallel on a GPGPU card. Tests done on a benchmark and a real world problem using an old NVidia 8800GTX card and a newer but not top of the range GTX260 card show a roughly 30x (resp. 100x) speedup for the whole algorithm compared to the same algorithm running on a standard 3.6GHz PC. Knowing that much faster hardware is already available, this opens new horizons to evolutionary computation, as search spaces can now be explored 2 or 3 orders of magnitude faster, depending on the number of used GPGPU cards. Since these cards remains very difficult to program, the knowhow has been integrated into the old EASEA language, that can now output code for GPGPU (-cuda option).

Categories and Subject Descriptors G.1.6 [Mathematics of Computing]: Numerical Analysis—Optimization

General Terms Performance

Keywords Parallelization, evolutionary computation, genetic algorithms, GPGPU, Graphic Processing Unit, EASEA

1.

INTRODUCTION

Ever since GPGPU (General Purpose Graphic Processing Unit) cards have appeared on the market a few years ago,

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GECCO’09, July 8–12, 2009, Montréal Québec, Canada. Copyright 2009 ACM 978-1-60558-325-9/09/07 ...$5.00.

1403

2.

STATE OF THE ART

For some reason, it seems that most efforts to efficiently use GPGPUs in the domain of evolutionary computation have been made in the field of Genetic Programming (GP), even though GP hardly satisfies the intrinsic programming constraints of GPGPUs. As will be briefly discussed later on, GPGPU cards are in fact very powerful massively parallel computers that have (among others) one main drawback: all the elementary processors on the card are organised into larger multi-processors that must all execute the same program (SPMD model, for Single Program Multiple Data). Inside each multi-processor, all elementary processors must execute the same instruction at the same time but possibly on different data (SIMD model, for Single Instruction Multiple Data). The obvious parallel part in evolutionary algorithms is the

evaluation of the population, but in the case of GP, the code to be executed for the evaluation of all the individuals is potentially different, because all individuals are different from each other and the evaluation of a GP individual is obtained by running the individual on some data (potentially identical among the population). So to sum up, where GP wants to execute different programs (individuals) on identical data (learning set), GPGPUs are designed to do exactly the opposite, i.e. execute identical programs on different data. So a lot of imagination is required from researchers to get these cards to work against their design, but GP being extremely CPU-greedy, it is very understandable that these courageous attempts should be made. On the opposite, standard evolutionary algorithms need to run an identical evaluation function on different individuals (that can be considered as different data), meaning that this is exactly what GPGPUs have been designed to deal with. However, for some strange reason, very few researchers have gone this obvious way, and when they did, they made strange choices, with over-complicated implementations [12, 8, 9]. The most basic idea that comes to mind when one wants to parallelize an evolutionary algorithm is to run the evolution engine in a sequential way on some kind of master CPU (potentially the host computer CPU), and when a new generation of children has been created, get all children to evaluate in parallel on a massively parallel computer. This may however sound like a bad idea because at each generation, it is necessary to transfer the whole population to the parallel computer and get the results back. Maybe this feared transfer/overhead time is what stopped everyone from trying this simplistic idea, but no paper could be found that tried this simple route. Even though this seemed such a trivial thing to do, it is exactly what has been tested out in this paper, based on the principle that one should always explore the obvious to make sure it is really not good enough, before spending a lot of time and energy to optimise things out. Probably in order to adopt a more refined technique, [12] implement a refined fine-grained algorithm with a 2D toroidal population structure stored as a set of 2D textures, with the complete algorithm running on the GPGPU (which poses a serious problem since these cards do not have a random number generator, so before going on the GPGPU, they create a matrix of random numbers that is stored in GPGPU memory for future reference). Anyway, a ×10 speedup is obtained, but on a gigantic populations of 5122 individuals. [8] find that standard genetic algorithms are ill-suited to run on GPGPUs because of such operators as crossovers (that would slow down execution when executed on the GPGPU) and therefore choose to implement a crossover-less Evolutionary Programming algorithm here again entirely on the GPGPU card. The obtained speedup of their parallel EP “ranges from 1.25 to 5.02 when the population size is large enough.” [9] implement a Fine Grained Parallel Genetic Algorithm once again on the GPGPU, to “avoid massive data transfer.” For a strange reason, they implement a binary GA even though GPGPUs have no bit-operators, so go into a lot of trouble to implement a single point crossover. So probably for fear of being too slow or non-optimal, these teams of researchers seem to have skipped the most basic implementation that will be explored in this paper. In an attempt to simplify the use of GPGPUs, the EASEA language has been revived for the occasion, that will al-

low both replicability and ease of use for non GPGPU expert programmers who would like to try their algorithms on GPGPU with minimal effort, for the price of a GPGPU card.

3.

PRESENTATION OF EASEA

Back in 1998, an INRIA “Cooperative Research Action” ARC called EVOLAB started between four French research laboratories with the aim to come up with a software platform that would help out non expert programmers to try out evolutionary algorithms to optimise their applied problems. A first prototype of the EASEA [4] (EAsy Specification for Evolutionary Algorithms, pronounce [i:zi:]) language was demonstrated at the EA’99 conference in Dunkerque, and new versions regularly came out on its web-based Sourceforge software repository until 20031 . In the meantime, the EASEA language (along with GUIDE [5], its dedicated Graphic User Interface) was used for many research projects and at least 4 PhDs, as well as for teaching evolutionary computation in several universities. Before 2000, the output of the EASEA compiler was a C++ source file that was using either the GALib or the EO libraries for their implementation of the evolutionary operators. By 2000, the DREAM [1] (Distributed Resource Evolutionary Algorithm Machine) European Project was starting, and it needed a programming language. EASEA became the programming language of the project, and it was modified so as to output java sourcecode for the DREAM, while feeding on the same input files, for this was another nice feature of the EASEA language: the same .ez file could be compiled to create code for GALib, EO [10] or then the DREAM, depending on a single option flag. This functionality allowed replicability across platforms and computers, as a published .ez program could be recompiled and executed with minimal effort on a Mac, a PC running under Windows or Linux, or any other machine where one of the target GALib / EO / DREAM libraries was installed. Development of the EASEA language stopped with the end of the DREAM project in 2003, but its ability to be able to generate a fully functional and compilable source code from a simple C-like description of the evolutionary operators (namely initialiser, evaluator, crossover operator and mutator) that were needed for a particular problem made it look like a wonderful solution to allow non-expert programmers to use GPGPUs. Due to their origins and history, these cards need a long time to understand, and are quite difficult to program. So the idea behind the revival of EASEA was that through using a possibly old .ez file, compiling it using the flag -cuda on the command line would have the compiler produce code that would directly run on the GPGPU card.

Brief overview of EASEA The idea behind EASEA was to allow virtually any basic programmer to try out an evolutionary algorithm by just typing the code that was specific to the problem to be solved. The code for the implementation of the GPGPU algorithm that tries to minimise the Weierstrass test function presented below therefore does not contain much more than the following lines:2 1

http://sourceforge.net/projects/easea/ For the sake of replicability, all the .ez programs used in this paper are available on 2

1404

\User classes : GenomeClass { float x[N]; }

\GenomeClass::mutator : for (int i=0; i