Fitness Uniform Selection to Preserve Genetic

An Evolutionary Algorithm for the Multi-objective Optimisation of VLSI Primitive Operator Filters Robert Thomson and Tughrul Arslan Department of Electronics and Electrical Engineering The University of Edinburgh Scotland Robert.Thomson,Tughrul.Arslan @ee.ed.ac.uk

Abstract—This paper introduces an evolvable hardware system for the generation of optimised FIR filter designs. This system converts the frequency domain specification of a filter directly into a circuit netlist. The filter designs are optimised with respect to an accurate model of the silicon area and latency.

I. INTRODUCTION Evolvable Hardware is the application of Evolutionary Algorithms (EAs) to the problem of the creation of novel circuit designs. This paper is concerned with the creation of digital circuits using high-level components, such as adders and subtracters. Evolvable Hardware systems perform two main types of work: circuit creation, and circuit optimisation. Circuit creation is the construction of a circuit design that meets a functional specification. Circuit optimisation is concerned with the properties of the design other than functionality — factors that might include circuit area, power consumption, speed of operation, fault tolerance, and testability, among many others. The EA systems described in this paper only attempt to optimise the silicon area and the latency of a circuit. Many evolvable hardware systems are purely concerned with circuit creation, and do not attempt any optimisation. There are also many systems that perform optimisation but not circuit creation. If the size of the chromosome is limited, the area and latency of the circuit will be indirectly limited. Systems which specify a topology for the final circuit, such as FPGA-based systems, will also tend to place limits on the area and delay of the circuit. While limiting the area and delay of the circuit will prevent the creation of extremely poor-quality solutions, it is not a substitute for circuit optimisation. There are several common approaches to the combination of circuit creation with circuit optimisation. One simple approach is to evolve circuits for functionality, and then to provide extra rewards for circuits that are both functionally correct and efficient [1]. This approach could suffer due to the fact that optimisation is something that is performed after a working circuit design has been found, rather than alongside circuit creation. This could lead to early design decisions being taken without regard to circuit optimisation, possibly tying the system into Robert Thomson is supported by EPSRC studentship number 0030467X.

0-7803-7282-4/02/$10.00 ©2002 IEEE

the use of a non-optimal design. The advantage of this approach is that, because optimisation is not performed all of the time, the EA is likely to run faster than systems that do perform optimisation all of the time. Among systems that perform both circuit creation and optimisation throughout the course of evolution, the commonest approach is to construct a weighted sum of several different performance measures. This is equivalent to alternating between different selection criteria during the course of evolution. This method has the disadvantage that it involves adding together quantities that are not equivalent. The designer must choose the relative worth of the different metrics, and set the various weights accordingly. This type of optimisation works, but will tend to converge towards a single optimal design. During multi-objective optimisation, the best individuals in the population will form a Pareto surface. Unless the system is provided with extra information about the properties of the desired circuit, there is no reason to prefer one Pareto point over another. The Pareto-optimal set of individuals can be extracted from the final population, and the designer can then choose the design that is most appropriate for a particular situation [2]. A Pareto surface based selection scheme is mentioned by Goldberg [3]. In this scheme, the Pareto-optimal points are labelled with a value of 1. The Pareto surface is removed from the population. The new Pareto-optimal points are labelled with a value of 2, and also removed. Successive Pareto surfaces are labelled, and then removed from the population. This results in a ranking of the individuals according to status within the population, as shown in figure 1. This is a measure of fitness that tends not to be biased towards any particular optima. It is not always necessary for an EA to discover the entire Pareto surface in order to cover the designs that are of interest to the circuit designer. If the designer provides feedback during the course of evolution, the search can be focused on a single part of the Pareto surface [4]. There are systems that design digital filters, by evolving filter co-efficients that meet a particular filter specification. Some of these systems perform multi-objective optimisation [5], [6]. This paper takes a lower-level approach, and describes the evolution of circuit netlists. Primitive Operator Filters (POFs) perform multiplication using only primitive operators such as addition, subtraction, and bitwise shifts. These operators have lower resource require-

4

1

4

3

cients. The chromosome is shown in figure 2. source

shift

2 2 1

source

1 1 key:

Individual Pareto surface Non−optimal surfaces

Fig. 1. A population, where individuals are labelled according to rank. The axes are unlabelled because this could apply to any minimisation problem.

input 1

shift

sign

input 2

op 1

op 2

op 3

operations

op 4

tap tap tap 1 2 3 taps

Fig. 2. A breakdown of the contents of the chromosome.

ments than dedicated multipliers. A POF circuit will generally be more efficient than an equivalent system that uses multipliers. There are algorithms for the creation of efficient constant factor multipliers using primitive operators [7]. The problem of POF creation is NP-complete [8]. An optimal circuit can not usually be created from a filter specification. There are several suboptimal heuristic schemes for the creation of POFs [8], [9]. EAs have been used for the creation of POFs [10], [11]. To increase the computational efficiency, EAs are often combined with heuristic methods when creating POFs. II. THE GENETIC ALGORITHM This EA uses a system. An initial population of 100 parents is extended by the addition of 100 child chromosomes. The new chromosomes are created through the application of mutation and crossover to randomly selected members of the parent population. The population size is then reduced down to 100 survivors — up to 10 of those are chosen through elitism, and the rest are chosen by size-2 tournament selection. The speed, silicon area, and functional correctness, are found for each circuit. These three different fitness measures are combined using a three-dimensional implementation of the Pareto selection scheme described by Goldberg [3]. Size-2 tournament selection favours individuals that are close to the Pareto surface. Elitism has been used. The elitism operator preserves one individual from each Pareto point in the population. If this would result in the preservation of more than ten individuals, only the ten Pareto points with the highest fitnesses are preserved. This filter creation problem is highly epistatic. The majority of changes to the chromosome are destructive. The use of elitism combined with a system ensures that the best individuals are not eliminated from the population. This is essential for the successful operation of the EA.

Every operation in the chromosome is performed by an adder. Either input to an adder can be shifted left by an amount, if necessary. The input to an adder can be connected to the result of any adder, or to the input to the entire circuit. The description of a coefficient specifies the source of the value, the amount to shift the value, and whether the coefficient should be negated. The source of a coefficient can be the result of any adder, or the input to the whole circuit. A coefficient can also be set to a constant value of zero. A fixed-size chromosome has been used. Not all of the components present in the chromosome are actually used in the circuit. Only the components that the circuit output depends upon, are incorporated into the circuit. Feedback loops can be formed in these circuits. This is a problem, as asynchronous feedback can lead to instability and to overheating. The EA will always rank a circuit that contains a feedback loop lower than any circuit that does not contain a feedback loop. IV. GENETIC OPERATORS A variety of genetic operators were used. These are shown in table I. There are three types of genetic operator: conventional mutation, heuristic mutation, and crossover. TABLE I THE GENETIC OPERATORS

Operator crossover conventional mutation scale value add operation remove operation duplicate operation swap operations swap inputs

Probability 1/2 1/8 1/16 1/16 1/16 1/16 1/16 1/16

III. THE CHROMOSOME The chromosome contains two types of information: descriptions of modules, and descriptions of the filter coeffi0-7803-7282-4/02/$10.00 ©2002 IEEE

Crossover and conventional mutation work on the numeric values in a chromosome. The operators do not work on the

individual bits within the chromosome — instead integers are used as genes. These values represent connection destinations and left-shifts magnitudes. Conventional mutation replaces one of the values in a chromosome with a new value. This corresponds either to changing the magnitude of a shift, or else breaking a connection and reconnecting the end to a random point in the circuit. Two point crossover is used. Depending on its position in the chromosome, a value in a child individual will come from one or other of the parent chromosomes. The heuristic mutation operators attempt to make small changes to the phenotype, possibly making drastic changes to the genotype. The heuristic operators improve the performance of the algorithm by increasing the likelihood that useful changes are made to the chromosome. The ‘add operation’ and ‘scale value’ operators both insert an extra adder into the circuit. The ‘remove operation’ operator attempts to remove an adder while minimising the damage to the circuit. The ‘swap operations’ operator swaps the positions of two operations within the chromosome. The ‘swap inputs’ operator swaps the inputs to an adder. These operations do not change the functionality of the chromosome, but they will introduce extra representations of a circuit into the population. It is hoped that this will increase the likelihood that crossover will produce novel circuits. The ‘duplicate operation’ operator clones one of the adders in a circuit. This operator is useful when a single adder has been used for two conflicting purposes. V. CIRCUIT SIMULATION The functional correctness of a circuit is measured by comparing the response of the circuit with the specification. It is actually a measure of error, so a circuit that meets the specification will have a functionality value of 0, while an incorrect circuit will have a higher value. The functionality of a circuit is evaluated in the frequency domain. This allows the user to input a filter specification directly into the system. The EA system chooses the filter coefficients. The EA will tend to choose filter coefficients that minimise the area and latency of the whole filter. The functional correctness is calculated by adding up error values in 64 different frequency bands. The error value at a particular frequency is the absolute number of decibels between the filter response and the closest value that is within the specification. This is illustrated in figure 3. If the response is entirely within the specification, the functionality value will be 0. Area and Delay are estimated according to the properties of the modules that are used in the circuit. The effects of the interconnections on the area and delay have not been taken into account. The adders used by these systems are only as wide as necessary. The maximum magnitude possible at a node is used when finding the most significant bit position for an adder. If both inputs to an adder will always have a power of two as a 0-7803-7282-4/02/$10.00 ©2002 IEEE

gain (dB)

frequency Fig. 3. The calculation of a functionality value. The functionality score is equal to the total area drawn in black. The curved line is a filter response, and the white area specifies the desired filter response.

common factor, then some bits can be eliminated from the least significant end of the adder. The widths of all of the adders in the circuit are found before the area and latency are calculated. TABLE II COMPONENT COSTS

Component 16-bit adder 16-bit register

Area (NAND gates) 196.85 85.28

Latency (ns) 10.77 —

Both the area and the delay of an adder are varied in direct proportion to the number of bits in that adder. This is approximately accurate for ripple-carry adders, but less accurate for other types of adder. It would be trivial to replace this scheme with something more accurate — such as a look-up table. The component costs are derived from the values listed in table II. VI. FINDING THE PARETO SURFACE The Pareto surface is found by taking the entire set of circuits, and eliminating the circuits that are not Pareto points. A circuit is not on the Pareto surface if it is dominated by another circuit. One circuit dominates another if the former circuit is always a better choice, regardless of the circumstances. This is illustrated in figure 4. The dominant circuit must be at least equal to the dominated circuit in every dimension, and the dominant circuit must also be superior in at least one dimension. To find the Pareto surface, all of the circuits in the population are placed in a set of potential Pareto points. The circuits in this set are compared against each other. Whenever one circuit is found to dominate another, the dominated circuit is removed from the set. When every circuit in the set has been compared with every other circuit in the set, all of the points in the set are non-dominated, and the set describes the Pareto surface. The combination of the area, latency, and functionality metrics into a single fitness measure is an extension of the system

B Area

gain (dB)

D C A

E

+20dB G

I H Delay

VII. CIRCUIT SYNTHESIS This Evolvable Hardware system produces Verilog netlists as output. The netlists instantiate adders and registers that come from a library of core blocks. The components in the library are precharacterised — the area and latency of each component is known. The parameters of the actual hardware components are used when estimating the circuit area and latency during the course of evolution. As a result of this, the circuits that are produced are optimised with respect to a specific hardware technology. The EA creates Verilog netlists that are fully simulatable and synthesisable. VIII. RESULTS

gain (dB) 30, 40, or 50dB

+20dB pass stop band band 0 0.125 0.25 normalised frequency Fig. 5. Low-pass filter specifications.

The system was tested creating filters with attenuations of 30, 40 and 50 decibels, as shown in figures 5 and 6. This was done for both the high-pass and the low-pass cases. There were a total of 20 runs of each system. Figures 7 and 8 show the times taken for functionally correct circuits to arise in the populations. Note that no correct high0-7803-7282-4/02/$10.00 ©2002 IEEE

20 Number of Populations

for finding the Pareto surface. In that case, the Pareto surface is repeatedly found and removed from the set of circuits. The individuals in each successive surface are assigned a worse fitness value. The algorithm stops when all individuals have been assigned a fitness value (figure 1).

stop pass band band 0 0.125 0.25 normalised frequency

Fig. 6. High-pass filter specifications.

Fig. 4. Pareto dominance. Circuit A dominates circuits C, D and F. Circuit A is dominated by circuits E, G and H. Circuits B and I do not dominate circuit A, and are not dominated by circuit A.

3dB

30, 40, or 50dB

3dB

F

15

10

5

0 0

5000

10000

15000

20000

Generation 30dB

40dB

Fig. 7. The number of populations containing a functionally correct low-pass filter, by generation.

pass filters with an attenuation of -50dB were discovered. It is possible that the limitation on the maximum number of filter taps prevented the creation of correct filters in some cases. A filter circuit that was generated by this system is shown in figure 9. This is a 10th order low-pass FIR filter with 40dB attenuation in the stop band. The input to this circuit is a 16-bit bus. The widths of components and other buses are greater; the widths are set according to the maximum possible magnitude of the data. The system of Pareto optimisation was compared with a weighted-sum multidimensional optimisation system. The fitness value for the weighted-sum system is a computed by scaling the circuit area, latency, and functionality measures, and then adding these weighted values together. Two different elitism operators were used with the weightedsum system. One of the elitism operators rewarded the most functional chromosome, while the other benefited the chromosomes with the highest fitness. Both of these elitism operators preserved 10 members of the population. The two different weighted-sum systems, and the Pareto surface system, were tested on the -30dB attenuation low-pass filter described in figure 5. All three systems were tested for 20 runs of 20000 generations. The weights used by the weighted-sum systems in these ex-

fectively when it was combined with elitism that preserved the most functional individuals. The functionally correct, Pareto optimal circuits, produced over 20 runs of 20000 generations, using the two better EA systems, were found. The properties of these circuits are shown in table III. It can be seen that the circuits made using the Pareto system are smaller and faster.

Number of Populations

20

15

10

TABLE III FUNCTIONALLY CORRECT PARETO-OPTIMAL FILTERS

5

0 0

5000

10000

15000

System

20000

Generation 30dB

40dB

Pareto selection and Pareto elitism

50dB

Fig. 8. The number of populations containing a functionally correct high-pass filter, by generation.

*−16

−16

*−1

−24

*16

16

*1

120

*1

*2

240

*4

*1

*8 *16 *64

*4

*8 *8

296

*2

240

*1

120

*16

Latency (ns) 18.17 17.50 16.83 45.10 35.68 34.33 25.58

The Pareto points from individual runs of the EAs are shown in figure 10. This shows that individual runs of the Pareto system always produced results that are better than the results from the weighted-sum system with functional elitism.

16

*−1

−24

*−16

−16

output

70

60 Latency (ns)

input

Weighted-sum selection and functionality elitism

Area (NAND gates) 2442 2619 3455 3576 4109 4496 5229

50

40

30

20

10

0 0

1000 2000 3000 4000 5000 6000 7000 Silicon Area (NAND gates)

Fig. 9. A filter which was generated by this system.

Pareto System Weighted-Sum System

periments were set to arbitrarily chosen values. The values of the weights indicate the relative importance of the different dimensions to the designer, so there are no ‘ideal’ values for the weights. The values used were: 1/400 for functionality, 1/4000 for area, and 1/20 for area. The differences between these values reflect the fact that the different metrics have different magnitudes. The EA performed very poorly when elitism was based around the weighted-sum fitness value. None of the twenty tests produced circuits that met the specification. The EA tended to choose circuits with a low silicon area and latency, rather than functionally correct circuits. The most functional designs were repeatedly forgotten and rediscovered, which is something that can not happen with the two other systems tested here. The weighted-sum selection system worked much more ef0-7803-7282-4/02/$10.00 ©2002 IEEE

Fig. 10. Functionally correct Pareto optimal points from individual EA runs.

The effects of the choice of selection and elitism scheme on population diversity was also investigated. The Pareto optimal circuits were found from each EA run. The mean numbers of Pareto optimal circuits, per run, are shown in table IV. One set of numbers are for functionally correct circuits, while the other set of values are for circuits that are not necessarily correct. The Pareto system produced correct circuits in 17 out of 20 runs. The weighted-sum system with functional elitism produced correct circuits in only 13 out of 20 runs. When the weight given to the functionality measure was increased by a factor of 10 to 1/40, the weighted-sum system with functional elitism only produced correct designs in 10 out of 20 runs. Regardless of the values used for the weights, the weighted sum

TABLE IV MEAN NUMBERS OF PARETO POINTS PER RUN

Selection Elitism Mean Mean correct

Pareto Pareto 13.5 1.65

Weighted Functionality 3.35 0.65

Weighted Weighted 1.3 0

systems always produced filters that are inferior to the filters produced by the Pareto system. IX. CONCLUSIONS This system uses real component area and longest-path delay information during circuit optimisation. This means that each finished circuit design is targeted towards a specific hardware technology. The use of an accurate model of the circuit allows the EA to make good choices when searching for an optimal design. The use of Pareto surfaces in an EA provides an advantage in terms of increased population diversity. This allows the EA to make a more thorough search of the problem space, possibly resulting in the discovery of better quality solutions. If the population is more diverse, the EA is also less likely to become stuck at a local optima during the search. It is common for the final population to include several alternative optimal designs, each with different advantages and disadvantages. A human designer can choose the design that is most appropriate for the situation. R EFERENCES [1]

T. Kalganova and J. Miller, “Evolving more efficient digital circuits by allowing circuit layout evolution and multi-objective fitness,” in Proceedings of the First NASA/DoD Workshop on Evolvable Hardware, A. Stoica, D. Keymeulen, and J. Lohn, Eds., July 1999, pp. 54–63. [2] M. S. Bright and T. Arslan, “Multi-objective design strategy for highlevel low power design of DSP systems,” in Proceedings of the 1999 IEEE International Symposium on Circuits and Systems, May 1999, pp. 80–83. [3] David E. Goldberg, Genetic Algorithms in search, optimization, and machine learning, chapter 5, pp. 197–201, Addison-Wesley, 1989. [4] C. M. Fonseca and P. J. Fleming, “Multiobjective genetic algorithms,” in IEE Colloquium on Genetic Algorithms for Control Systems Engineering, 1993, pp. 6/1–6/5. [5] T. Schnier, X. Yao, and P. Liu, “Digital filter design using multiple pareto fronts,” in The third NASA/DOD workshop on evolvable hardware, D. Keymeulen, A. Stoica, J. Lohn, and R. Zebulum, Eds., July 2001, pp. 136–145. [6] S. P. Harris and E. C. Ifeachor, “Nonlinear FIR filter design by Genetic Algorithm,” in First Online Conference on Soft Computing, Aug. 1996. [7] A. G. Dempster and M. D. Macleod, “General algorithms for reducedadder integer multiplier design,” Electronics Letters, vol. 31, no. 21, pp. 1800–1802, Oct. 1995. [8] D. R. Bull and D. H. Horrocks, “Primitive operator digital filters,” IEE Proceedings — Circuits, Devices and Systems, vol. 138, no. 3, pp. 401– 412, June 1991. [9] A. G. Dempster and M. D. Macleod, “Use of minimum-adder multiplier blocks in FIR digital filters,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 42, no. 9, pp. 569–577, Sept. 1995. [10] G. Wade, A. Roberts, and G. Williams, “Multiplier-less FIR filter design using a genetic algorithm,” in IEE Proceedings — Vision, Image and Signal Processing, June 1994, pp. 175–180.

0-7803-7282-4/02/$10.00 ©2002 IEEE

[11] D. W. Redmill, D. R. Bull, and E. Dagless, “Genetic synthesis of reduced complexity filters and filter banks using primitive operator directed graphs,” in IEE Proceedings — Circuits, Devices and Systems, Oct. 2000, pp. 303–310.