Research paper (final version) - UMdrive - University of Memphis

ISSRL, APRIL 2009

1

Approximating the Pareto Front using an Extension of Kuhn-Munkres and MOE Algorithms in Multi-Objective Problems Jesús Burbano, Germán Hernández, and Dipankar Dasgupta,

Abstract—In order to have any measure of quality of the solutions to Multi-Objective problems, like coverage, proximity (convergence), diversity..., a known Pareto Front (PF) or its approximation is required. For problems in a discrete space is very hard to calculate this PF. This paper examines the sensitivity to some characteristics of the search space dataset using an extension of Kuhn-Munkres Algorithm, NSGII, and NSGII with Informed Initialization as approximators of PF applied in small intances of an MOP, as it is the Sailor Assignment Problem. Index Terms—Pareto Front approximation, multiobjective optimization, weighted sum method, adaptive algorithms.

I. I NTRODUCTION

M

ULTI-OBJECTIVE optimization goal, unlike SingleObjective optimization that looks for solutions with the unique optimum objective value, is obtain a unique solution that dominates any other in the multi-objective space, or obtain a set of optimal solutions, called Pareto Optimal Set (POS), which its values, in the multi-objective space, are nondominated among them: the Pareto Front (PF). Formally [7]: x ≺ y ⇔ ∀i ∈ 1..d, xi ≤ yi ∧ ∃j ∈ 1..d, xj < yj ,

(1)

where x and y are d-dimensional objective vectors. The PF of an objective space Y ⊂ Rd is the set, {y ∈ Y | ¬∃ x ∈ Y, x ≺ y}

(2)

Approximating the PF in Multi-Objective discret problems commonly requires exponential time (as the size of the problem increases linearly, the time to solve the problem increases exponentially), in the continuos space may be not so hard because it could be possible to determine a polynomial equation describing its PF, then: this problem in discrete spaces is Non-Polinomial (NP): • there is no known algorithm for it that will execute in polynomial time on a conventional computer • it can be solved in polynomial time using a nondeterministic computer (one with inherently unlimited parallelism). Intuitively an extension of Kuhn-Munkres algorithm could be used in order to calculate an approximation of the PF J. Burbano and G. Hernández are with the Department of Systems and Industrial Engineering, National University of Colombia, e-mail: [email protected], [email protected]. D. Dasgupta is with the Computer Science Department , The University of Memphis, e-mail: [email protected]. Manuscript received April XX, 2009; revised Month dd, year.

for problems like QAP and SAP [3] but there is no a measure of the confidence of this method. The Kuhn-Munkres algorithm fully solves the linear assignment problem (LAP) in polynomial time (O(n3 )), and a Multi-Objective problem can be divided in multiple Single-Objective problems, like many LAPs, scalarizing the multiple goals into a single goals by the use of weight vectors, therefore applying the algorithm to a (not necesary wide) range r of scalarizations (r = Ω(Cadinality(Anm )) 1 ) of the problem its PF could partially mapped or be mapped at all. Sometimes this number scalarizations could be short, this seems to be very sensitive to the structure of the input dataset, in the worst case demanding a meticulous scan-scalarizing. If the KM mapping of the approximation to a PF is partial the measures of quality of the solutions to MOP calculated with this method are biased, how much? Density Estimation is a construction of an estimate of the density function from the observed data when one have a set of observed data points assumed to be a sample from an unknow probability density function. This Density Estimation will be used as ’ingredient’ in another statistical procedures for the SAP. In this particular problem the data may be not are drawn from one of a know parametric familiy of distributions, like normal, or it could be but: when we use a combinations of factors, as conflicting objecjetives, the result could be not a parametric distribution, additionally they are in the search space and it is transformed to the solution space, then: less rigid assumptions can be made about the distribution of the observed data. Althoug it will be assumed that the distribution has a probability density p. A. solutions (EKM for n different evenly distributed vectors) ⊂ solutions(HMOEAs with n iterations)? Given that a subset of solutions founded with an Extended Kuhn-Munkres algorithm can feds a Multi-Objective Evolutionary algorithm to conform an Hybrid Multi-Objective Evolutionary Algorithm it is clear that solutions are a subset of the whole set of solutions finded with the HMOEA. But when only a portion of EKM solutions are used to fed the HMOEA and you use a bigger set of EKM solutions in order to evaluate the solution that the HMOEA found, respect to the PF, you need to know how close is that set of EKM solutions to the PF. If you do not use a enought number of solutions 1 Cadinality(An ): the maximum number of posible assignments ∈ m n} {1, ..., Pm

ISSRL, APRIL 2009

2

the result of the solutions that the HMOEA finds could be understimated. II. P ROBLEM AND DATASETS A. The Sailor Assignment Problem (SAP) This is the problem to find a set of assignments of sailors to jobs which keeps the sailors happy and maintains fleet readiness while minimizing the cost of implementing those assignments. Many factors go into determining a good set of assignments: objectives such as maximizing training score, minimizing permanent change of costs and maximizing sailor and commander preferences. If the assignment is overly focused on only one of the objectives the rest usually will be hurted [5], [8]. Formally, the SAP may be defined as: Optimize

N X M X

Fi,j di,j

subject to the constraints: N X di,j ≤ 1 ∀j ∈ {1, 2, ...M } i=1 M X

di,j ≤ 1

∀i ∈ {1, 2, ...N }

j=1

di,j ∈ {0, 1}

∀i ∈ {1, 2, ...N } andj ∈ {1, 2, ...M }

where Fi,j denotes the fitness of assigning sailor i to job j and di,j = 1 when the Sailor i is assigned to job j and 0 otherwise. The fitness measure Fi,j encapsulates all information relevant to determining the desirability of the match. SAP problem is a problem with multiple objectives represented in an “objective vector”: 

  max T S(x)  min P CS(x)     Optimize F (x) =   max SR(x)  = min  max CR(x)

III. EKM, MOEA S AND HMOEA S A LGORITHMS Each type of algorithm proposed to address MOPs has its own merits and drawbacks with respect to two main properties: quality of solution and program execution time. An exhaustive search algorithm successfully will obtain the solution but it requires generation of all possible mappings, and this in the most of cases, for usual problems, takes very long time. A. An Extension of Kuhn-Munkres Algorithm

i=1 j=1 •

some experimental results using EKM search method, shows that this do not have a direct insidence in the number of solutions found by the method, then each dataset (instance of the problem) has a very well defined own set of solutions, therefore characteristic Cardinality(PF), and this could imply a characteristic shape.

 −T S(x) P CS(x)   −SR(x)  −CR(x)

The constraints on di,j ensure that at most one job will be assigned to any sailor and that no job is assigned to multiple sailors. Note that both constraints are inequalities thus allowing for the possibility that a given sailor is not assigned a job. B. SAP instances/Datasets The datasets in our study of the SAP is a complete representation of the a problem instance. The relations between a set of sailors and a set of prefered-availables jobs for them, each one with measures of training, cost, sailor ranking and command ranking. It is possible to think what instances/datasets with the similar characteristics of lenght and statistic media and standard deviation in its distribution of the jobs and in the value of the set’s measures will have a similar number of solution but

We are using basically the extension proposed in [4] of KM algorithm that deals with three problems. First, SAP cannot be represented as a complete bipartite graph. The algorithm in its canonical form assumes that any sailor can perform any job. But this assumption is not true in case of SAP problem since there are limited number of jobs of a particular sailor depending upon the various factors of the SAP problem. Thus the classical algorithm must be modified to ensure that only qualified sailors are assigned to each job: A sparse matrix representation requires only those values corresponding to feasible sailor/job combinations. Second, SAP is multiobjective problem whereas the Kuhn-Munkres algorithm is applicable for only single-objective assignment problem as LAP: Single objective instances of SAP are obtained using weight vectors ( see eq. 3 with wi ∈ [0, 1]). Finally, sometimes there are no complete matching exists in SAP problem whereas Kuhn-Munkres algorithm cannot be applied on such problems otherwise it will give the solutions that are even more worst in nature then to solve this problem, dummy jobs are added, for each sailor with low Training Score, high PCS cost, low Command Rank and low Sailor Rank, each guaranteed to be worse than any feasible sailor/job match, wich can be chosen as a last resort. w1 × T S + w2 × P CS + w3 × SR + w4 × P CS (3) A parallel implementation could be introduced as a response to a slow performance in order to face with large problem size (Jobs > Sailors > 1000). If the running time of the algorithm in a single thread takes T (n) units, where n is the problem size, using P threads to divide among themselves the original serial computational task would mean a performance speedup of at most P , after the total execution time is reduced to an ideal T (n)/P units assuming, of course, that the management and communication overhead between processors in a cluster is zero and it is available to divide the problem in P task. But teoretically at the end the O(T (n)/P ) = O(T (n)), even though in practice it could be an improvement depending on n, and in the specific case of SAP T (n) = n log 2 m log n [2].

ISSRL, APRIL 2009

3

B. MOEAs and HMOEAs Algorithms Several MOEAs have been proposed in the literature with varying degrees of success. Of these algorithms, the Nondominated Sorting Genetic Algorithm by Deb and others (NSGAII) and the Strength Pareto Evolutionary Algorithm (SPEA2) of Zitzler et al. have been widely studied and found to be effective across a range of common test functions as well as combinatorial optimization problems [1], [6]. There are two basic goals of any multiobjective evolutionary algorithm: First, the algorithm should try to push the population in the direction of Pareto optimal solutions. Ideally, the algorithm will terminate with a set of nondominated solutions in such a way that no solution in Pareto front dominates the other but, in practise, to find this solutions the algorithms require to improve routinely the quality of previous solutions until some stop criterion has been met (usually a large number of evaluations) [2], [7], [3] but this could not be taken as guarantee, even though it is the best that can be done if the True Pareto front is unknown. In this paper the Hybrd-MOEAs incorporates validations of feasibility of the solution (cromosome) for the SAP, and five non-dominated solutions (solutions from weightvectors [1,0.0001,0.0001,0.0001], [0.0001,1,0.0001,0.0001], [0.0001,0.0001,1,0.0001], [0.0001,0.0001,0.0001,1], and [0.25,0.25,0.25,0.25]) are obtained from the Kuhn-Munkres algorithm and fed into the initial populations of NSGA-II, this provides the MOEA with extreme solutions along the Pareto front, from which the evolutionary operators may work to fill in gaps along the front. The remainder of the population is initialized uniformly at random. The details of the procedure is described in [3]. IV. E XPERIMENTS AND R ESULTS For experiments, due to unavailability of real sailor data, a problem generator was used to allow testing on a variety of problem instances of different sizes and difficulties. These simulated instances contain values for all the objectives along with the information about the jobs for each sailor that he/she can perform. SAP instances with (10 Sailors, 12 jobs) and (100 sailors, 100 jobs) was studied. In small problems (Sailors < Jobs < 12) EKM using a big number of vectors (176851 evenly distributed weight vectors, 350 hours aprox using a Intel Core 2 Quad machine on Ubuntu 8.1 using Octave and 0.5 GB) shows that it has a growing behaviour until it meet its upper bound. But the exahustive search shows that it could be some solutions that EKM can not found (see Table I), this could imply the Pareto Front for this kind of problem is not convex. TABLE I N UMBER OF SOLUTIONS IN THE PF OF 10 DIFERENT DATASETS WITH MEAN 0.7 AND DEVIATION 0.3. FOR 176851 ( EACH ) EVENLY DISTRIBUTED VECTORS

Dataset KM Solutions Exahustive S

1 4 10

2 7 14

3 4 4

4 2 2

5 6 8

6 2 5

7 4 4

8 3 8

9 3 7

10 3 6

The HMOEA applied to the same instances shows better results (ramdomly initialized, 18 hours aprox. using a Intel Core 2 Quad machine on Ubuntu 8.1 using Java and 0.7 GB) (see Table II) 500000 evaluations with 100 population size for NSGAII + Informed Initialization, 0.95 crossover probability and 0.01 mutation probability. This experiments were carried out utilizing the jMetal platform [3]. TABLE II N UMBER OF SOLUTIONS IN THE PF OF 10 DIFERENT DATASETS WITH MEAN 0.7 AND DEVIATION 0.3. FOR 500000 EVALUATIONS ( EACH )

Dataset NSGAII+InfIn Exahustive S

1 10 10

2 14 14

3 4 4

4 2 2

5 7 8

6 5 5

7 4 4

8 7 8

9 6 7

10 6 6

The results for 100 sailors and 100 jobs requires a large time if we use big number of vectors then the number of vectors was reduced assuming the risk of hurting the results (10660 evenly distributed weight vectors, 350 hours aprox. using a Intel Core 2 Quad machine on Ubuntu 8.1 using Octave and 0.8 GB). NSGAII+Informed Initialization applied to the same instances shows more probably non-dominated solutions (ramdomly initialized, 25 hours aprox. using a Intel Core 2 Quad machine on Ubuntu 8.1 using Java and 0.9 GB) 50000 evaluations with 100 population size for NSGAII + Informed Initialization, 0.95 crossover probability and 0.01 mutation probability (see Table III). TABLE III N UMBER OF SOLUTIONS IN THE PF OF 10 DIFERENT DATASETS FOR 100 SAILORS AND 100 JOBS WITH MEAN 0.7 AND DEVIATION 0.3. A: EKM, B:NSGAII+I NFORMED I NITIALIZATION , C: E XAHUSTIVE SEARCH

Dset A B C

1 175 307 N/A

2 170 292 N/A

3 84 130 N/A

4 44 79 N/A

5 97 130 N/A

6 137 162 N/A

7 149 239 N/A

8 156 196 N/A

9 150 283 N/A

V. C ONCLUSION The KM algorithm is a perfect solver for LAP (with only one single objective function) and it works good when the pareto front is convex given diferent problems that composes the original one using the Weighted Sum Method (that scalarizes a set of objectives into a single objective by premultiplying each objective with a weight). But in most nonlinear MOOPs different weight vectors need not necessarily lead to different Pareto optimal solutions and our EKM is chosing only one of many (if exist) solutions for a single problem, then it cannot find all minimum solutions for a weitgh vector therefore some Pareto optimal solutions cannot be found. This is a weak of the EKM algorithm applied to SAP in order to approximate the pareto front. Even if you have a computer with infinite parallelization this EKM will only find subset of the solutions and never will be able to find it all. The HMOEA (NSGAII with informed initialization) has better diversity and it suposses that better coverage (true points in the Pareto Front) but when we are using the randomized

10 107 203 N/A

ISSRL, APRIL 2009

4

selection of the population we can not be sure that population is producing a point in the PF if we do not know already the solution of the problem. For small examples the solution could be found using exahustive search and the comparison of the Informed Initialization seems to be good using a considerable number of iterations, but one can not guarantee that the solutions finded using NSGAII+Informed Initialization will reach the solutions in the PF. ACKNOWLEDGMENT The authors would like to thank... R EFERENCES [1] Mourad Baiou and Michel Balinski. Many-to-many matching: stable polyandrous polygamy (or polygamous polyandry). Discrete Applied Mathematics, 101:112, 2000. [2] Indraneel Das. Multi-Objective optimization. http://wwwfp.mcs.anl.gov/otc/Guide/OptWeb/multiobj/, 1997. [3] Dipankar Dasgupta, German Hernandez, Deon Garrett, Pavan Kalyan Vejandla, Aishwarya Kaushal, Ramjee Yerneni, and Denise Monette Ferebee. GenoSAP - II final report. Technical report, The University of Memphis, 2008. [4] Dipankar Dasgupta, German Hernandez, Deon Garrett, Pavan Kalyan Vejandla, Aishwarya Kaushal, Ramjee Yerneni, and James Simien. A comparison of multiobjective evolutionary algorithms with informed initialization and Kuhn-Munkres algorithm for the sailor assignment problem. In GECCO ’08: Proceedings of the 2008 GECCO conference companion on Genetic and evolutionary computation, page 21292134, New York, NY, USA, 2008. ACM. [5] Deon Garrett, Joseph Vannucci, Rodrigo Silva, Dipankar Dasgupta, and James Simien. Genetic algorithms for the sailor assignment problem. In Proceedings of the 2005 conference on Genetic and evolutionary computation, page 19211928, Washington DC, USA, 2005. ACM. [6] A. Holder. Navy personnel planning and optimal partition. Operations research, 53(1):7789, February 2005. [7] J. Knowles and D. Corne. Quantifying the effects of objective space dimension in evolutionary multiobjective optimization. Evolutionary Multi-Criterion Optimization, LNCS 4403, page 757771, 2007. [8] Ibrahim Korkmaz, Hadi Gokcen, and Tahsin Cetinyokus. An analytic hierarchy process and two-sided matching based decision support system for military personnel assignment. Inf. Sci., 178(14):29152927, 2008.