École Polytechnique, CNRS, LIX
Abstract:The non-dominated sorting genetic algorithm~II (NSGA-II) is the most popular multi-objective optimization heuristic. Recent mathematical runtime analyses have detected two shortcomings in discrete search spaces, namely, that the NSGA-II has difficulties with more than two objectives and that it is very sensitive to the choice of the population size. To overcome these difficulties, we analyze a simple tie-breaking rule in the selection of the next population. Similar rules have been proposed before, but have found only little acceptance. We prove the effectiveness of our tie-breaking rule via mathematical runtime analyses on the classic OneMinMax, LeadingOnesTrailingZeros, and OneJumpZeroJump benchmarks. We prove that this modified NSGA-II can optimize the three benchmarks efficiently also for many objectives, in contrast to the exponential lower runtime bound previously shown for OneMinMax with three or more objectives. For the bi-objective problems, we show runtime guarantees that do not increase when moderately increasing the population size over the minimum admissible size. For example, for the OneJumpZeroJump problem with representation length $n$ and gap parameter $k$, we show a runtime guarantee of $O(\max\{n^{k+1},Nn\})$ function evaluations when the population size is at least four times the size of the Pareto front. For population sizes larger than the minimal choice $N = \Theta(n)$, this result improves considerably over the $\Theta(Nn^k)$ runtime of the classic NSGA-II.
Abstract:Randomized search heuristics have been applied successfully to a plethora of problems. This success is complemented by a large body of theoretical results. Unfortunately, the vast majority of these results regard problems with binary or continuous decision variables -- the theoretical analysis of randomized search heuristics for unbounded integer domains is almost nonexistent. To resolve this shortcoming, we start the runtime analysis of multi-objective evolutionary algorithms, which are among the most successful randomized search heuristics, for unbounded integer search spaces. We analyze single- and full-dimensional mutation operators with three different mutation strengths, namely changes by plus/minus one (unit strength), random changes following a law with exponential tails, and random changes following a power-law. The performance guarantees we prove on a recently proposed natural benchmark problem suggest that unit mutation strengths can be slow when the initial solutions are far from the Pareto front. When setting the expected change right (depending on the benchmark parameter and the distance of the initial solutions), the mutation strength with exponential tails yields the best runtime guarantees in our results -- however, with a wrong choice of this expectation, the performance guarantees quickly become highly uninteresting. With power-law mutation, which is an essentially parameter-less mutation operator, we obtain good results uniformly over all problem parameters and starting points. We complement our mathematical findings with experimental results that suggest that our bounds are not always tight. Most prominently, our experiments indicate that power-law mutation outperforms the one with exponential tails even when the latter uses a near-optimal parametrization. Hence, we suggest to favor power-law mutation for unknown problems in integer spaces.
Abstract:The NSGA-II is the most prominent multi-objective evolutionary algorithm (cited more than 50,000 times). Very recently, a mathematical runtime analysis has proven that this algorithm can have enormous difficulties when the number of objectives is larger than two (Zheng, Doerr. IEEE Transactions on Evolutionary Computation (2024)). However, this result was shown only for the OneMinMax benchmark problem, which has the particularity that all solutions are on the Pareto front, a fact heavily exploited in the proof of this result. In this work, we show a comparable result for the LeadingOnesTrailingZeroes benchmark. This popular benchmark problem appears more natural in that most of its solutions are not on the Pareto front. With a careful analysis of the population dynamics of the NGSA-II optimizing this benchmark, we manage to show that when the population grows on the Pareto front, then it does so much faster by creating known Pareto optima than by spreading out on the Pareto front. Consequently, already when still a constant fraction of the Pareto front is unexplored, the crowding distance becomes the crucial selection mechanism, and thus the same problems arise as in the optimization of OneMinMax. With these and some further arguments, we show that the NSGA-II, with a population size by at most a constant factor larger than the Pareto front, cannot compute the Pareto front in less than exponential time.
Abstract:Randomized search heuristics (RHSs) are generally believed to be robust to noise. However, almost all mathematical analyses on how RSHs cope with a noisy access to the objective function assume that each solution is re-evaluated whenever it is compared to others. This is unfortunate, both because it wastes computational resources and because it requires the user to foresee that noise is present (as in a noise-free setting, one would never re-evaluate solutions). In this work, we show the need for re-evaluations could be overestimated, and in fact, detrimental. For the classic benchmark problem of how the $(1+1)$ evolutionary algorithm optimizes the LeadingOnes benchmark, we show that without re-evaluations up to constant noise rates can be tolerated, much more than the $O(n^{-2} \log n)$ noise rates that can be tolerated when re-evaluating solutions. This first runtime analysis of an evolutionary algorithm solving a single-objective noisy problem without re-evaluations could indicate that such algorithms cope with noise much better than previously thought, and without the need to foresee the presence of noise.
Abstract:The NSGA-II is proven to encounter difficulties for more than two objectives, and the deduced reason is the crowding distance computed by regarding the different objectives independently. The recent theoretical efficiency of the NSGA-III and the SMS-EMOA also supports the deduced reason as both algorithms consider the dependencies of objectives in the second criterion after the non-dominated sorting but with complicated structure or difficult computation. However, there is still a question of whether a simple modification of the original crowding distance can help. This paper proposes such a variant, called truthful crowding distance. This variant inherits the simple structure of summing the component for each objective. For each objective, it first sorts the set of solutions in order of descending objective values, and uses the smallest normalized L1 distance between the current solution and solutions in the earlier positions of the sorted list as the component. Summing up all components gives the value of truthful crowding distance. We call this NSGA-II variant by NSGA-II-T that replaces the original crowding distance with the truthful one, and that sequentially updates the crowding distance value after each removal. We prove that the NSGA-II-T can efficiently cover the full Pareto front for many-objective mOneMinMax and mOJZJ, in contrast to the exponential runtime of the original NSGA-II. Besides, we also prove that it theoretically achieves a slightly better approximation of the Pareto front for OneMinMax than the original NSGA-II with sequential survival selection. Besides, it is the first NSGA-II variant with a simple structure that performs well for many objectives with theoretical guarantees.
Abstract:In recent work, Lissovoi, Oliveto, and Warwicker (Artificial Intelligence (2023)) proved that the Move Acceptance Hyper-Heuristic (MAHH) leaves the local optimum of the multimodal CLIFF benchmark with remarkable efficiency. The $O(n^3)$ runtime of the MAHH, for almost all cliff widths $d\ge 2,$ is significantly better than the $\Theta(n^d)$ runtime of simple elitist evolutionary algorithms (EAs) on CLIFF. In this work, we first show that this advantage is specific to the CLIFF problem and does not extend to the JUMP benchmark, the most prominent multi-modal benchmark in the theory of randomized search heuristics. We prove that for any choice of the MAHH selection parameter $p$, the expected runtime of the MAHH on a JUMP function with gap size $m = O(n^{1/2})$ is at least $\Omega(n^{2m-1} / (2m-1)!)$. This is significantly slower than the $O(n^m)$ runtime of simple elitist EAs. Encouragingly, we also show that replacing the local one-bit mutation operator in the MAHH with the global bit-wise mutation operator, commonly used in EAs, yields a runtime of $\min\{1, O(\frac{e\ln(n)}{m})^m\} \, O(n^m)$ on JUMP functions. This is at least as good as the runtime of simple elitist EAs. For larger values of $m$, this result proves an asymptotic performance gain over simple EAs. As our proofs reveal, the MAHH profits from its ability to walk through the valley of lower objective values in moderate-size steps, always accepting inferior solutions. This is the first time that such an optimization behavior is proven via mathematical means. Generally, our result shows that combining two ways of coping with local optima, global mutation and accepting inferior solutions, can lead to considerable performance gains.
Abstract:The decomposition-based multi-objective evolutionary algorithm (MOEA/D) does not directly optimize a given multi-objective function $f$, but instead optimizes $N + 1$ single-objective subproblems of $f$ in a co-evolutionary manner. It maintains an archive of all non-dominated solutions found and outputs it as approximation to the Pareto front. Once the MOEA/D found all optima of the subproblems (the $g$-optima), it may still miss Pareto optima of $f$. The algorithm is then tasked to find the remaining Pareto optima directly by mutating the $g$-optima. In this work, we analyze for the first time how the MOEA/D with only standard mutation operators computes the whole Pareto front of the OneMinMax benchmark when the $g$-optima are a strict subset of the Pareto front. For standard bit mutation, we prove an expected runtime of $O(n N \log n + n^{n/(2N)} N \log n)$ function evaluations. Especially for the second, more interesting phase when the algorithm start with all $g$-optima, we prove an $\Omega(n^{(1/2)(n/N + 1)} \sqrt{N} 2^{-n/N})$ expected runtime. This runtime is super-polynomial if $N = o(n)$, since this leaves large gaps between the $g$-optima, which require costly mutations to cover. For power-law mutation with exponent $\beta \in (1, 2)$, we prove an expected runtime of $O\left(n N \log n + n^{\beta} \log n\right)$ function evaluations. The $O\left(n^{\beta} \log n\right)$ term stems from the second phase of starting with all $g$-optima, and it is independent of the number of subproblems $N$. This leads to a huge speedup compared to the lower bound for standard bit mutation. In general, our overall bound for power-law suggests that the MOEA/D performs best for $N = O(n^{\beta - 1})$, resulting in an $O(n^\beta \log n)$ bound. In contrast to standard bit mutation, smaller values of $N$ are better for power-law mutation, as it is capable of easily creating missing solutions.
Abstract:Despite significant progress in the field of mathematical runtime analysis of multi-objective evolutionary algorithms (MOEAs), the performance of MOEAs on discrete many-objective problems is little understood. In particular, the few existing bounds for the SEMO, global SEMO, and SMS-EMOA algorithms on classic benchmarks are all roughly quadratic in the size of the Pareto front. In this work, we prove near-tight runtime guarantees for these three algorithms on the four most common benchmark problems OneMinMax, CountingOnesCountingZeros, LeadingOnesTrailingZeros, and OneJumpZeroJump, and this for arbitrary numbers of objectives. Our bounds depend only linearly on the Pareto front size, showing that these MOEAs on these benchmarks cope much better with many objectives than what previous works suggested. Our bounds are tight apart from small polynomial factors in the number of objectives and length of bitstrings. This is the first time that such tight bounds are proven for many-objective uses of these MOEAs. While it is known that such results cannot hold for the NSGA-II, we do show that our bounds, via a recent structural result, transfer to the NSGA-III algorithm.
Abstract:We consider whether conditions exist under which block-coordinate descent is asymptotically efficient in evolutionary multi-objective optimization, addressing an open problem. Block-coordinate descent, where an optimization problem is decomposed into $k$ blocks of decision variables and each of the blocks is optimized (with the others fixed) in a sequence, is a technique used in some large-scale optimization problems such as airline scheduling, however its use in multi-objective optimization is less studied. We propose a block-coordinate version of GSEMO and compare its running time to the standard GSEMO algorithm. Theoretical and empirical results on a bi-objective test function, a variant of LOTZ, serve to demonstrate the existence of cases where block-coordinate descent is faster. The result may yield wider insights into this class of algorithms.
Abstract:Experience shows that typical evolutionary algorithms can cope well with stochastic disturbances such as noisy function evaluations. In this first mathematical runtime analysis of the $(1+\lambda)$ and $(1,\lambda)$ evolutionary algorithms in the presence of prior bit-wise noise, we show that both algorithms can tolerate constant noise probabilities without increasing the asymptotic runtime on the OneMax benchmark. For this, a population size $\lambda$ suffices that is at least logarithmic in the problem size $n$. The only previous result in this direction regarded the less realistic one-bit noise model, required a population size super-linear in the problem size, and proved a runtime guarantee roughly cubic in the noiseless runtime for the OneMax benchmark. Our significantly stronger results are based on the novel proof argument that the noiseless offspring can be seen as a biased uniform crossover between the parent and the noisy offspring. We are optimistic that the technical lemmas resulting from this insight will find applications also in future mathematical runtime analyses of evolutionary algorithms.