Abstract:The integration of Large Language Models (LLMs) with evolutionary computation (EC) has introduced a promising paradigm for automating the design of metaheuristic algorithms. However, existing frameworks, such as the Large Language Model Evolutionary Algorithm (LLaMEA), often lack precise control over mutation mechanisms, leading to inefficiencies in solution space exploration and potentially suboptimal convergence. This paper introduces a novel approach to mutation control within LLM-driven evolutionary frameworks, inspired by theory of genetic algorithms. Specifically, we propose dynamic mutation prompts that adaptively regulate mutation rates, leveraging a heavy-tailed power-law distribution to balance exploration and exploitation. Experiments using GPT-3.5-turbo and GPT-4o models demonstrate that GPT-3.5-turbo fails to adhere to the specific mutation instructions, while GPT-4o is able to adapt its mutation based on the prompt engineered dynamic prompts. Further experiments show that the introduction of these dynamic rates can improve the convergence speed and adaptability of LLaMEA, when using GPT-4o. This work sets the starting point for better controlled LLM-based mutations in code optimization tasks, paving the way for further advancements in automated metaheuristic design.
Abstract:Benchmarking anomaly detection approaches for multivariate time series is challenging due to the lack of high-quality datasets. Current publicly available datasets are too small, not diverse and feature trivial anomalies, which hinders measurable progress in this research area. We propose a solution: a diverse, extensive, and non-trivial dataset generated via state-of-the-art simulation tools that reflects realistic behaviour of an automotive powertrain, including its multivariate, dynamic and variable-state properties. To cater for both unsupervised and semi-supervised anomaly detection settings, as well as time series generation and forecasting, we make different versions of the dataset available, where training and test subsets are offered in contaminated and clean versions, depending on the task. We also provide baseline results from a small selection of approaches based on deterministic and variational autoencoders, as well as a non-parametric approach. As expected, the baseline experimentation shows that the approaches trained on the semi-supervised version of the dataset outperform their unsupervised counterparts, highlighting a need for approaches more robust to contaminated training data.
Abstract:Traditional approaches to training agents have generally involved a single, deterministic environment of minimal complexity to solve various tasks such as robot locomotion or computer vision. However, agents trained in static environments lack generalization capabilities, limiting their potential in broader scenarios. Thus, recent benchmarks frequently rely on multiple environments, for instance, by providing stochastic noise, simple permutations, or altogether different settings. In practice, such collections result mainly from costly human-designed processes or the liberal use of random number generators. In this work, we introduce AMaze, a novel benchmark generator in which embodied agents must navigate a maze by interpreting visual signs of arbitrary complexities and deceptiveness. This generator promotes human interaction through the easy generation of feature-specific mazes and an intuitive understanding of the resulting agents' strategies. As a proof-of-concept, we demonstrate the capabilities of the generator in a simple, fully discrete case with limited deceptiveness. Agents were trained under three different regimes (one-shot, scaffolding, interactive), and the results showed that the latter two cases outperform direct training in terms of generalization capabilities. Indeed, depending on the combination of generalization metric, training regime, and algorithm, the median gain ranged from 50% to 100% and maximal performance was achieved through interactive training, thereby demonstrating the benefits of a controllable human-in-the-loop benchmark generator.
Abstract:This work addresses the critical challenge of optimal filter selection for a novel trace gas measurement device. This device uses photonic crystal filters to retrieve trace gas concentrations prone to photon and read noise. The filter selection directly influences accuracy and precision of the gas retrieval and therefore is a crucial performance driver. We formulate the problem as a stochastic combinatorial optimization problem and develop a simulator mimicking gas retrieval with noise. The objective function for selecting filters reducing retrieval error is minimized by the employed metaheuristics, that represent various families of optimizers. We aim to improve the found top-performing algorithms using our novel distance-driven extensions, that employ metrics on the space of filter selections. This leads to a novel adaptation of the UMDA algorithm, we call UMDA-U-PLS-Dist, equipped with one of the proposed distance metrics as the most efficient and robust solver among the considered ones. Analysis of filter sets produced by this method reveals that filters with relatively smooth transmission profiles but containing high contrast improve the device performance. Moreover, the top-performing obtained solution shows significant improvement compared to the baseline.
Abstract:The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is one of the most successful examples of a derandomized evolution strategy. However, it still relies on randomly sampling offspring, which can be done via a uniform distribution and subsequently transforming into the required Gaussian. Previous work has shown that replacing this uniform sampling with a low-discrepancy sampler, such as Halton or Sobol sequences, can improve performance over a wide set of problems. We show that iterating through small, fixed sets of low-discrepancy points can still perform better than the default uniform distribution. Moreover, using only 128 points throughout the search is sufficient to closely approximate the empirical performance of using the complete pseudorandom sequence up to dimensionality 40 on the BBOB benchmark. For lower dimensionalities (below 10), we find that using as little as 32 unique low discrepancy points performs similar or better than uniform sampling. In 2D, for which we have highly optimized low discrepancy samples available, we demonstrate that using these points yields the highest empirical performance and requires only 16 samples to improve over uniform sampling. Overall, we establish a clear relation between the $L_2$ discrepancy of the used point set and the empirical performance of the CMA-ES.
Abstract:Time-series anomaly detection plays an important role in engineering processes, like development, manufacturing and other operations involving dynamic systems. These processes can greatly benefit from advances in the field, as state-of-the-art approaches may aid in cases involving, for example, highly dimensional data. To provide the reader with understanding of the terminology, this survey introduces a novel taxonomy where a distinction between online and offline, and training and inference is made. Additionally, it presents the most popular data sets and evaluation metrics used in the literature, as well as a detailed analysis. Furthermore, this survey provides an extensive overview of the state-of-the-art model-based online semi- and unsupervised anomaly detection approaches for multivariate time-series data, categorising them into different model families and other properties. The biggest research challenge revolves around benchmarking, as currently there is no reliable way to compare different approaches against one another. This problem is two-fold: on the one hand, public data sets suffers from at least one fundamental flaw, while on the other hand, there is a lack of intuitive and representative evaluation metrics in the field. Moreover, the way most publications choose a detection threshold disregards real-world conditions, which hinders the application in the real world. To allow for tangible advances in the field, these issues must be addressed in future work.
Abstract:We introduce here the concept of Artificial General Creatures (AGC) which encompasses "robotic or virtual agents with a wide enough range of capabilities to ensure their continued survival". With this in mind, we propose a research line aimed at incrementally building both the technology and the trustworthiness of AGC. The core element in this approach is that trust can only be built over time, through demonstrably mutually beneficial interactions. To this end, we advocate starting from unobtrusive, nonthreatening artificial agents that would explicitly collaborate with humans, similarly to what domestic animals do. By combining multiple research fields, from Evolutionary Robotics to Neuroscience, from Ethics to Human-Machine Interaction, we aim at creating embodied, self-sustaining Artificial General Creatures that would form social and emotional connections with humans. Although they would not be able to play competitive online games or generate poems, we argue that creatures akin to artificial pets would be invaluable stepping stones toward symbiotic Artificial General Intelligence.
Abstract:As attention to recorded data grows in the realm of automotive testing and manual evaluation reaches its limits, there is a growing need for automatic online anomaly detection. This real-world data is complex in many ways and requires the modelling of testee behaviour. To address this, we propose a temporal variational autoencoder (TeVAE) that can detect anomalies with minimal false positives when trained on unlabelled data. Our approach also avoids the bypass phenomenon and introduces a new method to remap individual windows to a continuous time series. Furthermore, we propose metrics to evaluate the detection delay and root-cause capability of our approach and present results from experiments on a real-world industrial data set. When properly configured, TeVAE flags anomalies only 6% of the time wrongly and detects 65% of anomalies present. It also has the potential to perform well with a smaller training and validation subset but requires a more sophisticated threshold estimation method.
Abstract:Na\"ive restarts of global optimization solvers when operating on multimodal search landscapes may resemble the Coupon's Collector Problem, with a potential to waste significant function evaluations budget on revisiting the same basins of attractions. In this paper, we assess the degree to which such ``duplicate restarts'' occur on standard multimodal benchmark functions, which defines the \textit{redundancy potential} of each particular landscape. We then propose a repelling mechanism to avoid such wasted restarts with the CMA-ES and investigate its efficacy on test cases with high redundancy potential compared to the standard restart mechanism.
Abstract:To guide the design of better iterative optimisation heuristics, it is imperative to understand how inherent structural biases within algorithm components affect the performance on a wide variety of search landscapes. This study explores the impact of structural bias in the modular Covariance Matrix Adaptation Evolution Strategy (modCMA), focusing on the roles of various modulars within the algorithm. Through an extensive investigation involving 435,456 configurations of modCMA, we identified key modules that significantly influence structural bias of various classes. Our analysis utilized the Deep-BIAS toolbox for structural bias detection and classification, complemented by SHAP analysis for quantifying module contributions. The performance of these configurations was tested on a sequence of affine-recombined functions, maintaining fixed optimum locations while gradually varying the landscape features. Our results demonstrate an interplay between module-induced structural bias and algorithm performance across different landscape characteristics.