Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Franz Rothlauf

Was Tournament Selection All We Ever Needed? A Critical Reflection on Lexicase Selection

Feb 25, 2025

Alina Geiger, Martin Briesch, Dominik Sobania, Franz Rothlauf

Abstract:The success of lexicase selection has led to various extensions, including its combination with down-sampling, which further increased performance. However, recent work found that down-sampling also leads to significant improvements in the performance of tournament selection. This raises the question of whether tournament selection combined with down-sampling is the better choice, given its faster running times. To address this question, we run a set of experiments comparing epsilon-lexicase and tournament selection with different down-sampling techniques on synthetic problems of varying noise levels and problem sizes as well as real-world symbolic regression problems. Overall, we find that down-sampling improves generalization and performance even when compared over the same number of generations. This means that down-sampling is beneficial even with way fewer fitness evaluations. Additionally, down-sampling successfully reduces code growth. We observe that population diversity increases for tournament selection when combined with down-sampling. Further, we find that tournament selection and epsilon-lexicase selection with down-sampling perform similar, while tournament selection is significantly faster. We conclude that tournament selection should be further analyzed and improved in future work instead of only focusing on the improvement of lexicase variants.

Via

Access Paper or Ask Questions

Transformer Semantic Genetic Programming for Symbolic Regression

Jan 30, 2025

Philipp Anthes, Dominik Sobania, Franz Rothlauf

Figure 1 for Transformer Semantic Genetic Programming for Symbolic Regression

Figure 2 for Transformer Semantic Genetic Programming for Symbolic Regression

Figure 3 for Transformer Semantic Genetic Programming for Symbolic Regression

Figure 4 for Transformer Semantic Genetic Programming for Symbolic Regression

Abstract:In standard genetic programming (stdGP), solutions are varied by modifying their syntax, with uncertain effects on their semantics. Geometric-semantic genetic programming (GSGP), a popular variant of GP, effectively searches the semantic solution space using variation operations based on linear combinations, although it results in significantly larger solutions. This paper presents Transformer Semantic Genetic Programming (TSGP), a novel and flexible semantic approach that uses a generative transformer model as search operator. The transformer is trained on synthetic test problems and learns semantic similarities between solutions. Once the model is trained, it can be used to create offspring solutions with high semantic similarity also for unseen and unknown problems. Experiments on several symbolic regression problems show that TSGP generates solutions with comparable or even significantly better prediction quality than stdGP, SLIM_GSGP, DSR, and DAE-GP. Like SLIM_GSGP, TSGP is able to create new solutions that are semantically similar without creating solutions of large size. An analysis of the search dynamic reveals that the solutions generated by TSGP are semantically more similar than the solutions generated by the benchmark approaches allowing a better exploration of the semantic solution space.

Via

Access Paper or Ask Questions

ComfyGI: Automatic Improvement of Image Generation Workflows

Nov 21, 2024

Dominik Sobania, Martin Briesch, Franz Rothlauf

Abstract:Automatic image generation is no longer just of interest to researchers, but also to practitioners. However, current models are sensitive to the settings used and automatic optimization methods often require human involvement. To bridge this gap, we introduce ComfyGI, a novel approach to automatically improve workflows for image generation without the need for human intervention driven by techniques from genetic improvement. This enables image generation with significantly higher quality in terms of the alignment with the given description and the perceived aesthetics. On the performance side, we find that overall, the images generated with an optimized workflow are about 50% better compared to the initial workflow in terms of the median ImageReward score. These already good results are even surpassed in our human evaluation, as the participants preferred the images improved by ComfyGI in around 90% of the cases.

Via

Access Paper or Ask Questions

Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers (Extended Version)

Aug 30, 2024

Philipp Röchner, Henrique O. Marques, Ricardo J. G. B. Campello, Arthur Zimek, Franz Rothlauf

Abstract:Outlier detection algorithms typically assign an outlier score to each observation in a dataset, indicating the degree to which an observation is an outlier. However, these scores are often not comparable across algorithms and can be difficult for humans to interpret. Statistical scaling addresses this problem by transforming outlier scores into outlier probabilities without using ground-truth labels, thereby improving interpretability and comparability across algorithms. However, the quality of this transformation can be different for outliers and inliers. Missing outliers in scenarios where they are of particular interest - such as healthcare, finance, or engineering - can be costly or dangerous. Thus, ensuring good probabilities for outliers is essential. This paper argues that statistical scaling, as commonly used in the literature, does not produce equally good probabilities for outliers as for inliers. Therefore, we propose robust statistical scaling, which uses robust estimators to improve the probabilities for outliers. We evaluate several variants of our method against other outlier score transformations for real-world datasets and outlier detection algorithms, where it can improve the probabilities for outliers.

* 15 pages, 4 figures, extended version of an original article accepted for publication in SISAP 2024 by Springer Nature

Via

Access Paper or Ask Questions

Lexicase-based Selection Methods with Down-sampling for Symbolic Regression Problems: Overview and Benchmark

Jul 31, 2024

Alina Geiger, Dominik Sobania, Franz Rothlauf

Abstract:In recent years, several new lexicase-based selection variants have emerged due to the success of standard lexicase selection in various application domains. For symbolic regression problems, variants that use an epsilon-threshold or batches of training cases, among others, have led to performance improvements. Lately, especially variants that combine lexicase selection and down-sampling strategies have received a lot of attention. This paper evaluates random as well as informed down-sampling in combination with the relevant lexicase-based selection methods on a wide range of symbolic regression problems. In contrast to most work, we not only compare the methods over a given evaluation budget, but also over a given time as time is usually limited in practice. We find that for a given evaluation budget, epsilon-lexicase selection in combination with random or informed down-sampling outperforms all other methods. Only for a rather long running time of 24h, the best performing method is tournament selection in combination with informed down-sampling. If the given running time is very short, lexicase variants using batches of training cases perform best.

Via

Access Paper or Ask Questions

Large Language Models Suffer From Their Own Output: An Analysis of the Self-Consuming Training Loop

Nov 28, 2023

Martin Briesch, Dominik Sobania, Franz Rothlauf

Abstract:Large language models (LLM) have become state of the art in many benchmarks and conversational LLM applications like ChatGPT are now widely used by the public. Those LLMs can be used to generate large amounts of content which is posted on the internet to various platforms. As LLMs are trained on datasets usually collected from the internet, this LLM-generated content might be used to train the next generation of LLMs. Therefore, a self-consuming training loop emerges in which new LLM generations are trained on the output from the previous generations. We empirically study this self-consuming training loop using a novel dataset to analytically and accurately measure quality and diversity of generated outputs. We find that this self-consuming training loop initially improves both quality and diversity. However, after a few generations the output inevitably degenerates in diversity. We find that the rate of degeneration depends on the proportion of real and generated data.

Via

Access Paper or Ask Questions

Do You Trust ChatGPT? -- Perceived Credibility of Human and AI-Generated Content

Sep 05, 2023

Martin Huschens, Martin Briesch, Dominik Sobania, Franz Rothlauf

Abstract:This paper examines how individuals perceive the credibility of content originating from human authors versus content generated by large language models, like the GPT language model family that powers ChatGPT, in different user interface versions. Surprisingly, our results demonstrate that regardless of the user interface presentation, participants tend to attribute similar levels of credibility. While participants also do not report any different perceptions of competence and trustworthiness between human and AI-generated content, they rate AI-generated content as being clearer and more engaging. The findings from this study serve as a call for a more discerning approach to evaluating information sources, encouraging users to exercise caution and critical thinking when engaging with content generated by AI systems.

Via

Access Paper or Ask Questions

Down-Sampled Epsilon-Lexicase Selection for Real-World Symbolic Regression Problems

Feb 08, 2023

Alina Geiger, Dominik Sobania, Franz Rothlauf

Abstract:Epsilon-lexicase selection is a parent selection method in genetic programming that has been successfully applied to symbolic regression problems. Recently, the combination of random subsampling with lexicase selection significantly improved performance in other genetic programming domains such as program synthesis. However, the influence of subsampling on the solution quality of real-world symbolic regression problems has not yet been studied. In this paper, we propose down-sampled epsilon-lexicase selection which combines epsilon-lexicase selection with random subsampling to improve the performance in the domain of symbolic regression. Therefore, we compare down-sampled epsilon-lexicase with traditional selection methods on common real-world symbolic regression problems and analyze its influence on the properties of the population over a genetic programming run. We find that the diversity is reduced by using down-sampled epsilon-lexicase selection compared to standard epsilon-lexicase selection. This comes along with high hyperselection rates we observe for down-sampled epsilon-lexicase selection. Further, we find that down-sampled epsilon-lexicase selection outperforms the traditional selection methods on all studied problems. Overall, with down-sampled epsilon-lexicase selection we observe an improvement of the solution quality of up to 85% in comparison to standard epsilon-lexicase selection.

Via

Access Paper or Ask Questions

MTGP: Combining Metamorphic Testing and Genetic Programming

Jan 20, 2023

Dominik Sobania, Martin Briesch, Philipp Röchner, Franz Rothlauf

Abstract:Genetic programming is an evolutionary approach known for its performance in program synthesis. However, it is not yet mature enough for a practical use in real-world software development, since usually many training cases are required to generate programs that generalize to unseen test cases. As in practice, the training cases have to be expensively hand-labeled by the user, we need an approach to check the program behavior with a lower number of training cases. Metamorphic testing needs no labeled input/output examples. Instead, the program is executed multiple times, first on a given (randomly generated) input, followed by related inputs to check whether certain user-defined relations between the observed outputs hold. In this work, we suggest MTGP, which combines metamorphic testing and genetic programming and study its performance and the generalizability of the generated programs. Further, we analyze how the generalizability depends on the number of given labeled training cases. We find that using metamorphic testing combined with labeled training cases leads to a higher generalization rate than the use of labeled training cases alone in almost all studied configurations. Consequently, we recommend researchers to use metamorphic testing in their systems if the labeling of the training data is expensive.

Via

Access Paper or Ask Questions

Informed Down-Sampled Lexicase Selection: Identifying productive training cases for efficient problem solving

Jan 04, 2023

Ryan Boldi, Martin Briesch, Dominik Sobania, Alexander Lalejini, Thomas Helmuth, Franz Rothlauf, Charles Ofria, Lee Spector

Figure 1 for Informed Down-Sampled Lexicase Selection: Identifying productive training cases for efficient problem solving

Figure 2 for Informed Down-Sampled Lexicase Selection: Identifying productive training cases for efficient problem solving

Figure 3 for Informed Down-Sampled Lexicase Selection: Identifying productive training cases for efficient problem solving

Figure 4 for Informed Down-Sampled Lexicase Selection: Identifying productive training cases for efficient problem solving

Abstract:Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection. Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases allowing for more individuals to be explored with the same amount of program executions. However, creating a down-sample randomly might exclude important cases from the current down-sample for a number of generations, while cases that measure the same behavior (synonymous cases) may be overused despite their redundancy. In this work, we introduce Informed Down-Sampled Lexicase Selection. This method leverages population statistics to build down-samples that contain more distinct and therefore informative training cases. Through an empirical investigation across two different GP systems (PushGP and Grammar-Guided GP), we find that informed down-sampling significantly outperforms random down-sampling on a set of contemporary program synthesis benchmark problems. Through an analysis of the created down-samples, we find that important training cases are included in the down-sample consistently across independent evolutionary runs and systems. We hypothesize that this improvement can be attributed to the ability of Informed Down-Sampled Lexicase Selection to maintain more specialist individuals over the course of evolution, while also benefiting from reduced per-evaluation costs.

Via

Access Paper or Ask Questions