Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefan Coors

AMLB: an AutoML Benchmark

Jul 25, 2022

Pieter Gijsbers, Marcos L. P. Bueno, Stefan Coors, Erin LeDell, Sébastien Poirier, Janek Thomas, Bernd Bischl, Joaquin Vanschoren

Abstract:Comparing different AutoML frameworks is notoriously challenging and often done incorrectly. We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks. We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks. The differences between the AutoML frameworks are explored with a multi-faceted analysis, evaluating model accuracy, its trade-offs with inference time, and framework failures. We also use Bradley-Terry trees to discover subsets of tasks where the relative AutoML framework rankings differ. The benchmark comes with an open-source tool that integrates with many AutoML frameworks and automates the empirical evaluation process end-to-end: from framework installation and resource allocation to in-depth evaluation. The benchmark uses public data sets, can be easily extended with other AutoML frameworks and tasks, and has a website with up-to-date results.

* Submitted to JMLR

Via

Access Paper or Ask Questions

Multi-Objective Hyperparameter Optimization -- An Overview

Jun 15, 2022

Florian Karl, Tobias Pielok, Julia Moosbauer, Florian Pfisterer, Stefan Coors, Martin Binder, Lennart Schneider, Janek Thomas, Jakob Richter, Michel Lang(+3 more)

Figure 1 for Multi-Objective Hyperparameter Optimization -- An Overview

Figure 2 for Multi-Objective Hyperparameter Optimization -- An Overview

Figure 3 for Multi-Objective Hyperparameter Optimization -- An Overview

Figure 4 for Multi-Objective Hyperparameter Optimization -- An Overview

Abstract:Hyperparameter optimization constitutes a large part of typical modern machine learning workflows. This arises from the fact that machine learning methods and corresponding preprocessing steps often only yield optimal performance when hyperparameters are properly tuned. But in many applications, we are not only interested in optimizing ML pipelines solely for predictive accuracy; additional metrics or constraints must be considered when determining an optimal configuration, resulting in a multi-objective optimization problem. This is often neglected in practice, due to a lack of knowledge and readily available software implementations for multi-objective hyperparameter optimization. In this work, we introduce the reader to the basics of multi- objective hyperparameter optimization and motivate its usefulness in applied ML. Furthermore, we provide an extensive survey of existing optimization strategies, both from the domain of evolutionary algorithms and Bayesian optimization. We illustrate the utility of MOO in several specific ML applications, considering objectives such as operating conditions, prediction time, sparseness, fairness, interpretability and robustness.

* 56 pages, 11 figures, submitted to ACM TELO

Via

Access Paper or Ask Questions

Automatic Componentwise Boosting: An Interpretable AutoML System

Sep 12, 2021

Stefan Coors, Daniel Schalk, Bernd Bischl, David Rügamer

Figure 1 for Automatic Componentwise Boosting: An Interpretable AutoML System

Figure 2 for Automatic Componentwise Boosting: An Interpretable AutoML System

Figure 3 for Automatic Componentwise Boosting: An Interpretable AutoML System

Figure 4 for Automatic Componentwise Boosting: An Interpretable AutoML System

Abstract:In practice, machine learning (ML) workflows require various different steps, from data preprocessing, missing value imputation, model selection, to model tuning as well as model evaluation. Many of these steps rely on human ML experts. AutoML - the field of automating these ML pipelines - tries to help practitioners to apply ML off-the-shelf without any expert knowledge. Most modern AutoML systems like auto-sklearn, H20-AutoML or TPOT aim for high predictive performance, thereby generating ensembles that consist almost exclusively of black-box models. This, in turn, makes the interpretation for the layperson more intricate and adds another layer of opacity for users. We propose an AutoML system that constructs an interpretable additive model that can be fitted using a highly scalable componentwise boosting algorithm. Our system provides tools for easy model interpretation such as visualizing partial effects and pairwise interactions, allows for a straightforward calculation of feature importance, and gives insights into the required model complexity to fit the given task. We introduce the general framework and outline its implementation autocompboost. To demonstrate the frameworks efficacy, we compare autocompboost to other existing systems based on the OpenML AutoML-Benchmark. Despite its restriction to an interpretable model space, our system is competitive in terms of predictive performance on most data sets while being more user-friendly and transparent.

* 6 pages, 4 figures, ECML-PKDD Workshop on Automating Data Science 2021

Via

Access Paper or Ask Questions

Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges

Jul 14, 2021

Bernd Bischl, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, Theresa Ullmann, Marc Becker, Anne-Laure Boulesteix(+2 more)

Figure 1 for Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges

Figure 2 for Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges

Figure 3 for Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges

Figure 4 for Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges

Abstract:Most machine learning algorithms are configured by one or several hyperparameters that must be carefully chosen and often considerably impact performance. To avoid a time consuming and unreproducible manual trial-and-error process to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods, e.g., based on resampling error estimation for supervised machine learning, can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods such as grid or random search, evolutionary algorithms, Bayesian optimization, Hyperband and racing. It gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with ML pipelines, runtime improvements, and parallelization.

Via

Access Paper or Ask Questions

Multi-Objective Automatic Machine Learning with AutoxgboostMC

Aug 28, 2019

Florian Pfisterer, Stefan Coors, Janek Thomas, Bernd Bischl

Figure 1 for Multi-Objective Automatic Machine Learning with AutoxgboostMC

Figure 2 for Multi-Objective Automatic Machine Learning with AutoxgboostMC

Figure 3 for Multi-Objective Automatic Machine Learning with AutoxgboostMC

Figure 4 for Multi-Objective Automatic Machine Learning with AutoxgboostMC

Abstract:AutoML systems are currently rising in popularity, as they can build powerful models without human oversight. They often combine techniques from many different sub-fields of machine learning in order to find a model or set of models that optimize a user-supplied criterion, such as predictive performance. The ultimate goal of such systems is to reduce the amount of time spent on menial tasks, or tasks that can be solved better by algorithms while leaving decisions that require human intelligence to the end-user. In recent years, the importance of other criteria, such as fairness and interpretability, and many others have become more and more apparent. Current AutoML frameworks either do not allow to optimize such secondary criteria or only do so by limiting the system's choice of models and preprocessing steps. We propose to optimize additional criteria defined by the user directly to guide the search towards an optimal machine learning pipeline. In order to demonstrate the need and usefulness of our approach, we provide a simple multi-criteria AutoML system and showcase an exemplary application.

* Accepted at ECMLPKDD WORKSHOP ON AUTOMATING DATA SCIENCE

Via

Access Paper or Ask Questions

Automatic Gradient Boosting

Jul 13, 2018

Janek Thomas, Stefan Coors, Bernd Bischl

Figure 1 for Automatic Gradient Boosting

Figure 2 for Automatic Gradient Boosting

Figure 3 for Automatic Gradient Boosting

Abstract:Automatic machine learning performs predictive modeling with high performing machine learning tools without human interference. This is achieved by making machine learning applications parameter-free, i.e. only a dataset is provided while the complete model selection and model building process is handled internally through (often meta) optimization. Projects like Auto-WEKA and auto-sklearn aim to solve the Combined Algorithm Selection and Hyperparameter optimization (CASH) problem resulting in huge configuration spaces. However, for most real-world applications, the optimization over only a few different key learning algorithms can not only be sufficient, but also potentially beneficial. The latter becomes apparent when one considers that models have to be validated, explained, deployed and maintained. Here, less complex model are often preferred, for validation or efficiency reasons, or even a strict requirement. Automatic gradient boosting simplifies this idea one step further, using only gradient boosting as a single learning algorithm in combination with model-based hyperparameter tuning, threshold optimization and encoding of categorical features. We introduce this general framework as well as a concrete implementation called autoxgboost. It is compared to current AutoML projects on 16 datasets and despite its simplicity is able to achieve comparable results on about half of the datasets as well as performing best on two.

* 6 pages, 1 figure, ICML 2018 AutoML Workshop

Via

Access Paper or Ask Questions