Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuhei Watanabe

Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process

Jan 14, 2025

Shuhei Watanabe

Abstract:Gaussian process (GP) is arguably one of the most widely used machine learning algorithms in practice. One of its prominent applications is Bayesian optimization (BO). Although the vanilla GP itself is already a powerful tool for BO, it is often beneficial to be able to consider the dependencies of multiple outputs. To do so, Multi-task GP (MTGP) is formulated, but it is not trivial to fully understand the derivations of its formulations and their gradients from the previous literature. This paper serves friendly derivations of the MTGP formulations and their gradients.

Via

Access Paper or Ask Questions

Derivation of Closed Form of Expected Improvement for Gaussian Process Trained on Log-Transformed Objective

Nov 27, 2024

Shuhei Watanabe

Abstract:Expected Improvement (EI) is arguably the most widely used acquisition function in Bayesian optimization. However, it is often challenging to enhance the performance with EI due to its sensitivity to numerical precision. Previously, Hutter et al. (2009) tackled this problem by using Gaussian process trained on the log-transformed objective function and it was reported that this trick improves the predictive accuracy of GP, leading to substantially better performance. Although Hutter et al. (2009) offered the closed form of their EI, its intermediate derivation has not been provided so far. In this paper, we give a friendly derivation of their proposition.

Via

Access Paper or Ask Questions

Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost Benchmarks

Mar 04, 2024

Shuhei Watanabe, Neeratyoy Mallik, Edward Bergman, Frank Hutter

Abstract:While deep learning has celebrated many successes, its results often hinge on the meticulous selection of hyperparameters (HPs). However, the time-consuming nature of deep learning training makes HP optimization (HPO) a costly endeavor, slowing down the development of efficient HPO tools. While zero-cost benchmarks, which provide performance and runtime without actual training, offer a solution for non-parallel setups, they fall short in parallel setups as each worker must communicate its queried runtime to return its evaluation in the exact order. This work addresses this challenge by introducing a user-friendly Python package that facilitates efficient parallel HPO with zero-cost benchmarks. Our approach calculates the exact return order based on the information stored in file system, eliminating the need for long waiting times and enabling much faster HPO evaluations. We first verify the correctness of our approach through extensive testing and the experiments with 6 popular HPO libraries show its applicability to diverse libraries and its ability to achieve over 1000x speedup compared to a traditional approach. Our package can be installed via pip install mfhpo-simulator.

* Submitted to AutoML Conference 2024 ABCD Track

Via

Access Paper or Ask Questions

Python Wrapper for Simulating Multi-Fidelity Optimization on HPO Benchmarks without Any Wait

May 27, 2023

Shuhei Watanabe

Abstract:Hyperparameter (HP) optimization of deep learning (DL) is essential for high performance. As DL often requires several hours to days for its training, HP optimization (HPO) of DL is often prohibitively expensive. This boosted the emergence of tabular or surrogate benchmarks, which enable querying the (predictive) performance of DL with a specific HP configuration in a fraction. However, since actual runtimes of a DL training are significantly different from query response times, in a naive implementation, simulators of an asynchronous HPO, e.g. multi-fidelity optimization, must wait for the actual runtimes at each iteration; otherwise, the evaluation order in the simulator does not match with the real experiment. To ease this issue, we develop a Python wrapper to force each worker to wait in order to match the evaluation order with the real experiment and describe the usage. Our implementation reduces the waiting time to 0.01 seconds and it is available at https://github.com/nabenabe0928/mfhpo-simulator/.

Via

Access Paper or Ask Questions

Python Tool for Visualizing Variability of Pareto Fronts over Multiple Runs

May 15, 2023

Shuhei Watanabe

Abstract:Hyperparameter optimization is crucial to achieving high performance in deep learning. On top of the performance, other criteria such as inference time or memory requirement often need to be optimized due to some practical reasons. This motivates research on multi-objective optimization (MOO). However, Pareto fronts of MOO methods are often shown without considering the variability caused by random seeds and this makes the performance stability evaluation difficult. Although there is a concept named empirical attainment surface to enable the visualization with uncertainty over multiple runs, there is no major Python package for empirical attainment surface. We, therefore, develop a Python package for this purpose and describe the usage. The package is available at https://github.com/nabenabe0928/empirical-attainment-func.

* Submitted to AutoML workshop

Via

Access Paper or Ask Questions

Tree-structured Parzen estimator: Understanding its algorithm components and their roles for better empirical performance

Apr 25, 2023

Shuhei Watanabe

Abstract:Recent advances in many domains require more and more complicated experiment design. Such complicated experiments often have many parameters, which necessitate parameter tuning. Tree-structured Parzen estimator (TPE), a Bayesian optimization method, is widely used in recent parameter tuning frameworks. Despite its popularity, the roles of each control parameter and the algorithm intuition have not been discussed so far. In this tutorial, we will identify the roles of each control parameter and their impacts on hyperparameter optimization using a diverse set of benchmarks. We compare our recommended setting drawn from the ablation study with baseline methods and demonstrate that our recommended setting improves the performance of TPE. Our TPE implementation is available at https://github.com/nabenabe0928/tpe/tree/single-opt.

Via

Access Paper or Ask Questions

PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

Apr 20, 2023

Shuhei Watanabe, Archit Bansal, Frank Hutter

Abstract:The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this problem, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form computation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient.

* Accepted by IJCAI2023

Via

Access Paper or Ask Questions

Multi-objective Tree-structured Parzen Estimator Meets Meta-learning

Dec 13, 2022

Shuhei Watanabe, Noow Awad, Masaki Onishi, Frank Hutter

Abstract:Hyperparameter optimization (HPO) is essential for the better performance of deep learning, and practitioners often need to consider the trade-off between multiple metrics, such as error rate, latency, memory requirements, robustness, and algorithmic fairness. Due to this demand and the heavy computation of deep learning, the acceleration of multi-objective (MO) optimization becomes ever more important. Although meta-learning has been extensively studied to speedup HPO, existing methods are not applicable to the MO tree-structured parzen estimator (MO-TPE), a simple yet powerful MO-HPO algorithm. In this paper, we extend TPE's acquisition function to the meta-learning setting, using a task similarity defined by the overlap in promising domains of each task. In a comprehensive set of experiments, we demonstrate that our method accelerates MO-TPE on tabular HPO benchmarks and yields state-of-the-art performance. Our method was also validated externally by winning the AutoML 2022 competition on "Multiobjective Hyperparameter Optimization for Transformers".

* Meta-learning workshop on NeurIPS 2022

Via

Access Paper or Ask Questions

c-TPE: Generalizing Tree-structured Parzen Estimator with Inequality Constraints for Continuous and Categorical Hyperparameter Optimization

Nov 26, 2022

Shuhei Watanabe, Frank Hutter

Abstract:Hyperparameter optimization (HPO) is crucial for strong performance of deep learning algorithms. A widely-used versatile HPO method is a variant of Bayesian optimization called tree-structured Parzen estimator (TPE), which splits data into good and bad groups and uses the density ratio of those groups as an acquisition function (AF). However, real-world applications often have some constraints, such as memory requirements, or latency. In this paper, we present an extension of TPE to constrained optimization (c-TPE) via simple factorization of AFs. The experiments demonstrate c-TPE is robust to various constraint levels and exhibits the best average rank performance among existing methods with statistical significance on search spaces with categorical parameters on 81 settings.

Via

Access Paper or Ask Questions

Warm Starting CMA-ES for Hyperparameter Optimization

Dec 13, 2020

Masahiro Nomura, Shuhei Watanabe, Youhei Akimoto, Yoshihiko Ozaki, Masaki Onishi

Figure 1 for Warm Starting CMA-ES for Hyperparameter Optimization

Figure 2 for Warm Starting CMA-ES for Hyperparameter Optimization

Figure 3 for Warm Starting CMA-ES for Hyperparameter Optimization

Figure 4 for Warm Starting CMA-ES for Hyperparameter Optimization

Abstract:Hyperparameter optimization (HPO), formulated as black-box optimization (BBO), is recognized as essential for automation and high performance of machine learning approaches. The CMA-ES is a promising BBO approach with a high degree of parallelism, and has been applied to HPO tasks, often under parallel implementation, and shown superior performance to other approaches including Bayesian optimization (BO). However, if the budget of hyperparameter evaluations is severely limited, which is often the case for end users who do not deserve parallel computing, the CMA-ES exhausts the budget without improving the performance due to its long adaptation phase, resulting in being outperformed by BO approaches. To address this issue, we propose to transfer prior knowledge on similar HPO tasks through the initialization of the CMA-ES, leading to significantly shortening the adaptation time. The knowledge transfer is designed based on the novel definition of task similarity, with which the correlation of the performance of the proposed approach is confirmed on synthetic problems. The proposed warm starting CMA-ES, called WS-CMA-ES, is applied to different HPO tasks where some prior knowledge is available, showing its superior performance over the original CMA-ES as well as BO approaches with or without using the prior knowledge.

* accepted at AAAI2021

Via

Access Paper or Ask Questions