Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mathieu Dagréou

A Near-Optimal Algorithm for Bilevel Empirical Risk Minimization

Feb 17, 2023

Mathieu Dagréou, Thomas Moreau, Samuel Vaiter, Pierre Ablin

Figure 1 for A Near-Optimal Algorithm for Bilevel Empirical Risk Minimization

Figure 2 for A Near-Optimal Algorithm for Bilevel Empirical Risk Minimization

Abstract:Bilevel optimization problems, which are problems where two optimization problems are nested, have more and more applications in machine learning. In many practical cases, the upper and the lower objectives correspond to empirical risk minimization problems and therefore have a sum structure. In this context, we propose a bilevel extension of the celebrated SARAH algorithm. We demonstrate that the algorithm requires $\mathcal{O}((n+m)^{\frac12}\varepsilon^{-1})$ gradient computations to achieve $\varepsilon$-stationarity with $n+m$ the total number of samples, which improves over all previous bilevel algorithms. Moreover, we provide a lower bound on the number of oracle calls required to get an approximate stationary point of the objective function of the bilevel problem. This lower bound is attained by our algorithm, which is therefore optimal in terms of sample complexity.

Via

Access Paper or Ask Questions

Benchopt: Reproducible, efficient and collaborative optimization benchmarks

Jun 28, 2022

Thomas Moreau, Mathurin Massias, Alexandre Gramfort, Pierre Ablin, Pierre-Antoine Bannier, Benjamin Charlier, Mathieu Dagréou, Tom Dupré la Tour, Ghislain Durif, Cassio F. Dantas(+11 more)

Figure 1 for Benchopt: Reproducible, efficient and collaborative optimization benchmarks

Figure 2 for Benchopt: Reproducible, efficient and collaborative optimization benchmarks

Figure 3 for Benchopt: Reproducible, efficient and collaborative optimization benchmarks

Figure 4 for Benchopt: Reproducible, efficient and collaborative optimization benchmarks

Abstract:Numerical validation is at the core of machine learning research as it allows to assess the actual impact of new methods, and to confirm the agreement between theory and practice. Yet, the rapid development of the field poses several challenges: researchers are confronted with a profusion of methods to compare, limited transparency and consensus on best practices, as well as tedious re-implementation work. As a result, validation is often very partial, which can lead to wrong conclusions that slow down the progress of research. We propose Benchopt, a collaborative framework to automate, reproduce and publish optimization benchmarks in machine learning across programming languages and hardware architectures. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments. To demonstrate its broad usability, we showcase benchmarks on three standard learning tasks: $\ell_2$-regularized logistic regression, Lasso, and ResNet18 training for image classification. These benchmarks highlight key practical findings that give a more nuanced view of the state-of-the-art for these problems, showing that for practical evaluation, the devil is in the details. We hope that Benchopt will foster collaborative work in the community hence improving the reproducibility of research findings.

Via

Access Paper or Ask Questions

A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

Jan 31, 2022

Mathieu Dagréou, Pierre Ablin, Samuel Vaiter, Thomas Moreau

Figure 1 for A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

Figure 2 for A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

Figure 3 for A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

Figure 4 for A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

Abstract:Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for bilevel optimization that verifies either of these properties. Numerical experiments validate the usefulness of our method.

Via

Access Paper or Ask Questions