Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Martínez-Rubio

On the necessity of adaptive regularisation:Optimal anytime online learning on $\boldsymbol{\ell_p}$-balls

Jun 24, 2025

Emmeran Johnson, David Martínez-Rubio, Ciara Pike-Burke, Patrick Rebeschini

Abstract:We study online convex optimization on $\ell_p$-balls in $\mathbb{R}^d$ for $p > 2$. While always sub-linear, the optimal regret exhibits a shift between the high-dimensional setting ($d > T$), when the dimension $d$ is greater than the time horizon $T$ and the low-dimensional setting ($d \leq T$). We show that Follow-the-Regularised-Leader (FTRL) with time-varying regularisation which is adaptive to the dimension regime is anytime optimal for all dimension regimes. Motivated by this, we ask whether it is possible to obtain anytime optimality of FTRL with fixed non-adaptive regularisation. Our main result establishes that for separable regularisers, adaptivity in the regulariser is necessary, and that any fixed regulariser will be sub-optimal in one of the two dimension regimes. Finally, we provide lower bounds which rule out sub-linear regret bounds for the linear bandit problem in sufficiently high-dimension for all $\ell_p$-balls with $p \geq 1$.

Via

Access Paper or Ask Questions

Linear Convergence of the Frank-Wolfe Algorithm over Product Polytopes

May 16, 2025

Gabriele Iommazzo, David Martínez-Rubio, Francisco Criado, Elias Wirth, Sebastian Pokutta

Abstract:We study the linear convergence of Frank-Wolfe algorithms over product polytopes. We analyze two condition numbers for the product polytope, namely the \emph{pyramidal width} and the \emph{vertex-facet distance}, based on the condition numbers of individual polytope components. As a result, for convex objectives that are $\mu$-Polyak-{\L}ojasiewicz, we show linear convergence rates quantified in terms of the resulting condition numbers. We apply our results to the problem of approximately finding a feasible point in a polytope intersection in high-dimensions, and demonstrate the practical efficiency of our algorithms through empirical results.

Via

Access Paper or Ask Questions

Beyond Short Steps in Frank-Wolfe Algorithms

Jan 30, 2025

David Martínez-Rubio, Sebastian Pokutta

Figure 1 for Beyond Short Steps in Frank-Wolfe Algorithms

Figure 2 for Beyond Short Steps in Frank-Wolfe Algorithms

Figure 3 for Beyond Short Steps in Frank-Wolfe Algorithms

Figure 4 for Beyond Short Steps in Frank-Wolfe Algorithms

Abstract:We introduce novel techniques to enhance Frank-Wolfe algorithms by leveraging function smoothness beyond traditional short steps. Our study focuses on Frank-Wolfe algorithms with step sizes that incorporate primal-dual guarantees, offering practical stopping criteria. We present a new Frank-Wolfe algorithm utilizing an optimistic framework and provide a primal-dual convergence proof. Additionally, we propose a generalized short-step strategy aimed at optimizing a computable primal-dual gap. Interestingly, this new generalized short-step strategy is also applicable to gradient descent algorithms beyond Frank-Wolfe methods. As a byproduct, our work revisits and refines primal-dual techniques for analyzing Frank-Wolfe algorithms, achieving tighter primal-dual convergence rates. Empirical results demonstrate that our optimistic algorithm outperforms existing methods, highlighting its practical advantages.

Via

Access Paper or Ask Questions

Implicit Riemannian Optimism with Applications to Min-Max Problems

Jan 30, 2025

Christophe Roux, David Martínez-Rubio, Sebastian Pokutta

Figure 1 for Implicit Riemannian Optimism with Applications to Min-Max Problems

Figure 2 for Implicit Riemannian Optimism with Applications to Min-Max Problems

Figure 3 for Implicit Riemannian Optimism with Applications to Min-Max Problems

Figure 4 for Implicit Riemannian Optimism with Applications to Min-Max Problems

Abstract:We introduce a Riemannian optimistic online learning algorithm for Hadamard manifolds based on inexact implicit updates. Unlike prior work, our method can handle in-manifold constraints, and matches the best known regret bounds in the Euclidean setting with no dependence on geometric constants, like the minimum curvature. Building on this, we develop algorithms for g-convex, g-concave smooth min-max problems on Hadamard manifolds. Notably, one method nearly matches the gradient oracle complexity of the lower bound for Euclidean problems, for the first time.

Via

Access Paper or Ask Questions

Black-Box Uniform Stability for Non-Euclidean Empirical Risk Minimization

Dec 20, 2024

Simon Vary, David Martínez-Rubio, Patrick Rebeschini

Abstract:We study first-order algorithms that are uniformly stable for empirical risk minimization (ERM) problems that are convex and smooth with respect to $p$-norms, $p \geq 1$. We propose a black-box reduction method that, by employing properties of uniformly convex regularizers, turns an optimization algorithm for H\"older smooth convex losses into a uniformly stable learning algorithm with optimal statistical risk bounds on the excess risk, up to a constant factor depending on $p$. Achieving a black-box reduction for uniform stability was posed as an open question by (Attia and Koren, 2022), which had solved the Euclidean case $p=2$. We explore applications that leverage non-Euclidean geometry in addressing binary classification problems.

* 33 pages, no figures

Via

Access Paper or Ask Questions

Non-Euclidean High-Order Smooth Convex Optimization

Nov 13, 2024

Juan Pablo Contreras, Cristóbal Guzmán, David Martínez-Rubio

Abstract:We develop algorithms for the optimization of convex objectives that have H\"older continuous $q$-th derivatives with respect to a $p$-norm by using a $q$-th order oracle, for $p, q \geq 1$. We can also optimize other structured functions. We do this by developing a non-Euclidean inexact accelerated proximal point method that makes use of an inexact uniformly convex regularizer. We also provide nearly matching lower bounds for any deterministic algorithm that interacts with the function via a local oracle.

Via

Access Paper or Ask Questions

Accelerated Methods for Riemannian Min-Max Optimization Ensuring Bounded Geometric Penalties

May 25, 2023

David Martínez-Rubio, Christophe Roux, Christopher Criscitiello, Sebastian Pokutta

Abstract:In this work, we study optimization problems of the form $\min_x \max_y f(x, y)$, where $f(x, y)$ is defined on a product Riemannian manifold $\mathcal{M} \times \mathcal{N}$ and is $\mu_x$-strongly geodesically convex (g-convex) in $x$ and $\mu_y$-strongly g-concave in $y$, for $\mu_x, \mu_y \geq 0$. We design accelerated methods when $f$ is $(L_x, L_y, L_{xy})$-smooth and $\mathcal{M}$, $\mathcal{N}$ are Hadamard. To that aim we introduce new g-convex optimization results, of independent interest: we show global linear convergence for metric-projected Riemannian gradient descent and improve existing accelerated methods by reducing geometric constants. Additionally, we complete the analysis of two previous works applying to the Riemannian min-max case by removing an assumption about iterates staying in a pre-specified compact set.

Via

Access Paper or Ask Questions

Accelerated Riemannian Optimization: Handling Constraints with a Prox to Bound Geometric Penalties

Nov 26, 2022

David Martínez-Rubio, Sebastian Pokutta

Abstract:We propose a globally-accelerated, first-order method for the optimization of smooth and (strongly or not) geodesically-convex functions in a wide class of Hadamard manifolds. We achieve the same convergence rates as Nesterov's accelerated gradient descent, up to a multiplicative geometric penalty and log factors. Crucially, we can enforce our method to stay within a compact set we define. Prior fully accelerated works \textit{resort to assuming} that the iterates of their algorithms stay in some pre-specified compact set, except for two previous methods of limited applicability. For our manifolds, this solves the open question in [KY22] about obtaining global general acceleration without iterates assumptively staying in the feasible set.

* arxiv submission, first circulated in May 2022

Via

Access Paper or Ask Questions

Acceleration in Hyperbolic and Spherical Spaces

Dec 16, 2020

David Martínez-Rubio

Figure 1 for Acceleration in Hyperbolic and Spherical Spaces

Figure 2 for Acceleration in Hyperbolic and Spherical Spaces

Abstract:We further research on the acceleration phenomenon on Riemannian manifolds by introducing the first global first-order method that achieves the same rates as accelerated gradient descent in the Euclidean space for the optimization of smooth and geodesically convex (g-convex) or strongly g-convex functions defined on the hyperbolic space or a subset of the sphere, up to constants and log factors. To the best of our knowledge, this is the first method that is proved to achieve these rates globally on functions defined on a Riemannian manifold $\mathcal{M}$ other than the Euclidean space. Additionally, for any Riemannian manifold of bounded sectional curvature, we provide reductions from optimization methods for smooth and g-convex functions to methods for smooth and strongly g-convex functions and vice versa. As a proxy, we solve a constrained non-convex Euclidean problem, under a condition between convexity and quasar-convexity.

Via

Access Paper or Ask Questions

Neural networks are a priori biased towards Boolean functions with low entropy

Sep 29, 2019

Chris Mingard, Joar Skalse, Guillermo Valle-Pérez, David Martínez-Rubio, Vladimir Mikulik, Ard A. Louis

Figure 1 for Neural networks are a priori biased towards Boolean functions with low entropy

Figure 2 for Neural networks are a priori biased towards Boolean functions with low entropy

Figure 3 for Neural networks are a priori biased towards Boolean functions with low entropy

Figure 4 for Neural networks are a priori biased towards Boolean functions with low entropy

Abstract:Understanding the inductive bias of neural networks is critical to explaining their ability to generalise. Here, for one of the simplest neural networks -- a single-layer perceptron with $n$ input neurons, one output neuron, and no threshold bias term -- we prove that upon random initialisation of weights, the a priori probability $P(t)$ that it represents a Boolean function that classifies $t$ points in $\{0,1\}^n$ as $1$ has a remarkably simple form: $ P(t) = 2^{-n} \,\, {\rm for} \,\, 0\leq t < 2^n$. Since a perceptron can express far fewer Boolean functions with small or large values of $t$ (low "entropy") than with intermediate values of $t$ (high "entropy") there is, on average, a strong intrinsic a-priori bias towards individual functions with low entropy. Furthermore, within a class of functions with fixed $t$, we often observe a further intrinsic bias towards functions of lower complexity. Finally, we prove that, regardless of the distribution of inputs, the bias towards low entropy becomes monotonically stronger upon adding ReLU layers, and empirically show that increasing the variance of the bias term has a similar effect.

* Under review as a conference paper at ICLR 2020

Via

Access Paper or Ask Questions