Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Luiz F. O. Chamon

Learning (Approximately) Equivariant Networks via Constrained Optimization

May 19, 2025

Andrei Manolache, Luiz F. O. Chamon, Mathias Niepert

Abstract:Equivariant neural networks are designed to respect symmetries through their architecture, boosting generalization and sample efficiency when those symmetries are present in the data distribution. Real-world data, however, often departs from perfect symmetry because of noise, structural variation, measurement bias, or other symmetry-breaking effects. Strictly equivariant models may struggle to fit the data, while unconstrained models lack a principled way to leverage partial symmetries. Even when the data is fully symmetric, enforcing equivariance can hurt training by limiting the model to a restricted region of the parameter space. Guided by homotopy principles, where an optimization problem is solved by gradually transforming a simpler problem into a complex one, we introduce Adaptive Constrained Equivariance (ACE), a constrained optimization approach that starts with a flexible, non-equivariant model and gradually reduces its deviation from equivariance. This gradual tightening smooths training early on and settles the model at a data-driven equilibrium, balancing between equivariance and non-equivariance. Across multiple architectures and tasks, our method consistently improves performance metrics, sample efficiency, and robustness to input perturbations compared with strictly equivariant models and heuristic equivariance relaxations.

Via

Access Paper or Ask Questions

Benign Overfitting and the Geometry of the Ridge Regression Solution in Binary Classification

Mar 11, 2025

Alexander Tsigler, Luiz F. O. Chamon, Spencer Frei, Peter L. Bartlett

Abstract:In this work, we investigate the behavior of ridge regression in an overparameterized binary classification task. We assume examples are drawn from (anisotropic) class-conditional cluster distributions with opposing means and we allow for the training labels to have a constant level of label-flipping noise. We characterize the classification error achieved by ridge regression under the assumption that the covariance matrix of the cluster distribution has a high effective rank in the tail. We show that ridge regression has qualitatively different behavior depending on the scale of the cluster mean vector and its interaction with the covariance matrix of the cluster distributions. In regimes where the scale is very large, the conditions that allow for benign overfitting turn out to be the same as those for the regression task. We additionally provide insights into how the introduction of label noise affects the behavior of the minimum norm interpolator (MNI). The optimal classifier in this setting is a linear transformation of the cluster mean vector and in the noiseless setting the MNI approximately learns this transformation. On the other hand, the introduction of label noise can significantly change the geometry of the solution while preserving the same qualitative behavior.

* 115 pages, 2 figures

Via

Access Paper or Ask Questions

Constrained Sampling with Primal-Dual Langevin Monte Carlo

Nov 01, 2024

Luiz F. O. Chamon, Mohammad Reza Karimi, Anna Korba

Abstract:This work considers the problem of sampling from a probability distribution known up to a normalization constant while satisfying a set of statistical constraints specified by the expected values of general nonlinear functions. This problem finds applications in, e.g., Bayesian inference, where it can constrain moments to evaluate counterfactual scenarios or enforce desiderata such as prediction fairness. Methods developed to handle support constraints, such as those based on mirror maps, barriers, and penalties, are not suited for this task. This work therefore relies on gradient descent-ascent dynamics in Wasserstein space to put forward a discrete-time primal-dual Langevin Monte Carlo algorithm (PD-LMC) that simultaneously constrains the target distribution and samples from it. We analyze the convergence of PD-LMC under standard assumptions on the target distribution and constraints, namely (strong) convexity and log-Sobolev inequalities. To do so, we bring classical optimization arguments for saddle-point algorithms to the geometry of Wasserstein space. We illustrate the relevance and effectiveness of PD-LMC in several applications.

* 39 pages, 14 figures. Published at NeurIPS 2024

Via

Access Paper or Ask Questions

Solving Differential Equations with Constrained Learning

Oct 30, 2024

Viggo Moro, Luiz F. O. Chamon

Figure 1 for Solving Differential Equations with Constrained Learning

Figure 2 for Solving Differential Equations with Constrained Learning

Figure 3 for Solving Differential Equations with Constrained Learning

Figure 4 for Solving Differential Equations with Constrained Learning

Abstract:(Partial) differential equations (PDEs) are fundamental tools for describing natural phenomena, making their solution crucial in science and engineering. While traditional methods, such as the finite element method, provide reliable solutions, their accuracy is often tied to the use of computationally intensive fine meshes. Moreover, they do not naturally account for measurements or prior solutions, and any change in the problem parameters requires results to be fully recomputed. Neural network-based approaches, such as physics-informed neural networks and neural operators, offer a mesh-free alternative by directly fitting those models to the PDE solution. They can also integrate prior knowledge and tackle entire families of PDEs by simply aggregating additional training losses. Nevertheless, they are highly sensitive to hyperparameters such as collocation points and the weights associated with each loss. This paper addresses these challenges by developing a science-constrained learning (SCL) framework. It demonstrates that finding a (weak) solution of a PDE is equivalent to solving a constrained learning problem with worst-case losses. This explains the limitations of previous methods that minimize the expected value of aggregated losses. SCL also organically integrates structural constraints (e.g., invariances) and (partial) measurements or known solutions. The resulting constrained learning problems can be tackled using a practical algorithm that yields accurate solutions across a variety of PDEs, neural network architectures, and prior knowledge levels without extensive hyperparameter tuning and sometimes even at a lower computational cost.

Via

Access Paper or Ask Questions

Near-Optimal Solutions of Constrained Learning Problems

Mar 18, 2024

Juan Elenter, Luiz F. O. Chamon, Alejandro Ribeiro

Abstract:With the widespread adoption of machine learning systems, the need to curtail their behavior has become increasingly apparent. This is evidenced by recent advancements towards developing models that satisfy robustness, safety, and fairness requirements. These requirements can be imposed (with generalization guarantees) by formulating constrained learning problems that can then be tackled by dual ascent algorithms. Yet, though these algorithms converge in objective value, even in non-convex settings, they cannot guarantee that their outcome is feasible. Doing so requires randomizing over all iterates, which is impractical in virtually any modern applications. Still, final iterates have been observed to perform well in practice. In this work, we address this gap between theory and practice by characterizing the constraint violation of Lagrangian minimizers associated with optimal dual variables, despite lack of convexity. To do this, we leverage the fact that non-convex, finite-dimensional constrained learning problems can be seen as parametrizations of convex, functional problems. Our results show that rich parametrizations effectively mitigate the issue of feasibility in dual methods, shedding light on prior empirical successes of dual learning. We illustrate our findings in fair learning tasks.

Via

Access Paper or Ask Questions

Reply to 'Comments on Graphon Signal Processing' [arXiv:2310.14683]

Jan 05, 2024

Luana Ruiz, Luiz F. O. Chamon, Alejandro Ribeiro

Abstract:This technical note addresses an issue [arXiv:2310.14683] with the proof (but not the statement) of [arXiv:2003.05030, Proposition 4]. The statement of the proposition is correct, but the proof as written in [arXiv:2003.05030] is not and due to a typo in the manuscript, a reference to the correct proof is effectively missing. In the sequel, we present [arXiv:2003.05030, Proposition 4] and its proof. The proof follows from results in [2] that we reproduce here for clarity of exposition. Since the statement of the proposition remains correct, no change in the results of [arXiv:2003.05030] are required. In particular, Lemma 3 and Lemma 4 showing spectral convergence of graphs to graphons, Theorem 1 showing convergence of the GFT to the WFT, and Theorems 3 and 4 showing convergence of graph to graphon filters, remain valid.

* Reply to 'Comments on Graphon Signal Processing'

Via

Access Paper or Ask Questions

Distributed Universal Adaptive Networks

Jul 07, 2023

Cassio G. Lopes, Vítor H. Nascimento, Luiz F. O. Chamon

Abstract:Adaptive networks (ANs) are effective real time techniques to process and track events observed by sensor networks and, more recently, to equip Internet of Things (IoT) applications. ANs operate over nodes equipped with collaborative adaptive filters that solve distributively an estimation problem common to the whole network. However, they do not guarantee that nodes do not lose from cooperation, as compared to its non-cooperative operation; that poor nodes are rejected and exceptional nodes estimates reach the entire network; and that performance is uniform over all nodes. In order to enforce such properties, this work introduces the concept of distributed universal estimation, which encompasses the new concepts of local universality, global universality and universality with respect to the non-cooperative operation. We then construct a new cooperation protocol that is proven to be distributively universal, outperforming direct competitors from the literature, as shown by several simulations. Mean and mean-square analytical models are developed, with good agreement between theory and simulations.

* IEEE Transactions on Signal Processing, v. 71, p. 1817-1832, 2023

Via

Access Paper or Ask Questions

Resilient Constrained Learning

Jun 04, 2023

Ignacio Hounie, Alejandro Ribeiro, Luiz F. O. Chamon

Abstract:When deploying machine learning solutions, they must satisfy multiple requirements beyond accuracy, such as fairness, robustness, or safety. These requirements are imposed during training either implicitly, using penalties, or explicitly, using constrained optimization methods based on Lagrangian duality. Either way, specifying requirements is hindered by the presence of compromises and limited prior knowledge about the data. Furthermore, their impact on performance can often only be evaluated by actually solving the learning problem. This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task. To do so, it relaxes the learning constraints in a way that contemplates how much they affect the task at hand by balancing the performance gains obtained from the relaxation against a user-defined cost of that relaxation. We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation. We show conditions under which this balance can be achieved and introduce a practical algorithm to compute it, for which we derive approximation and generalization guarantees. We showcase the advantages of this resilient learning method in image classification tasks involving multiple potential invariances and in heterogeneous federated learning.

Via

Access Paper or Ask Questions

Automatic Data Augmentation via Invariance-Constrained Learning

Sep 29, 2022

Ignacio Hounie, Luiz F. O. Chamon, Alejandro Ribeiro

Figure 1 for Automatic Data Augmentation via Invariance-Constrained Learning

Figure 2 for Automatic Data Augmentation via Invariance-Constrained Learning

Figure 3 for Automatic Data Augmentation via Invariance-Constrained Learning

Figure 4 for Automatic Data Augmentation via Invariance-Constrained Learning

Abstract:Underlying data structures, such as symmetries or invariances to transformations, are often exploited to improve the solution of learning tasks. However, embedding these properties in models or learning algorithms can be challenging and computationally intensive. Data augmentation, on the other hand, induces these symmetries during training by applying multiple transformations to the input data. Despite its ubiquity, its effectiveness depends on the choices of which transformations to apply, when to do so, and how often. In fact, there is both empirical and theoretical evidence that the indiscriminate use of data augmentation can introduce biases that outweigh its benefits. This work tackles these issues by automatically adapting the data augmentation while solving the learning task. To do so, it formulates data augmentation as an invariance-constrained learning problem and leverages Monte Carlo Markov Chain (MCMC) sampling to solve it. The result is a practical algorithm that not only does away with a priori searches for augmentation distributions, but also dynamically controls if and when data augmentation is applied. Our experiments illustrate the performance of this method, which achieves state-of-the-art results in automatic data augmentation benchmarks for CIFAR datasets. Furthermore, this approach can be used to gather insights on the actual symmetries underlying a learning task.

Via

Access Paper or Ask Questions

Probabilistically Robust Learning: Balancing Average- and Worst-case Performance

Feb 02, 2022

Alexander Robey, Luiz F. O. Chamon, George J. Pappas, Hamed Hassani

Figure 1 for Probabilistically Robust Learning: Balancing Average- and Worst-case Performance

Figure 2 for Probabilistically Robust Learning: Balancing Average- and Worst-case Performance

Figure 3 for Probabilistically Robust Learning: Balancing Average- and Worst-case Performance

Figure 4 for Probabilistically Robust Learning: Balancing Average- and Worst-case Performance

Abstract:Many of the successes of machine learning are based on minimizing an averaged loss function. However, it is well-known that this paradigm suffers from robustness issues that hinder its applicability in safety-critical domains. These issues are often addressed by training against worst-case perturbations of data, a technique known as adversarial training. Although empirically effective, adversarial training can be overly conservative, leading to unfavorable trade-offs between nominal performance and robustness. To this end, in this paper we propose a framework called probabilistic robustness that bridges the gap between the accurate, yet brittle average case and the robust, yet conservative worst case by enforcing robustness to most rather than to all perturbations. From a theoretical point of view, this framework overcomes the trade-offs between the performance and the sample-complexity of worst-case and average-case learning. From a practical point of view, we propose a novel algorithm based on risk-aware optimization that effectively balances average- and worst-case performance at a considerably lower computational cost relative to adversarial training. Our results on MNIST, CIFAR-10, and SVHN illustrate the advantages of this framework on the spectrum from average- to worst-case robustness.

Via

Access Paper or Ask Questions