Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Domènec Ruiz-Balet

Constructive conditional normalizing flows

Feb 09, 2026

Borjan Geshkovski, Domènec Ruiz-Balet

Abstract:Motivated by applications in conditional sampling, given a probability measure $μ$ and a diffeomorphism $φ$, we consider the problem of simultaneously approximating $φ$ and the pushforward $φ_{\#}μ$ by means of the flow of a continuity equation whose velocity field is a perceptron neural network with piecewise constant weights. We provide an explicit construction based on a polar-like decomposition of the Lagrange interpolant of $φ$. The latter involves a compressible component, given by the gradient of a particular convex function, which can be realized exactly, and an incompressible component, which -- after approximating via permutations -- can be implemented through shear flows intrinsic to the continuity equation. For more regular maps $φ$ -- such as the Knöthe-Rosenblatt rearrangement -- we provide an alternative, probabilistic construction inspired by the Maurey empirical method, in which the number of discontinuities in the weights doesn't scale inversely with the ambient dimension.

Via

Access Paper or Ask Questions

Perceptrons and localization of attention's mean-field landscape

Jan 29, 2026

Antonio Álvarez-López, Borjan Geshkovski, Domènec Ruiz-Balet

Abstract:The forward pass of a Transformer can be seen as an interacting particle system on the unit sphere: time plays the role of layers, particles that of token embeddings, and the unit sphere idealizes layer normalization. In some weight settings the system can even be seen as a gradient flow for an explicit energy, and one can make sense of the infinite context length (mean-field) limit thanks to Wasserstein gradient flows. In this paper we study the effect of the perceptron block in this setting, and show that critical points are generically atomic and localized on subsets of the sphere.

Via

Access Paper or Ask Questions

Constructive approximate transport maps with normalizing flows

Dec 26, 2024

Antonio Álvarez-López, Borjan Geshkovski, Domènec Ruiz-Balet

Abstract:We study an approximate controllability problem for the continuity equation and its application to constructing transport maps with normalizing flows. Specifically, we construct time-dependent controls $\theta=(w, a, b)$ in the vector field $w(a^\top x + b)_+$ to approximately transport a known base density $\rho_{\mathrm{B}}$ to a target density $\rho_*$. The approximation error is measured in relative entropy, and $\theta$ are constructed piecewise constant, with bounds on the number of switches being provided. Our main result relies on an assumption on the relative tail decay of $\rho_*$ and $\rho_{\mathrm{B}}$, and provides hints on characterizing the reachable space of the continuity equation in relative entropy.

Via

Access Paper or Ask Questions

Measure-to-measure interpolation using Transformers

Nov 07, 2024

Borjan Geshkovski, Philippe Rigollet, Domènec Ruiz-Balet

Figure 1 for Measure-to-measure interpolation using Transformers

Figure 2 for Measure-to-measure interpolation using Transformers

Figure 3 for Measure-to-measure interpolation using Transformers

Figure 4 for Measure-to-measure interpolation using Transformers

Abstract:Transformers are deep neural network architectures that underpin the recent successes of large language models. Unlike more classical architectures that can be viewed as point-to-point maps, a Transformer acts as a measure-to-measure map implemented as specific interacting particle system on the unit sphere: the input is the empirical measure of tokens in a prompt and its evolution is governed by the continuity equation. In fact, Transformers are not limited to empirical measures and can in principle process any input measure. As the nature of data processed by Transformers is expanding rapidly, it is important to investigate their expressive power as maps from an arbitrary measure to another arbitrary measure. To that end, we provide an explicit choice of parameters that allows a single Transformer to match $N$ arbitrary input measures to $N$ arbitrary target measures, under the minimal assumption that every pair of input-target measures can be matched by some transport map.

Via

Access Paper or Ask Questions