Abstract:Sliced Wasserstein distances preserve properties of classic Wasserstein distances while being more scalable for computation and estimation in high dimensions. The goal of this work is to quantify this scalability from three key aspects: (i) empirical convergence rates; (ii) robustness to data contamination; and (iii) efficient computational methods. For empirical convergence, we derive fast rates with explicit dependence of constants on dimension, subject to log-concavity of the population distributions. For robustness, we characterize minimax optimal, dimension-free robust estimation risks, and show an equivalence between robust sliced 1-Wasserstein estimation and robust mean estimation. This enables lifting statistical and algorithmic guarantees available for the latter to the sliced 1-Wasserstein setting. Moving on to computational aspects, we analyze the Monte Carlo estimator for the average-sliced distance, demonstrating that larger dimension can result in faster convergence of the numerical integration error. For the max-sliced distance, we focus on a subgradient-based local optimization algorithm that is frequently used in practice, albeit without formal guarantees, and establish an $O(\epsilon^{-4})$ computational complexity bound for it. Our theory is validated by numerical experiments, which altogether provide a comprehensive quantitative account of the scalability question.
Abstract:The smooth 1-Wasserstein distance (SWD) $W_1^\sigma$ was recently proposed as a means to mitigate the curse of dimensionality in empirical approximation while preserving the Wasserstein structure. Indeed, SWD exhibits parametric convergence rates and inherits the metric and topological structure of the classic Wasserstein distance. Motivated by the above, this work conducts a thorough statistical study of the SWD, including a high-dimensional limit distribution result for empirical $W_1^\sigma$, bootstrap consistency, concentration inequalities, and Berry-Esseen type bounds. The derived nondegenerate limit stands in sharp contrast with the classic empirical $W_1$, for which a similar result is known only in the one-dimensional case. We also explore asymptotics and characterize the limit distribution when the smoothing parameter $\sigma$ is scaled with $n$, converging to $0$ at a sufficiently slow rate. The dimensionality of the sampled distribution enters empirical SWD convergence bounds only through the prefactor (i.e., the constant). We provide a sharp characterization of this prefactor's dependence on the smoothing parameter and the intrinsic dimension. This result is then used to derive new empirical convergence rates for classic $W_1$ in terms of the intrinsic dimension. As applications of the limit distribution theory, we study two-sample testing and minimum distance estimation (MDE) under $W_1^\sigma$. We establish asymptotic validity of SWD testing, while for MDE, we prove measurability, almost sure convergence, and limit distributions for optimal estimators and their corresponding $W_1^\sigma$ error. Our results suggest that the SWD is well suited for high-dimensional statistical learning and inference.
Abstract:Statistical distances, i.e., discrepancy measures between probability distributions, are ubiquitous in probability theory, statistics and machine learning. To combat the curse of dimensionality when estimating these distances from data, recent work has proposed smoothing out local irregularities in the measured distributions via convolution with a Gaussian kernel. Motivated by the scalability of the smooth framework to high dimensions, we conduct an in-depth study of the structural and statistical behavior of the Gaussian-smoothed $p$-Wasserstein distance $\mathsf{W}_p^{(\sigma)}$, for arbitrary $p\geq 1$. We start by showing that $\mathsf{W}_p^{(\sigma)}$ admits a metric structure that is topologically equivalent to classic $\mathsf{W}_p$ and is stable with respect to perturbations in $\sigma$. Moving to statistical questions, we explore the asymptotic properties of $\mathsf{W}_p^{(\sigma)}(\hat{\mu}_n,\mu)$, where $\hat{\mu}_n$ is the empirical distribution of $n$ i.i.d. samples from $\mu$. To that end, we prove that $\mathsf{W}_p^{(\sigma)}$ is controlled by a $p$th order smooth dual Sobolev norm $\mathsf{d}_p^{(\sigma)}$. Since $\mathsf{d}_p^{(\sigma)}(\hat{\mu}_n,\mu)$ coincides with the supremum of an empirical process indexed by Gaussian-smoothed Sobolev functions, it lends itself well to analysis via empirical process theory. We derive the limit distribution of $\sqrt{n}\mathsf{d}_p^{(\sigma)}(\hat{\mu}_n,\mu)$ in all dimensions $d$, when $\mu$ is sub-Gaussian. Through the aforementioned bound, this implies a parametric empirical convergence rate of $n^{-1/2}$ for $\mathsf{W}_p^{(\sigma)}$, contrasting the $n^{-1/d}$ rate for unsmoothed $\mathsf{W}_p$ when $d \geq 3$. As applications, we provide asymptotic guarantees for two-sample testing and minimum distance estimation. When $p=2$, we further show that $\mathsf{d}_2^{(\sigma)}$ can be expressed as a maximum mean discrepancy.