Abstract:We study the problem of quantifying how far an empirical distribution deviates from Gaussianity under the framework of optimal transport. By exploiting the cone geometry of the relative translation invariant quadratic Wasserstein space, we introduce two novel geometric quantities, the relative Wasserstein angle and the orthogonal projection distance, which provide meaningful measures of non-Gaussianity. We prove that the filling cone generated by any two rays in this space is flat, ensuring that angles, projections, and inner products are rigorously well-defined. This geometric viewpoint recasts Gaussian approximation as a projection problem onto the Gaussian cone and reveals that the commonly used moment-matching Gaussian can \emph{not} be the \(W_2\)-nearest Gaussian for a given empirical distribution. In one dimension, we derive closed-form expressions for the proposed quantities and extend them to several classical distribution families, including uniform, Laplace, and logistic distributions; while in high dimensions, we develop an efficient stochastic manifold optimization algorithm based on a semi-discrete dual formulation. Experiments on synthetic data and real-world feature distributions demonstrate that the relative Wasserstein angle is more robust than the Wasserstein distance and that the proposed nearest Gaussian provides a better approximation than moment matching in the evaluation of Fréchet Inception Distance (FID) scores.




Abstract:We introduce a new family of distances, relative-translation invariant Wasserstein distances ($RW_p$), for measuring the similarity of two probability distributions under distribution shift. Generalizing it from the classical optimal transport model, we show that $RW_p$ distances are also real distance metrics defined on the quotient set $\mathcal{P}_p(\mathbb{R}^n)/\sim$ and invariant to distribution translations. When $p=2$, the $RW_2$ distance enjoys more exciting properties, including decomposability of the optimal transport model, translation-invariance of the $RW_2$ distance, and a Pythagorean relationship between $RW_2$ and the classical quadratic Wasserstein distance ($W_2$). Based on these properties, we show that a distribution shift, measured by $W_2$ distance, can be explained in the bias-variance perspective. In addition, we propose a variant of the Sinkhorn algorithm, named $RW_2$ Sinkhorn algorithm, for efficiently calculating $RW_2$ distance, coupling solutions, as well as $W_2$ distance. We also provide the analysis of numerical stability and time complexity for the proposed algorithm. Finally, we validate the $RW_2$ distance metric and the algorithm performance with three experiments. We conduct one numerical validation for the $RW_2$ Sinkhorn algorithm and show two real-world applications demonstrating the effectiveness of using $RW_2$ under distribution shift: digits recognition and similar thunderstorm detection. The experimental results report that our proposed algorithm significantly improves the computational efficiency of Sinkhorn in certain practical applications, and the $RW_2$ distance is robust to distribution translations compared with baselines.