Abstract:We propose a novel class of statistical divergences called \textit{Relaxed Wasserstein} (RW) divergence. RW divergence generalizes Wasserstein divergence and is parametrized by a class of strictly convex and differentiable functions. We establish for RW divergence several probabilistic properties, which are critical for the success of Wasserstein divergence. In particular, we show that RW divergence is dominated by Total Variation (TV) and Wasserstein-$L^2$ divergence, and that RW divergence has continuity, differentiability and duality representation. Finally, we provide a nonasymptotic moment estimate and a concentration inequality for RW divergence. Our experiments on the image generation task demonstrate that RW divergence is a suitable choice for GANs. Indeed, the performance of RWGANs with Kullback-Leibler (KL) divergence is very competitive with other state-of-the-art GANs approaches. Furthermore, RWGANs possess better convergence properties than the existing WGANs with competitive inception scores. To the best of our knowledge, our new conceptual framework is the first to not only provide the flexibility in designing effective GANs scheme, but also the possibility in studying different losses under a unified mathematical framework.
Abstract:Construction of ambiguity set in robust optimization relies on the choice of divergences between probability distributions. In distribution learning, choosing appropriate probability distributions based on observed data is critical for approximating the true distribution. To improve the performance of machine learning models, there has recently been interest in designing objective functions based on Lp-Wasserstein distance rather than the classical Kullback-Leibler (KL) divergence. In this paper, we derive concentration and asymptotic results using Bregman divergence. We propose a novel asymmetric statistical divergence called Wasserstein-Bregman divergence as a generalization of L2-Wasserstein distance. We discuss how these results can be applied to the construction of ambiguity set in robust optimization.