Abstract:This paper approaches the unsupervised learning problem by minimizing the second-order Wasserstein loss (the $W_2$ loss). The minimization is characterized by a distribution-dependent ordinary differential equation (ODE), whose dynamics involves the Kantorovich potential between a current estimated distribution and the true data distribution. A main result shows that the time-marginal law of the ODE converges exponentially to the true data distribution. To prove that the ODE has a unique solution, we first construct explicitly a solution to the associated nonlinear Fokker-Planck equation and show that it coincides with the unique gradient flow for the $W_2$ loss. Based on this, a unique solution to the ODE is built from Trevisan's superposition principle and the exponential convergence results. An Euler scheme is proposed for the distribution-dependent ODE and it is shown to correctly recover the gradient flow for the $W_2$ loss in the limit. An algorithm is designed by following the scheme and applying persistent training, which is natural in our gradient-flow framework. In both low- and high-dimensional experiments, our algorithm converges much faster than and outperforms Wasserstein generative adversarial networks, by increasing the level of persistent training appropriately.
Abstract:We propose a new theoretical lens to view Wasserstein generative adversarial networks (WGANs). In our framework, we define a discretization inspired by a distribution-dependent ordinary differential equation (ODE). We show that such a discretization is convergent and propose a viable class of adversarial training methods to implement this discretization, which we call W1 Forward Euler (W1-FE). In particular, the ODE framework allows us to implement persistent training, a novel training technique that cannot be applied to typical WGAN algorithms without the ODE interpretation. Remarkably, when we do not implement persistent training, we prove that our algorithms simplify to existing WGAN algorithms; when we increase the level of persistent training appropriately, our algorithms outperform existing WGAN algorithms in both low- and high-dimensional examples.
Abstract:This paper approaches the unsupervised learning problem by gradient descent in the space of probability density functions. Our main result shows that along the gradient flow induced by a distribution-dependent ordinary differential equation (ODE), the unknown data distribution emerges as the long-time limit of this flow of densities. That is, one can uncover the data distribution by simulating the distribution-dependent ODE. Intriguingly, we find that the simulation of the ODE is equivalent to the training of generative adversarial networks (GANs). The GAN framework, by definition a non-cooperative game between a generator and a discriminator, can therefore be viewed alternatively as a cooperative game between a navigator and a calibrator (in collaboration to simulate the ODE). At the theoretic level, this new perspective simplifies the analysis of GANs and gives new insight into their performance. To construct a solution to the distribution-dependent ODE, we first show that the associated nonlinear Fokker-Planck equation has a unique weak solution, using the Crandall-Liggett theorem for differential equations in Banach spaces. From this solution to the Fokker-Planck equation, we construct a unique solution to the ODE, relying on Trevisan's superposition principle. The convergence of the induced gradient flow to the data distribution is obtained by analyzing the Fokker-Planck equation.