Abstract:Variational inference (VI) is a computationally efficient and scalable methodology for approximate Bayesian inference. It strikes a balance between accuracy of uncertainty quantification and practical tractability. It excels at generative modelling and inversion tasks due to its built-in Bayesian regularisation and flexibility, essential qualities for physics related problems. Deriving the central learning objective for VI must often be tailored to new learning tasks where the nature of the problems dictates the conditional dependence between variables of interest, such as arising in physics problems. In this paper, we provide an accessible and thorough technical introduction to VI for forward and inverse problems, guiding the reader through standard derivations of the VI framework and how it can best be realized through deep learning. We then review and unify recent literature exemplifying the creative flexibility allowed by VI. This paper is designed for a general scientific audience looking to solve physics-based problems with an emphasis on uncertainty quantification.
Abstract:This paper introduces and analyses interacting underdamped Langevin algorithms, termed Kinetic Interacting Particle Langevin Monte Carlo (KIPLMC) methods, for statistical inference in latent variable models. We propose a diffusion process that evolves jointly in the space of parameters and latent variables and exploit the fact that the stationary distribution of this diffusion concentrates around the maximum marginal likelihood estimate of the parameters. We then provide two explicit discretisations of this diffusion as practical algorithms to estimate parameters of statistical models. For each algorithm, we obtain nonasymptotic rates of convergence for the case where the joint log-likelihood is strongly concave with respect to latent variables and parameters. In particular, we provide convergence analysis for the diffusion together with the discretisation error, providing convergence rate estimates for the algorithms in Wasserstein-2 distance. To demonstrate the utility of the introduced methodology, we provide numerical experiments that demonstrate the effectiveness of the proposed diffusion for statistical inference and the stability of the numerical integrators utilised for discretisation. Our setting covers a broad number of applications, including unsupervised learning, statistical inference, and inverse problems.
Abstract:We introduce a class of algorithms, termed Proximal Interacting Particle Langevin Algorithms (PIPLA), for inference and learning in latent variable models whose joint probability density is non-differentiable. Leveraging proximal Markov chain Monte Carlo (MCMC) techniques and the recently introduced interacting particle Langevin algorithm (IPLA), we propose several variants within the novel proximal IPLA family, tailored to the problem of estimating parameters in a non-differentiable statistical model. We prove nonasymptotic bounds for the parameter estimates produced by multiple algorithms in the strongly log-concave setting and provide comprehensive numerical experiments on various models to demonstrate the effectiveness of the proposed methods. In particular, we demonstrate the utility of the proposed family of algorithms on a toy hierarchical example where our assumptions can be checked, as well as on the problems of sparse Bayesian logistic regression, sparse Bayesian neural network, and sparse matrix completion. Our theory and experiments together show that PIPLA family can be the de facto choice for parameter estimation problems in latent variable models for non-differentiable models.
Abstract:Bayesian inversion is central to the quantification of uncertainty within problems arising from numerous applications in science and engineering. To formulate the approach, four ingredients are required: a forward model mapping the unknown parameter to an element of a solution space, often the solution space for a differential equation; an observation operator mapping an element of the solution space to the data space; a noise model describing how noise pollutes the observations; and a prior model describing knowledge about the unknown parameter before the data is acquired. This paper is concerned with learning the prior model from data; in particular, learning the prior from multiple realizations of indirect data obtained through the noisy observation process. The prior is represented, using a generative model, as the pushforward of a Gaussian in a latent space; the pushforward map is learned by minimizing an appropriate loss function. A metric that is well-defined under empirical approximation is used to define the loss function for the pushforward map to make an implementable methodology. Furthermore, an efficient residual-based neural operator approximation of the forward model is proposed and it is shown that this may be learned concurrently with the pushforward map, using a bilevel optimization formulation of the problem; this use of neural operator approximation has the potential to make prior learning from indirect data more computationally efficient, especially when the observation process is expensive, non-smooth or not known. The ideas are illustrated with the Darcy flow inverse problem of finding permeability from piezometric head measurements.