Abstract:We study the problem of estimating the score function of an unknown probability distribution $\rho^*$ from $n$ independent and identically distributed observations in $d$ dimensions. Assuming that $\rho^*$ is subgaussian and has a Lipschitz-continuous score function $s^*$, we establish the optimal rate of $\tilde \Theta(n^{-\frac{2}{d+4}})$ for this estimation problem under the loss function $\|\hat s - s^*\|^2_{L^2(\rho^*)}$ that is commonly used in the score matching literature, highlighting the curse of dimensionality where sample complexity for accurate score estimation grows exponentially with the dimension $d$. Leveraging key insights in empirical Bayes theory as well as a new convergence rate of smoothed empirical distribution in Hellinger distance, we show that a regularized score estimator based on a Gaussian kernel attains this rate, shown optimal by a matching minimax lower bound. We also discuss the implication of our theory on the sample complexity of score-based generative models.
Abstract:We study the Inexact Langevin Algorithm (ILA) for sampling using estimated score function when the target distribution satisfies log-Sobolev inequality (LSI), motivated by Score-based Generative Modeling (SGM). We prove a long-term convergence in Kullback-Leibler (KL) divergence under a sufficient assumption that the error of the score estimator has a bounded Moment Generating Function (MGF). Our assumption is weaker than $L^\infty$ (which is too strong to hold in practice) and stronger than $L^2$ error assumption, which we show not sufficient to guarantee convergence in general. Under the $L^\infty$ error assumption, we additionally prove convergence in R\'enyi divergence, which is stronger than KL divergence. We then study how to get a provably accurate score estimator which satisfies bounded MGF assumption for LSI target distributions, by using an estimator based on kernel density estimation. Together with the convergence results, we yield the first end-to-end convergence guarantee for ILA in the population level. Last, we generalize our convergence analysis to SGM and derive a complexity guarantee in KL divergence for data satisfying LSI under MGF-accurate score estimator.