Abstract:We study the problem of learning mixtures of Gaussians with approximate differential privacy. We prove that roughly $kd^2 + k^{1.5} d^{1.75} + k^2 d$ samples suffice to learn a mixture of $k$ arbitrary $d$-dimensional Gaussians up to low total variation distance, with differential privacy. Our work improves over the previous best result [AAL24b] (which required roughly $k^2 d^4$ samples) and is provably optimal when $d$ is much larger than $k^2$. Moreover, we give the first optimal bound for privately learning mixtures of $k$ univariate (i.e., $1$-dimensional) Gaussians. Importantly, we show that the sample complexity for privately learning mixtures of univariate Gaussians is linear in the number of components $k$, whereas the previous best sample complexity [AAL21] was quadratic in $k$. Our algorithms utilize various techniques, including the inverse sensitivity mechanism [AD20b, AD20a, HKMN23], sample compression for distributions [ABDH+20], and methods for bounding volumes of sumsets.
Abstract:We study differentially private (DP) mean estimation in the case where each person holds multiple samples. Commonly referred to as the "user-level" setting, DP here requires the usual notion of distributional stability when all of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show that \[n = \tilde \Theta\left(\frac{d}{\alpha^2 m} + \frac{d }{ \alpha m^{1/2} \varepsilon} + \frac{d}{\alpha^{k/(k-1)} m \varepsilon} + \frac{d}{\varepsilon}\right)\] people are necessary and sufficient to estimate the mean up to distance $\alpha$ in $\ell_2$-norm under $\varepsilon$-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate DP (with slightly degraded sample complexity) and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the well known noisy-clipped-mean approach, but the analysis for our setting requires new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables, and a new argument for bounding the bias introduced by clipping.
Abstract:We study the relationship between adversarial robustness and differential privacy in high-dimensional algorithmic statistics. We give the first black-box reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental high-dimensional parameter estimation problems, including mean and covariance estimation. We show that this reduction can be implemented in polynomial time in some important special cases. In particular, using nearly-optimal polynomial-time robust estimators for the mean and covariance of high-dimensional Gaussians which are based on the Sum-of-Squares method, we design the first polynomial-time private estimators for these problems with nearly-optimal samples-accuracy-privacy tradeoffs. Our algorithms are also robust to a constant fraction of adversarially-corrupted samples.
Abstract:We give the first polynomial-time algorithm to estimate the mean of a $d$-variate probability distribution with bounded covariance from $\tilde{O}(d)$ independent samples subject to pure differential privacy. Prior algorithms for this problem either incur exponential running time, require $\Omega(d^{1.5})$ samples, or satisfy only the weaker concentrated or approximate differential privacy conditions. In particular, all prior polynomial-time algorithms require $d^{1+\Omega(1)}$ samples to guarantee small privacy loss with "cryptographically" high probability, $1-2^{-d^{\Omega(1)}}$, while our algorithm retains $\tilde{O}(d)$ sample complexity even in this stringent setting. Our main technique is a new approach to use the powerful Sum of Squares method (SoS) to design differentially private algorithms. SoS proofs to algorithms is a key theme in numerous recent works in high-dimensional algorithmic statistics -- estimators which apparently require exponential running time but whose analysis can be captured by low-degree Sum of Squares proofs can be automatically turned into polynomial-time algorithms with the same provable guarantees. We demonstrate a similar proofs to private algorithms phenomenon: instances of the workhorse exponential mechanism which apparently require exponential time but which can be analyzed with low-degree SoS proofs can be automatically turned into polynomial-time differentially private algorithms. We prove a meta-theorem capturing this phenomenon, which we expect to be of broad use in private algorithm design. Our techniques also draw new connections between differentially private and robust statistics in high dimensions. In particular, viewed through our proofs-to-private-algorithms lens, several well-studied SoS proofs from recent works in algorithmic robust statistics directly yield key components of our differentially private mean estimation algorithm.