Abstract:Finding anonymization mechanisms to protect personal data is at the heart of machine learning research. Here we consider the consequences of local differential privacy constraints on goodness-of-fit testing, i.e. the statistical problem assessing whether sample points are generated from a fixed density $f_0$, or not. The observations are hidden and replaced by a stochastic transformation satisfying the local differential privacy constraint. In this setting, we propose a new testing procedure which is based on an estimation of the quadratic distance between the density $f$ of the unobserved sample and $f_0$. We establish minimax separation rates for our test over Besov balls. We also provide a lower bound, proving the optimality of our result. To the best of our knowledge, we provide the first minimax optimal test and associated private transformation under a local differential privacy constraint, quantifying the price to pay for data privacy.
Abstract:We consider the closeness testing (or two-sample testing) problem in the Poisson vector model - which is known to be asymptotically equivalent to the model of multinomial distributions. The goal is to distinguish whether two data samples are drawn from the same unspecified distribution, or whether their respective distributions are separated in $L_1$-norm. In this paper, we focus on adapting the rate to the shape of the underlying distributions, i.e. we consider a local minimax setting. We provide, to the best of our knowledge, the first local minimax rate for the separation distance up to logarithmic factors, together with a test that achieves it. In view of the rate, closeness testing turns out to be substantially harder than the related one-sample testing problem over a wide range of cases.