Abstract:LeCam's two-point testing method yields perhaps the simplest lower bound for estimating the mean of a distribution: roughly, if it is impossible to well-distinguish a distribution centered at $\mu$ from the same distribution centered at $\mu+\Delta$, then it is impossible to estimate the mean by better than $\Delta/2$. It is setting-dependent whether or not a nearly matching upper bound is attainable. We study the conditions under which the two-point testing lower bound can be attained for univariate mean estimation; both in the setting of location estimation (where the distribution is known up to translation) and adaptive location estimation (unknown distribution). Roughly, we will say an estimate nearly attains the two-point testing lower bound if it incurs error that is at most polylogarithmically larger than the Hellinger modulus of continuity for $\tilde{\Omega}(n)$ samples. Adaptive location estimation is particularly interesting as some distributions admit much better guarantees than sub-Gaussian rates (e.g. $\operatorname{Unif}(\mu-1,\mu+1)$ permits error $\Theta(\frac{1}{n})$, while the sub-Gaussian rate is $\Theta(\frac{1}{\sqrt{n}})$), yet it is not obvious whether these rates may be adaptively attained by one unified approach. Our main result designs an algorithm that nearly attains the two-point testing rate for mixtures of symmetric, log-concave distributions with a common mean. Moreover, this algorithm runs in near-linear time and is parameter-free. In contrast, we show the two-point testing rate is not nearly attainable even for symmetric, unimodal distributions. We complement this with results for location estimation, showing the two-point testing rate is nearly attainable for unimodal distributions, but unattainable for symmetric distributions.
Abstract:Given data drawn from a collection of Gaussian variables with a common mean but different and unknown variances, what is the best algorithm for estimating their common mean? We present an intuitive and efficient algorithm for this task. As different closed-form guarantees can be hard to compare, the Subset-of-Signals model serves as a benchmark for heteroskedastic mean estimation: given $n$ Gaussian variables with an unknown subset of $m$ variables having variance bounded by 1, what is the optimal estimation error as a function of $n$ and $m$? Our algorithm resolves this open question up to logarithmic factors, improving upon the previous best known estimation error by polynomial factors when $m = n^c$ for all $0<c<1$. Of particular note, we obtain error $o(1)$ with $m = \tilde{O}(n^{1/4})$ variance-bounded samples, whereas previous work required $m = \tilde{\Omega}(n^{1/2})$. Finally, we show that in the multi-dimensional setting, even for $d=2$, our techniques enable rates comparable to knowing the variance of each sample.
Abstract:Entropic causal inference is a framework for inferring the causal direction between two categorical variables from observational data. The central assumption is that the amount of unobserved randomness in the system is not too large. This unobserved randomness is measured by the entropy of the exogenous variable in the underlying structural causal model, which governs the causal relation between the observed variables. Kocaoglu et al. conjectured that the causal direction is identifiable when the entropy of the exogenous variable is not too large. In this paper, we prove a variant of their conjecture. Namely, we show that for almost all causal models where the exogenous variable has entropy that does not scale with the number of states of the observed variables, the causal direction is identifiable from observational data. We also consider the minimum entropy coupling-based algorithmic approach presented by Kocaoglu et al., and for the first time demonstrate algorithmic identifiability guarantees using a finite number of samples. We conduct extensive experiments to evaluate the robustness of the method to relaxing some of the assumptions in our theory and demonstrate that both the constant-entropy exogenous variable and the no latent confounder assumptions can be relaxed in practice. We also empirically characterize the number of observational samples needed for causal identification. Finally, we apply the algorithm on Tuebingen cause-effect pairs dataset.