Abstract:We introduce the problems of goodness-of-fit and two-sample testing of the latent community structure in a 2-community, symmetric, stochastic block model (SBM), in the regime where recovery of the structure is difficult. The latter problem may be described as follows: let $x,y$ be two latent community partitions. Given graphs $G,H$ drawn according to SBMs with partitions $x,y$, respectively, we wish to test the hypothesis $x = y$ against $d(x,y) \ge s,$ for a given Hamming distortion parameter $s \ll n$. Prior work showed that `partial' recovery of these partitions up to distortion $s$ with vanishing error probability requires that the signal-to-noise ratio $(\mathrm{SNR})$ is $\gtrsim C \log (n/s).$ We prove by constructing simple schemes that if $s \gg \sqrt{n \log n},$ then these testing problems can be solved even if $\mathrm{SNR} = O(1).$ For $s = o(\sqrt{n}),$ and constant order degrees, we show via an information-theoretic lower bound that both testing problems require $\mathrm{SNR} = \Omega(\log(n)),$ and thus at this scale the na\"{i}ve scheme of learning the communities and comparing them is minimax optimal up to constant factors. These results are augmented by simulations of goodness-of-fit and two-sample testing for standard SBMs as well as for Gaussian Markov random fields with underlying SBM structure.
Abstract:The change detection problem is to determine if the Markov network structures of two Markov random fields differ from one another given two sets of samples drawn from the respective underlying distributions. We study the trade-off between the sample sizes and the reliability of change detection, measured as a minimax risk, for the important cases of the Ising models and the Gaussian Markov random fields restricted to the models which have network structures with $p$ nodes and degree at most $d$, and obtain information-theoretic lower bounds for reliable change detection over these models. We show that for the Ising model, $\Omega\left(\frac{d^2}{(\log d)^2}\log p\right)$ samples are required from each dataset to detect even the sparsest possible changes, and that for the Gaussian, $\Omega\left( \gamma^{-2} \log(p)\right)$ samples are required from each dataset to detect change, where $\gamma$ is the smallest ratio of off-diagonal to diagonal terms in the precision matrices of the distributions. These bounds are compared to the corresponding results in structure learning, and closely match them under mild conditions on the model parameters. Thus, our change detection bounds inherit partial tightness from the structure learning schemes in previous literature, demonstrating that in certain parameter regimes, the naive structure learning based approach to change detection is minimax optimal up to constant factors.