Abstract:This paper considers the problem of community detection on multiple potentially correlated graphs from an information-theoretical perspective. We first put forth a random graph model, called the multi-view stochastic block model (MVSBM), designed to generate correlated graphs on the same set of nodes (with cardinality $n$). The $n$ nodes are partitioned into two disjoint communities of equal size. The presence or absence of edges in the graphs for each pair of nodes depends on whether the two nodes belong to the same community or not. The objective for the learner is to recover the hidden communities with observed graphs. Our technical contributions are two-fold: (i) We establish an information-theoretic upper bound (Theorem~1) showing that exact recovery of community is achievable when the model parameters of MVSBM exceed a certain threshold. (ii) Conversely, we derive an information-theoretic lower bound (Theorem~2) showing that when the model parameters of MVSBM fall below the aforementioned threshold, then for any estimator, the expected number of misclassified nodes will always be greater than one. Our results for the MVSBM recover several prior results for community detection in the standard SBM as well as in multiple independent SBMs as special cases.
Abstract:This paper considers the problem of completing a rating matrix based on sub-sampled matrix entries as well as observed social graphs and hypergraphs. We show that there exists a \emph{sharp threshold} on the sample probability for the task of exactly completing the rating matrix -- the task is achievable when the sample probability is above the threshold, and is impossible otherwise -- demonstrating a phase transition phenomenon. The threshold can be expressed as a function of the ``quality'' of hypergraphs, enabling us to \emph{quantify} the amount of reduction in sample probability due to the exploitation of hypergraphs. This also highlights the usefulness of hypergraphs in the matrix completion problem. En route to discovering the sharp threshold, we develop a computationally efficient matrix completion algorithm that effectively exploits the observed graphs and hypergraphs. Theoretical analyses show that our algorithm succeeds with high probability as long as the sample probability exceeds the aforementioned threshold, and this theoretical result is further validated by synthetic experiments. Moreover, our experiments on a real social network dataset (with both graphs and hypergraphs) show that our algorithm outperforms other state-of-the-art matrix completion algorithms.