Abstract:We study the problem of semi-supervised learning on graphs in the regime where data labels are scarce or possibly corrupted. We propose an approach called $p$-conductance learning that generalizes the $p$-Laplace and Poisson learning methods by introducing an objective reminiscent of $p$-Laplacian regularization and an affine relaxation of the label constraints. This leads to a family of probability measure mincut programs that balance sparse edge removal with accurate distribution separation. Our theoretical analysis connects these programs to well-known variational and probabilistic problems on graphs (including randomized cuts, effective resistance, and Wasserstein distance) and provides motivation for robustness when labels are diffused via the heat kernel. Computationally, we develop a semismooth Newton-conjugate gradient algorithm and extend it to incorporate class-size estimates when converting the continuous solutions into label assignments. Empirical results on computer vision and citation datasets demonstrate that our approach achieves state-of-the-art accuracy in low label-rate, corrupted-label, and partial-label regimes.
Abstract:We consider graphs where edges and their signs are added independently at random from among all pairs of nodes. We establish strong concentration inequalities for adjacency and Laplacian matrices obtained from this family of random graph models. Then, we apply our results to study graphs sampled from the signed stochastic block model. Namely, we take a two-community setting where edges within the communities have positive signs and edges between the communities have negative signs and apply a random sign perturbation with probability $0< s <1/2$. In this setting, our findings include: first, the spectral gap of the corresponding signed Laplacian matrix concentrates near $2s$ with high probability; and second, the sign of the first eigenvector of the Laplacian matrix defines a weakly consistent estimator for the balanced community detection problem, or equivalently, the $\pm 1$ synchronization problem. We supplement our theoretical contributions with experimental data obtained from the models under consideration.