Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pradeep Kr. Banerjee

Oversquashing in GNNs through the lens of information contraction and graph expansion

Aug 06, 2022

Pradeep Kr. Banerjee, Kedar Karhadkar, Yu Guang Wang, Uri Alon, Guido Montúfar

Abstract:The quality of signal propagation in message-passing graph neural networks (GNNs) strongly influences their expressivity as has been observed in recent works. In particular, for prediction tasks relying on long-range interactions, recursive aggregation of node features can lead to an undesired phenomenon called "oversquashing". We present a framework for analyzing oversquashing based on information contraction. Our analysis is guided by a model of reliable computation due to von Neumann that lends a new insight into oversquashing as signal quenching in noisy computation graphs. Building on this, we propose a graph rewiring algorithm aimed at alleviating oversquashing. Our algorithm employs a random local edge flip primitive motivated by an expander graph construction. We compare the spectral expansion properties of our algorithm with that of an existing curvature-based non-local rewiring strategy. Synthetic experiments show that while our algorithm in general has a slower rate of expansion, it is overall computationally cheaper, preserves the node degrees exactly and never disconnects the graph.

* 8 pages, 5 figures; Accepted at the 58th Annual Allerton Conference on Communication, Control, and Computing

Via

Access Paper or Ask Questions

Learning curves for Gaussian process regression with power-law priors and targets

Oct 23, 2021

Hui Jin, Pradeep Kr. Banerjee, Guido Montúfar

Figure 1 for Learning curves for Gaussian process regression with power-law priors and targets

Figure 2 for Learning curves for Gaussian process regression with power-law priors and targets

Abstract:We study the power-law asymptotics of learning curves for Gaussian process regression (GPR). When the eigenspectrum of the prior decays with rate $\alpha$ and the eigenexpansion coefficients of the target function decay with rate $\beta$, we show that the generalization error behaves as $\tilde O(n^{\max\{\frac{1}{\alpha}-1, \frac{1-2\beta}{\alpha}\}})$ with high probability over the draw of $n$ input samples. Under similar assumptions, we show that the generalization error of kernel ridge regression (KRR) has the same asymptotics. Infinitely wide neural networks can be related to KRR with respect to the neural tangent kernel (NTK), which in several cases is known to have a power-law spectrum. Hence our methods can be applied to study the generalization error of infinitely wide neural networks. We present toy experiments demonstrating the theory.

* 51 pages, 1 table, 1 figure

Via

Access Paper or Ask Questions

Information Complexity and Generalization Bounds

May 04, 2021

Pradeep Kr. Banerjee, Guido Montúfar

Abstract:We present a unifying picture of PAC-Bayesian and mutual information-based upper bounds on the generalization error of randomized learning algorithms. As we show, Tong Zhang's information exponential inequality (IEI) gives a general recipe for constructing bounds of both flavors. We show that several important results in the literature can be obtained as simple corollaries of the IEI under different assumptions on the loss function. Moreover, we obtain new bounds for data-dependent priors and unbounded loss functions. Optimizing the bounds gives rise to variants of the Gibbs algorithm, for which we discuss two practical examples for learning with neural networks, namely, Entropy- and PAC-Bayes- SGD. Further, we use an Occam's factor argument to show a PAC-Bayesian bound that incorporates second-order curvature information of the training loss.

* Accepted for presentation at 2021 IEEE International Symposium on Information Theory (ISIT); 22 pages

Via

Access Paper or Ask Questions

The Variational Deficiency Bottleneck

Oct 27, 2018

Pradeep Kr. Banerjee, Guido Montúfar

Figure 1 for The Variational Deficiency Bottleneck

Figure 2 for The Variational Deficiency Bottleneck

Figure 3 for The Variational Deficiency Bottleneck

Figure 4 for The Variational Deficiency Bottleneck

Abstract:We introduce a bottleneck method for learning data representations based on channel deficiency, rather than the more traditional information sufficiency. A variational upper bound allows us to implement this method efficiently. The bound itself is bounded above by the variational information bottleneck objective, and the two methods coincide in the regime of single-shot Monte Carlo approximations. The notion of deficiency provides a principled way of approximating complicated channels by relatively simpler ones. The deficiency of one channel w.r.t. another has an operational interpretation in terms of the optimal risk gap of decision problems, capturing classification as a special case. Unsupervised generalizations are possible, such as the deficiency autoencoder, which can also be formulated in a variational form. Experiments demonstrate that the deficiency bottleneck can provide advantages in terms of minimal sufficiency as measured by information bottleneck curves, while retaining a good test performance in classification and reconstruction tasks.

* 20 pages, 7 figures

Via

Access Paper or Ask Questions