Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Artur Back de Luca

Learning to Execute Graph Algorithms Exactly with Graph Neural Networks

Jan 30, 2026

Muhammad Fetrat Qharabagh, Artur Back de Luca, George Giapitzakis, Kimon Fountoulakis

Abstract:Understanding what graph neural networks can learn, especially their ability to learn to execute algorithms, remains a central theoretical challenge. In this work, we prove exact learnability results for graph algorithms under bounded-degree and finite-precision constraints. Our approach follows a two-step process. First, we train an ensemble of multi-layer perceptrons (MLPs) to execute the local instructions of a single node. Second, during inference, we use the trained MLP ensemble as the update function within a graph neural network (GNN). Leveraging Neural Tangent Kernel (NTK) theory, we show that local instructions can be learned from a small training set, enabling the complete graph algorithm to be executed during inference without error and with high probability. To illustrate the learning power of our setting, we establish a rigorous learnability result for the LOCAL model of distributed computation. We further demonstrate positive learnability results for widely studied algorithms such as message flooding, breadth-first and depth-first search, and Bellman-Ford.

Via

Access Paper or Ask Questions

Exact Learning of Permutations for Nonzero Binary Inputs with Logarithmic Training Size and Quadratic Ensemble Complexity

Feb 24, 2025

George Giapitzakis, Artur Back de Luca, Kimon Fountoulakis

Abstract:The ability of an architecture to realize permutations is quite fundamental. For example, Large Language Models need to be able to correctly copy (and perhaps rearrange) parts of the input prompt into the output. Classical universal approximation theorems guarantee the existence of parameter configurations that solve this task but offer no insights into whether gradient-based algorithms can find them. In this paper, we address this gap by focusing on two-layer fully connected feed-forward neural networks and the task of learning permutations on nonzero binary inputs. We show that in the infinite width Neural Tangent Kernel (NTK) regime, an ensemble of such networks independently trained with gradient descent on only the $k$ standard basis vectors out of $2^k - 1$ possible inputs successfully learns any fixed permutation of length $k$ with arbitrarily high probability. By analyzing the exact training dynamics, we prove that the network's output converges to a Gaussian process whose mean captures the ground truth permutation via sign-based features. We then demonstrate how averaging these runs (an "ensemble" method) and applying a simple rounding step yields an arbitrarily accurate prediction on any possible input unseen during training. Notably, the number of models needed to achieve exact learning with high probability (which we refer to as ensemble complexity) exhibits a linearithmic dependence on the input size $k$ for a single test input and a quadratic dependence when considering all test inputs simultaneously.

* 21 pages, 1 figure

Via

Access Paper or Ask Questions

Positional Attention: Out-of-Distribution Generalization and Expressivity for Neural Algorithmic Reasoning

Oct 02, 2024

Artur Back de Luca, George Giapitzakis, Shenghao Yang, Petar Veličković, Kimon Fountoulakis

Abstract:There has been a growing interest in the ability of neural networks to solve algorithmic tasks, such as arithmetic, summary statistics, and sorting. While state-of-the-art models like Transformers have demonstrated good generalization performance on in-distribution tasks, their out-of-distribution (OOD) performance is poor when trained end-to-end. In this paper, we focus on value generalization, a common instance of OOD generalization where the test distribution has the same input sequence length as the training distribution, but the value ranges in the training and test distributions do not necessarily overlap. To address this issue, we propose that using fixed positional encodings to determine attention weights-referred to as positional attention-enhances empirical OOD performance while maintaining expressivity. We support our claim about expressivity by proving that Transformers with positional attention can effectively simulate parallel algorithms.

* 37 pages, 22 figures

Via

Access Paper or Ask Questions

Simulation of Graph Algorithms with Looped Transformers

Feb 02, 2024

Artur Back de Luca, Kimon Fountoulakis

Figure 1 for Simulation of Graph Algorithms with Looped Transformers

Figure 2 for Simulation of Graph Algorithms with Looped Transformers

Figure 3 for Simulation of Graph Algorithms with Looped Transformers

Abstract:The execution of graph algorithms using neural networks has recently attracted significant interest due to promising empirical progress. This motivates further understanding of how neural networks can replicate reasoning steps with relational data. In this work, we study the ability of transformer networks to simulate algorithms on graphs from a theoretical perspective. The architecture that we utilize is a looped transformer with extra attention heads that interact with the graph. We prove by construction that this architecture can simulate algorithms such as Dijkstra's shortest path algorithm, Breadth- and Depth-First Search, and Kosaraju's strongly connected components algorithm. The width of the network does not increase with the size of the input graph, which implies that the network can simulate the above algorithms for any graph. Despite this property, we show that there is a limit to simulation in our solution due to finite precision. Finally, we show a Turing Completeness result with constant width when the extra attention heads are utilized.

* 45 pages, 2 figures

Via

Access Paper or Ask Questions

Local Graph Clustering with Noisy Labels

Oct 12, 2023

Artur Back de Luca, Kimon Fountoulakis, Shenghao Yang

Figure 1 for Local Graph Clustering with Noisy Labels

Figure 2 for Local Graph Clustering with Noisy Labels

Figure 3 for Local Graph Clustering with Noisy Labels

Figure 4 for Local Graph Clustering with Noisy Labels

Abstract:The growing interest in machine learning problems over graphs with additional node information such as texts, images, or labels has popularized methods that require the costly operation of processing the entire graph. Yet, little effort has been made to the development of fast local methods (i.e. without accessing the entire graph) that extract useful information from such data. To that end, we propose a study of local graph clustering using noisy node labels as a proxy for additional node information. In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise. Subsequently, a fraction of these labels is flipped. We investigate the benefits of incorporating noisy labels for local graph clustering. By constructing a weighted graph with such labels, we study the performance of graph diffusion-based local clustering method on both the original and the weighted graphs. From a theoretical perspective, we consider recovering an unknown target cluster with a single seed node in a random graph with independent noisy node labels. We provide sufficient conditions on the label noise under which, with high probability, using diffusion in the weighted graph yields a more accurate recovery of the target cluster. This approach proves more effective than using the given labels alone or using diffusion in the label-free original graph. Empirically, we show that reliable node labels can be obtained with just a few samples from an attributed graph. Moreover, utilizing these labels via diffusion in the weighted graph leads to significantly better local clustering performance across several real-world datasets, improving F1 scores by up to 13%.

* 26 pages, 5 figures, 14 tables

Via

Access Paper or Ask Questions

Mitigating Data Heterogeneity in Federated Learning with Data Augmentation

Jun 20, 2022

Artur Back de Luca, Guojun Zhang, Xi Chen, Yaoliang Yu

Figure 1 for Mitigating Data Heterogeneity in Federated Learning with Data Augmentation

Figure 2 for Mitigating Data Heterogeneity in Federated Learning with Data Augmentation

Figure 3 for Mitigating Data Heterogeneity in Federated Learning with Data Augmentation

Figure 4 for Mitigating Data Heterogeneity in Federated Learning with Data Augmentation

Abstract:Federated Learning (FL) is a prominent framework that enables training a centralized model while securing user privacy by fusing local, decentralized models. In this setting, one major obstacle is data heterogeneity, i.e., each client having non-identically and independently distributed (non-IID) data. This is analogous to the context of Domain Generalization (DG), where each client can be treated as a different domain. However, while many approaches in DG tackle data heterogeneity from the algorithmic perspective, recent evidence suggests that data augmentation can induce equal or greater performance. Motivated by this connection, we present federated versions of popular DG algorithms, and show that by applying appropriate data augmentation, we can mitigate data heterogeneity in the federated setting, and obtain higher accuracy on unseen clients. Equipped with data augmentation, we can achieve state-of-the-art performance using even the most basic Federated Averaging algorithm, with much sparser communication.

* 18 pages, 5 figures

Via

Access Paper or Ask Questions