Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

Oct 30, 2024

Aishwarya Sarkar, Sayan Ghosh, Nathan R. Tallent, Ali Jannesari

Share this with someone who'll enjoy it:

Abstract:Graph Neural Networks (GNN) are indispensable in learning from graph-structured data, yet their rising computational costs, especially on massively connected graphs, pose significant challenges in terms of execution performance. To tackle this, distributed-memory solutions such as partitioning the graph to concurrently train multiple replicas of GNNs are in practice. However, approaches requiring a partitioned graph usually suffer from communication overhead and load imbalance, even under optimal partitioning and communication strategies due to irregularities in the neighborhood minibatch sampling. This paper proposes practical trade-offs for improving the sampling and communication overheads for representation learning on distributed graphs (using popular GraphSAGE architecture) by developing a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework, demonstrating about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer for various OGB datasets.

* In Proc. of the IEEE International Conference on Cluster Computing (CLUSTER), 2024

View paper on

Share this with someone who'll enjoy it:

Title:MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

Paper and Code