Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction

Aug 25, 2023

Guangji Bai, Ziyang Yu, Zheng Chai, Yue Cheng, Liang Zhao

Figure 1 for Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction

Figure 2 for Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction

Figure 3 for Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction

Figure 4 for Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction

Share this with someone who'll enjoy it:

Abstract:Despite the recent success of Graph Neural Networks (GNNs), it remains challenging to train GNNs on large-scale graphs due to neighbor explosions. As a remedy, distributed computing becomes a promising solution by leveraging abundant computing resources (e.g., GPU). However, the node dependency of graph data increases the difficulty of achieving high concurrency in distributed GNN training, which suffers from the massive communication overhead. To address it, Historical value approximation is deemed a promising class of distributed training techniques. It utilizes an offline memory to cache historical information (e.g., node embedding) as an affordable approximation of the exact value and achieves high concurrency. However, such benefits come at the cost of involving dated training information, leading to staleness, imprecision, and convergence issues. To overcome these challenges, this paper proposes SAT (Staleness-Alleviated Training), a novel and scalable distributed GNN training framework that reduces the embedding staleness adaptively. The key idea of SAT is to model the GNN's embedding evolution as a temporal graph and build a model upon it to predict future embedding, which effectively alleviates the staleness of the cached historical embedding. We propose an online algorithm to train the embedding predictor and the distributed GNN alternatively and further provide a convergence analysis. Empirically, we demonstrate that SAT can effectively reduce embedding staleness and thus achieve better performance and convergence speed on multiple large-scale graph datasets.

* Preprint. Do not distribute. arXiv admin note: text overlap with arXiv:2206.00057

View paper on

Share this with someone who'll enjoy it:

Title:Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction

Paper and Code