Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuchen Zhong

GNNFlow: A Distributed Framework for Continuous Temporal GNN Learning on Dynamic Graphs

Nov 30, 2023

Yuchen Zhong, Guangming Sheng, Tianzuo Qin, Minjie Wang, Quan Gan, Chuan Wu

Figure 1 for GNNFlow: A Distributed Framework for Continuous Temporal GNN Learning on Dynamic Graphs

Figure 2 for GNNFlow: A Distributed Framework for Continuous Temporal GNN Learning on Dynamic Graphs

Figure 3 for GNNFlow: A Distributed Framework for Continuous Temporal GNN Learning on Dynamic Graphs

Figure 4 for GNNFlow: A Distributed Framework for Continuous Temporal GNN Learning on Dynamic Graphs

Abstract:Graph Neural Networks (GNNs) play a crucial role in various fields. However, most existing deep graph learning frameworks assume pre-stored static graphs and do not support training on graph streams. In contrast, many real-world graphs are dynamic and contain time domain information. We introduce GNNFlow, a distributed framework that enables efficient continuous temporal graph representation learning on dynamic graphs on multi-GPU machines. GNNFlow introduces an adaptive time-indexed block-based data structure that effectively balances memory usage with graph update and sampling operation efficiency. It features a hybrid GPU-CPU graph data placement for rapid GPU-based temporal neighborhood sampling and kernel optimizations for enhanced sampling processes. A dynamic GPU cache for node and edge features is developed to maximize cache hit rates through reuse and restoration strategies. GNNFlow supports distributed training across multiple machines with static scheduling to ensure load balance. We implement GNNFlow based on DGL and PyTorch. Our experimental results show that GNNFlow provides up to 21.1x faster continuous learning than existing systems.

Via

Access Paper or Ask Questions

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

May 18, 2022

Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo

Figure 1 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

Figure 2 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

Figure 3 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

Figure 4 for dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

Abstract:Distributed training using multiple devices (e.g., GPUs) has been widely adopted for learning DNN models over large datasets. However, the performance of large-scale distributed training tends to be far from linear speed-up in practice. Given the complexity of distributed systems, it is challenging to identify the root cause(s) of inefficiency and exercise effective performance optimizations when unexpected low training speed occurs. To date, there exists no software tool which diagnoses performance issues and helps expedite distributed DNN training, while the training can be run using different deep learning frameworks. This paper proposes dPRO, a toolkit that includes: (1) an efficient profiler that collects runtime traces of distributed DNN training across multiple frameworks, especially fine-grained communication traces, and constructs global data flow graphs including detailed communication operations for accurate replay; (2) an optimizer that effectively identifies performance bottlenecks and explores optimization strategies (from computation, communication, and memory aspects) for training acceleration. We implement dPRO on multiple deep learning frameworks (TensorFlow, MXNet) and representative communication schemes (AllReduce and Parameter Server). Extensive experiments show that dPRO predicts the performance of distributed training in various settings with < 5% errors in most cases and finds optimization strategies with up to 3.48x speed-up over the baselines.

* Accepted by MLSys 2022

Via

Access Paper or Ask Questions

Compressed Communication for Distributed Training: Adaptive Methods and System

May 17, 2021

Yuchen Zhong, Cong Xie, Shuai Zheng, Haibin Lin

Figure 1 for Compressed Communication for Distributed Training: Adaptive Methods and System

Figure 2 for Compressed Communication for Distributed Training: Adaptive Methods and System

Figure 3 for Compressed Communication for Distributed Training: Adaptive Methods and System

Figure 4 for Compressed Communication for Distributed Training: Adaptive Methods and System

Abstract:Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training. However, there is little understanding of applying gradient compression to adaptive gradient methods. Moreover, its performance benefits are often limited by the non-negligible compression overhead. In this paper, we first introduce a novel adaptive gradient method with gradient compression. We show that the proposed method has a convergence rate of $\mathcal{O}(1/\sqrt{T})$ for non-convex problems. In addition, we develop a scalable system called BytePS-Compress for two-way compression, where the gradients are compressed in both directions between workers and parameter servers. BytePS-Compress pipelines the compression and decompression on CPUs and achieves a high degree of parallelism. Empirical evaluations show that we improve the training time of ResNet50, VGG16, and BERT-base by 5.0%, 58.1%, 23.3%, respectively, without any accuracy loss with 25 Gb/s networking. Furthermore, for training the BERT models, we achieve a compression rate of 333x compared to the mixed-precision training.

Via

Access Paper or Ask Questions

Thin Structure Estimation with Curvature Regularization

Sep 16, 2015

Dmitrii Marin, Yuri Boykov, Yuchen Zhong

Figure 1 for Thin Structure Estimation with Curvature Regularization

Figure 2 for Thin Structure Estimation with Curvature Regularization

Figure 3 for Thin Structure Estimation with Curvature Regularization

Figure 4 for Thin Structure Estimation with Curvature Regularization

Abstract:Many applications in vision require estimation of thin structures such as boundary edges, surfaces, roads, blood vessels, neurons, etc. Unlike most previous approaches, we simultaneously detect and delineate thin structures with sub-pixel localization and real-valued orientation estimation. This is an ill-posed problem that requires regularization. We propose an objective function combining detection likelihoods with a prior minimizing curvature of the center-lines or surfaces. Unlike simple block-coordinate descent, we develop a novel algorithm that is able to perform joint optimization of location and detection variables more effectively. Our lower bound optimization algorithm applies to quadratic or absolute curvature. The proposed early vision framework is sufficiently general and it can be used in many higher-level applications. We illustrate the advantage of our approach on a range of 2D and 3D examples.

* The IEEE International Conference on Computer Vision (ICCV), 2015, pp. 397-405
* D. Marin, Y. Zhong, M. Drangova, Y. Boykov. Thin Structure Estimation with Curvature Regularization. International Conference on Computer Vision (ICCV), Santiago, Chili, December 2015, to appear

Via

Access Paper or Ask Questions