Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jou-An Chen

BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs

May 04, 2023

Jou-An Chen, Hsin-Hsuan Sung, Xipeng Shen, Sutanay Choudhury, Ang Li

Figure 1 for BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs

Figure 2 for BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs

Figure 3 for BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs

Figure 4 for BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs

Abstract:Recent studies have shown that Binary Graph Neural Networks (GNNs) are promising for saving computations of GNNs through binarized tensors. Prior work, however, mainly focused on algorithm designs or training techniques, leaving it open to how to materialize the performance potential on accelerator hardware fully. This work redesigns the binary GNN inference backend from the efficiency perspective. It fills the gap by proposing a series of abstractions and techniques to map binary GNNs and their computations best to fit the nature of bit manipulations on GPUs. Results on real-world graphs with GCNs, GraphSAGE, and GraphSAINT show that the proposed techniques outperform state-of-the-art binary GNN implementations by 8-22X with the same accuracy maintained. BitGNN code is publicly available.

* To appear in the International Conference on Supercomputing (ICS '23)

Via

Access Paper or Ask Questions

Survey: Exploiting Data Redundancy for Optimization of Deep Learning

Aug 29, 2022

Jou-An Chen, Wei Niu, Bin Ren, Yanzhi Wang, Xipeng Shen

Figure 1 for Survey: Exploiting Data Redundancy for Optimization of Deep Learning

Figure 2 for Survey: Exploiting Data Redundancy for Optimization of Deep Learning

Figure 3 for Survey: Exploiting Data Redundancy for Optimization of Deep Learning

Figure 4 for Survey: Exploiting Data Redundancy for Optimization of Deep Learning

Abstract:Data redundancy is ubiquitous in the inputs and intermediate results of Deep Neural Networks (DNN). It offers many significant opportunities for improving DNN performance and efficiency and has been explored in a large body of work. These studies have scattered in many venues across several years. The targets they focus on range from images to videos and texts, and the techniques they use to detect and exploit data redundancy also vary in many aspects. There is not yet a systematic examination and summary of the many efforts, making it difficult for researchers to get a comprehensive view of the prior work, the state of the art, differences and shared principles, and the areas and directions yet to explore. This article tries to fill the void. It surveys hundreds of recent papers on the topic, introduces a novel taxonomy to put the various techniques into a single categorization framework, offers a comprehensive description of the main methods used for exploiting data redundancy in improving multiple kinds of DNNs on data, and points out a set of research opportunities for future to explore.

Via

Access Paper or Ask Questions

Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

Jul 20, 2020

Wei Niu, Mengshu Sun, Zhengang Li, Jou-An Chen, Jiexiong Guan, Xipeng Shen, Yanzhi Wang, Xue Lin, Bin Ren

Figure 1 for Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

Figure 2 for Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

Figure 3 for Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

Figure 4 for Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

Abstract:Mobile devices are becoming an important carrier for deep learning tasks, as they are being equipped with powerful, high-end mobile CPUs and GPUs. However, it is still a challenging task to execute 3D Convolutional Neural Networks (CNNs) targeting for real-time performance, besides high inference accuracy. The reason is more complex model structure and higher model dimensionality overwhelm the available computation/storage resources on mobile devices. A natural way may be turning to deep learning weight pruning techniques. However, the direct generalization of existing 2D CNN weight pruning methods to 3D CNNs is not ideal for fully exploiting mobile parallelism while achieving high inference accuracy. This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs, seamlessly integrating neural network weight pruning and compiler code generation techniques. We propose and investigate two structured sparsity schemes i.e., the vanilla structured sparsity and kernel group structured (KGS) sparsity that are mobile acceleration friendly. The vanilla sparsity removes whole kernel groups, while KGS sparsity is a more fine-grained structured sparsity that enjoys higher flexibility while exploiting full on-device parallelism. We propose a reweighted regularization pruning algorithm to achieve the proposed sparsity schemes. The inference time speedup due to sparsity is approaching the pruning rate of the whole model FLOPs (floating point operations). RT3D demonstrates up to 29.1$\times$ speedup in end-to-end inference time comparing with current mobile frameworks supporting 3D CNNs, with moderate 1%-1.5% accuracy loss. The end-to-end inference time for 16 video frames could be within 150 ms, when executing representative C3D and R(2+1)D models on a cellphone. For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.

Via

Access Paper or Ask Questions