Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xianhang Cheng

oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

Jan 03, 2023

Jianhui Li, Zhennan Qin, Yijie Mei, Jingze Cui, Yunfei Song, Ciyong Chen, Yifei Zhang, Longsheng Du, Xianhang Cheng, Baihui Jin(+3 more)

Figure 1 for oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

Figure 2 for oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

Figure 3 for oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

Figure 4 for oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

Abstract:With the rapid development of deep learning models and hardware support for dense computing, the deep learning (DL) workload characteristics changed significantly from a few hot spots on compute-intensive operations to a broad range of operations scattered across the models. Accelerating a few compute-intensive operations using the expert-tuned implementation of primitives does not fully exploit the performance potential of AI hardware. Various efforts are made to compile a full deep neural network (DNN) graph. One of the biggest challenges is to achieve end-to-end compilation by generating expert-level performance code for the dense compute-intensive operations and applying compilation optimization at the scope of DNN computation graph across multiple compute-intensive operations. We present oneDNN Graph Compiler, a tensor compiler that employs a hybrid approach of using techniques from both compiler optimization and expert-tuned kernels for high-performance code generation of the deep neural network graph. oneDNN Graph Compiler addresses unique optimization challenges in the deep learning domain, such as low-precision computation, aggressive fusion, optimization for static tensor shapes and memory layout, constant weight optimization, and memory buffer reuse. Experimental results demonstrate up to 2x performance gains over primitives-based optimization for performance-critical DNN computation graph patterns on Intel Xeon Scalable Processors.

* 12 pages excluding reference, 8 figures, 1 table. concurrently submitted to OSDI 2023

Via

Access Paper or Ask Questions

Predicate correlation learning for scene graph generation

Jul 06, 2021

Leitian Tao, Li Mi, Nannan Li, Xianhang Cheng, Yaosi Hu, Zhenzhong Chen

Figure 1 for Predicate correlation learning for scene graph generation

Figure 2 for Predicate correlation learning for scene graph generation

Figure 3 for Predicate correlation learning for scene graph generation

Figure 4 for Predicate correlation learning for scene graph generation

Abstract:For a typical Scene Graph Generation (SGG) method, there is often a large gap in the performance of the predicates' head classes and tail classes. This phenomenon is mainly caused by the semantic overlap between different predicates as well as the long-tailed data distribution. In this paper, a Predicate Correlation Learning (PCL) method for SGG is proposed to address the above two problems by taking the correlation between predicates into consideration. To describe the semantic overlap between strong-correlated predicate classes, a Predicate Correlation Matrix (PCM) is defined to quantify the relationship between predicate pairs, which is dynamically updated to remove the matrix's long-tailed bias. In addition, PCM is integrated into a Predicate Correlation Loss function ($L_{PC}$) to reduce discouraging gradients of unannotated classes. The proposed method is evaluated on Visual Genome benchmark, where the performance of the tail classes is significantly improved when built on the existing methods.

Via

Access Paper or Ask Questions

Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution

Jun 15, 2020

Xianhang Cheng, Zhenzhong Chen

Figure 1 for Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution

Figure 2 for Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution

Figure 3 for Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution

Figure 4 for Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution

Abstract:Generating non-existing frames from a consecutive video sequence has been an interesting and challenging problem in the video processing field. Recent kernel-based interpolation methods predict pixels with a single convolution process that convolves source frames with spatially adaptive local kernels. However, when scene motion is larger than the pre-defined kernel size, these methods are prone to yield less plausible results and they cannot directly generate a frame at an arbitrary temporal position because the learned kernels are tied to the midpoint in time between the input frames. In this paper, we try to solve these problems and propose a novel approach that we refer to as enhanced deformable separable convolution (EDSC) to estimate not only adaptive kernels, but also offsets, masks and biases to make the network obtain information from non-local neighborhood. During the learning process, different intermediate time step can be involved as a control variable by means of the coord-conv trick, allowing the estimated components to vary with different input temporal information. This makes our method capable to produce multiple in-between frames. Furthermore, we investigate the relationships between our method and other typical kernel- and flow-based methods. Experimental results show that our method performs favorably against the state-of-the-art methods across a broad range of datasets. Code will be publicly available on URL: \url{https://github.com/Xianhang/EDSC-pytorch}.

Via

Access Paper or Ask Questions