Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yunfei Song

oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

Jan 03, 2023

Jianhui Li, Zhennan Qin, Yijie Mei, Jingze Cui, Yunfei Song, Ciyong Chen, Yifei Zhang, Longsheng Du, Xianhang Cheng, Baihui Jin(+3 more)

Figure 1 for oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

Figure 2 for oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

Figure 3 for oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

Figure 4 for oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

Abstract:With the rapid development of deep learning models and hardware support for dense computing, the deep learning (DL) workload characteristics changed significantly from a few hot spots on compute-intensive operations to a broad range of operations scattered across the models. Accelerating a few compute-intensive operations using the expert-tuned implementation of primitives does not fully exploit the performance potential of AI hardware. Various efforts are made to compile a full deep neural network (DNN) graph. One of the biggest challenges is to achieve end-to-end compilation by generating expert-level performance code for the dense compute-intensive operations and applying compilation optimization at the scope of DNN computation graph across multiple compute-intensive operations. We present oneDNN Graph Compiler, a tensor compiler that employs a hybrid approach of using techniques from both compiler optimization and expert-tuned kernels for high-performance code generation of the deep neural network graph. oneDNN Graph Compiler addresses unique optimization challenges in the deep learning domain, such as low-precision computation, aggressive fusion, optimization for static tensor shapes and memory layout, constant weight optimization, and memory buffer reuse. Experimental results demonstrate up to 2x performance gains over primitives-based optimization for performance-critical DNN computation graph patterns on Intel Xeon Scalable Processors.

* 12 pages excluding reference, 8 figures, 1 table. concurrently submitted to OSDI 2023

Via

Access Paper or Ask Questions

FDA3 : Federated Defense Against Adversarial Attacks for Cloud-Based IIoT Applications

Jun 28, 2020

Yunfei Song, Tian Liu, Tongquan Wei, Xiangfeng Wang, Zhe Tao, Mingsong Chen

Figure 1 for FDA3 : Federated Defense Against Adversarial Attacks for Cloud-Based IIoT Applications

Figure 2 for FDA3 : Federated Defense Against Adversarial Attacks for Cloud-Based IIoT Applications

Figure 3 for FDA3 : Federated Defense Against Adversarial Attacks for Cloud-Based IIoT Applications

Figure 4 for FDA3 : Federated Defense Against Adversarial Attacks for Cloud-Based IIoT Applications

Abstract:Along with the proliferation of Artificial Intelligence (AI) and Internet of Things (IoT) techniques, various kinds of adversarial attacks are increasingly emerging to fool Deep Neural Networks (DNNs) used by Industrial IoT (IIoT) applications. Due to biased training data or vulnerable underlying models, imperceptible modifications on inputs made by adversarial attacks may result in devastating consequences. Although existing methods are promising in defending such malicious attacks, most of them can only deal with limited existing attack types, which makes the deployment of large-scale IIoT devices a great challenge. To address this problem, we present an effective federated defense approach named FDA3 that can aggregate defense knowledge against adversarial examples from different sources. Inspired by federated learning, our proposed cloud-based architecture enables the sharing of defense capabilities against different attacks among IIoT devices. Comprehensive experimental results show that the generated DNNs by our approach can not only resist more malicious attacks than existing attack-specific adversarial training methods, but also can prevent IIoT applications from new attacks.

* IEEE Transactions on Industrial Informatics, 2020

Via

Access Paper or Ask Questions