Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kuba Kaszyk

School of Informatics, University of Edinburgh, UK

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Feb 20, 2020

Valentin Radu, Kuba Kaszyk, Yuan Wen, Jack Turner, Jose Cano, Elliot J. Crowley, Bjorn Franke, Amos Storkey, Michael O'Boyle

Figure 1 for Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Figure 2 for Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Figure 3 for Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Figure 4 for Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Abstract:Convolutional Neural Networks (CNN) are becoming a common presence in many applications and services, due to their superior recognition accuracy. They are increasingly being used on mobile devices, many times just by porting large models designed for server space, although several model compression techniques have been considered. One model compression technique intended to reduce computations is channel pruning. Mobile and embedded systems now have GPUs which are ideal for the parallel computations of neural networks and for their lower energy cost per operation. Specialized libraries perform these neural network computations through highly optimized routines. As we find in our experiments, these libraries are optimized for the most common network shapes, making uninstructed channel pruning inefficient. We evaluate higher level libraries, which analyze the input characteristics of a convolutional layer, based on which they produce optimized OpenCL (Arm Compute Library and TVM) and CUDA (cuDNN) code. However, in reality, these characteristics and subsequent choices intended for optimization can have the opposite effect. We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance, leading to 2x slowdown. On the other hand, we also find examples where performance-aware pruning achieves the intended results, with performance speedups of 3x with cuDNN and above 10x with Arm Compute Library and TVM. Our findings expose the need for hardware-instructed neural network pruning.

* A copy of this was published in IISWC'19

Via

Access Paper or Ask Questions

Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality

Aug 20, 2018

Sajad Saeedi, Bruno Bodin, Harry Wagstaff, Andy Nisbet, Luigi Nardi, John Mawer, Nicolas Melot, Oscar Palomar, Emanuele Vespa, Tom Spink(+16 more)

Figure 1 for Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality

Figure 2 for Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality

Figure 3 for Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality

Figure 4 for Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality

Abstract:Visual understanding of 3D environments in real-time, at low power, is a huge computational challenge. Often referred to as SLAM (Simultaneous Localisation and Mapping), it is central to applications spanning domestic and industrial robotics, autonomous vehicles, virtual and augmented reality. This paper describes the results of a major research effort to assemble the algorithms, architectures, tools, and systems software needed to enable delivery of SLAM, by supporting applications specialists in selecting and configuring the appropriate algorithm and the appropriate hardware, and compilation pathway, to meet their performance, accuracy, and energy consumption goals. The major contributions we present are (1) tools and methodology for systematic quantitative evaluation of SLAM algorithms, (2) automated, machine-learning-guided exploration of the algorithmic and implementation design space with respect to multiple objectives, (3) end-to-end simulation tools to enable optimisation of heterogeneous, accelerated architectures for the specific algorithmic requirements of the various SLAM algorithmic approaches, and (4) tools for delivering, where appropriate, accelerated, adaptive SLAM solutions in a managed, JIT-compiled, adaptive runtime context.

* Proceedings of the IEEE 2018

Via

Access Paper or Ask Questions