Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jack Turner

Neural Architecture Search as Program Transformation Exploration

Feb 12, 2021

Jack Turner, Elliot J. Crowley, Michael O'Boyle

Figure 1 for Neural Architecture Search as Program Transformation Exploration

Figure 2 for Neural Architecture Search as Program Transformation Exploration

Figure 3 for Neural Architecture Search as Program Transformation Exploration

Figure 4 for Neural Architecture Search as Program Transformation Exploration

Abstract:Improving the performance of deep neural networks (DNNs) is important to both the compiler and neural architecture search (NAS) communities. Compilers apply program transformations in order to exploit hardware parallelism and memory hierarchy. However, legality concerns mean they fail to exploit the natural robustness of neural networks. In contrast, NAS techniques mutate networks by operations such as the grouping or bottlenecking of convolutions, exploiting the resilience of DNNs. In this work, we express such neural architecture operations as program transformations whose legality depends on a notion of representational capacity. This allows them to be combined with existing transformations into a unified optimization framework. This unification allows us to express existing NAS operations as combinations of simpler transformations. Crucially, it allows us to generate and explore new tensor convolutions. We prototyped the combined framework in TVM and were able to find optimizations across different DNNs, that significantly reduce inference time - over 3$\times$ in the majority of cases. Furthermore, our scheme dramatically reduces NAS search time. Code is available at~\href{https://github.com/jack-willturner/nas-as-program-transformation-exploration}{this https url}.

Via

Access Paper or Ask Questions

Optimizing Grouped Convolutions on Edge Devices

Jun 17, 2020

Perry Gibson, José Cano, Jack Turner, Elliot J. Crowley, Michael O'Boyle, Amos Storkey

Figure 1 for Optimizing Grouped Convolutions on Edge Devices

Figure 2 for Optimizing Grouped Convolutions on Edge Devices

Figure 3 for Optimizing Grouped Convolutions on Edge Devices

Figure 4 for Optimizing Grouped Convolutions on Edge Devices

Abstract:When deploying a deep neural network on constrained hardware, it is possible to replace the network's standard convolutions with grouped convolutions. This allows for substantial memory savings with minimal loss of accuracy. However, current implementations of grouped convolutions in modern deep learning frameworks are far from performing optimally in terms of speed. In this paper we propose Grouped Spatial Pack Convolutions (GSPC), a new implementation of grouped convolutions that outperforms existing solutions. We implement GSPC in TVM, which provides state-of-the-art performance on edge devices. We analyze a set of networks utilizing different types of grouped convolutions and evaluate their performance in terms of inference time on several edge devices. We observe that our new implementation scales well with the number of groups and provides the best inference times in all settings, improving the existing implementations of grouped convolutions in TVM, PyTorch and TensorFlow Lite by 3.4x, 8x and 4x on average respectively. Code is available at https://github.com/gecLAB/tvm-GSPC/

* Camera ready version to be published at ASAP 2020 - The 31st IEEE International Conference on Application-specific Systems, Architectures and Processors. 8 pages, 6 figures

Via

Access Paper or Ask Questions

Neural Architecture Search without Training

Jun 08, 2020

Joseph Mellor, Jack Turner, Amos Storkey, Elliot J. Crowley

Figure 1 for Neural Architecture Search without Training

Figure 2 for Neural Architecture Search without Training

Figure 3 for Neural Architecture Search without Training

Figure 4 for Neural Architecture Search without Training

Abstract:The time and effort involved in hand-designing deep neural networks is immense. This has prompted the development of Neural Architecture Search (NAS) techniques to automate this design. However, NAS algorithms tend to be extremely slow and expensive; they need to train vast numbers of candidate networks to inform the search process. This could be remedied if we could infer a network's trained accuracy from its initial state. In this work, we examine how the linear maps induced by data points correlate for untrained network architectures in the NAS-Bench-201 search space, and motivate how this can be used to give a measure of modelling flexibility which is highly indicative of a network's trained performance. We incorporate this measure into a simple algorithm that allows us to search for powerful networks without any training in a matter of seconds on a single GPU. Code to reproduce our experiments is available at https://github.com/BayesWatch/nas-without-training.

Via

Access Paper or Ask Questions

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Feb 20, 2020

Valentin Radu, Kuba Kaszyk, Yuan Wen, Jack Turner, Jose Cano, Elliot J. Crowley, Bjorn Franke, Amos Storkey, Michael O'Boyle

Figure 1 for Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Figure 2 for Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Figure 3 for Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Figure 4 for Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Abstract:Convolutional Neural Networks (CNN) are becoming a common presence in many applications and services, due to their superior recognition accuracy. They are increasingly being used on mobile devices, many times just by porting large models designed for server space, although several model compression techniques have been considered. One model compression technique intended to reduce computations is channel pruning. Mobile and embedded systems now have GPUs which are ideal for the parallel computations of neural networks and for their lower energy cost per operation. Specialized libraries perform these neural network computations through highly optimized routines. As we find in our experiments, these libraries are optimized for the most common network shapes, making uninstructed channel pruning inefficient. We evaluate higher level libraries, which analyze the input characteristics of a convolutional layer, based on which they produce optimized OpenCL (Arm Compute Library and TVM) and CUDA (cuDNN) code. However, in reality, these characteristics and subsequent choices intended for optimization can have the opposite effect. We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance, leading to 2x slowdown. On the other hand, we also find examples where performance-aware pruning achieves the intended results, with performance speedups of 3x with cuDNN and above 10x with Arm Compute Library and TVM. Our findings expose the need for hardware-instructed neural network pruning.

* A copy of this was published in IISWC'19

Via

Access Paper or Ask Questions

Deep Kernel Transfer in Gaussian Processes for Few-shot Learning

Oct 11, 2019

Massimiliano Patacchiola, Jack Turner, Elliot J. Crowley, Amos Storkey

Figure 1 for Deep Kernel Transfer in Gaussian Processes for Few-shot Learning

Figure 2 for Deep Kernel Transfer in Gaussian Processes for Few-shot Learning

Figure 3 for Deep Kernel Transfer in Gaussian Processes for Few-shot Learning

Figure 4 for Deep Kernel Transfer in Gaussian Processes for Few-shot Learning

Abstract:Humans tackle new problems by making inferences that go far beyond the information available, reusing what they have previously learned, and weighing different alternatives in the face of uncertainty. Incorporating these abilities in an artificial system is a major objective in machine learning. Towards this goal, we introduce a Bayesian method based on Gaussian Processes (GPs) that can learn efficiently from a limited amount of data and generalize across new tasks and domains. We frame few-shot learning as a model selection problem by learning a deep kernel across tasks, and then using this kernel as a covariance function in a GP prior for Bayesian inference. This probabilistic treatment allows for cross-domain flexibility, and uncertainty quantification. We provide substantial experimental evidence, showing that the proposed method is better than several state-of-the-art algorithms in few-shot regression and cross-domain classification.

Via

Access Paper or Ask Questions

BlockSwap: Fisher-guided Block Substitution for Network Compression

Jun 10, 2019

Jack Turner, Elliot J. Crowley, Gavin Gray, Amos Storkey, Michael O'Boyle

Figure 1 for BlockSwap: Fisher-guided Block Substitution for Network Compression

Figure 2 for BlockSwap: Fisher-guided Block Substitution for Network Compression

Figure 3 for BlockSwap: Fisher-guided Block Substitution for Network Compression

Figure 4 for BlockSwap: Fisher-guided Block Substitution for Network Compression

Abstract:The desire to run neural networks on low-capacity edge devices has led to the development of a wealth of compression techniques. Moonshine is a simple and powerful example of this: one takes a large pre-trained network and substitutes each of its convolutional blocks with a selected cheap alternative block, then distills the resultant network with the original. However, not all blocks are created equally; for a required parameter budget there may exist a potent combination of many different cheap blocks. In this work, we find these by developing BlockSwap: an algorithm for choosing networks with interleaved block types by passing a single minibatch of training data through randomly initialised networks and gauging their Fisher potential. We show that block-wise cheapening yields more accurate networks than single block-type networks across a spectrum of parameter budgets. Code is available at https://github.com/BayesWatch/pytorch-blockswap.

Via

Access Paper or Ask Questions

HAKD: Hardware Aware Knowledge Distillation

Oct 24, 2018

Jack Turner, Elliot J. Crowley, Valentin Radu, José Cano, Amos Storkey, Michael O'Boyle

Figure 1 for HAKD: Hardware Aware Knowledge Distillation

Figure 2 for HAKD: Hardware Aware Knowledge Distillation

Figure 3 for HAKD: Hardware Aware Knowledge Distillation

Figure 4 for HAKD: Hardware Aware Knowledge Distillation

Abstract:Despite recent developments, deploying deep neural networks on resource constrained general purpose hardware remains a significant challenge. There has been much work in developing methods for reshaping neural networks, usually with a focus on minimising total parameter count. These methods are typically developed in a hardware-agnostic manner and do not exploit hardware behaviour. In this paper we propose a new approach, Hardware Aware Knowledge Distillation (HAKD) which uses empirical observations of hardware behaviour to design efficient student networks which are then trained with knowledge distillation. This allows the trade-off between accuracy and performance to be managed explicitly. We have applied this approach across three platforms and evaluated it on two networks, MobileNet and DenseNet, on CIFAR-10. We show that HAKD outperforms Deep Compression and Fisher pruning in terms of size, accuracy and performance.

Via

Access Paper or Ask Questions

Pruning neural networks: is it time to nip it in the bud?

Oct 10, 2018

Elliot J. Crowley, Jack Turner, Amos Storkey, Michael O'Boyle

Figure 1 for Pruning neural networks: is it time to nip it in the bud?

Figure 2 for Pruning neural networks: is it time to nip it in the bud?

Figure 3 for Pruning neural networks: is it time to nip it in the bud?

Figure 4 for Pruning neural networks: is it time to nip it in the bud?

Abstract:Pruning is a popular technique for compressing a neural network: a large pre-trained network is fine-tuned while connections are successively removed. However, the value of pruning has largely evaded scrutiny. In this extended abstract, we examine residual networks obtained through Fisher-pruning and make two interesting observations. First, when time-constrained, it is better to train a simple, smaller network from scratch than prune a large network. Second, it is the architectures obtained through the pruning process --- not the learnt weights ---that prove valuable. Such architectures are powerful when trained from scratch. Furthermore, these architectures are easy to approximate without any further pruning: we can prune once and obtain a family of new, scalable network architectures for different memory requirements.

* Extended Abstract

Via

Access Paper or Ask Questions

Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks

Sep 19, 2018

Jack Turner, José Cano, Valentin Radu, Elliot J. Crowley, Michael O'Boyle, Amos Storkey

Figure 1 for Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks

Figure 2 for Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks

Figure 3 for Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks

Figure 4 for Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks

Abstract:Convolutional Neural Networks (CNNs) are extremely computationally demanding, presenting a large barrier to their deployment on resource-constrained devices. Since such systems are where some of their most useful applications lie (e.g. obstacle detection for mobile robots, vision-based medical assistive technology), significant bodies of work from both machine learning and systems communities have attempted to provide optimisations that will make CNNs available to edge devices. In this paper we unify the two viewpoints in a Deep Learning Inference Stack and take an across-stack approach by implementing and evaluating the most common neural network compression techniques (weight pruning, channel pruning, and quantisation) and optimising their parallel execution with a range of programming approaches (OpenMP, OpenCL) and hardware architectures (CPU, GPU). We provide comprehensive Pareto curves to instruct trade-offs under constraints of accuracy, execution time, and memory space.

* IISWC 2018

Via

Access Paper or Ask Questions