Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sebastian Vogel

Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge

Jan 22, 2024

Yao Lu, Hiram Rayo Torres Rodriguez, Sebastian Vogel, Nick van de Waterlaat, Pavol Jancura

Abstract:Neural Architecture Search (NAS) has become the de-facto approach for designing accurate and efficient networks for edge devices. Since models are typically quantized for edge deployment, recent work has investigated quantization-aware NAS (QA-NAS) to search for highly accurate and efficient quantized models. However, existing QA-NAS approaches, particularly few-bit mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently, QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS. We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.

* Accepted at Workshop on Compilers, Deployment, and Tooling for Edge AI (CODAI '23 ), September 21, 2023, Hamburg, Germany

Via

Access Paper or Ask Questions

BOMP-NAS: Bayesian Optimization Mixed Precision NAS

Jan 27, 2023

David van Son, Floran de Putter, Sebastian Vogel, Henk Corporaal

Abstract:Bayesian Optimization Mixed-Precision Neural Architecture Search (BOMP-NAS) is an approach to quantization-aware neural architecture search (QA-NAS) that leverages both Bayesian optimization (BO) and mixed-precision quantization (MP) to efficiently search for compact, high performance deep neural networks. The results show that integrating quantization-aware fine-tuning (QAFT) into the NAS loop is a necessary step to find networks that perform well under low-precision quantization: integrating it allows a model size reduction of nearly 50\% on the CIFAR-10 dataset. BOMP-NAS is able to find neural networks that achieve state of the art performance at much lower design costs. This study shows that BOMP-NAS can find these neural networks at a 6x shorter search time compared to the closest related work.

Via

Access Paper or Ask Questions

An End-to-End HW/SW Co-Design Methodology to Design Efficient Deep Neural Network Systems using Virtual Models

Nov 18, 2019

Michael J. Klaiber, Sebastian Vogel, Axel Acosta, Robert Korn, Leonardo Ecco, Kristine Back, Andre Guntoro, Ingo Feldner

Figure 1 for An End-to-End HW/SW Co-Design Methodology to Design Efficient Deep Neural Network Systems using Virtual Models

Figure 2 for An End-to-End HW/SW Co-Design Methodology to Design Efficient Deep Neural Network Systems using Virtual Models

Figure 3 for An End-to-End HW/SW Co-Design Methodology to Design Efficient Deep Neural Network Systems using Virtual Models

Figure 4 for An End-to-End HW/SW Co-Design Methodology to Design Efficient Deep Neural Network Systems using Virtual Models

Abstract:End-to-end performance estimation and measurement of deep neural network (DNN) systems become more important with increasing complexity of DNN systems consisting of hardware and software components. The methodology proposed in this paper aims at a reduced turn-around time for evaluating different design choices of hardware and software components of DNN systems. This reduction is achieved by moving the performance estimation from the implementation phase to the concept phase by employing virtual hardware models instead of gathering measurement results from physical prototypes. Deep learning compilers introduce hardware-specific transformations and are, therefore, considered a part of the design flow of virtual system models to extract end-to-end performance estimations. To validate the run-time accuracy of the proposed methodology, a system processing the DilatedVGG DNN is realized both as virtual system model and as hardware implementation. The results show that up to 92 % accuracy can be reached in predicting the processing time of the DNN inference.

* Embedded Systems Week 2019, INTelligent Embedded Systems Architectures and Applications Workshop 2019

Via

Access Paper or Ask Questions

Automated design of error-resilient and hardware-efficient deep neural networks

Sep 30, 2019

Christoph Schorn, Thomas Elsken, Sebastian Vogel, Armin Runge, Andre Guntoro, Gerd Ascheid

Figure 1 for Automated design of error-resilient and hardware-efficient deep neural networks

Figure 2 for Automated design of error-resilient and hardware-efficient deep neural networks

Figure 3 for Automated design of error-resilient and hardware-efficient deep neural networks

Figure 4 for Automated design of error-resilient and hardware-efficient deep neural networks

Abstract:Applying deep neural networks (DNNs) in mobile and safety-critical systems, such as autonomous vehicles, demands a reliable and efficient execution on hardware. Optimized dedicated hardware accelerators are being developed to achieve this. However, the design of efficient and reliable hardware has become increasingly difficult, due to the increased complexity of modern integrated circuit technology and its sensitivity against hardware faults, such as random bit-flips. It is thus desirable to exploit optimization potential for error resilience and efficiency also at the algorithmic side, e.g., by optimizing the architecture of the DNN. Since there are numerous design choices for the architecture of DNNs, with partially opposing effects on the preferred characteristics (such as small error rates at low latency), multi-objective optimization strategies are necessary. In this paper, we develop an evolutionary optimization technique for the automated design of hardware-optimized DNN architectures. For this purpose, we derive a set of easily computable objective functions, which enable the fast evaluation of DNN architectures with respect to their hardware efficiency and error resilience solely based on the network topology. We observe a strong correlation between predicted error resilience and actual measurements obtained from fault injection simulations. Furthermore, we analyze two different quantization schemes for efficient DNN computation and find significant differences regarding their effect on error resilience.

Via

Access Paper or Ask Questions

Efficient Stochastic Inference of Bitwise Deep Neural Networks

Nov 20, 2016

Sebastian Vogel, Christoph Schorn, Andre Guntoro, Gerd Ascheid

Figure 1 for Efficient Stochastic Inference of Bitwise Deep Neural Networks

Figure 2 for Efficient Stochastic Inference of Bitwise Deep Neural Networks

Abstract:Recently published methods enable training of bitwise neural networks which allow reduced representation of down to a single bit per weight. We present a method that exploits ensemble decisions based on multiple stochastically sampled network models to increase performance figures of bitwise neural networks in terms of classification accuracy at inference. Our experiments with the CIFAR-10 and GTSRB datasets show that the performance of such network ensembles surpasses the performance of the high-precision base model. With this technique we achieve 5.81% best classification error on CIFAR-10 test set using bitwise networks. Concerning inference on embedded systems we evaluate these bitwise networks using a hardware efficient stochastic rounding procedure. Our work contributes to efficient embedded bitwise neural networks.

* 6 pages, 3 figures, Workshop on Efficient Methods for Deep Neural Networks at Neural Information Processing Systems Conference 2016, NIPS 2016, EMDNN 2016

Via

Access Paper or Ask Questions