Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yury Pisarchyk

Efficient Memory Management for Deep Neural Net Inference

Feb 16, 2020

Yury Pisarchyk, Juhyun Lee

Figure 1 for Efficient Memory Management for Deep Neural Net Inference

Figure 2 for Efficient Memory Management for Deep Neural Net Inference

Figure 3 for Efficient Memory Management for Deep Neural Net Inference

Figure 4 for Efficient Memory Management for Deep Neural Net Inference

Abstract:While deep neural net inference was considered a task for servers only, latest advances in technology allow the task of inference to be moved to mobile and embedded devices, desired for various reasons ranging from latency to privacy. These devices are not only limited by their compute power and battery, but also by their inferior physical memory and cache, and thus, an efficient memory manager becomes a crucial component for deep neural net inference at the edge. We explore various strategies to smartly share memory buffers among intermediate tensors in deep neural nets. Employing these can result in up to 11% smaller memory footprint than the state of the art.

* 6 pages, 6 figures, MLSys 2020 Workshop on Resource-Constrained Machine Learning (ReCoML 2020)

Via

Access Paper or Ask Questions

On-Device Neural Net Inference with Mobile GPUs

Jul 03, 2019

Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, Matthias Grundmann

Figure 1 for On-Device Neural Net Inference with Mobile GPUs

Figure 2 for On-Device Neural Net Inference with Mobile GPUs

Figure 3 for On-Device Neural Net Inference with Mobile GPUs

Figure 4 for On-Device Neural Net Inference with Mobile GPUs

Abstract:On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy. Running such a compute-intensive task solely on the mobile CPU, however, can be difficult due to limited computing power, thermal constraints, and energy consumption. App developers and researchers have begun exploiting hardware accelerators to overcome these challenges. Recently, device manufacturers are adding neural processing units into high-end phones for on-device inference, but these account for only a small fraction of hand-held devices. In this paper, we present how we leverage the mobile GPU, a ubiquitous hardware accelerator on virtually every phone, to run inference of deep neural networks in real-time for both Android and iOS devices. By describing our architecture, we also discuss how to design networks that are mobile GPU-friendly. Our state-of-the-art mobile GPU inference engine is integrated into the open-source project TensorFlow Lite and publicly available at https://tensorflow.org/lite.

* Computer Vision and Pattern Recognition Workshop: Efficient Deep Learning for Computer Vision 2019

Via

Access Paper or Ask Questions