Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Matevž Fabjančič

Mobiprox: Supporting Dynamic Approximate Computing on Mobiles

Mar 16, 2023

Matevž Fabjančič, Octavian Machidon, Hashim Sharif, Yifan Zhao, Saša Misailović, Veljko Pejović

Figure 1 for Mobiprox: Supporting Dynamic Approximate Computing on Mobiles

Figure 2 for Mobiprox: Supporting Dynamic Approximate Computing on Mobiles

Figure 3 for Mobiprox: Supporting Dynamic Approximate Computing on Mobiles

Figure 4 for Mobiprox: Supporting Dynamic Approximate Computing on Mobiles

Abstract:Runtime-tunable context-dependent network compression would make mobile deep learning adaptable to often varying resource availability, input "difficulty", or user needs. The existing compression techniques significantly reduce the memory, processing, and energy tax of deep learning, yet, the resulting models tend to be permanently impaired, sacrificing the inference power for reduced resource usage. The existing tunable compression approaches, on the other hand, require expensive re-training, seldom provide mobile-ready implementations, and do not support arbitrary strategies for adapting the compression. In this paper we present Mobiprox, a framework enabling flexible-accuracy on-device deep learning. Mobiprox implements tunable approximations of tensor operations and enables runtime adaptation of individual network layers. A profiler and a tuner included with Mobiprox identify the most promising neural network approximation configurations leading to the desired inference quality with the minimal use of resources. Furthermore, we develop control strategies that depending on contextual factors, such as the input data difficulty, dynamically adjust the approximation level of a model. We implement Mobiprox in Android OS and through experiments in diverse mobile domains, including human activity recognition and spoken keyword detection, demonstrate that it can save up to 15% system-wide energy with a minimal impact on the inference accuracy.

* 26 pages, 9 figures

Via

Access Paper or Ask Questions