Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chuo-Ling Chang

Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations

Apr 21, 2023

Yu-Hui Chen, Raman Sarokin, Juhyun Lee, Jiuqiang Tang, Chuo-Ling Chang, Andrei Kulik, Matthias Grundmann

Abstract:The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, common large diffusion models have over 1 billion parameters and pose challenges due to restricted computational and memory resources on devices. We present a series of implementation optimizations for large diffusion models that achieve the fastest reported inference latency to-date (under 12 seconds for Stable Diffusion 1.4 without int8 quantization on Samsung S23 Ultra for a 512x512 image with 20 iterations) on GPU-equipped mobile devices. These enhancements broaden the applicability of generative AI and improve the overall user experience across a wide range of devices.

* 4 pages (not including references), 2 figures, 2 tables. Accepted to Efficient Deep Learning for Computer Vision workshop 2023

Via

Access Paper or Ask Questions

On-device Real-time Hand Gesture Recognition

Oct 29, 2021

George Sung, Kanstantsin Sokal, Esha Uboweja, Valentin Bazarevsky, Jonathan Baccash, Eduard Gabriel Bazavan, Chuo-Ling Chang, Matthias Grundmann

Figure 1 for On-device Real-time Hand Gesture Recognition

Figure 2 for On-device Real-time Hand Gesture Recognition

Figure 3 for On-device Real-time Hand Gesture Recognition

Figure 4 for On-device Real-time Hand Gesture Recognition

Abstract:We present an on-device real-time hand gesture recognition (HGR) system, which detects a set of predefined static gestures from a single RGB camera. The system consists of two parts: a hand skeleton tracker and a gesture classifier. We use MediaPipe Hands as the basis of the hand skeleton tracker, improve the keypoint accuracy, and add the estimation of 3D keypoints in a world metric space. We create two different gesture classifiers, one based on heuristics and the other using neural networks (NN).

* 5 pages, 6 figures; ICCV Workshop on Computer Vision for Augmented and Virtual Reality, Montreal, Canada, 2021

Via

Access Paper or Ask Questions

MediaPipe Hands: On-device Real-time Hand Tracking

Jun 18, 2020

Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, Matthias Grundmann

Figure 1 for MediaPipe Hands: On-device Real-time Hand Tracking

Figure 2 for MediaPipe Hands: On-device Real-time Hand Tracking

Figure 3 for MediaPipe Hands: On-device Real-time Hand Tracking

Figure 4 for MediaPipe Hands: On-device Real-time Hand Tracking

Abstract:We present a real-time on-device hand tracking pipeline that predicts hand skeleton from single RGB camera for AR/VR applications. The pipeline consists of two models: 1) a palm detector, 2) a hand landmark model. It's implemented via MediaPipe, a framework for building cross-platform ML solutions. The proposed model and pipeline architecture demonstrates real-time inference speed on mobile GPUs and high prediction quality. MediaPipe Hands is open sourced at https://mediapipe.dev.

* 5 pages, 7 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Seattle, WA, USA, 2020

Via

Access Paper or Ask Questions