Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alok Verma

Communication Compression for Tensor Parallel LLM Inference

Nov 14, 2024

Jan Hansen-Palmus, Michael Truong-Le, Oliver Hausdörfer, Alok Verma

Abstract:Large Language Models (LLMs) have pushed the frontier of artificial intelligence but are comprised of hundreds of billions of parameters and operations. For faster inference latency, LLMs are deployed on multiple hardware accelerators through various Model Parallelism strategies. Our paper looks into the details on one such strategy - Tensor Parallel - and proposes to reduce latency by compressing inter-accelerator communication. We leverage fine grained quantization techniques to compress selected activations by 3.5 - 4.5x. Our proposed method leads up to 2x reduction of time-to-first-token (TTFT) with negligible model performance degradation.

Via

Access Paper or Ask Questions

Photometric Depth Super-Resolution

Sep 26, 2018

Bjoern Haefner, Songyou Peng, Alok Verma, Yvain Quéau, Daniel Cremers

Figure 1 for Photometric Depth Super-Resolution

Figure 2 for Photometric Depth Super-Resolution

Figure 3 for Photometric Depth Super-Resolution

Figure 4 for Photometric Depth Super-Resolution

Abstract:This study explores the use of photometric techniques (shape-from-shading and uncalibrated photometric stereo) for upsampling the low-resolution depth map from an RGB-D sensor to the higher resolution of the companion RGB image. A single-shot variational approach is first put forward, which is effective as long as the target's reflectance is piecewise-constant. It is then shown that this dependency upon a specific reflectance model can be relaxed by focusing on a specific class of objects (e.g., faces), and delegate reflectance estimation to a deep neural network. A multi-shots strategy based on randomly varying lighting conditions is eventually discussed. It requires no training or prior on the reflectance, yet this comes at the price of a dedicated acquisition setup. Both quantitative and qualitative evaluations illustrate the effectiveness of the proposed methods on synthetic and real-world scenarios.

* 14-page main paper and 16-page supplementary material. First three authors contribute equally

Via

Access Paper or Ask Questions