Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lorenz K. Müller

SSSD: Simply-Scalable Speculative Decoding

Nov 08, 2024

Michele Marzollo, Jiawei Zhuang, Niklas Roemer, Lorenz K. Müller, Lukas Cavigelli

Figure 1 for SSSD: Simply-Scalable Speculative Decoding

Figure 2 for SSSD: Simply-Scalable Speculative Decoding

Figure 3 for SSSD: Simply-Scalable Speculative Decoding

Figure 4 for SSSD: Simply-Scalable Speculative Decoding

Abstract:Over the past year, Speculative Decoding has gained popularity as a technique for accelerating Large Language Model inference. While several methods have been introduced, most struggle to deliver satisfactory performance at batch sizes typical for data centers ($\geq 8$) and often involve significant deployment complexities. In this work, we offer a theoretical explanation of how Speculative Decoding can be effectively utilized with larger batch sizes. We also introduce a method that integrates seamlessly into existing systems without additional training or the complexity of deploying a small LLM. In a continuous batching setting, we achieve a 4x increase in throughput without any latency impact for short context generation, and a 1.7-2x improvement in both latency and throughput for longer contexts.

* 14 pages, 7 figures

Via

Access Paper or Ask Questions

RL-based Stateful Neural Adaptive Sampling and Denoising for Real-Time Path Tracing

Oct 05, 2023

Antoine Scardigli, Lukas Cavigelli, Lorenz K. Müller

Figure 1 for RL-based Stateful Neural Adaptive Sampling and Denoising for Real-Time Path Tracing

Figure 2 for RL-based Stateful Neural Adaptive Sampling and Denoising for Real-Time Path Tracing

Figure 3 for RL-based Stateful Neural Adaptive Sampling and Denoising for Real-Time Path Tracing

Figure 4 for RL-based Stateful Neural Adaptive Sampling and Denoising for Real-Time Path Tracing

Abstract:Monte-Carlo path tracing is a powerful technique for realistic image synthesis but suffers from high levels of noise at low sample counts, limiting its use in real-time applications. To address this, we propose a framework with end-to-end training of a sampling importance network, a latent space encoder network, and a denoiser network. Our approach uses reinforcement learning to optimize the sampling importance network, thus avoiding explicit numerically approximated gradients. Our method does not aggregate the sampled values per pixel by averaging but keeps all sampled values which are then fed into the latent space encoder. The encoder replaces handcrafted spatiotemporal heuristics by learned representations in a latent space. Finally, a neural denoiser is trained to refine the output image. Our approach increases visual quality on several challenging datasets and reduces rendering times for equal quality by a factor of 1.6x compared to the previous state-of-the-art, making it a promising solution for real-time applications.

* Submitted to NeurIPS. https://openreview.net/forum?id=xNyR7DXUzJ

Via

Access Paper or Ask Questions