Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding

Jun 26, 2024

Zhenglin Wang, Jialong Wu, Yilong Lai, Congzhi Zhang, Deyu Zhou

Figure 1 for SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding

Figure 2 for SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding

Figure 3 for SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding

Figure 4 for SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) demonstrate remarkable emergent abilities across various tasks, yet fall short of complex reasoning and planning tasks. The tree-search-based reasoning methods address this by surpassing the capabilities of chain-of-thought prompting, encouraging exploration of intermediate steps. However, such methods introduce significant inference latency due to the systematic exploration and evaluation of multiple thought paths. This paper introduces SeeD, a novel and efficient inference framework to optimize runtime speed and GPU memory management concurrently. By employing a scheduled speculative execution, SeeD efficiently handles multiple iterations for the thought generation and the state evaluation, leveraging a rounds-scheduled strategy to manage draft model dispatching. Extensive experimental evaluations on three reasoning datasets demonstrate superior speedup performance of SeeD, providing a viable path for batched inference in training-free speculative decoding.

View paper on

Share this with someone who'll enjoy it:

Title:SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding

Paper and Code