Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zining Liu

Speculative Decoding and Beyond: An In-Depth Review of Techniques

Feb 27, 2025

Yunhai Hu, Zining Liu, Zhenyuan Dong, Tianfan Peng, Bradley McDanel, Sai Qian Zhang

Abstract:Sequential dependencies present a fundamental bottleneck in deploying large-scale autoregressive models, particularly for real-time applications. While traditional optimization approaches like pruning and quantization often compromise model quality, recent advances in generation-refinement frameworks demonstrate that this trade-off can be significantly mitigated. This survey presents a comprehensive taxonomy of generation-refinement frameworks, analyzing methods across autoregressive sequence tasks. We categorize methods based on their generation strategies (from simple n-gram prediction to sophisticated draft models) and refinement mechanisms (including single-pass verification and iterative approaches). Through systematic analysis of both algorithmic innovations and system-level implementations, we examine deployment strategies across computing environments and explore applications spanning text, images, and speech generation. This systematic examination of both theoretical frameworks and practical implementations provides a foundation for future research in efficient autoregressive decoding.

Via

Access Paper or Ask Questions

Which Channel to Ask My Question? Personalized Customer Service RequestStream Routing using DeepReinforcement Learning

Nov 24, 2019

Zining Liu, Chong Long, Xiaolu Lu, Zehong Hu, Jie Zhang, Yafang Wang

Figure 1 for Which Channel to Ask My Question? Personalized Customer Service RequestStream Routing using DeepReinforcement Learning

Figure 2 for Which Channel to Ask My Question? Personalized Customer Service RequestStream Routing using DeepReinforcement Learning

Figure 3 for Which Channel to Ask My Question? Personalized Customer Service RequestStream Routing using DeepReinforcement Learning

Figure 4 for Which Channel to Ask My Question? Personalized Customer Service RequestStream Routing using DeepReinforcement Learning

Abstract:Customer services are critical to all companies, as they may directly connect to the brand reputation. Due to a great number of customers, e-commerce companies often employ multiple communication channels to answer customers' questions, for example, chatbot and hotline. On one hand, each channel has limited capacity to respond to customers' requests, on the other hand, customers have different preferences over these channels. The current production systems are mainly built based on business rules, which merely considers tradeoffs between resources and customers' satisfaction. To achieve the optimal tradeoff between resources and customers' satisfaction, we propose a new framework based on deep reinforcement learning, which directly takes both resources and user model into account. In addition to the framework, we also propose a new deep-reinforcement-learning based routing method-double dueling deep Q-learning with prioritized experience replay (PER-DoDDQN). We evaluate our proposed framework and method using both synthetic and a real customer service log data from a large financial technology company. We show that our proposed deep-reinforcement-learning based framework is superior to the existing production system. Moreover, we also show our proposed PER-DoDDQN is better than all other deep Q-learning variants in practice, which provides a more optimal routing plan. These observations suggest that our proposed method can seek the trade-off where both channel resources and customers' satisfaction are optimal.

* 13 pages, 7 figures

Via

Access Paper or Ask Questions