Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network

Jan 23, 2024

Hanchen Li, Yuhan Liu, Yihua Cheng, Siddhant Ray, Kuntai Du, Junchen Jiang

Figure 1 for Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network

Figure 2 for Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network

Figure 3 for Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network

Figure 4 for Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network

Share this with someone who'll enjoy it:

Abstract:To render each generated token in real time, the LLM server generates response tokens one by one and streams each generated token (or group of a few tokens) through the network to the user right after it is generated, which we refer to as LLM token streaming. However, under unstable network conditions, the LLM token streaming experience could suffer greatly from stalls since one packet loss could block the rendering of tokens contained in subsequent packets even if they arrive on time. With a real-world measurement study, we show that current applications including ChatGPT, Claude, and Bard all suffer from increased stall under unstable network. For this emerging token streaming problem in LLM Chatbots, we propose a novel transport layer scheme, called Chatterbox, which puts new generated tokens as well as currently unacknowledged tokens in the next outgoing packet. This ensures that each packet contains some new tokens and can be independently rendered when received, thus avoiding aforementioned stalls caused by missing packets. Through simulation under various network conditions, we show Chatterbox reduces stall ratio (proportion of token rendering wait time) by 71.0% compared to the token streaming method commonly used by real chatbot applications and by 31.6% compared to a custom packet duplication scheme. By tailoring Chatterbox to fit the token-by-token generation of LLM, we enable the Chatbots to respond like an eloquent speaker for users to better enjoy pervasive AI.

View paper on

Share this with someone who'll enjoy it:

Title:Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network

Paper and Code