Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gan Luo

Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints

Apr 15, 2025

Ruicheng Ao, Gan Luo, David Simchi-Levi, Xinshang Wang

Abstract:Large Language Models (LLMs) are indispensable in today's applications, but their inference procedure -- generating responses by processing text in segments and using a memory-heavy Key-Value (KV) cache -- demands significant computational resources, particularly under memory constraints. This paper formulates LLM inference optimization as a multi-stage online scheduling problem where sequential prompt arrivals and KV cache growth render conventional scheduling ineffective. We develop a fluid dynamics approximation to provide a tractable benchmark that guides algorithm design. Building on this, we propose the Waiting for Accumulated Inference Threshold (WAIT) algorithm, which uses multiple thresholds to schedule incoming prompts optimally when output lengths are known, and extend it to Nested WAIT for cases with unknown output lengths. Theoretical analysis shows that both algorithms achieve near-optimal performance against the fluid benchmark in heavy traffic conditions, balancing throughput, latency, and Time to First Token (TTFT). Experiments with the Llama-7B model on an A100 GPU using both synthetic and real-world datasets demonstrate improved throughput and reduced latency relative to established baselines like vLLM and Sarathi. This work bridges operations research and machine learning, offering a rigorous framework for the efficient deployment of LLMs under memory constraints.

* 42 pages, 18 figures

Via

Access Paper or Ask Questions

Course Concept Expansion in MOOCs with External Knowledge and Interactive Game

Sep 17, 2019

Jifan Yu, Chenyu Wang, Gan Luo, Lei Hou, Juanzi Li, Jie Tang, Zhiyuan Liu

Figure 1 for Course Concept Expansion in MOOCs with External Knowledge and Interactive Game

Figure 2 for Course Concept Expansion in MOOCs with External Knowledge and Interactive Game

Figure 3 for Course Concept Expansion in MOOCs with External Knowledge and Interactive Game

Figure 4 for Course Concept Expansion in MOOCs with External Knowledge and Interactive Game

Abstract:As Massive Open Online Courses (MOOCs) become increasingly popular, it is promising to automatically provide extracurricular knowledge for MOOC users. Suffering from semantic drifts and lack of knowledge guidance, existing methods can not effectively expand course concepts in complex MOOC environments. In this paper, we first build a novel boundary during searching for new concepts via external knowledge base and then utilize heterogeneous features to verify the high-quality results. In addition, to involve human efforts in our model, we design an interactive optimization mechanism based on a game. Our experiments on the four datasets from Coursera and XuetangX show that the proposed method achieves significant improvements(+0.19 by MAP) over existing methods. The source code and datasets have been published.

Via

Access Paper or Ask Questions