Picture for Rongxin Cheng

Rongxin Cheng

KunServe: Elastic and Efficient Large Language Model Serving with Parameter-centric Memory Management

Add code
Dec 24, 2024
Viaarxiv icon

Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory

Add code
May 07, 2024
Viaarxiv icon