Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rui Kang

WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving

Dec 10, 2025

Chiheng Lou, Sheng Qi, Rui Kang, Yong Zhang, Chen Sun, Pengcheng Wang, Bingyang Liu, Xuanzhe Liu, Xin Jin

Figure 1 for WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving

Figure 2 for WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving

Figure 3 for WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving

Figure 4 for WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving

Abstract:Deploying multiple models within shared GPU clusters is promising for improving resource efficiency in large language model (LLM) serving. Existing multi-LLM serving systems optimize GPU utilization at the cost of worse inference performance, especially time-to-first-token (TTFT). We identify the root cause of such compromise as their unawareness of future workload characteristics. In contrast, recent analysis on real-world traces has shown the high periodicity and long-term predictability of LLM serving workloads. We propose universal GPU workers to enable one-for-many GPU prewarming that loads models with knowledge of future workloads. Based on universal GPU workers, we design and build WarmServe, a multi-LLM serving system that (1) mitigates cluster-wide prewarming interference by adopting an evict-aware model placement strategy, (2) prepares universal GPU workers in advance by proactive prewarming, and (3) manages GPU memory with a zero-overhead memory switching mechanism. Evaluation under real-world datasets shows that WarmServe improves TTFT by up to 50.8$\times$ compared to the state-of-the-art autoscaling-based system, while being capable of serving up to 2.5$\times$ more requests compared to the GPU-sharing system.

Via

Access Paper or Ask Questions

CNN-Based Automatic Urinary Particles Recognition

Mar 06, 2018

Rui Kang, Yixiong Liang, Chunyan Lian, Yuan Mao

Figure 1 for CNN-Based Automatic Urinary Particles Recognition

Figure 2 for CNN-Based Automatic Urinary Particles Recognition

Figure 3 for CNN-Based Automatic Urinary Particles Recognition

Figure 4 for CNN-Based Automatic Urinary Particles Recognition

Abstract:The urine sediment analysis of particles in microscopic images can assist physicians in evaluating patients with renal and urinary tract diseases. Manual urine sediment examination is labor-intensive, subjective and time-consuming, and the traditional automatic algorithms often extract the hand-crafted features for recognition. Instead of using the hand-crafted features, in this paper, we exploit CNN to learn features in an end-to-end manner to recognize the urine particles. We treat the urine particles recognition as object detection and exploit two state-of-the-art CNN-based object detection methods, Faster R-CNN and SSD, as well as their variants for urine particles recognition. We further investigate different factors involving these CNN-based object detection methods for urine particles recognition. We comprehensively evaluate these methods on a dataset consisting of 5,376 annotated images corresponding to 7 categories of urine particles, i.e., erythrocyte, leukocyte, epithelial cell, crystal, cast, mycete, epithelial nuclei, and obtain a best mAP (mean average precision) of 84.1% while taking only 72 ms per image on a NVIDIA Titan X GPU.

* The manuscript has been submitted to Journal of Medical Systems on Jul 02. 2017

Via

Access Paper or Ask Questions