Picture for Haiying Shen

Haiying Shen

HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference

Add code
Feb 05, 2025
Viaarxiv icon

Towards Efficient Large Multimodal Model Serving

Add code
Feb 02, 2025
Viaarxiv icon

Ensuring Fair LLM Serving Amid Diverse Applications

Add code
Nov 24, 2024
Figure 1 for Ensuring Fair LLM Serving Amid Diverse Applications
Figure 2 for Ensuring Fair LLM Serving Amid Diverse Applications
Figure 3 for Ensuring Fair LLM Serving Amid Diverse Applications
Figure 4 for Ensuring Fair LLM Serving Amid Diverse Applications
Viaarxiv icon

Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference

Add code
Aug 07, 2024
Viaarxiv icon