Picture for Jiangang Kong

Jiangang Kong

InstCache: A Predictive Cache for LLM Serving

Add code
Nov 21, 2024
Viaarxiv icon

CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers

Add code
Apr 10, 2024
Viaarxiv icon