Picture for Jiangang Kong

Jiangang Kong

InstCache: A Predictive Cache for LLM Serving

Add code
Nov 21, 2024
Figure 1 for InstCache: A Predictive Cache for LLM Serving
Figure 2 for InstCache: A Predictive Cache for LLM Serving
Figure 3 for InstCache: A Predictive Cache for LLM Serving
Figure 4 for InstCache: A Predictive Cache for LLM Serving
Viaarxiv icon

CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers

Add code
Apr 10, 2024
Viaarxiv icon