Picture for Yongwei Wu

Yongwei Wu

From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation

Add code
Jan 19, 2026
Viaarxiv icon

Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning

Add code
Nov 18, 2025
Figure 1 for Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning
Figure 2 for Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning
Figure 3 for Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning
Figure 4 for Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning
Viaarxiv icon

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

Add code
Jul 02, 2024
Figure 1 for Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Figure 2 for Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Figure 3 for Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Figure 4 for Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Viaarxiv icon

Efficient and Economic Large Language Model Inference with Attention Offloading

Add code
May 03, 2024
Figure 1 for Efficient and Economic Large Language Model Inference with Attention Offloading
Figure 2 for Efficient and Economic Large Language Model Inference with Attention Offloading
Figure 3 for Efficient and Economic Large Language Model Inference with Attention Offloading
Figure 4 for Efficient and Economic Large Language Model Inference with Attention Offloading
Viaarxiv icon