Picture for Pengfei Zuo

Pengfei Zuo

Serving Large Language Models on Huawei CloudMatrix384

Add code
Jun 15, 2025
Viaarxiv icon

Efficient Unified Caching for Accelerating Heterogeneous AI Workloads

Add code
Jun 14, 2025
Viaarxiv icon

Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation

Add code
Mar 26, 2025
Viaarxiv icon

AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference

Add code
Jan 04, 2025
Figure 1 for AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Figure 2 for AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Figure 3 for AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Figure 4 for AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Viaarxiv icon

AttentionStore: Cost-effective Attention Reuse across Multi-turn Conversations in Large Language Model Serving

Add code
Mar 23, 2024
Viaarxiv icon

A Scalable Learned Index Scheme in Storage Systems

Add code
May 08, 2019
Figure 1 for A Scalable Learned Index Scheme in Storage Systems
Figure 2 for A Scalable Learned Index Scheme in Storage Systems
Figure 3 for A Scalable Learned Index Scheme in Storage Systems
Figure 4 for A Scalable Learned Index Scheme in Storage Systems
Viaarxiv icon