Picture for Xingkun Yang

Xingkun Yang

RelayGR: Scaling Long-Sequence Generative Recommendation via Cross-Stage Relay-Race Inference

Add code
Jan 05, 2026
Viaarxiv icon

Serving Large Language Models on Huawei CloudMatrix384

Add code
Jun 15, 2025
Figure 1 for Serving Large Language Models on Huawei CloudMatrix384
Figure 2 for Serving Large Language Models on Huawei CloudMatrix384
Figure 3 for Serving Large Language Models on Huawei CloudMatrix384
Figure 4 for Serving Large Language Models on Huawei CloudMatrix384
Viaarxiv icon

AttentionStore: Cost-effective Attention Reuse across Multi-turn Conversations in Large Language Model Serving

Add code
Mar 23, 2024
Viaarxiv icon