Picture for Yiyuan Ma

Yiyuan Ma

FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Add code
Feb 28, 2025
Viaarxiv icon