Picture for Varun Yerram

Varun Yerram

HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference

Add code
Feb 14, 2024
Viaarxiv icon