Picture for Zhengang Wang

Zhengang Wang

Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling

Add code
Mar 07, 2025
Viaarxiv icon