Picture for Gyouk Chu

Gyouk Chu

Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models

Add code
Feb 18, 2025
Viaarxiv icon