Picture for Zijie Yan

Zijie Yan

Llama 3 Meets MoE: Efficient Upcycling

Add code
Dec 13, 2024
Viaarxiv icon

Upcycling Large Language Models into Mixture of Experts

Add code
Oct 10, 2024
Viaarxiv icon

Gradient Sparification for Asynchronous Distributed Training

Add code
Oct 24, 2019
Figure 1 for Gradient Sparification for Asynchronous Distributed Training
Figure 2 for Gradient Sparification for Asynchronous Distributed Training
Figure 3 for Gradient Sparification for Asynchronous Distributed Training
Figure 4 for Gradient Sparification for Asynchronous Distributed Training
Viaarxiv icon