Picture for Zijie Yan

Zijie Yan

Llama 3 Meets MoE: Efficient Upcycling

Add code
Dec 13, 2024
Figure 1 for Llama 3 Meets MoE: Efficient Upcycling
Figure 2 for Llama 3 Meets MoE: Efficient Upcycling
Figure 3 for Llama 3 Meets MoE: Efficient Upcycling
Figure 4 for Llama 3 Meets MoE: Efficient Upcycling
Viaarxiv icon

Upcycling Large Language Models into Mixture of Experts

Add code
Oct 10, 2024
Figure 1 for Upcycling Large Language Models into Mixture of Experts
Figure 2 for Upcycling Large Language Models into Mixture of Experts
Figure 3 for Upcycling Large Language Models into Mixture of Experts
Figure 4 for Upcycling Large Language Models into Mixture of Experts
Viaarxiv icon

Gradient Sparification for Asynchronous Distributed Training

Add code
Oct 24, 2019
Figure 1 for Gradient Sparification for Asynchronous Distributed Training
Figure 2 for Gradient Sparification for Asynchronous Distributed Training
Figure 3 for Gradient Sparification for Asynchronous Distributed Training
Figure 4 for Gradient Sparification for Asynchronous Distributed Training
Viaarxiv icon