Picture for Michał Krutul

Michał Krutul

Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient

Add code
Feb 07, 2025
Figure 1 for Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient
Figure 2 for Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient
Figure 3 for Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient
Figure 4 for Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient
Viaarxiv icon

Scaling Laws for Fine-Grained Mixture of Experts

Add code
Feb 12, 2024
Figure 1 for Scaling Laws for Fine-Grained Mixture of Experts
Figure 2 for Scaling Laws for Fine-Grained Mixture of Experts
Figure 3 for Scaling Laws for Fine-Grained Mixture of Experts
Figure 4 for Scaling Laws for Fine-Grained Mixture of Experts
Viaarxiv icon

Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation

Add code
Oct 24, 2023
Viaarxiv icon