Picture for Mogens Henrik From

Mogens Henrik From

FlexDeMo: Decoupled Momentum Optimization for Fully and Hybrid Sharded Training

Add code
Feb 10, 2025
Viaarxiv icon