Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

Add code
Jan 11, 2025

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: