Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

Add code
Jan 11, 2025
Figure 1 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Figure 2 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Figure 3 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Figure 4 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: