Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Dec 08, 2023

Yanxi Chen, Xuchen Pan, Yaliang Li, Bolin Ding, Jingren Zhou

Figure 1 for EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Figure 2 for EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Figure 3 for EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Figure 4 for EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Share this with someone who'll enjoy it:

Abstract:We present EE-LLM, a framework for large-scale training and inference of early-exit large language models (LLMs). While recent works have shown preliminary evidence for the efficacy of early exiting in accelerating LLM inference, EE-LLM makes a foundational step towards scaling up early-exit LLMs by supporting their training and inference with massive 3D parallelism. Built upon Megatron-LM, EE-LLM implements a variety of algorithmic innovations and performance optimizations tailored to early exiting, including a lightweight method that facilitates backpropagation for the early-exit training objective with pipeline parallelism, techniques of leveraging idle resources in the original pipeline schedule for computation related to early-exit layers, and two approaches of early-exit inference that are compatible with KV caching for autoregressive generation. Our analytical and empirical study shows that EE-LLM achieves great training efficiency with negligible computational overhead compared to standard LLM training, as well as outstanding inference speedup without compromising output quality. To facilitate further research and adoption, we release EE-LLM at https://github.com/pan-x-c/EE-LLM.

* We will continuously update the codebase and arXiv version

View paper on

Share this with someone who'll enjoy it:

Title:EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Paper and Code