Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors

Jun 14, 2024

Siyuan Chen, Zelong Guan, Yudong Liu, Phillip B. Gibbons

Figure 1 for Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors

Figure 2 for Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors

Figure 3 for Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors

Figure 4 for Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors

Share this with someone who'll enjoy it:

Abstract:Fine-tuning large language models (LLMs) requires significant memory, often exceeding the capacity of a single GPU. A common solution to this memory challenge is offloading compute and data from the GPU to the CPU. However, this approach is hampered by the limited bandwidth of commodity hardware, which constrains communication between the CPU and GPU. In this paper, we present an offloading framework, LSP_Offload, that enables near-native speed LLM fine-tuning on commodity hardware through learned subspace projectors. Our data-driven approach involves learning an efficient sparse compressor that minimizes communication with minimal precision loss. Additionally, we introduce a novel layer-wise communication schedule to maximize parallelism between communication and computation. As a result, our framework can fine-tune a 1.3 billion parameter model on a 4GB laptop GPU and a 7 billion parameter model on an NVIDIA RTX 4090 GPU with 24GB memory, achieving only a 31% slowdown compared to fine-tuning with unlimited memory. Compared to state-of-the-art offloading frameworks, our approach increases fine-tuning throughput by up to 3.33 times and reduces end-to-end fine-tuning time by 33.1%~62.5% when converging to the same accuracy.

View paper on

Share this with someone who'll enjoy it:

Title:Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors

Paper and Code