Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaoqun Wang

ProTrain: Efficient LLM Training via Memory-Aware Techniques

Jun 12, 2024

Hanmei Yang, Jin Zhou, Yao Fu, Xiaoqun Wang, Ramine Roane, Hui Guan, Tongping Liu

Figure 1 for ProTrain: Efficient LLM Training via Memory-Aware Techniques

Figure 2 for ProTrain: Efficient LLM Training via Memory-Aware Techniques

Figure 3 for ProTrain: Efficient LLM Training via Memory-Aware Techniques

Figure 4 for ProTrain: Efficient LLM Training via Memory-Aware Techniques

Abstract:It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem, existing work exploits the combination of CPU and GPU for the training process, such as ZeRO-Offload. Such a technique largely democratizes billion-scale model training, making it possible to train with few consumer graphics cards. However, based on our observation, existing frameworks often provide coarse-grained memory management and require experienced experts in configuration tuning, leading to suboptimal hardware utilization and performance. This paper proposes ProTrain, a novel training system that intelligently balances memory usage and performance by coordinating memory, computation, and IO. ProTrain achieves adaptive memory management through Chunk-Based Model State Management and Block-Wise Activation Management, guided by a Memory-Aware Runtime Profiler without user intervention. ProTrain does not change the training algorithm and thus does not compromise accuracy. Experiments show that ProTrain improves training throughput by 1.43$\times$ to 2.71$\times$ compared to the SOTA training systems.

Via

Access Paper or Ask Questions

Convergence analysis of a quasi-Monte Carlo-based deep learning algorithm for solving partial differential equations

Oct 28, 2022

Fengjiang Fu, Xiaoqun Wang

Abstract:Deep learning methods have achieved great success in solving partial differential equations (PDEs), where the loss is often defined as an integral. The accuracy and efficiency of these algorithms depend greatly on the quadrature method. We propose to apply quasi-Monte Carlo (QMC) methods to the Deep Ritz Method (DRM) for solving the Neumann problems for the Poisson equation and the static Schr\"{o}dinger equation. For error estimation, we decompose the error of using the deep learning algorithm to solve PDEs into the generalization error, the approximation error and the training error. We establish the upper bounds and prove that QMC-based DRM achieves an asymptotically smaller error bound than DRM. Numerical experiments show that the proposed method converges faster in all cases and the variances of the gradient estimators of randomized QMC-based DRM are much smaller than those of DRM, which illustrates the superiority of QMC in deep learning over MC.

* 27 pages, 4 figures, 2 tables

Via

Access Paper or Ask Questions