Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tongping Liu

Understanding and Alleviating Memory Consumption in RLHF for LLMs

Oct 21, 2024

Jin Zhou, Hanmei Yang, Steven, Tang, Mingcan Xiang, Hui Guan, Tongping Liu

Figure 1 for Understanding and Alleviating Memory Consumption in RLHF for LLMs

Figure 2 for Understanding and Alleviating Memory Consumption in RLHF for LLMs

Figure 3 for Understanding and Alleviating Memory Consumption in RLHF for LLMs

Abstract:Fine-tuning with Reinforcement Learning with Human Feedback (RLHF) is essential for aligning large language models (LLMs). However, RLHF often encounters significant memory challenges. This study is the first to examine memory usage in the RLHF context, exploring various memory management strategies and unveiling the reasons behind excessive memory consumption. Additionally, we introduce a simple yet effective approach that substantially reduces the memory required for RLHF fine-tuning.

Via

Access Paper or Ask Questions

AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

Aug 07, 2024

Mingcan Xiang, Steven Jiaxun Tang, Qizheng Yang, Hui Guan, Tongping Liu

Abstract:In the domain of multimedia and multimodal processing, the efficient handling of diverse data streams such as images, video, and sensor data is paramount. Model compression and multitask learning (MTL) are crucial in this field, offering the potential to address the resource-intensive demands of processing and interpreting multiple forms of media simultaneously. However, effectively compressing a multitask model presents significant challenges due to the complexities of balancing sparsity allocation and accuracy performance across multiple tasks. To tackle these challenges, we propose AdapMTL, an adaptive pruning framework for MTL models. AdapMTL leverages multiple learnable soft thresholds independently assigned to the shared backbone and the task-specific heads to capture the nuances in different components' sensitivity to pruning. During training, it co-optimizes the soft thresholds and MTL model weights to automatically determine the suitable sparsity level at each component to achieve both high task accuracy and high overall sparsity. It further incorporates an adaptive weighting mechanism that dynamically adjusts the importance of task-specific losses based on each task's robustness to pruning. We demonstrate the effectiveness of AdapMTL through comprehensive experiments on popular multitask datasets, namely NYU-v2 and Tiny-Taskonomy, with different architectures, showcasing superior performance compared to state-of-the-art pruning methods.

* 13 pages, 9 figures, Published at ACM Multimedia (ACM MM) 2024

Via

Access Paper or Ask Questions

ProTrain: Efficient LLM Training via Memory-Aware Techniques

Jun 12, 2024

Hanmei Yang, Jin Zhou, Yao Fu, Xiaoqun Wang, Ramine Roane, Hui Guan, Tongping Liu

Figure 1 for ProTrain: Efficient LLM Training via Memory-Aware Techniques

Figure 2 for ProTrain: Efficient LLM Training via Memory-Aware Techniques

Figure 3 for ProTrain: Efficient LLM Training via Memory-Aware Techniques

Figure 4 for ProTrain: Efficient LLM Training via Memory-Aware Techniques

Abstract:It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem, existing work exploits the combination of CPU and GPU for the training process, such as ZeRO-Offload. Such a technique largely democratizes billion-scale model training, making it possible to train with few consumer graphics cards. However, based on our observation, existing frameworks often provide coarse-grained memory management and require experienced experts in configuration tuning, leading to suboptimal hardware utilization and performance. This paper proposes ProTrain, a novel training system that intelligently balances memory usage and performance by coordinating memory, computation, and IO. ProTrain achieves adaptive memory management through Chunk-Based Model State Management and Block-Wise Activation Management, guided by a Memory-Aware Runtime Profiler without user intervention. ProTrain does not change the training algorithm and thus does not compromise accuracy. Experiments show that ProTrain improves training throughput by 1.43$\times$ to 2.71$\times$ compared to the SOTA training systems.

Via

Access Paper or Ask Questions