Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ziqi Yuan

Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage

Jun 06, 2025

Ziqi Yuan, Haoyang Zhang, Yirui Eric Zhou, Apoorve Mohan, I-Hsin Chung, Seetharami Seelam, Jian Huang

Figure 1 for Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage

Figure 2 for Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage

Figure 3 for Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage

Figure 4 for Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage

Abstract:We present the design and implementation of a new lifetime-aware tensor offloading framework for GPU memory expansion using low-cost PCIe-based solid-state drives (SSDs). Our framework, TERAIO, is developed explicitly for large language model (LLM) training with multiple GPUs and multiple SSDs. Its design is driven by our observation that the active tensors take only a small fraction (1.7% on average) of allocated GPU memory in each LLM training iteration, the inactive tensors are usually large and will not be used for a long period of time, creating ample opportunities for offloading/prefetching tensors to/from slow SSDs without stalling the GPU training process. TERAIO accurately estimates the lifetime (active period of time in GPU memory) of each tensor with the profiling of the first few iterations in the training process. With the tensor lifetime analysis, TERAIO will generate an optimized tensor offloading/prefetching plan and integrate it into the compiled LLM program via PyTorch. TERAIO has a runtime tensor migration engine to execute the offloading/prefetching plan via GPUDirect storage, which allows direct tensor migration between GPUs and SSDs for alleviating the CPU bottleneck and maximizing the SSD bandwidth utilization. In comparison with state-of-the-art studies such as ZeRO-Offload and ZeRO-Infinity, we show that TERAIO improves the training performance of various LLMs by 1.47x on average, and achieves 80.7% of the ideal performance assuming unlimited GPU memory.

Via

Access Paper or Ask Questions

Decomposing weather forecasting into advection and convection with neural networks

May 10, 2024

Mengxuan Chen, Ziqi Yuan, Jinxiao Zhang, Runmin Dong, Haohuan Fu

Abstract:Operational weather forecasting models have advanced for decades on both the explicit numerical solvers and the empirical physical parameterization schemes. However, the involved high computational costs and uncertainties in these existing schemes are requiring potential improvements through alternative machine learning methods. Previous works use a unified model to learn the dynamics and physics of the atmospheric model. Contrarily, we propose a simple yet effective machine learning model that learns the horizontal movement in the dynamical core and vertical movement in the physical parameterization separately. By replacing the advection with a graph attention network and the convection with a multi-layer perceptron, our model provides a new and efficient perspective to simulate the transition of variables in atmospheric models. We also assess the model's performance over a 5-day iterative forecasting. Under the same input variables and training methods, our model outperforms existing data-driven methods with a significantly-reduced number of parameters with a resolution of 5.625 deg. Overall, this work aims to contribute to the ongoing efforts that leverage machine learning techniques for improving both the accuracy and efficiency of global weather forecasting.

Via

Access Paper or Ask Questions

PhoGAD: Graph-based Anomaly Behavior Detection with Persistent Homology Optimization

Jan 19, 2024

Ziqi Yuan, Haoyi Zhou, Tianyu Chen, Jianxin Li

Figure 1 for PhoGAD: Graph-based Anomaly Behavior Detection with Persistent Homology Optimization

Figure 2 for PhoGAD: Graph-based Anomaly Behavior Detection with Persistent Homology Optimization

Figure 3 for PhoGAD: Graph-based Anomaly Behavior Detection with Persistent Homology Optimization

Figure 4 for PhoGAD: Graph-based Anomaly Behavior Detection with Persistent Homology Optimization

Abstract:A multitude of toxic online behaviors, ranging from network attacks to anonymous traffic and spam, have severely disrupted the smooth operation of networks. Due to the inherent sender-receiver nature of network behaviors, graph-based frameworks are commonly used for detecting anomalous behaviors. However, in real-world scenarios, the boundary between normal and anomalous behaviors tends to be ambiguous. The local heterophily of graphs interferes with the detection, and existing methods based on nodes or edges introduce unwanted noise into representation results, thereby impacting the effectiveness of detection. To address these issues, we propose PhoGAD, a graph-based anomaly detection framework. PhoGAD leverages persistent homology optimization to clarify behavioral boundaries. Building upon this, the weights of adjacent edges are designed to mitigate the effects of local heterophily. Subsequently, to tackle the noise problem, we conduct a formal analysis and propose a disentangled representation-based explicit embedding method, ultimately achieving anomaly behavior detection. Experiments on intrusion, traffic, and spam datasets verify that PhoGAD has surpassed the performance of state-of-the-art (SOTA) frameworks in detection efficacy. Notably, PhoGAD demonstrates robust detection even with diminished anomaly proportions, highlighting its applicability to real-world scenarios. The analysis of persistent homology demonstrates its effectiveness in capturing the topological structure formed by normal edge features. Additionally, ablation experiments validate the effectiveness of the innovative mechanisms integrated within PhoGAD.

* Accepted by WSDM 2024

Via

Access Paper or Ask Questions

M-SENA: An Integrated Platform for Multimodal Sentiment Analysis

Mar 23, 2022

Huisheng Mao, Ziqi Yuan, Hua Xu, Wenmeng Yu, Yihe Liu, Kai Gao

Figure 1 for M-SENA: An Integrated Platform for Multimodal Sentiment Analysis

Figure 2 for M-SENA: An Integrated Platform for Multimodal Sentiment Analysis

Figure 3 for M-SENA: An Integrated Platform for Multimodal Sentiment Analysis

Figure 4 for M-SENA: An Integrated Platform for Multimodal Sentiment Analysis

Abstract:M-SENA is an open-sourced platform for Multimodal Sentiment Analysis. It aims to facilitate advanced research by providing flexible toolkits, reliable benchmarks, and intuitive demonstrations. The platform features a fully modular video sentiment analysis framework consisting of data management, feature extraction, model training, and result analysis modules. In this paper, we first illustrate the overall architecture of the M-SENA platform and then introduce features of the core modules. Reliable baseline results of different modality features and MSA benchmarks are also reported. Moreover, we use model evaluation and analysis tools provided by M-SENA to present intermediate representation visualization, on-the-fly instance test, and generalization ability test results. The source code of the platform is publicly available at https://github.com/thuiar/M-SENA.

* 11 pages, 4 figures, to be published in ACL 2022 System Demonstration Track

Via

Access Paper or Ask Questions

Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis

Feb 09, 2021

Wenmeng Yu, Hua Xu, Ziqi Yuan, Jiele Wu

Figure 1 for Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis

Figure 2 for Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis

Figure 3 for Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis

Figure 4 for Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis

Abstract:Representation Learning is a significant and challenging task in multimodal learning. Effective modality representations should contain two parts of characteristics: the consistency and the difference. Due to the unified multimodal annotation, existing methods are restricted in capturing differentiated information. However, additional uni-modal annotations are high time- and labor-cost. In this paper, we design a label generation module based on the self-supervised learning strategy to acquire independent unimodal supervisions. Then, joint training the multi-modal and uni-modal tasks to learn the consistency and difference, respectively. Moreover, during the training stage, we design a weight-adjustment strategy to balance the learning progress among different subtasks. That is to guide the subtasks to focus on samples with a larger difference between modality supervisions. Last, we conduct extensive experiments on three public multimodal baseline datasets. The experimental results validate the reliability and stability of auto-generated unimodal supervisions. On MOSI and MOSEI datasets, our method surpasses the current state-of-the-art methods. On the SIMS dataset, our method achieves comparable performance than human-annotated unimodal labels. The full codes are available at https://github.com/thuiar/Self-MM.

* Accepted by AAAI2021

Via

Access Paper or Ask Questions