Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ziquan Zhu

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Jan 12, 2025

Tianjin Huang, Ziquan Zhu, Gaojie Jin, Lu Liu, Zhangyang Wang, Shiwei Liu

Figure 1 for SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Figure 2 for SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Figure 3 for SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Figure 4 for SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Abstract:Large Language Models (LLMs) have demonstrated exceptional performance across diverse tasks, yet their training remains highly resource-intensive and susceptible to critical challenges such as training instability. A predominant source of this instability stems from gradient and loss spikes, which disrupt the learning process, often leading to costly interventions like checkpoint recovery and experiment restarts, further amplifying inefficiencies. This paper presents a comprehensive investigation into gradient spikes observed during LLM training, revealing their prevalence across multiple architectures and datasets. Our analysis shows that these spikes can be up to $1000\times$ larger than typical gradients, substantially deteriorating model performance. To address this issue, we propose Spike-Aware Adam with Momentum Reset SPAM, a novel optimizer designed to counteract gradient spikes through momentum reset and spike-aware gradient clipping. Extensive experiments, including both pre-training and fine-tuning, demonstrate that SPAM consistently surpasses Adam and its variants across various tasks, including (1) LLM pre-training from 60M to 1B, (2) 4-bit LLM pre-training,(3) reinforcement learning, and (4) Time Series Forecasting. Additionally, SPAM facilitates memory-efficient training by enabling sparse momentum, where only a subset of momentum terms are maintained and updated. When operating under memory constraints, SPAM outperforms state-of-the-art memory-efficient optimizers such as GaLore and Adam-Mini. Our work underscores the importance of mitigating gradient spikes in LLM training and introduces an effective optimization strategy that enhances both training stability and resource efficiency at scale. Code is available at https://github.com/TianjinYellow/SPAM-Optimizer.git

Via

Access Paper or Ask Questions

TBDLNet: a network for classifying multidrug-resistant and drug-sensitive tuberculosis

Oct 27, 2023

Ziquan Zhu, Jing Tao, Shuihua Wang, Xin Zhang, Yudong Zhang

Figure 1 for TBDLNet: a network for classifying multidrug-resistant and drug-sensitive tuberculosis

Figure 2 for TBDLNet: a network for classifying multidrug-resistant and drug-sensitive tuberculosis

Figure 3 for TBDLNet: a network for classifying multidrug-resistant and drug-sensitive tuberculosis

Figure 4 for TBDLNet: a network for classifying multidrug-resistant and drug-sensitive tuberculosis

Abstract:This paper proposes applying a novel deep-learning model, TBDLNet, to recognize CT images to classify multidrug-resistant and drug-sensitive tuberculosis automatically. The pre-trained ResNet50 is selected to extract features. Three randomized neural networks are used to alleviate the overfitting problem. The ensemble of three RNNs is applied to boost the robustness via majority voting. The proposed model is evaluated by five-fold cross-validation. Five indexes are selected in this paper, which are accuracy, sensitivity, precision, F1-score, and specificity. The TBDLNet achieves 0.9822 accuracy, 0.9815 specificity, 0.9823 precision, 0.9829 sensitivity, and 0.9826 F1-score, respectively. The TBDLNet is suitable for classifying multidrug-resistant tuberculosis and drug-sensitive tuberculosis. It can detect multidrug-resistant pulmonary tuberculosis as early as possible, which helps to adjust the treatment plan in time and improve the treatment effect.

Via

Access Paper or Ask Questions