Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Emre Onal

Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models

May 06, 2024

Emre Onal, Klemens Flöge, Emma Caldwell, Arsen Sheverdin, Vincent Fortuin

Figure 1 for Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models

Figure 2 for Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models

Figure 3 for Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models

Abstract:Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor calibration, particularly when fine-tuned on small datasets. To address these challenges, we propose a simple combination of Low-Rank Adaptation (LoRA) with Gaussian Stochastic Weight Averaging (SWAG), facilitating approximate Bayesian inference in LLMs. Through extensive testing across several Natural Language Processing (NLP) benchmarks, we demonstrate that our straightforward and computationally efficient approach improves model generalization and calibration. We further show that our method exhibits greater robustness against distribution shift, as reflected in its performance on out-of-distribution tasks.

* 14 pages, 1 figure, 2 tables

Via

Access Paper or Ask Questions

Neural Collapse in the Intermediate Hidden Layers of Classification Neural Networks

Aug 05, 2023

Liam Parker, Emre Onal, Anton Stengel, Jake Intrater

Figure 1 for Neural Collapse in the Intermediate Hidden Layers of Classification Neural Networks

Figure 2 for Neural Collapse in the Intermediate Hidden Layers of Classification Neural Networks

Figure 3 for Neural Collapse in the Intermediate Hidden Layers of Classification Neural Networks

Figure 4 for Neural Collapse in the Intermediate Hidden Layers of Classification Neural Networks

Abstract:Neural Collapse (NC) gives a precise description of the representations of classes in the final hidden layer of classification neural networks. This description provides insights into how these networks learn features and generalize well when trained past zero training error. However, to date, (NC) has only been studied in the final layer of these networks. In the present paper, we provide the first comprehensive empirical analysis of the emergence of (NC) in the intermediate hidden layers of these classifiers. We examine a variety of network architectures, activations, and datasets, and demonstrate that some degree of (NC) emerges in most of the intermediate hidden layers of the network, where the degree of collapse in any given layer is typically positively correlated with the depth of that layer in the neural network. Moreover, we remark that: (1) almost all of the reduction in intra-class variance in the samples occurs in the shallower layers of the networks, (2) the angular separation between class means increases consistently with hidden layer depth, and (3) simple datasets require only the shallower layers of the networks to fully learn them, whereas more difficult ones require the entire network. Ultimately, these results provide granular insights into the structural propagation of features through classification neural networks.

Via

Access Paper or Ask Questions