Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aleksandra I. Nowak

Towards Optimal Adapter Placement for Efficient Transfer Learning

Oct 21, 2024

Aleksandra I. Nowak, Otniel-Bogdan Mercea, Anurag Arnab, Jonas Pfeiffer, Yann Dauphin, Utku Evci

Figure 1 for Towards Optimal Adapter Placement for Efficient Transfer Learning

Figure 2 for Towards Optimal Adapter Placement for Efficient Transfer Learning

Figure 3 for Towards Optimal Adapter Placement for Efficient Transfer Learning

Figure 4 for Towards Optimal Adapter Placement for Efficient Transfer Learning

Abstract:Parameter-efficient transfer learning (PETL) aims to adapt pre-trained models to new downstream tasks while minimizing the number of fine-tuned parameters. Adapters, a popular approach in PETL, inject additional capacity into existing networks by incorporating low-rank projections, achieving performance comparable to full fine-tuning with significantly fewer parameters. This paper investigates the relationship between the placement of an adapter and its performance. We observe that adapter location within a network significantly impacts its effectiveness, and that the optimal placement is task-dependent. To exploit this observation, we introduce an extended search space of adapter connections, including long-range and recurrent adapters. We demonstrate that even randomly selected adapter placements from this expanded space yield improved results, and that high-performing placements often correlate with high gradient rank. Our findings reveal that a small number of strategically placed adapters can match or exceed the performance of the common baseline of adding adapters in every block, opening a new avenue for research into optimal adapter placement strategies.

Via

Access Paper or Ask Questions

Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse Training

Jun 21, 2023

Aleksandra I. Nowak, Bram Grooten, Decebal Constantin Mocanu, Jacek Tabor

Figure 1 for Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse Training

Figure 2 for Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse Training

Figure 3 for Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse Training

Figure 4 for Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse Training

Abstract:Dynamic Sparse Training (DST) is a rapidly evolving area of research that seeks to optimize the sparse initialization of a neural network by adapting its topology during training. It has been shown that under specific conditions, DST is able to outperform dense models. The key components of this framework are the pruning and growing criteria, which are repeatedly applied during the training process to adjust the network's sparse connectivity. While the growing criterion's impact on DST performance is relatively well studied, the influence of the pruning criterion remains overlooked. To address this issue, we design and perform an extensive empirical analysis of various pruning criteria to better understand their effect on the dynamics of DST solutions. Surprisingly, we find that most of the studied methods yield similar results. The differences become more significant in the low-density regime, where the best performance is predominantly given by the simplest technique: magnitude-based pruning. The code is provided at https://github.com/alooow/fantastic_weights_paper

Via

Access Paper or Ask Questions