Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale

Oct 02, 2023

Markus Frohmann, Carolin Holtermann, Shahed Masoudian, Anne Lauscher, Navid Rekabsaz

Figure 1 for ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale

Figure 2 for ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale

Figure 3 for ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale

Figure 4 for ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale

Share this with someone who'll enjoy it:

Abstract:Multi-task learning (MTL) has shown considerable practical benefits, particularly when using pre-trained language models (PLMs). While this is commonly achieved by simultaneously learning $n$ tasks under a joint optimization procedure, recent methods such as AdapterFusion structure the problem into two distinct stages: (i) task learning, where knowledge specific to a task is encapsulated within sets of parameters (\eg adapters), and (ii) transfer, where this already learned knowledge is leveraged for a target task. This separation of concerns provides numerous benefits, such as promoting reusability, and addressing cases involving data privacy and societal concerns; on the flip side, current two-stage MTL methods come with the cost of introducing a substantial number of additional parameters. In this work, we address this issue by leveraging the usefulness of linearly scaling the output representations of source adapters for transfer learning. We introduce ScaLearn, a simple and highly parameter-efficient two-stage MTL method that capitalizes on the knowledge of the source tasks by learning a minimal set of scaling parameters that enable effective knowledge transfer to a target task. Our experiments on three benchmarks (GLUE, SuperGLUE, and HumSet) show that our ScaLearn, in addition to facilitating the benefits of two-stage MTL, consistently outperforms strong baselines with only a small number of transfer parameters - roughly 0.35% of those of AdapterFusion. Remarkably, we observe that ScaLearn maintains its strong abilities even when further reducing parameters through uniform scaling and layer-sharing, achieving similarly competitive results with only $8$ transfer parameters for each target task. Our proposed approach thus demonstrates the power of simple scaling as a promise for more efficient task transfer.

View paper on

Share this with someone who'll enjoy it:

Title:ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale

Paper and Code