Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huiyi Wang

Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning

Mar 27, 2024

Huiyi Wang, Haodong Lu, Lina Yao, Dong Gong

Figure 1 for Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning

Figure 2 for Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning

Figure 3 for Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning

Figure 4 for Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning

Abstract:Continual learning aims to learn from a stream of continuously arriving data with minimum forgetting of previously learned knowledge. While previous works have explored the effectiveness of leveraging the generalizable knowledge from pre-trained models in continual learning, existing parameter-efficient fine-tuning approaches focus on the use of a predetermined or task-wise set of adapters or prompts. However, these approaches still suffer from forgetting due to task interference on jointly used parameters or restricted flexibility. The reliance on a static model architecture may lead to the allocation of excessive parameters that are not essential or, conversely, inadequate adaptation for downstream tasks, given that the scale and distribution of incoming data are unpredictable in continual learning. We propose Self-Expansion of pre-trained models with Modularized Adaptation (SEMA), a novel fine-tuning approach which automatically decides to reuse or add adapter modules on demand in continual learning, depending on whether drastic distribution shift that could not be handled by existing modules is detected at different representation levels. We design each adapter module to consist of an adapter and a representation descriptor, specifically, implemented as an autoencoder. The representation descriptor functions as a distributional shift indicator during training and triggers adapter expansion. For better usage of the adapters, an expandable weighting router is learned jointly for mixture of adapter outputs. By comparing with vision-transformer-based continual learning adaptation methods, we demonstrate that the proposed framework outperforms the state-of-the-art without memory rehearsal.

Via

Access Paper or Ask Questions