Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

Dec 11, 2023

Anke Tang, Li Shen, Yong Luo, Liang Ding, Han Hu, Bo Du, Dacheng Tao

Figure 1 for Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

Figure 2 for Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

Figure 3 for Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

Figure 4 for Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

Share this with someone who'll enjoy it:

Abstract:Merging models fine-tuned from a common, extensively pre-trained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multi-task model that performs well across diverse tasks. Recent research, exemplified by task arithmetic, highlights that this multi-task model can be derived through arithmetic operations on task vectors. Nevertheless, current merging techniques frequently resolve potential conflicts among parameters from task-specific models by evaluating individual attributes, such as the parameters' magnitude or sign, overlooking their collective impact on the overall functionality of the model. In this work, we propose the CONtinuous relaxation of disCRETE (Concrete) subspace learning method to identify a common low-dimensional subspace and utilize its shared information to track the interference problem without sacrificing much performance. Specifically, we model the problem as a bi-level optimization problem and introduce a meta-learning framework to find the Concrete subspace mask through gradient-based techniques. At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model. We conduct extensive experiments on both vision domain and language domain, and the results demonstrate the effectiveness of our method. The code is available at https://github.com/tanganke/subspace_fusion

View paper on

Share this with someone who'll enjoy it:

Title:Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

Paper and Code