Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

Sep 11, 2023

Zhengxiang Shi, Aldo Lipani

Figure 1 for DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

Figure 2 for DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

Figure 3 for DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

Figure 4 for DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

Share this with someone who'll enjoy it:

Abstract:Prompt tuning (PT), where a small amount of trainable soft (continuous) prompt vectors is affixed to the input of language models (LM), has shown promising results across various tasks and models for parameter-efficient fine-tuning (PEFT). PT stands out from other PEFT approaches because it maintains competitive performance with fewer trainable parameters and does not drastically scale up its parameters as the model size expands. However, PT introduces additional soft prompt tokens, leading to longer input sequences, which significantly impacts training and inference time and memory usage due to the Transformer's quadratic complexity. Particularly concerning for Large Language Models (LLMs) that face heavy daily querying. To address this issue, we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates. This allows DePT to achieve better performance while saving over 20% memory and time costs compared to vanilla PT and its variants, without changing trainable parameter sizes. Through extensive experiments on 23 natural language processing (NLP) and vision-language (VL) tasks, we demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline in some scenarios. Additionally, we empirically show that DEPT grows more efficient as the model size increases. Our further study reveals that DePT integrates seamlessly with parameter-efficient transfer learning in the few-shot learning setting and highlights its adaptability to various model architectures and sizes.

* Code is available at https://github.com/ZhengxiangShi/DePT

View paper on

Share this with someone who'll enjoy it:

Title:DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

Paper and Code