Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Supervised Fine-Tuning: An Activation Pattern Optimization Process for Attention Heads

Sep 24, 2024

Yang Zhao, Li Du, Xiao Ding, Kai Xiong, Ting Liu, Bing Qin

Figure 1 for Supervised Fine-Tuning: An Activation Pattern Optimization Process for Attention Heads

Figure 2 for Supervised Fine-Tuning: An Activation Pattern Optimization Process for Attention Heads

Figure 3 for Supervised Fine-Tuning: An Activation Pattern Optimization Process for Attention Heads

Figure 4 for Supervised Fine-Tuning: An Activation Pattern Optimization Process for Attention Heads

Share this with someone who'll enjoy it:

Abstract:Though demonstrating promising potential, LLMs' performance on complex tasks, such as advanced mathematics and complex disease diagnosis is still unsatisfactory. A key issue is the present LLMs learn in a data-driven schema, while the instruction dataset about these complex tasks is both scarce and hard to collect or construct. On the contrary, a prominent phenomenon is that LLMs can learn rather fast on those simpler tasks with adequate prior knowledge captured during pretraining stage. Thus, if the prerequisite and mechanism of such rapid generalization could be elucidated, it could be highly beneficial in enhancing the efficiency and effectiveness of the LLM's ability to learn complex tasks. Thus, in this paper, we employ a gradient-based method, to dissect the process that the SFT process adapts LLMs to downstream tasks via the perspective of attention patterns. We find that: (1) LLMs selectively activate task-specific attention heads during SFT; (2) activation patterns for complex tasks are combinations of basic task patterns; and (3) changes in a few parameters can significantly impact activation patterns after SFT on a small number of samples. Based on these insights, we conduct experiments to examine whether these conclusions could effectively enhance the efficiency and effectiveness of SFT, particularly in handling complex tasks and when instructional resources are scarce. Our research not only uncovers the underlying reasons behind LLMs' rapid learning and generalization mechanisms but also provides practical solutions for addressing data challenges in complex and specialized tasks.

* in review

View paper on

Share this with someone who'll enjoy it:

Title:Supervised Fine-Tuning: An Activation Pattern Optimization Process for Attention Heads

Paper and Code