Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun-gyu Jin

SEAL: Scaling to Emphasize Attention for Long-Context Retrieval

Jan 25, 2025

Changhun Lee, Jun-gyu Jin, Younghyun Cho, Eunhyeok Park

Figure 1 for SEAL: Scaling to Emphasize Attention for Long-Context Retrieval

Figure 2 for SEAL: Scaling to Emphasize Attention for Long-Context Retrieval

Figure 3 for SEAL: Scaling to Emphasize Attention for Long-Context Retrieval

Figure 4 for SEAL: Scaling to Emphasize Attention for Long-Context Retrieval

Abstract:In this work, we introduce a novel approach called Scaling to Emphasize Attention for Long-context retrieval (SEAL), which enhances the retrieval performance of large language models (LLMs) over extended contexts. Previous studies have shown that each attention head in LLMs has a unique functionality and collectively contributes to the overall behavior of the model. Similarly, we observe that specific heads are closely tied to long-context retrieval, showing positive or negative correlation with retrieval scores. Built on this insight, we propose a learning-based mechanism using zero-shot generated data to emphasize these heads, improving the model's performance in long-context retrieval tasks. By applying SEAL, we can achieve significant improvements in in-domain retrieval performance, including document QA tasks from LongBench, and considerable improvements in out-of-domain cases. Additionally, when combined with existing training-free context extension techniques, SEAL extends the context limits of LLMs while maintaining highly reliable outputs, opening new avenues for research in this field.

* 15 pages

Via

Access Paper or Ask Questions

QEFT: Quantization for Efficient Fine-Tuning of LLMs

Oct 11, 2024

Changhun Lee, Jun-gyu Jin, Younghyun Cho, Eunhyeok Park

Abstract:With the rapid growth in the use of fine-tuning for large language models (LLMs), optimizing fine-tuning while keeping inference efficient has become highly important. However, this is a challenging task as it requires improvements in all aspects, including inference speed, fine-tuning speed, memory consumption, and, most importantly, model quality. Previous studies have attempted to achieve this by combining quantization with fine-tuning, but they have failed to enhance all four aspects simultaneously. In this study, we propose a new lightweight technique called Quantization for Efficient Fine-Tuning (QEFT). QEFT accelerates both inference and fine-tuning, is supported by robust theoretical foundations, offers high flexibility, and maintains good hardware compatibility. Our extensive experiments demonstrate that QEFT matches the quality and versatility of full-precision parameter-efficient fine-tuning, while using fewer resources. Our code is available at https://github.com/xvyaward/qeft.

* Accepted at Findings of EMNLP 2024

Via

Access Paper or Ask Questions