Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Improving Domain Adaptation through Extended-Text Reading Comprehension

Jan 18, 2024

Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang(+1 more)

Figure 1 for Improving Domain Adaptation through Extended-Text Reading Comprehension

Figure 2 for Improving Domain Adaptation through Extended-Text Reading Comprehension

Figure 3 for Improving Domain Adaptation through Extended-Text Reading Comprehension

Figure 4 for Improving Domain Adaptation through Extended-Text Reading Comprehension

Share this with someone who'll enjoy it:

Abstract:To enhance the domain-specific capabilities of large language models, continued pre-training on a domain-specific corpus is a prevalent method. Recent work demonstrates that adapting models using reading comprehension data formatted by regex-based patterns can significantly improve performance on domain-specific tasks. However, regex-based patterns are incapable of parsing raw corpora using domain-specific knowledge. Furthermore, the question and answer pairs are extracted directly from the corpus in predefined formats offers limited context. To address this limitation, we improve reading comprehension via LLM and clustering. LLM focuses on leveraging domain knowledge within the corpus to refine comprehension stage, while clustering supplies relevant knowledge by extending the context to enrich reading stage. Additionally, our method incorporates parameter-efficient fine-tuning to improve the efficiency of domain adaptation. In comparison to AdaptLLM, our method achieves an improvement exceeding 5% in domain-specific tasks. Our code will available at https://github.com/microsoft/LMOps.

* Work in Progress

View paper on

Share this with someone who'll enjoy it:

Title:Improving Domain Adaptation through Extended-Text Reading Comprehension

Paper and Code