Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

Jun 16, 2024

Weiyao Luo, Suncong Zheng, Heming Xia, Weikang Wang, Yan Lei, Tianyu Liu, Shuang Chen, Zhifang Sui

Figure 1 for Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

Figure 2 for Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

Figure 3 for Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

Figure 4 for Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

Share this with someone who'll enjoy it:

Abstract:Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts due to they discard some information to reduce computational overhead. In this work, we propose a simple yet effective method to enable LLMs to take a deep breath, encouraging them to summarize information contained within discrete text chunks. Specifically, we segment the text into multiple chunks and insert special token <SR> at the end of each chunk. We then modify the attention mask to integrate the chunk's information into the corresponding <SR> token. This facilitates LLMs to interpret information not only from historical individual tokens but also from the <SR> token, aggregating the chunk's semantic information. Experiments on language modeling and out-of-domain downstream tasks validate the superiority of our approach.

View paper on

Share this with someone who'll enjoy it:

Title:Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

Paper and Code