Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

Dec 04, 2023

Shaohua Wu, Xudong Zhao, Shenling Wang, Jiangang Luo, Lingjun Li, Xi Chen, Bing Zhao, Wei Wang, Tong Yu, Rongguo Zhang(+2 more)

Figure 1 for YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

Figure 2 for YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

Figure 3 for YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

Figure 4 for YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

Share this with someone who'll enjoy it:

Abstract:In this work, we develop and release Yuan 2.0, a series of large language models with parameters ranging from 2.1 billion to 102.6 billion. The Localized Filtering-based Attention (LFA) is introduced to incorporate prior knowledge of local dependencies of natural language into Attention. A data filtering and generating system is presented to build pre-training and fine-tuning dataset in high quality. A distributed training method with non-uniform pipeline parallel, data parallel, and optimizer parallel is proposed, which greatly reduces the bandwidth requirements of intra-node communication, and achieves good performance in large-scale distributed training. Yuan 2.0 models display impressive ability in code generation, math problem-solving, and chatting compared with existing models. The latest version of YUAN 2.0, including model weights and source code, is accessible at Github.

View paper on

Share this with someone who'll enjoy it:

Title:YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

Paper and Code