Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Feb 26, 2024

Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Xin Zhao, Furu Wei, Ji-Rong Wen

Figure 1 for Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Figure 2 for Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Figure 3 for Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Figure 4 for Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Share this with someone who'll enjoy it:

Abstract:Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. It remains a challenging problem to explain the underlying mechanisms by which LLMs process multilingual texts. In this paper, we delve into the composition of Transformer architectures in LLMs to pinpoint language-specific regions. Specially, we propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Based on LAPE, we conduct comprehensive experiments on two representative LLMs, namely LLaMA-2 and BLOOM. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons, primarily situated in the models' top and bottom layers. Furthermore, we showcase the feasibility to "steer" the output language of LLMs by selectively activating or deactivating language-specific neurons. Our research provides important evidence to the understanding and exploration of the multilingual capabilities of LLMs.

View paper on

Share this with someone who'll enjoy it:

Title:Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Paper and Code