Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models

Jun 24, 2024

Rishabh Maheshwary, Vikas Yadav, Hoang Nguyen, Khyati Mahajan, Sathwik Tejaswi Madhusudhan

Figure 1 for M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models

Figure 2 for M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models

Figure 3 for M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models

Figure 4 for M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models

Share this with someone who'll enjoy it:

Abstract:Instruction finetuning (IFT) is critical for aligning Large Language Models (LLMs) to follow instructions. Numerous effective IFT datasets have been proposed in the recent past, but most focus on high resource languages such as English. In this work, we propose a fully synthetic, novel taxonomy (Evol) guided Multilingual, Multi-turn instruction finetuning dataset, called M2Lingual, to better align LLMs on a diverse set of languages and tasks. M2Lingual contains a total of 182K IFT pairs that are built upon diverse seeds, covering 70 languages, 17 NLP tasks and general instruction-response pairs. LLMs finetuned with M2Lingual substantially outperform the majority of existing multilingual IFT datasets. Importantly, LLMs trained with M2Lingual consistently achieve competitive results across a wide variety of evaluation benchmarks compared to existing multilingual IFT datasets. Specifically, LLMs finetuned with M2Lingual achieve strong performance on our translated multilingual, multi-turn evaluation benchmark as well as a wide variety of multilingual tasks. Thus we contribute, and the 2 step Evol taxonomy used for its creation. M2Lingual repository - https://huggingface.co/datasets/ServiceNow-AI/M2Lingual

* 39 pages

View paper on

Share this with someone who'll enjoy it:

Title:M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models

Paper and Code