Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

Oct 10, 2024

Tianyi Bai, Ling Yang, Zhen Hao Wong, Jiahui Peng, Xinlin Zhuang, Chi Zhang, Lijun Wu, Qiu Jiantao, Wentao Zhang, Binhang Yuan(+1 more)

Figure 1 for Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

Figure 2 for Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

Figure 3 for Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

Figure 4 for Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

Share this with someone who'll enjoy it:

Abstract:Efficient data selection is crucial to accelerate the pretraining of large language models (LLMs). While various methods have been proposed to enhance data efficiency, limited research has addressed the inherent conflicts between these approaches to achieve optimal data selection for LLM pretraining. To tackle this problem, we propose a novel multi-agent collaborative data selection mechanism. In this framework, each data selection method serves as an independent agent, and an agent console is designed to dynamically integrate the information from all agents throughout the LLM training process. We conduct extensive empirical studies to evaluate our multi-agent framework. The experimental results demonstrate that our approach significantly improves data efficiency, accelerates convergence in LLM training, and achieves an average performance gain of 10.5% across multiple language model benchmarks compared to the state-of-the-art methods.

View paper on

Share this with someone who'll enjoy it:

Title:Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

Paper and Code