Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wang Qun

Xmodel-2 Technical Report

Dec 27, 2024

Wang Qun, Liu Yang, Lin Qingquan, Qu Zhijiu, Jiang Ling

Abstract:Xmodel-2 is a 1.2-billion-parameter large language model designed specifically for reasoning tasks. Its architecture enables different model scales to share a unified set of hyperparameters, allowing for extensive experimentation on smaller models and seamless transfer of optimal configurations to larger models. To maximize training efficiency and stability, Xmodel-2 employs the WSD learning rate scheduler from MiniCPM. Pretrained on 1.5 trillion tokens from diverse sources, Xmodel-2 achieves state-of-the-art performance in complex reasoning and agent-based tasks, while maintaining low training costs. These results highlight the potential of efficient model design and training strategies in advancing reasoning capabilities. Model checkpoints and code are publicly available on GitHub at https://github.com/XiaoduoAILab/Xmodel-2

Via

Access Paper or Ask Questions

Xmodel-1.5: An 1B-scale Multilingual LLM

Nov 15, 2024

Wang Qun, Liu Yang, Lin Qingquan, Jiang Ling

Figure 1 for Xmodel-1.5: An 1B-scale Multilingual LLM

Figure 2 for Xmodel-1.5: An 1B-scale Multilingual LLM

Figure 3 for Xmodel-1.5: An 1B-scale Multilingual LLM

Figure 4 for Xmodel-1.5: An 1B-scale Multilingual LLM

Abstract:We introduce Xmodel-1.5, a novel 1-billion-parameter multilingual large model pretrained on approximately 2 trillion tokens. The model demonstrates strong performance across several languages, with particularly notable results in Thai, Arabic, and French, alongside its effectiveness in Chinese and English. In addition, we contribute to the research community by releasing a Thai evaluation dataset, which includes hundreds of questions annotated by students from Chulalongkorn University's School of Integrated Innovation. While the results are promising, we acknowledge that there is still room for improvement. We hope this work advances ongoing efforts in multilingual AI research and promotes better cross-linguistic understanding in various natural language processing tasks. Our models and code are publicly available on GitHub at https://github.com/XiaoduoAILab/XmodelLM.

Via

Access Paper or Ask Questions