Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaomin Zhuang

ChuXin: 1.6B Technical Report

May 08, 2024

Xiaomin Zhuang, Yufan Jiang, Qiaozhi He, Zhihua Wu

Figure 1 for ChuXin: 1.6B Technical Report

Figure 2 for ChuXin: 1.6B Technical Report

Figure 3 for ChuXin: 1.6B Technical Report

Abstract:In this report, we present ChuXin, an entirely open-source language model with a size of 1.6 billion parameters. Unlike the majority of works that only open-sourced the model weights and architecture, we have made everything needed to train a model available, including the training data, the training process, and the evaluation code. Our goal is to empower and strengthen the open research community, fostering transparency and enabling a new wave of innovation in the field of language modeling. Furthermore, we extend the context length to 1M tokens through lightweight continual pretraining and demonstrate strong needle-in-a-haystack retrieval performance. The weights for both models are available at Hugging Face to download and use.

* Technical Report

Via

Access Paper or Ask Questions

Code Comparison Tuning for Code Large Language Models

Mar 28, 2024

Yufan Jiang, Qiaozhi He, Xiaomin Zhuang, Zhihua Wu

Figure 1 for Code Comparison Tuning for Code Large Language Models

Figure 2 for Code Comparison Tuning for Code Large Language Models

Figure 3 for Code Comparison Tuning for Code Large Language Models

Figure 4 for Code Comparison Tuning for Code Large Language Models

Abstract:We present Code Comparison Tuning (CCT), a simple and effective tuning method for code large language models (Code LLMs) to better handle subtle code errors. Specifically, we integrate the concept of comparison into instruction tuning, both at the token and sequence levels, enabling the model to discern even the slightest deviations in code. To compare the original code with an erroneous version containing manually added code errors, we use token-level preference loss for detailed token-level comparisons. Additionally, we combine code segments to create a new instruction tuning sample for sequence-level comparisons, enhancing the model's bug-fixing capability. Experimental results on the HumanEvalFix benchmark show that CCT surpasses instruction tuning in pass@1 scores by up to 4 points across diverse code LLMs, and extensive analysis demonstrates the effectiveness of our method.

* Preprint

Via

Access Paper or Ask Questions

RecycleGPT: An Autoregressive Language Model with Recyclable Module

Aug 08, 2023

Yufan Jiang, Qiaozhi He, Xiaomin Zhuang, Zhihua Wu, Kunpeng Wang, Wenlai Zhao, Guangwen Yang

Abstract:Existing large language models have to run K times to generate a sequence of K tokens. In this paper, we present RecycleGPT, a generative language model with fast decoding speed by recycling pre-generated model states without running the whole model in multiple steps. Our approach relies on the observation that adjacent tokens in a sequence usually have strong correlations and the next token in a sequence can be reasonably guessed or inferred based on the preceding ones. Experiments and analysis demonstrate the effectiveness of our approach in lowering inference latency, achieving up to 1.4x speedup while preserving high performance.

* Technical Report

Via

Access Paper or Ask Questions