Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xuancheng Huang

Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning

Mar 15, 2024

Yongquan He, Xuancheng Huang, Minghao Tang, Lingxun Meng, Xiang Li, Wei Lin, Wenyuan Zhang, Yifu Gao

Figure 1 for Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning

Figure 2 for Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning

Figure 3 for Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning

Figure 4 for Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning

Abstract:Instruction tuning for large language models (LLMs) can drive them to produce results consistent with human goals in specific downstream tasks. However, the process of continual instruction tuning (CIT) for LLMs may bring about the catastrophic forgetting (CF) problem, where previously learned abilities are degraded. Recent methods try to alleviate the CF problem by modifying models or replaying data, which may only remember the surface-level pattern of instructions and get confused on held-out tasks. In this paper, we propose a novel continual instruction tuning method based on Key-part Information Gain (KPIG). Our method computes the information gain on masked parts to dynamically replay data and refine the training objective, which enables LLMs to capture task-aware information relevant to the correct response and alleviate overfitting to general descriptions in instructions. In addition, we propose two metrics, P-score and V-score, to measure the generalization and instruction-following abilities of LLMs. Experiments demonstrate our method achieves superior performance on both seen and held-out tasks.

* 18 pages, 4 figures

Via

Access Paper or Ask Questions

Prompt Gating: A Parameter Efficient Tuning Method for Zero-Shot Multi-Source Translation

Dec 19, 2022

Xuancheng Huang, Zijun Liu, Peng Li, Maosong Sun, Yang Liu

Abstract:Multi-source translation (MST), which typically receives multiple source sentences of the same meaning in different languages, has been shown superior to single-source translation. As the quantity of multi-source parallel data is limited, taking full advantage of single-source data and limited multi-source data to make models perform well when receiving as many as possible sources remains a challenge. Unlike previous work mostly devoted to supervised scenarios, we focus on zero-shot MST: expecting models to be able to process unseen combinations of multiple sources, e.g., unseen language combinations, during inference. We propose a simple yet effective parameter efficient method, named Prompt Gating, which appends prompts to the model inputs and attaches gates on the extended hidden states for each encoder layer. It shows strong zero-shot transferability (+9.0 BLEU points maximally) and remarkable compositionality (+15.6 BLEU points maximally) on MST, and also shows its superiorities over baselines on lexically constrained translation.

Via

Access Paper or Ask Questions

CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

Dec 27, 2021

Yuan Yao, Qingxiu Dong, Jian Guan, Boxi Cao, Zhengyan Zhang, Chaojun Xiao, Xiaozhi Wang, Fanchao Qi, Junwei Bao, Jinran Nie(+25 more)

Figure 1 for CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

Figure 2 for CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

Figure 3 for CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

Figure 4 for CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

Abstract:Realizing general-purpose language intelligence has been a longstanding goal for natural language processing, where standard evaluation benchmarks play a fundamental and guiding role. We argue that for general-purpose language intelligence evaluation, the benchmark itself needs to be comprehensive and systematic. To this end, we propose CUGE, a Chinese Language Understanding and Generation Evaluation benchmark with the following features: (1) Hierarchical benchmark framework, where datasets are principally selected and organized with a language capability-task-dataset hierarchy. (2) Multi-level scoring strategy, where different levels of model performance are provided based on the hierarchical framework. To facilitate CUGE, we provide a public leaderboard that can be customized to support flexible model judging criteria. Evaluation results on representative pre-trained language models indicate ample room for improvement towards general-purpose language intelligence. CUGE is publicly available at cuge.baai.ac.cn.

Via

Access Paper or Ask Questions

Transfer Learning for Sequence Generation: from Single-source to Multi-source

May 31, 2021

Xuancheng Huang, Jingfang Xu, Maosong Sun, Yang Liu

Figure 1 for Transfer Learning for Sequence Generation: from Single-source to Multi-source

Figure 2 for Transfer Learning for Sequence Generation: from Single-source to Multi-source

Figure 3 for Transfer Learning for Sequence Generation: from Single-source to Multi-source

Figure 4 for Transfer Learning for Sequence Generation: from Single-source to Multi-source

Abstract:Multi-source sequence generation (MSG) is an important kind of sequence generation tasks that takes multiple sources, including automatic post-editing, multi-source translation, multi-document summarization, etc. As MSG tasks suffer from the data scarcity problem and recent pretrained models have been proven to be effective for low-resource downstream tasks, transferring pretrained sequence-to-sequence models to MSG tasks is essential. Although directly finetuning pretrained models on MSG tasks and concatenating multiple sources into a single long sequence is regarded as a simple method to transfer pretrained models to MSG tasks, we conjecture that the direct finetuning method leads to catastrophic forgetting and solely relying on pretrained self-attention layers to capture cross-source information is not sufficient. Therefore, we propose a two-stage finetuning method to alleviate the pretrain-finetune discrepancy and introduce a novel MSG model with a fine encoder to learn better representations in MSG tasks. Experiments show that our approach achieves new state-of-the-art results on the WMT17 APE task and multi-source translation task using the WMT14 test set. When adapted to document-level translation, our framework outperforms strong baselines significantly.

* ACL2021 main track long paper

Via

Access Paper or Ask Questions

Neural Machine Translation: A Review of Methods, Resources, and Tools

Dec 31, 2020

Zhixing Tan, Shuo Wang, Zonghan Yang, Gang Chen, Xuancheng Huang, Maosong Sun, Yang Liu

Figure 1 for Neural Machine Translation: A Review of Methods, Resources, and Tools

Figure 2 for Neural Machine Translation: A Review of Methods, Resources, and Tools

Figure 3 for Neural Machine Translation: A Review of Methods, Resources, and Tools

Figure 4 for Neural Machine Translation: A Review of Methods, Resources, and Tools

Abstract:Machine translation (MT) is an important sub-field of natural language processing that aims to translate natural languages using computers. In recent years, end-to-end neural machine translation (NMT) has achieved great success and has become the new mainstream method in practical MT systems. In this article, we first provide a broad review of the methods for NMT and focus on methods relating to architectures, decoding, and data augmentation. Then we summarize the resources and tools that are useful for researchers. Finally, we conclude with a discussion of possible future research directions.

* Accepted by AI Open

Via

Access Paper or Ask Questions

Modeling Voting for System Combination in Machine Translation

Jul 14, 2020

Xuancheng Huang, Jiacheng Zhang, Zhixing Tan, Derek F. Wong, Huanbo Luan, Jingfang Xu, Maosong Sun, Yang Liu

Figure 1 for Modeling Voting for System Combination in Machine Translation

Figure 2 for Modeling Voting for System Combination in Machine Translation

Figure 3 for Modeling Voting for System Combination in Machine Translation

Figure 4 for Modeling Voting for System Combination in Machine Translation

Abstract:System combination is an important technique for combining the hypotheses of different machine translation systems to improve translation performance. Although early statistical approaches to system combination have been proven effective in analyzing the consensus between hypotheses, they suffer from the error propagation problem due to the use of pipelines. While this problem has been alleviated by end-to-end training of multi-source sequence-to-sequence models recently, these neural models do not explicitly analyze the relations between hypotheses and fail to capture their agreement because the attention to a word in a hypothesis is calculated independently, ignoring the fact that the word might occur in multiple hypotheses. In this work, we propose an approach to modeling voting for system combination in machine translation. The basic idea is to enable words in hypotheses from different systems to vote on words that are representative and should get involved in the generation process. This can be done by quantifying the influence of each voter and its preference for each candidate. Our approach combines the advantages of statistical and neural methods since it can not only analyze the relations between hypotheses but also allow for end-to-end training. Experiments show that our approach is capable of better taking advantage of the consensus between hypotheses and achieves significant improvements over state-of-the-art baselines on Chinese-English and English-German machine translation tasks.

Via

Access Paper or Ask Questions

Learning to Copy for Automatic Post-Editing

Nov 09, 2019

Xuancheng Huang, Yang Liu, Huanbo Luan, Jingfang Xu, Maosong Sun

Figure 1 for Learning to Copy for Automatic Post-Editing

Figure 2 for Learning to Copy for Automatic Post-Editing

Figure 3 for Learning to Copy for Automatic Post-Editing

Figure 4 for Learning to Copy for Automatic Post-Editing

Abstract:Automatic post-editing (APE), which aims to correct errors in the output of machine translation systems in a post-processing step, is an important task in natural language processing. While recent work has achieved considerable performance gains by using neural networks, how to model the copying mechanism for APE remains a challenge. In this work, we propose a new method for modeling copying for APE. To better identify translation errors, our method learns the representations of source sentences and system outputs in an interactive way. These representations are used to explicitly indicate which words in the system outputs should be copied, which is useful to help CopyNet (Gu et al., 2016) better generate post-edited translations. Experiments on the datasets of the WMT 2016-2017 APE shared tasks show that our approach outperforms all best published results.

* EMNLP 2019

Via

Access Paper or Ask Questions