Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinwei Geng

Towards Higher Pareto Frontier in Multilingual Machine Translation

May 25, 2023

Yichong Huang, Xiaocheng Feng, Xinwei Geng, Baohang Li, Bing Qin

Abstract:Multilingual neural machine translation has witnessed remarkable progress in recent years. However, the long-tailed distribution of multilingual corpora poses a challenge of Pareto optimization, i.e., optimizing for some languages may come at the cost of degrading the performance of others. Existing balancing training strategies are equivalent to a series of Pareto optimal solutions, which trade off on a Pareto frontier. In this work, we propose a new training framework, Pareto Mutual Distillation (Pareto-MD), towards pushing the Pareto frontier outwards rather than making trade-offs. Specifically, Pareto-MD collaboratively trains two Pareto optimal solutions that favor different languages and allows them to learn from the strengths of each other via knowledge distillation. Furthermore, we introduce a novel strategy to enable stronger communication between Pareto optimal solutions and broaden the applicability of our approach. Experimental results on the widely-used WMT and TED datasets show that our method significantly pushes the Pareto frontier and outperforms baselines by up to +2.46 BLEU.

* Accepted by ACL2023

Via

Access Paper or Ask Questions

OmniKnight: Multilingual Neural Machine Translation with Language-Specific Self-Distillation

May 03, 2022

Yichong Huang, Xiaocheng Feng, Xinwei Geng, Bing Qin

Figure 1 for OmniKnight: Multilingual Neural Machine Translation with Language-Specific Self-Distillation

Figure 2 for OmniKnight: Multilingual Neural Machine Translation with Language-Specific Self-Distillation

Figure 3 for OmniKnight: Multilingual Neural Machine Translation with Language-Specific Self-Distillation

Figure 4 for OmniKnight: Multilingual Neural Machine Translation with Language-Specific Self-Distillation

Abstract:Although all-in-one-model multilingual neural machine translation (MNMT) has achieved remarkable progress in recent years, its selected best overall checkpoint fails to achieve the best performance simultaneously in all language pairs. It is because that the best checkpoints for each individual language pair (i.e., language-specific best checkpoints) scatter in different epochs. In this paper, we present a novel training strategy dubbed Language-Specific Self-Distillation (LSSD) for bridging the gap between language-specific best checkpoints and the overall best checkpoint. In detail, we regard each language-specific best checkpoint as a teacher to distill the overall best checkpoint. Moreover, we systematically explore three variants of our LSSD, which perform distillation statically, selectively, and adaptively. Experimental results on two widely-used benchmarks show that LSSD obtains consistent improvements towards all language pairs and achieves the state-of-the-art

Via

Access Paper or Ask Questions

Dialogue Discourse-Aware Graph Convolutional Networks for Abstractive Meeting Summarization

Dec 07, 2020

Xiachong Feng, Xiaocheng Feng, Bing Qin, Xinwei Geng, Ting Liu

Figure 1 for Dialogue Discourse-Aware Graph Convolutional Networks for Abstractive Meeting Summarization

Figure 2 for Dialogue Discourse-Aware Graph Convolutional Networks for Abstractive Meeting Summarization

Figure 3 for Dialogue Discourse-Aware Graph Convolutional Networks for Abstractive Meeting Summarization

Figure 4 for Dialogue Discourse-Aware Graph Convolutional Networks for Abstractive Meeting Summarization

Abstract:Sequence-to-sequence methods have achieved promising results for textual abstractive meeting summarization. Different from documents like news and scientific papers, a meeting is naturally full of dialogue-specific structural information. However, previous works model a meeting in a sequential manner, while ignoring the rich structural information. In this paper, we develop a Dialogue Discourse-Aware Graph Convolutional Networks (DDA-GCN) for meeting summarization by utilizing dialogue discourse, which is a dialogue-specific structure that can provide pre-defined semantic relationships between each utterance. We first transform the entire meeting text with dialogue discourse relations into a discourse graph and then use DDA-GCN to encode the semantic representation of the graph. Finally, we employ a Recurrent Neural Network to generate the summary. In addition, we utilize the question-answer discourse relation to construct a pseudo-summarization corpus, which can be used to pre-train our model. Experimental results on the AMI dataset show that our model outperforms various baselines and can achieve state-of-the-art performance.

Via

Access Paper or Ask Questions

How Does Selective Mechanism Improve Self-Attention Networks?

May 03, 2020

Xinwei Geng, Longyue Wang, Xing Wang, Bing Qin, Ting Liu, Zhaopeng Tu

Figure 1 for How Does Selective Mechanism Improve Self-Attention Networks?

Figure 2 for How Does Selective Mechanism Improve Self-Attention Networks?

Figure 3 for How Does Selective Mechanism Improve Self-Attention Networks?

Figure 4 for How Does Selective Mechanism Improve Self-Attention Networks?

Abstract:Self-attention networks (SANs) with selective mechanism has produced substantial improvements in various NLP tasks by concentrating on a subset of input words. However, the underlying reasons for their strong performance have not been well explained. In this paper, we bridge the gap by assessing the strengths of selective SANs (SSANs), which are implemented with a flexible and universal Gumbel-Softmax. Experimental results on several representative NLP tasks, including natural language inference, semantic role labelling, and machine translation, show that SSANs consistently outperform the standard SANs. Through well-designed probing experiments, we empirically validate that the improvement of SSANs can be attributed in part to mitigating two commonly-cited weaknesses of SANs: word order encoding and structure modeling. Specifically, the selective mechanism improves SANs by paying more attention to content words that contribute to the meaning of the sentence. The code and data are released at https://github.com/xwgeng/SSAN.

* ACL 2020

Via

Access Paper or Ask Questions

Learning to Refine Source Representations for Neural Machine Translation

Dec 26, 2018

Xinwei Geng, Longyue Wang, Xing Wang, Bing Qin, Ting Liu, Zhaopeng Tu

Figure 1 for Learning to Refine Source Representations for Neural Machine Translation

Figure 2 for Learning to Refine Source Representations for Neural Machine Translation

Figure 3 for Learning to Refine Source Representations for Neural Machine Translation

Figure 4 for Learning to Refine Source Representations for Neural Machine Translation

Abstract:Neural machine translation (NMT) models generally adopt an encoder-decoder architecture for modeling the entire translation process. The encoder summarizes the representation of input sentence from scratch, which is potentially a problem if the sentence is ambiguous. When translating a text, humans often create an initial understanding of the source sentence and then incrementally refine it along the translation on the target side. Starting from this intuition, we propose a novel encoder-refiner-decoder framework, which dynamically refines the source representations based on the generated target-side information at each decoding step. Since the refining operations are time-consuming, we propose a strategy, leveraging the power of reinforcement learning models, to decide when to refine at specific decoding steps. Experimental results on both Chinese-English and English-German translation tasks show that the proposed approach significantly and consistently improves translation performance over the standard encoder-decoder framework. Furthermore, when refining strategy is applied, results still show reasonable improvement over the baseline without much decrease in decoding speed.

Via

Access Paper or Ask Questions

A Planning based Framework for Essay Generation

Jan 06, 2016

Bing Qin, Duyu Tang, Xinwei Geng, Dandan Ning, Jiahao Liu, Ting Liu

Figure 1 for A Planning based Framework for Essay Generation

Figure 2 for A Planning based Framework for Essay Generation

Figure 3 for A Planning based Framework for Essay Generation

Figure 4 for A Planning based Framework for Essay Generation

Abstract:Generating an article automatically with computer program is a challenging task in artificial intelligence and natural language processing. In this paper, we target at essay generation, which takes as input a topic word in mind and generates an organized article under the theme of the topic. We follow the idea of text planning \cite{Reiter1997} and develop an essay generation framework. The framework consists of three components, including topic understanding, sentence extraction and sentence reordering. For each component, we studied several statistical algorithms and empirically compared between them in terms of qualitative or quantitative analysis. Although we run experiments on Chinese corpus, the method is language independent and can be easily adapted to other language. We lay out the remaining challenges and suggest avenues for future research.

Via

Access Paper or Ask Questions