Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinchi Chen

Improving Cross-task Generalization of Unified Table-to-text Models with Compositional Task Configurations

Dec 17, 2022

Jifan Chen, Yuhao Zhang, Lan Liu, Rui Dong, Xinchi Chen, Patrick Ng, William Yang Wang, Zhiheng Huang

Abstract:There has been great progress in unifying various table-to-text tasks using a single encoder-decoder model trained via multi-task learning (Xie et al., 2022). However, existing methods typically encode task information with a simple dataset name as a prefix to the encoder. This not only limits the effectiveness of multi-task learning, but also hinders the model's ability to generalize to new domains or tasks that were not seen during training, which is crucial for real-world applications. In this paper, we propose compositional task configurations, a set of prompts prepended to the encoder to improve cross-task generalization of unified models. We design the task configurations to explicitly specify the task type, as well as its input and output types. We show that this not only allows the model to better learn shared knowledge across different tasks at training, but also allows us to control the model by composing new configurations that apply novel input-output combinations in a zero-shot manner. We demonstrate via experiments over ten table-to-text tasks that our method outperforms the UnifiedSKG baseline by noticeable margins in both in-domain and zero-shot settings, with average improvements of +0.5 and +12.6 from using a T5-large backbone, respectively.

Via

Access Paper or Ask Questions

Language Agnostic Multilingual Information Retrieval with Contrastive Learning

Oct 12, 2022

Xiyang Hu, Xinchi Chen, Peng Qi, Deguang Kong, Kunlun Liu, William Yang Wang, Zhiheng Huang

Figure 1 for Language Agnostic Multilingual Information Retrieval with Contrastive Learning

Figure 2 for Language Agnostic Multilingual Information Retrieval with Contrastive Learning

Figure 3 for Language Agnostic Multilingual Information Retrieval with Contrastive Learning

Figure 4 for Language Agnostic Multilingual Information Retrieval with Contrastive Learning

Abstract:Multilingual information retrieval is challenging due to the lack of training datasets for many low-resource languages. We present an effective method by leveraging parallel and non-parallel corpora to improve the pretrained multilingual language models' cross-lingual transfer ability for information retrieval. We design the semantic contrastive loss as regular contrastive learning to improve the cross-lingual alignment of parallel sentence pairs, and we propose a new contrastive loss, the language contrastive loss, to leverage both parallel corpora and non-parallel corpora to further improve multilingual representation learning. We train our model on an English information retrieval dataset, and test its zero-shot transfer ability to other languages. Our experiment results show that our method brings significant improvement to prior work on retrieval performance, while it requires much less computational effort. Our model can work well even with a small number of parallel corpora. And it can be used as an add-on module to any backbone and other tasks. Our code is available at: https://github.com/xiyanghu/multilingualIR.

Via

Access Paper or Ask Questions

Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner

May 18, 2022

Danilo Ribeiro, Shen Wang, Xiaofei Ma, Rui Dong, Xiaokai Wei, Henry Zhu, Xinchi Chen, Zhiheng Huang, Peng Xu, Andrew Arnold(+1 more)

Figure 1 for Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner

Figure 2 for Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner

Figure 3 for Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner

Figure 4 for Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner

Abstract:Large language models have achieved high performance on various question answering (QA) benchmarks, but the explainability of their output remains elusive. Structured explanations, called entailment trees, were recently suggested as a way to explain and inspect a QA system's answer. In order to better generate such entailment trees, we propose an architecture called Iterative Retrieval-Generation Reasoner (IRGR). Our model is able to explain a given hypothesis by systematically generating a step-by-step explanation from textual premises. The IRGR model iteratively searches for suitable premises, constructing a single entailment step at a time. Contrary to previous approaches, our method combines generation steps and retrieval of premises, allowing the model to leverage intermediate conclusions, and mitigating the input size limit of baseline encoder-decoder models. We conduct experiments using the EntailmentBank dataset, where we outperform existing benchmarks on premise retrieval and entailment tree generation, with around 300% gain in overall correctness.

* published in NAACL 2022

Via

Access Paper or Ask Questions

Contrastive Document Representation Learning with Graph Attention Networks

Oct 20, 2021

Peng Xu, Xinchi Chen, Xiaofei Ma, Zhiheng Huang, Bing Xiang

Figure 1 for Contrastive Document Representation Learning with Graph Attention Networks

Figure 2 for Contrastive Document Representation Learning with Graph Attention Networks

Figure 3 for Contrastive Document Representation Learning with Graph Attention Networks

Figure 4 for Contrastive Document Representation Learning with Graph Attention Networks

Abstract:Recent progress in pretrained Transformer-based language models has shown great success in learning contextual representation of text. However, due to the quadratic self-attention complexity, most of the pretrained Transformers models can only handle relatively short text. It is still a challenge when it comes to modeling very long documents. In this work, we propose to use a graph attention network on top of the available pretrained Transformers model to learn document embeddings. This graph attention network allows us to leverage the high-level semantic structure of the document. In addition, based on our graph document model, we design a simple contrastive learning strategy to pretrain our models on a large amount of unlabeled corpus. Empirically, we demonstrate the effectiveness of our approaches in document classification and document retrieval tasks.

* Findings of EMNLP 2021

Via

Access Paper or Ask Questions

Capturing Argument Interaction in Semantic Role Labeling with Capsule Networks

Oct 07, 2019

Xinchi Chen, Chunchuan Lyu, Ivan Titov

Figure 1 for Capturing Argument Interaction in Semantic Role Labeling with Capsule Networks

Figure 2 for Capturing Argument Interaction in Semantic Role Labeling with Capsule Networks

Figure 3 for Capturing Argument Interaction in Semantic Role Labeling with Capsule Networks

Figure 4 for Capturing Argument Interaction in Semantic Role Labeling with Capsule Networks

Abstract:Semantic role labeling (SRL) involves extracting propositions (i.e. predicates and their typed arguments) from natural language sentences. State-of-the-art SRL models rely on powerful encoders (e.g., LSTMs) and do not model non-local interaction between arguments. We propose a new approach to modeling these interactions while maintaining efficient inference. Specifically, we use Capsule Networks: each proposition is encoded as a tuple of \textit{capsules}, one capsule per argument type (i.e. role). These tuples serve as embeddings of entire propositions. In every network layer, the capsules interact with each other and with representations of words in the sentence. Each iteration results in updated proposition embeddings and updated predictions about the SRL structure. Our model substantially outperforms the non-refinement baseline model on all 7 CoNLL-2019 languages and achieves state-of-the-art results on 5 languages (including English) for dependency SRL. We analyze the types of mistakes corrected by the refinement procedure. For example, each role is typically (but not always) filled with at most one argument. Whereas enforcing this approximate constraint is not useful with the modern SRL system, iterative procedure corrects the mistakes by capturing this intuition in a flexible and context-sensitive way.

* 11 pages, 6 figures, accepted as a long paper at EMNLP 2019

Via

Access Paper or Ask Questions

Switch-LSTMs for Multi-Criteria Chinese Word Segmentation

Dec 19, 2018

Jingjing Gong, Xinchi Chen, Tao Gui, Xipeng Qiu

Figure 1 for Switch-LSTMs for Multi-Criteria Chinese Word Segmentation

Figure 2 for Switch-LSTMs for Multi-Criteria Chinese Word Segmentation

Figure 3 for Switch-LSTMs for Multi-Criteria Chinese Word Segmentation

Figure 4 for Switch-LSTMs for Multi-Criteria Chinese Word Segmentation

Abstract:Multi-criteria Chinese word segmentation is a promising but challenging task, which exploits several different segmentation criteria and mines their common underlying knowledge. In this paper, we propose a flexible multi-criteria learning for Chinese word segmentation. Usually, a segmentation criterion could be decomposed into multiple sub-criteria, which are shareable with other segmentation criteria. The process of word segmentation is a routing among these sub-criteria. From this perspective, we present Switch-LSTMs to segment words, which consist of several long short-term memory neural networks (LSTM), and a switcher to automatically switch the routing among these LSTMs. With these auto-switched LSTMs, our model provides a more flexible solution for multi-criteria CWS, which is also easy to transfer the learned knowledge to new criteria. Experiments show that our model obtains significant improvements on eight corpora with heterogeneous segmentation criteria, compared to the previous method and single-criterion learning.

Via

Access Paper or Ask Questions

Exploring Shared Structures and Hierarchies for Multiple NLP Tasks

Aug 23, 2018

Junkun Chen, Kaiyu Chen, Xinchi Chen, Xipeng Qiu, Xuanjing Huang

Figure 1 for Exploring Shared Structures and Hierarchies for Multiple NLP Tasks

Figure 2 for Exploring Shared Structures and Hierarchies for Multiple NLP Tasks

Figure 3 for Exploring Shared Structures and Hierarchies for Multiple NLP Tasks

Figure 4 for Exploring Shared Structures and Hierarchies for Multiple NLP Tasks

Abstract:Designing shared neural architecture plays an important role in multi-task learning. The challenge is that finding an optimal sharing scheme heavily relies on the expert knowledge and is not scalable to a large number of diverse tasks. Inspired by the promising work of neural architecture search (NAS), we apply reinforcement learning to automatically find possible shared architecture for multi-task learning. Specifically, we use a controller to select from a set of shareable modules and assemble a task-specific architecture, and repeat the same procedure for other tasks. The controller is trained with reinforcement learning to maximize the expected accuracies for all tasks. We conduct extensive experiments on two types of tasks, text classification and sequence labeling, which demonstrate the benefits of our approach.

* 8

Via

Access Paper or Ask Questions

Toward Diverse Text Generation with Inverse Reinforcement Learning

Jun 07, 2018

Zhan Shi, Xinchi Chen, Xipeng Qiu, Xuanjing Huang

Figure 1 for Toward Diverse Text Generation with Inverse Reinforcement Learning

Figure 2 for Toward Diverse Text Generation with Inverse Reinforcement Learning

Figure 3 for Toward Diverse Text Generation with Inverse Reinforcement Learning

Figure 4 for Toward Diverse Text Generation with Inverse Reinforcement Learning

Abstract:Text generation is a crucial task in NLP. Recently, several adversarial generative models have been proposed to improve the exposure bias problem in text generation. Though these models gain great success, they still suffer from the problems of reward sparsity and mode collapse. In order to address these two problems, in this paper, we employ inverse reinforcement learning (IRL) for text generation. Specifically, the IRL framework learns a reward function on training data, and then an optimal policy to maximum the expected total reward. Similar to the adversarial models, the reward and policy function in IRL are optimized alternately. Our method has two advantages: (1) the reward function can produce more dense reward signals. (2) the generation policy, trained by "entropy regularized" policy gradient, encourages to generate more diversified texts. Experiment results demonstrate that our proposed method can generate higher quality texts than the previous methods.

* 7 pages, IJCAI 2018

Via

Access Paper or Ask Questions

A Feature-Enriched Neural Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Jul 02, 2017

Xinchi Chen, Xipeng Qiu, Xuanjing Huang

Figure 1 for A Feature-Enriched Neural Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Figure 2 for A Feature-Enriched Neural Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Figure 3 for A Feature-Enriched Neural Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Figure 4 for A Feature-Enriched Neural Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Abstract:Recently, neural network models for natural language processing tasks have been increasingly focused on for their ability of alleviating the burden of manual feature engineering. However, the previous neural models cannot extract the complicated feature compositions as the traditional methods with discrete features. In this work, we propose a feature-enriched neural model for joint Chinese word segmentation and part-of-speech tagging task. Specifically, to simulate the feature templates of traditional discrete feature based models, we use different filters to model the complex compositional features with convolutional and pooling layer, and then utilize long distance dependency information with recurrent layer. Experimental results on five different datasets show the effectiveness of our proposed model.

Via

Access Paper or Ask Questions

DAG-based Long Short-Term Memory for Neural Word Segmentation

Jul 02, 2017

Xinchi Chen, Zhan Shi, Xipeng Qiu, Xuanjing Huang

Figure 1 for DAG-based Long Short-Term Memory for Neural Word Segmentation

Figure 2 for DAG-based Long Short-Term Memory for Neural Word Segmentation

Figure 3 for DAG-based Long Short-Term Memory for Neural Word Segmentation

Figure 4 for DAG-based Long Short-Term Memory for Neural Word Segmentation

Abstract:Neural word segmentation has attracted more and more research interests for its ability to alleviate the effort of feature engineering and utilize the external resource by the pre-trained character or word embeddings. In this paper, we propose a new neural model to incorporate the word-level information for Chinese word segmentation. Unlike the previous word-based models, our model still adopts the framework of character-based sequence labeling, which has advantages on both effectiveness and efficiency at the inference stage. To utilize the word-level information, we also propose a new long short-term memory (LSTM) architecture over directed acyclic graph (DAG). Experimental results demonstrate that our model leads to better performances than the baseline models.

Via

Access Paper or Ask Questions