Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Meng Lin

CoT-MAE v2: Contextual Masked Auto-Encoder with Multi-view Modeling for Passage Retrieval

Apr 05, 2023

Xing Wu, Guangyuan Ma, Peng Wang, Meng Lin, Zijia Lin, Fuzheng Zhang, Songlin Hu

Abstract:Growing techniques have been emerging to improve the performance of passage retrieval. As an effective representation bottleneck pretraining technique, the contextual masked auto-encoder utilizes contextual embedding to assist in the reconstruction of passages. However, it only uses a single auto-encoding pre-task for dense representation pre-training. This study brings multi-view modeling to the contextual masked auto-encoder. Firstly, multi-view representation utilizes both dense and sparse vectors as multi-view representations, aiming to capture sentence semantics from different aspects. Moreover, multiview decoding paradigm utilizes both autoencoding and auto-regressive decoders in representation bottleneck pre-training, aiming to provide both reconstructive and generative signals for better contextual representation pretraining. We refer to this multi-view pretraining method as CoT-MAE v2. Through extensive experiments, we show that CoT-MAE v2 is effective and robust on large-scale passage retrieval benchmarks and out-of-domain zero-shot benchmarks.

* working in progress

Via

Access Paper or Ask Questions

ConTextual Mask Auto-Encoder for Dense Passage Retrieval

Aug 16, 2022

Xing Wu, Guangyuan Ma, Meng Lin, Zijia Lin, Zhongyuan Wang, Songlin Hu

Figure 1 for ConTextual Mask Auto-Encoder for Dense Passage Retrieval

Figure 2 for ConTextual Mask Auto-Encoder for Dense Passage Retrieval

Figure 3 for ConTextual Mask Auto-Encoder for Dense Passage Retrieval

Figure 4 for ConTextual Mask Auto-Encoder for Dense Passage Retrieval

Abstract:Dense passage retrieval aims to retrieve the relevant passages of a query from a large corpus based on dense representations (i.e., vectors) of the query and the passages. Recent studies have explored improving pre-trained language models to boost dense retrieval performance. This paper proposes CoT-MAE (ConTextual Masked Auto-Encoder), a simple yet effective generative pre-training method for dense passage retrieval. CoT-MAE employs an asymmetric encoder-decoder architecture that learns to compress the sentence semantics into a dense vector through self-supervised and context-supervised masked auto-encoding. Precisely, self-supervised masked auto-encoding learns to model the semantics of the tokens inside a text span, and context-supervised masked auto-encoding learns to model the semantical correlation between the text spans. We conduct experiments on large-scale passage retrieval benchmarks and show considerable improvements over strong baselines, demonstrating the high efficiency of CoT-MAE.

* 11 pages

Via

Access Paper or Ask Questions

Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

Feb 28, 2022

Xing Wu, Chaochen Gao, Meng Lin, Liangjun Zang, Zhongyuan Wang, Songlin Hu

Figure 1 for Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

Figure 2 for Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

Figure 3 for Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

Figure 4 for Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

Abstract:Before entering the neural network, a token is generally converted to the corresponding one-hot representation, which is a discrete distribution of the vocabulary. Smoothed representation is the probability of candidate tokens obtained from a pre-trained masked language model, which can be seen as a more informative substitution to the one-hot representation. We propose an efficient data augmentation method, termed text smoothing, by converting a sentence from its one-hot representation to a controllable smoothed representation. We evaluate text smoothing on different benchmarks in a low-resource regime. Experimental results show that text smoothing outperforms various mainstream data augmentation methods by a substantial margin. Moreover, text smoothing can be combined with those data augmentation methods to achieve better performance.

* ACL 2022 Main Conference Accepted

Via

Access Paper or Ask Questions

Imbalanced Sentiment Classification Enhanced with Discourse Marker

Mar 28, 2019

Tao Zhang, Xing Wu, Meng Lin, Jizhong Han, Songlin Hu

Figure 1 for Imbalanced Sentiment Classification Enhanced with Discourse Marker

Figure 2 for Imbalanced Sentiment Classification Enhanced with Discourse Marker

Figure 3 for Imbalanced Sentiment Classification Enhanced with Discourse Marker

Figure 4 for Imbalanced Sentiment Classification Enhanced with Discourse Marker

Abstract:Imbalanced data commonly exists in real world, espacially in sentiment-related corpus, making it difficult to train a classifier to distinguish latent sentiment in text data. We observe that humans often express transitional emotion between two adjacent discourses with discourse markers like "but", "though", "while", etc, and the head discourse and the tail discourse 3 usually indicate opposite emotional tendencies. Based on this observation, we propose a novel plug-and-play method, which first samples discourses according to transitional discourse markers and then validates sentimental polarities with the help of a pretrained attention-based model. Our method increases sample diversity in the first place, can serve as a upstream preprocessing part in data augmentation. We conduct experiments on three public sentiment datasets, with several frequently used algorithms. Results show that our method is found to be consistently effective, even in highly imbalanced scenario, and easily be integrated with oversampling method to boost the performance on imbalanced sentiment classification.

* 12 pages, 1 figures

Via

Access Paper or Ask Questions