Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fréderic Godin

Zero-Shot Cross-Lingual Sentiment Classification under Distribution Shift: an Exploratory Study

Nov 11, 2023

Maarten De Raedt, Semere Kiros Bitew, Fréderic Godin, Thomas Demeester, Chris Develder

Figure 1 for Zero-Shot Cross-Lingual Sentiment Classification under Distribution Shift: an Exploratory Study

Figure 2 for Zero-Shot Cross-Lingual Sentiment Classification under Distribution Shift: an Exploratory Study

Figure 3 for Zero-Shot Cross-Lingual Sentiment Classification under Distribution Shift: an Exploratory Study

Figure 4 for Zero-Shot Cross-Lingual Sentiment Classification under Distribution Shift: an Exploratory Study

Abstract:The brittleness of finetuned language model performance on out-of-distribution (OOD) test samples in unseen domains has been well-studied for English, yet is unexplored for multi-lingual models. Therefore, we study generalization to OOD test data specifically in zero-shot cross-lingual transfer settings, analyzing performance impacts of both language and domain shifts between train and test data. We further assess the effectiveness of counterfactually augmented data (CAD) in improving OOD generalization for the cross-lingual setting, since CAD has been shown to benefit in a monolingual English setting. Finally, we propose two new approaches for OOD generalization that avoid the costly annotation process associated with CAD, by exploiting the power of recent large language models (LLMs). We experiment with 3 multilingual models, LaBSE, mBERT, and XLM-R trained on English IMDb movie reviews, and evaluate on OOD test sets in 13 languages: Amazon product reviews, Tweets, and Restaurant reviews. Results echo the OOD performance decline observed in the monolingual English setting. Further, (i) counterfactuals from the original high-resource language do improve OOD generalization in the low-resource language, and (ii) our newly proposed cost-effective approaches reach similar or up to +3.1% better accuracy than CAD for Amazon and Restaurant reviews.

* The 3rd Workshop on Multilingual Representation Learning (MRL@EMNLP2023)

Via

Access Paper or Ask Questions

IDAS: Intent Discovery with Abstractive Summarization

May 31, 2023

Maarten De Raedt, Fréderic Godin, Thomas Demeester, Chris Develder

Figure 1 for IDAS: Intent Discovery with Abstractive Summarization

Figure 2 for IDAS: Intent Discovery with Abstractive Summarization

Figure 3 for IDAS: Intent Discovery with Abstractive Summarization

Figure 4 for IDAS: Intent Discovery with Abstractive Summarization

Abstract:Intent discovery is the task of inferring latent intents from a set of unlabeled utterances, and is a useful step towards the efficient creation of new conversational agents. We show that recent competitive methods in intent discovery can be outperformed by clustering utterances based on abstractive summaries, i.e., "labels", that retain the core elements while removing non-essential information. We contribute the IDAS approach, which collects a set of descriptive utterance labels by prompting a Large Language Model, starting from a well-chosen seed set of prototypical utterances, to bootstrap an In-Context Learning procedure to generate labels for non-prototypical utterances. The utterances and their resulting noisy labels are then encoded by a frozen pre-trained encoder, and subsequently clustered to recover the latent intents. For the unsupervised task (without any intent labels) IDAS outperforms the state-of-the-art by up to +7.42% in standard cluster metrics for the Banking, StackOverflow, and Transport datasets. For the semi-supervised task (with labels for a subset of intents) IDAS surpasses 2 recent methods on the CLINC benchmark without even using labeled data.

* The 5th Workshop on NLP for Conversational AI (NLP4ConvAI@ACL)

Via

Access Paper or Ask Questions

A Simple Geometric Method for Cross-Lingual Linguistic Transformations with Pre-trained Autoencoders

Apr 08, 2021

Maarten De Raedt, Fréderic Godin, Pieter Buteneers, Chris Develder, Thomas Demeester

Figure 1 for A Simple Geometric Method for Cross-Lingual Linguistic Transformations with Pre-trained Autoencoders

Figure 2 for A Simple Geometric Method for Cross-Lingual Linguistic Transformations with Pre-trained Autoencoders

Figure 3 for A Simple Geometric Method for Cross-Lingual Linguistic Transformations with Pre-trained Autoencoders

Figure 4 for A Simple Geometric Method for Cross-Lingual Linguistic Transformations with Pre-trained Autoencoders

Abstract:Powerful sentence encoders trained for multiple languages are on the rise. These systems are capable of embedding a wide range of linguistic properties into vector representations. While explicit probing tasks can be used to verify the presence of specific linguistic properties, it is unclear whether the vector representations can be manipulated to indirectly steer such properties. We investigate the use of a geometric mapping in embedding space to transform linguistic properties, without any tuning of the pre-trained sentence encoder or decoder. We validate our approach on three linguistic properties using a pre-trained multilingual autoencoder and analyze the results in both monolingual and cross-lingual settings.

Via

Access Paper or Ask Questions

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Apr 03, 2019

Fréderic Godin, Anjishnu Kumar, Arpit Mittal

Figure 1 for Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Figure 2 for Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Figure 3 for Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Figure 4 for Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Abstract:In this paper, we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate for such settings. More specifically, they do not evaluate the systems correctly for situations when there is no answer available and thus agents optimized for these metrics are poor at modeling confidence. We introduce a simple new performance metric for evaluating question-answering agents that is more representative of practical usage conditions, and optimize for this metric by extending the binary reward structure used in prior work to a ternary reward structure which also rewards an agent for not answering a question rather than giving an incorrect answer. We show that this can drastically improve the precision of answered questions while only not answering a limited number of previously correctly answered questions. Employing a supervised learning strategy using depth-first-search paths to bootstrap the reinforcement learning algorithm further improves performance.

* Accepted at NAACL 2019. Version 1 was presented at NIPS 2018 workshop on Relational Representation Learning

Via

Access Paper or Ask Questions

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

Aug 28, 2018

Fréderic Godin, Kris Demuynck, Joni Dambre, Wesley De Neve, Thomas Demeester

Figure 1 for Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

Figure 2 for Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

Figure 3 for Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

Figure 4 for Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

Abstract:Character-level features are currently used in different neural network-based natural language processing algorithms. However, little is known about the character-level patterns those models learn. Moreover, models are often compared only quantitatively while a qualitative analysis is missing. In this paper, we investigate which character-level patterns neural networks learn and if those patterns coincide with manually-defined word segmentations and annotations. To that end, we extend the contextual decomposition technique (Murdoch et al. 2018) to convolutional neural networks which allows us to compare convolutional neural networks and bidirectional long short-term memory networks. We evaluate and compare these models for the task of morphological tagging on three morphologically different languages and show that these models implicitly discover understandable linguistic rules. Our implementation can be found at https://github.com/FredericGodin/ContextualDecomposition-NLP .

* Accepted at EMNLP 2018

Via

Access Paper or Ask Questions

Predefined Sparseness in Recurrent Sequence Models

Aug 27, 2018

Thomas Demeester, Johannes Deleu, Fréderic Godin, Chris Develder

Figure 1 for Predefined Sparseness in Recurrent Sequence Models

Figure 2 for Predefined Sparseness in Recurrent Sequence Models

Figure 3 for Predefined Sparseness in Recurrent Sequence Models

Figure 4 for Predefined Sparseness in Recurrent Sequence Models

Abstract:Inducing sparseness while training neural networks has been shown to yield models with a lower memory footprint but similar effectiveness to dense models. However, sparseness is typically induced starting from a dense model, and thus this advantage does not hold during training. We propose techniques to enforce sparseness upfront in recurrent sequence models for NLP applications, to also benefit training. First, in language modeling, we show how to increase hidden state sizes in recurrent layers without increasing the number of parameters, leading to more expressive models. Second, for sequence labeling, we show that word embeddings with predefined sparseness lead to similar performance as dense embeddings, at a fraction of the number of trainable parameters.

* the SIGNLL Conference on Computational Natural Language Learning (CoNLL, 2018)

Via

Access Paper or Ask Questions

Dual Rectified Linear Units : A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks

Oct 31, 2017

Fréderic Godin, Jonas Degrave, Joni Dambre, Wesley De Neve

Figure 1 for Dual Rectified Linear Units : A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks

Figure 2 for Dual Rectified Linear Units : A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks

Figure 3 for Dual Rectified Linear Units : A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks

Figure 4 for Dual Rectified Linear Units : A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks

Abstract:In this paper, we introduce a novel type of Rectified Linear Unit (ReLU), called a Dual Rectified Linear Unit (DReLU). A DReLU, which comes with an unbounded positive and negative image, can be used as a drop-in replacement for a tanh activation function in the recurrent step of Quasi-Recurrent Neural Networks (QRNNs) (Bradbury et al. (2017)). Similar to ReLUs, DReLUs are less prone to the vanishing gradient problem, they are noise robust, and they induce sparse activations. We independently reproduce the QRNN experiments of Bradbury et al. (2017) and compare our DReLU-based QRNNs with the original tanh-based QRNNs and Long Short-Term Memory networks (LSTMs) on sentiment classification and word-level language modeling. Additionally, we evaluate on character-level language modeling, showing that we are able to stack up to eight QRNN layers with DReLUs, thus making it possible to improve the current state-of-the-art in character-level language modeling over shallow architectures based on LSTMs.

Via

Access Paper or Ask Questions

Improving Language Modeling using Densely Connected Recurrent Neural Networks

Jul 19, 2017

Fréderic Godin, Joni Dambre, Wesley De Neve

Figure 1 for Improving Language Modeling using Densely Connected Recurrent Neural Networks

Figure 2 for Improving Language Modeling using Densely Connected Recurrent Neural Networks

Figure 3 for Improving Language Modeling using Densely Connected Recurrent Neural Networks

Abstract:In this paper, we introduce the novel concept of densely connected layers into recurrent neural networks. We evaluate our proposed architecture on the Penn Treebank language modeling task. We show that we can obtain similar perplexity scores with six times fewer parameters compared to a standard stacked 2-layer LSTM model trained with dropout (Zaremba et al. 2014). In contrast with the current usage of skip connections, we show that densely connecting only a few stacked layers with skip connections already yields significant perplexity reductions.

* Accepted at Workshop on Representation Learning, ACL2017

Via

Access Paper or Ask Questions