Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Camille Pradel

Load What You Need: Smaller Versions of Multilingual BERT

Oct 12, 2020

Amine Abdaoui, Camille Pradel, Grégoire Sigel

Figure 1 for Load What You Need: Smaller Versions of Multilingual BERT

Figure 2 for Load What You Need: Smaller Versions of Multilingual BERT

Figure 3 for Load What You Need: Smaller Versions of Multilingual BERT

Abstract:Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, most of the parameters are located in the embeddings layer. Therefore, reducing the vocabulary size should have an important impact on the total number of parameters. In this paper, we propose to generate smaller models that handle fewer number of languages according to the targeted corpora. We present an evaluation of smaller versions of multilingual BERT on the XNLI data set, but we believe that this method may be applied to other multilingual transformers. The obtained results confirm that we can generate smaller models that keep comparable results, while reducing up to 45% of the total number of parameters. We compared our models with DistilmBERT (a distilled version of multilingual BERT) and showed that unlike language reduction, distillation induced a 1.7% to 6% drop in the overall accuracy on the XNLI data set. The presented models and code are publicly available.

* SustaiNLP / EMNLP 2020

Via

Access Paper or Ask Questions

DiscSense: Automated Semantic Analysis of Discourse Markers

Jun 02, 2020

Damien Sileo, Tim Van de Cruys, Camille Pradel, Philippe Muller

Figure 1 for DiscSense: Automated Semantic Analysis of Discourse Markers

Figure 2 for DiscSense: Automated Semantic Analysis of Discourse Markers

Figure 3 for DiscSense: Automated Semantic Analysis of Discourse Markers

Figure 4 for DiscSense: Automated Semantic Analysis of Discourse Markers

Abstract:Discourse markers ({\it by contrast}, {\it happily}, etc.) are words or phrases that are used to signal semantic and/or pragmatic relationships between clauses or sentences. Recent work has fruitfully explored the prediction of discourse markers between sentence pairs in order to learn accurate sentence representations, that are useful in various classification tasks. In this work, we take another perspective: using a model trained to predict discourse markers between sentence pairs, we predict plausible markers between sentence pairs with a known semantic relation (provided by existing classification datasets). These predictions allow us to study the link between discourse markers and the semantic relations annotated in classification datasets. Handcrafted mappings have been proposed between markers and discourse relations on a limited set of markers and a limited set of categories, but there exist hundreds of discourse markers expressing a wide variety of relations, and there is no consensus on the taxonomy of relations between competing discourse theories (which are largely built in a top-down fashion). By using an automatic rediction method over existing semantically annotated datasets, we provide a bottom-up characterization of discourse markers in English. The resulting dataset, named DiscSense, is publicly available.

* Accepted at LREC2020

Via

Access Paper or Ask Questions

Discourse-Based Evaluation of Language Understanding

Jul 19, 2019

Damien Sileo, Tim Van-de-Cruys, Camille Pradel, Philippe Muller

Figure 1 for Discourse-Based Evaluation of Language Understanding

Figure 2 for Discourse-Based Evaluation of Language Understanding

Figure 3 for Discourse-Based Evaluation of Language Understanding

Abstract:We introduce DiscEval, a compilation of $11$ evaluation datasets with a focus on discourse, that can be used for evaluation of English Natural Language Understanding when considering meaning as use. We make the case that evaluation with discourse tasks is overlooked and that Natural Language Inference (NLI) pretraining may not lead to the learning really universal representations. DiscEval can also be used as supplementary training data for multi-task learning-based systems, and is publicly available, alongside the code for gathering and preprocessing the datasets.

Via

Access Paper or Ask Questions

Composition of Sentence Embeddings:Lessons from Statistical Relational Learning

Apr 04, 2019

Damien Sileo, Tim Van-De-Cruys, Camille Pradel, Philippe Muller

Figure 1 for Composition of Sentence Embeddings:Lessons from Statistical Relational Learning

Figure 2 for Composition of Sentence Embeddings:Lessons from Statistical Relational Learning

Figure 3 for Composition of Sentence Embeddings:Lessons from Statistical Relational Learning

Figure 4 for Composition of Sentence Embeddings:Lessons from Statistical Relational Learning

Abstract:Various NLP problems -- such as the prediction of sentence similarity, entailment, and discourse relations -- are all instances of the same general task: the modeling of semantic relations between a pair of textual elements. A popular model for such problems is to embed sentences into fixed size vectors, and use composition functions (e.g. concatenation or sum) of those vectors as features for the prediction. At the same time, composition of embeddings has been a main focus within the field of Statistical Relational Learning (SRL) whose goal is to predict relations between entities (typically from knowledge base triples). In this article, we show that previous work on relation prediction between texts implicitly uses compositions from baseline SRL models. We show that such compositions are not expressive enough for several tasks (e.g. natural language inference). We build on recent SRL models to address textual relational problems, showing that they are more expressive, and can alleviate issues from simpler compositions. The resulting models significantly improve the state of the art in both transferable sentence representation learning and relation prediction.

* Camera-ready for *SEM 2019

Via

Access Paper or Ask Questions

Mining Discourse Markers for Unsupervised Sentence Representation Learning

Mar 28, 2019

Damien Sileo, Tim Van-De-Cruys, Camille Pradel, Philippe Muller

Figure 1 for Mining Discourse Markers for Unsupervised Sentence Representation Learning

Figure 2 for Mining Discourse Markers for Unsupervised Sentence Representation Learning

Figure 3 for Mining Discourse Markers for Unsupervised Sentence Representation Learning

Figure 4 for Mining Discourse Markers for Unsupervised Sentence Representation Learning

Abstract:Current state of the art systems in NLP heavily rely on manually annotated datasets, which are expensive to construct. Very little work adequately exploits unannotated data -- such as discourse markers between sentences -- mainly because of data sparseness and ineffective extraction methods. In the present work, we propose a method to automatically discover sentence pairs with relevant discourse markers, and apply it to massive amounts of data. Our resulting dataset contains 174 discourse markers with at least 10k examples each, even for rare markers such as coincidentally or amazingly We use the resulting data as supervision for learning transferable sentence embeddings. In addition, we show that even though sentence representation learning through prediction of discourse markers yields state of the art results across different transfer tasks, it is not clear that our models made use of the semantic relation between sentences, thus leaving room for further improvements. Our datasets are publicly available (https://github.com/synapse-developpement/Discovery)

* Camera-ready for NAACL HLT 2019

Via

Access Paper or Ask Questions

Synapse at CAp 2017 NER challenge: Fasttext CRF

Sep 14, 2017

Damien Sileo, Camille Pradel, Philippe Muller, Tim Van de Cruys

Figure 1 for Synapse at CAp 2017 NER challenge: Fasttext CRF

Figure 2 for Synapse at CAp 2017 NER challenge: Fasttext CRF

Figure 3 for Synapse at CAp 2017 NER challenge: Fasttext CRF

Abstract:We present our system for the CAp 2017 NER challenge which is about named entity recognition on French tweets. Our system leverages unsupervised learning on a larger dataset of French tweets to learn features feeding a CRF model. It was ranked first without using any gazetteer or structured external data, with an F-measure of 58.89\%. To the best of our knowledge, it is the first system to use fasttext embeddings (which include subword representations) and an embedding-based sentence representation for NER.

* CAP2017

Via

Access Paper or Ask Questions