Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Carlos Ramisch

LIS, TALEP

Injecting Wiktionary to improve token-level contextual representations using contrastive learning

Feb 12, 2024

Anna Mosolova, Marie Candito, Carlos Ramisch

Figure 1 for Injecting Wiktionary to improve token-level contextual representations using contrastive learning

Figure 2 for Injecting Wiktionary to improve token-level contextual representations using contrastive learning

Figure 3 for Injecting Wiktionary to improve token-level contextual representations using contrastive learning

Figure 4 for Injecting Wiktionary to improve token-level contextual representations using contrastive learning

Abstract:While static word embeddings are blind to context, for lexical semantics tasks context is rather too present in contextual word embeddings, vectors of same-meaning occurrences being too different (Ethayarajh, 2019). Fine-tuning pre-trained language models (PLMs) using contrastive learning was proposed, leveraging automatically self-augmented examples (Liu et al., 2021b). In this paper, we investigate how to inject a lexicon as an alternative source of supervision, using the English Wiktionary. We also test how dimensionality reduction impacts the resulting contextual word embeddings. We evaluate our approach on the Word-In-Context (WiC) task, in the unsupervised setting (not using the training set). We achieve new SoTA result on the original WiC test set. We also propose two new WiC test sets for which we show that our fine-tuning method achieves substantial improvements. We also observe improvements, although modest, for the semantic frame induction task. Although we experimented on English to allow comparison with related work, our method is adaptable to the many languages for which large Wiktionaries exist.

* Accepted to EACL 2024 (Main)

Via

Access Paper or Ask Questions

AMU-EURANOVA at CASE 2021 Task 1: Assessing the stability of multilingual BERT

Jun 10, 2021

Léo Bouscarrat, Antoine Bonnefoy, Cécile Capponi, Carlos Ramisch

Figure 1 for AMU-EURANOVA at CASE 2021 Task 1: Assessing the stability of multilingual BERT

Figure 2 for AMU-EURANOVA at CASE 2021 Task 1: Assessing the stability of multilingual BERT

Figure 3 for AMU-EURANOVA at CASE 2021 Task 1: Assessing the stability of multilingual BERT

Figure 4 for AMU-EURANOVA at CASE 2021 Task 1: Assessing the stability of multilingual BERT

Abstract:This paper explains our participation in task 1 of the CASE 2021 shared task. This task is about multilingual event extraction from news. We focused on sub-task 4, event information extraction. This sub-task has a small training dataset and we fine-tuned a multilingual BERT to solve this sub-task. We studied the instability problem on the dataset and tried to mitigate it.

* Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021), Aug 2021, Online, Unknown Region

Via

Access Paper or Ask Questions

To Be or Not To Be a Verbal Multiword Expression: A Quest for Discriminating Features

Jul 22, 2020

Caroline Pasquer, Agata Savary, Jean-Yves Antoine, Carlos Ramisch, Nicolas Labroche, Arnaud Giacometti

Figure 1 for To Be or Not To Be a Verbal Multiword Expression: A Quest for Discriminating Features

Figure 2 for To Be or Not To Be a Verbal Multiword Expression: A Quest for Discriminating Features

Figure 3 for To Be or Not To Be a Verbal Multiword Expression: A Quest for Discriminating Features

Figure 4 for To Be or Not To Be a Verbal Multiword Expression: A Quest for Discriminating Features

Abstract:Automatic identification of mutiword expressions (MWEs) is a pre-requisite for semantically-oriented downstream applications. This task is challenging because MWEs, especially verbal ones (VMWEs), exhibit surface variability. However, this variability is usually more restricted than in regular (non-VMWE) constructions, which leads to various variability profiles. We use this fact to determine the optimal set of features which could be used in a supervised classification setting to solve a subproblem of VMWE identification: the identification of occurrences of previously seen VMWEs. Surprisingly, a simple custom frequency-based feature selection method proves more efficient than other standard methods such as Chi-squared test, information gain or decision trees. An SVM classifier using the optimal set of only 6 features outperforms the best systems from a recent shared task on the French seen data.

Via

Access Paper or Ask Questions

Multilingual enrichment of disease biomedical ontologies

Apr 07, 2020

Léo Bouscarrat, Antoine Bonnefoy, Cécile Capponi, Carlos Ramisch

Figure 1 for Multilingual enrichment of disease biomedical ontologies

Figure 2 for Multilingual enrichment of disease biomedical ontologies

Figure 3 for Multilingual enrichment of disease biomedical ontologies

Figure 4 for Multilingual enrichment of disease biomedical ontologies

Abstract:Translating biomedical ontologies is an important challenge, but doing it manually requires much time and money. We study the possibility to use open-source knowledge bases to translate biomedical ontologies. We focus on two aspects: coverage and quality. We look at the coverage of two biomedical ontologies focusing on diseases with respect to Wikidata for 9 European languages (Czech, Dutch, English, French, German, Italian, Polish, Portuguese and Spanish) for both ontologies, plus Arabic, Chinese and Russian for the second one. We first use direct links between Wikidata and the studied ontologies and then use second-order links by going through other intermediate ontologies. We then compare the quality of the translations obtained thanks to Wikidata with a commercial machine translation tool, here Google Cloud Translation.

* 2nd workshop on MultilingualBIO: Multilingual Biomedical Text Processing, May 2020, Marseille, France

Via

Access Paper or Ask Questions