Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fabian M. Suchanek

Corporate Greenwashing Detection in Text - a Survey

Feb 11, 2025

Tom Calamai, Oana Balalau, Théo Le Guenedal, Fabian M. Suchanek

Abstract:Greenwashing is an effort to mislead the public about the environmental impact of an entity, such as a state or company. We provide a comprehensive survey of the scientific literature addressing natural language processing methods to identify potentially misleading climate-related corporate communications, indicative of greenwashing. We break the detection of greenwashing into intermediate tasks, and review the state-of-the-art approaches for each of them. We discuss datasets, methods, and results, as well as limitations and open challenges. We also provide an overview of how far the field has come as a whole, and point out future research directions.

* 35 pages, 1 figure, 21 pages (appendix), working paper

Via

Access Paper or Ask Questions

Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation

May 22, 2024

Cyril Chhun, Fabian M. Suchanek, Chloé Clavel

Abstract:Storytelling is an integral part of human experience and plays a crucial role in social interactions. Thus, Automatic Story Evaluation (ASE) and Generation (ASG) could benefit society in multiple ways, but they are challenging tasks which require high-level human abilities such as creativity, reasoning and deep understanding. Meanwhile, Large Language Models (LLM) now achieve state-of-the-art performance on many NLP tasks. In this paper, we study whether LLMs can be used as substitutes for human annotators for ASE. We perform an extensive analysis of the correlations between LLM ratings, other automatic measures, and human annotations, and we explore the influence of prompting on the results and the explainability of LLM behaviour. Most notably, we find that LLMs outperform current automatic measures for system-level evaluation but still struggle at providing satisfactory explanations for their answers.

* TACL, pre-MIT Press publication version

Via

Access Paper or Ask Questions

Reconfidencing LLMs from the Grouping Loss Perspective

Feb 07, 2024

Lihu Chen, Alexandre Perez-Lebel, Fabian M. Suchanek, Gaël Varoquaux

Abstract:Large Language Models (LLMs), including ChatGPT and LLaMA, are susceptible to generating hallucinated answers in a confident tone. While efforts to elicit and calibrate confidence scores have proven useful, recent findings show that controlling uncertainty must go beyond calibration: predicted scores may deviate significantly from the actual posterior probabilities due to the impact of grouping loss. In this work, we construct a new evaluation dataset derived from a knowledge base to assess confidence scores given to answers of Mistral and LLaMA. Experiments show that they tend to be overconfident. Further, we show that they are more overconfident on some answers than others, \emph{eg} depending on the nationality of the person in the query. In uncertainty-quantification theory, this is grouping loss. To address this, we propose a solution to reconfidence LLMs, canceling not only calibration but also grouping loss. The LLMs, after the reconfidencing process, indicate improved confidence alignment with the accuracy of their responses.

Via

Access Paper or Ask Questions

Learning High-Quality and General-Purpose Phrase Representations

Jan 18, 2024

Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek

Figure 1 for Learning High-Quality and General-Purpose Phrase Representations

Figure 2 for Learning High-Quality and General-Purpose Phrase Representations

Figure 3 for Learning High-Quality and General-Purpose Phrase Representations

Figure 4 for Learning High-Quality and General-Purpose Phrase Representations

Abstract:Phrase representations play an important role in data science and natural language processing, benefiting various tasks like Entity Alignment, Record Linkage, Fuzzy Joins, and Paraphrase Classification. The current state-of-the-art method involves fine-tuning pre-trained language models for phrasal embeddings using contrastive learning. However, we have identified areas for improvement. First, these pre-trained models tend to be unnecessarily complex and require to be pre-trained on a corpus with context sentences. Second, leveraging the phrase type and morphology gives phrase representations that are both more precise and more flexible. We propose an improved framework to learn phrase representations in a context-free fashion. The framework employs phrase type classification as an auxiliary task and incorporates character-level information more effectively into the phrase representation. Furthermore, we design three granularities of data augmentation to increase the diversity of training samples. Our experiments across a wide range of tasks show that our approach generates superior phrase embeddings compared to previous methods while requiring a smaller model size. The code is available at \faGithub~ \url{https://github.com/tigerchen52/PEARL} \end{abstract}

* Findings of EACL 2024

Via

Access Paper or Ask Questions

The Locality and Symmetry of Positional Encodings

Oct 19, 2023

Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek

Figure 1 for The Locality and Symmetry of Positional Encodings

Figure 2 for The Locality and Symmetry of Positional Encodings

Figure 3 for The Locality and Symmetry of Positional Encodings

Figure 4 for The Locality and Symmetry of Positional Encodings

Abstract:Positional Encodings (PEs) are used to inject word-order information into transformer-based language models. While they can significantly enhance the quality of sentence representations, their specific contribution to language models is not fully understood, especially given recent findings that various positional encodings are insensitive to word order. In this work, we conduct a systematic study of positional encodings in \textbf{Bidirectional Masked Language Models} (BERT-style) , which complements existing work in three aspects: (1) We uncover the core function of PEs by identifying two common properties, Locality and Symmetry; (2) We show that the two properties are closely correlated with the performances of downstream tasks; (3) We quantify the weakness of current PEs by introducing two new probing tasks, on which current PEs perform poorly. We believe that these results are the basis for developing better PEs for transformer-based language models. The code is available at \faGithub~ \url{https://github.com/tigerchen52/locality\_symmetry}

* Long Paper in Findings of EMNLP23

Via

Access Paper or Ask Questions

GLADIS: A General and Large Acronym Disambiguation Benchmark

Feb 03, 2023

Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek

Figure 1 for GLADIS: A General and Large Acronym Disambiguation Benchmark

Figure 2 for GLADIS: A General and Large Acronym Disambiguation Benchmark

Figure 3 for GLADIS: A General and Large Acronym Disambiguation Benchmark

Figure 4 for GLADIS: A General and Large Acronym Disambiguation Benchmark

Abstract:Acronym Disambiguation (AD) is crucial for natural language understanding on various sources, including biomedical reports, scientific papers, and search engine queries. However, existing acronym disambiguation benchmarks and tools are limited to specific domains, and the size of prior benchmarks is rather small. To accelerate the research on acronym disambiguation, we construct a new benchmark named GLADIS with three components: (1) a much larger acronym dictionary with 1.5M acronyms and 6.4M long forms; (2) a pre-training corpus with 160 million sentences; (3) three datasets that cover the general, scientific, and biomedical domains. We then pre-train a language model, \emph{AcroBERT}, on our constructed corpus for general acronym disambiguation, and show the challenges and values of our new benchmark.

* EACL 23

Via

Access Paper or Ask Questions

Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation

Aug 25, 2022

Cyril Chhun, Pierre Colombo, Chloé Clavel, Fabian M. Suchanek

Figure 1 for Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation

Figure 2 for Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation

Figure 3 for Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation

Figure 4 for Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation

Abstract:Research on Automatic Story Generation (ASG) relies heavily on human and automatic evaluation. However, there is no consensus on which human evaluation criteria to use, and no analysis of how well automatic criteria correlate with them. In this paper, we propose to re-evaluate ASG evaluation. We introduce a set of 6 orthogonal and comprehensive human criteria, carefully motivated by the social sciences literature. We also present HANNA, an annotated dataset of 1,056 stories produced by 10 different ASG systems. HANNA allows us to quantitatively evaluate the correlations of 72 automatic metrics with human criteria. Our analysis highlights the weaknesses of current metrics for ASG and allows us to formulate practical recommendations for ASG evaluation.

* 43 pages, 38 figures. To appear in Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022)

Via

Access Paper or Ask Questions

Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Mar 21, 2022

Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek

Figure 1 for Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Figure 2 for Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Figure 3 for Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Figure 4 for Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Abstract:State-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary (OOV) words. To address this issue, we follow the principle of mimick-like models to generate vectors for unseen words, by learning the behavior of pre-trained embeddings using only the surface form of words. We present a simple contrastive learning framework, LOVE, which extends the word representation of an existing pre-trained language model (such as BERT), and makes it robust to OOV with few additional parameters. Extensive evaluations demonstrate that our lightweight model achieves similar or even better performances than prior competitors, both on original datasets and on corrupted variants. Moreover, it can be used in a plug-and-play fashion with FastText and BERT, where it significantly improves their robustness.

* Long paper accepted by ACL main conference. 17 pages

Via

Access Paper or Ask Questions

A Lightweight Neural Model for Biomedical Entity Linking

Dec 16, 2020

Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek

Figure 1 for A Lightweight Neural Model for Biomedical Entity Linking

Figure 2 for A Lightweight Neural Model for Biomedical Entity Linking

Figure 3 for A Lightweight Neural Model for Biomedical Entity Linking

Figure 4 for A Lightweight Neural Model for Biomedical Entity Linking

Abstract:Biomedical entity linking aims to map biomedical mentions, such as diseases and drugs, to standard entities in a given knowledge base. The specific challenge in this context is that the same biomedical entity can have a wide range of names, including synonyms, morphological variations, and names with different word orderings. Recently, BERT-based methods have advanced the state-of-the-art by allowing for rich representations of word sequences. However, they often have hundreds of millions of parameters and require heavy computing resources, which limits their applications in resource-limited scenarios. Here, we propose a lightweight neural method for biomedical entity linking, which needs just a fraction of the parameters of a BERT model and much less computing resources. Our method uses a simple alignment layer with attention mechanisms to capture the variations between mention and entity names. Yet, we show that our model is competitive with previous work on standard evaluation benchmarks.

Via

Access Paper or Ask Questions