Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Geoffrey Scoutheeten

Separating Retention from Extraction in the Evaluation of End-to-end Relation Extraction

Sep 24, 2021

Bruno Taillé, Vincent Guigue, Geoffrey Scoutheeten, Patrick Gallinari

Figure 1 for Separating Retention from Extraction in the Evaluation of End-to-end Relation Extraction

Figure 2 for Separating Retention from Extraction in the Evaluation of End-to-end Relation Extraction

Figure 3 for Separating Retention from Extraction in the Evaluation of End-to-end Relation Extraction

Figure 4 for Separating Retention from Extraction in the Evaluation of End-to-end Relation Extraction

Abstract:State-of-the-art NLP models can adopt shallow heuristics that limit their generalization capability (McCoy et al., 2019). Such heuristics include lexical overlap with the training set in Named-Entity Recognition (Taill\'e et al., 2020) and Event or Type heuristics in Relation Extraction (Rosenman et al., 2020). In the more realistic end-to-end RE setting, we can expect yet another heuristic: the mere retention of training relation triples. In this paper, we propose several experiments confirming that retention of known facts is a key factor of performance on standard benchmarks. Furthermore, one experiment suggests that a pipeline model able to use intermediate type representations is less prone to over-rely on retention.

* Accepted at EMNLP 2021

Via

Access Paper or Ask Questions

Data-QuestEval: A Referenceless Metric for Data to Text Semantic Evaluation

Apr 15, 2021

Clément Rebuffel, Thomas Scialom, Laure Soulier, Benjamin Piwowarski, Sylvain Lamprier, Jacopo Staiano, Geoffrey Scoutheeten, Patrick Gallinari

Figure 1 for Data-QuestEval: A Referenceless Metric for Data to Text Semantic Evaluation

Figure 2 for Data-QuestEval: A Referenceless Metric for Data to Text Semantic Evaluation

Figure 3 for Data-QuestEval: A Referenceless Metric for Data to Text Semantic Evaluation

Abstract:In this paper, we explore how QuestEval, which is a Text-vs-Text metric, can be adapted for the evaluation of Data-to-Text Generation systems. QuestEval is a reference-less metric that compares the predictions directly to the structured input data by automatically asking and answering questions. Its adaptation to Data-to-Text is not straightforward as it requires multi-modal Question Generation and Answering (QG \& QA) systems. To this purpose, we propose to build synthetic multi-modal corpora that enables to train multi-modal QG/QA. The resulting metric is reference-less, multi-modal; it obtains state-of-the-art correlations with human judgement on the E2E and WebNLG benchmark.

Via

Access Paper or Ask Questions

Controlling Hallucinations at Word Level in Data-to-Text Generation

Feb 04, 2021

Clément Rebuffel, Marco Roberti, Laure Soulier, Geoffrey Scoutheeten, Rossella Cancelliere, Patrick Gallinari

Figure 1 for Controlling Hallucinations at Word Level in Data-to-Text Generation

Figure 2 for Controlling Hallucinations at Word Level in Data-to-Text Generation

Figure 3 for Controlling Hallucinations at Word Level in Data-to-Text Generation

Figure 4 for Controlling Hallucinations at Word Level in Data-to-Text Generation

Abstract:Data-to-Text Generation (DTG) is a subfield of Natural Language Generation aiming at transcribing structured data in natural language descriptions. The field has been recently boosted by the use of neural-based generators which exhibit on one side great syntactic skills without the need of hand-crafted pipelines; on the other side, the quality of the generated text reflects the quality of the training data, which in realistic settings only offer imperfectly aligned structure-text pairs. Consequently, state-of-art neural models include misleading statements - usually called hallucinations - in their outputs. The control of this phenomenon is today a major challenge for DTG, and is the problem addressed in the paper. Previous work deal with this issue at the instance level: using an alignment score for each table-reference pair. In contrast, we propose a finer-grained approach, arguing that hallucinations should rather be treated at the word level. Specifically, we propose a Multi-Branch Decoder which is able to leverage word-level labels to learn the relevant parts of each training instance. These labels are obtained following a simple and efficient scoring procedure based on co-occurrence analysis and dependency parsing. Extensive evaluations, via automated metrics and human judgment on the standard WikiBio benchmark, show the accuracy of our alignment labels and the effectiveness of the proposed Multi-Branch Decoder. Our model is able to reduce and control hallucinations, while keeping fluency and coherence in generated texts. Further experiments on a degraded version of ToTTo show that our model could be successfully used on very noisy settings.

* 20 pages, 6 figures, 5 tables (excluding Appendix). Source code: https://github.com/KaijuML/dtt-multi-branch

Via

Access Paper or Ask Questions

PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation

Oct 22, 2020

Clément Rebuffel, Laure Soulier, Geoffrey Scoutheeten, Patrick Gallinari

Figure 1 for PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation

Figure 2 for PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation

Figure 3 for PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation

Figure 4 for PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation

Abstract:In language generation models conditioned by structured data, the classical training via maximum likelihood almost always leads models to pick up on dataset divergence (i.e., hallucinations or omissions), and to incorporate them erroneously in their own generations at inference. In this work, we build ontop of previous Reinforcement Learning based approaches and show that a model-agnostic framework relying on the recently introduced PARENT metric is efficient at reducing both hallucinations and omissions. Evaluations on the widely used WikiBIO and WebNLG benchmarks demonstrate the effectiveness of this framework compared to state-of-the-art models.

* Accepted at the 13th International Conference on Natural Language Generation (INLG 2020)

Via

Access Paper or Ask Questions

Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!

Sep 22, 2020

Bruno Taillé, Vincent Guigue, Geoffrey Scoutheeten, Patrick Gallinari

Figure 1 for Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!

Figure 2 for Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!

Figure 3 for Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!

Figure 4 for Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!

Abstract:Despite efforts to distinguish three different evaluation setups (Bekoulis et al., 2018), numerous end-to-end Relation Extraction (RE) articles present unreliable performance comparison to previous work. In this paper, we first identify several patterns of invalid comparisons in published papers and describe them to avoid their propagation. We then propose a small empirical study to quantify the impact of the most common mistake and evaluate it leads to overestimating the final RE performance by around 5% on ACE05. We also seize this opportunity to study the unexplored ablations of two recent developments: the use of language model pretraining (specifically BERT) and span-level NER. This meta-analysis emphasizes the need for rigor in the report of both the evaluation setting and the datasets statistics and we call for unifying the evaluation setting in end-to-end RE.

* EMNLP 2020

Via

Access Paper or Ask Questions

A Hierarchical Model for Data-to-Text Generation

Dec 20, 2019

Clément Rebuffel, Laure Soulier, Geoffrey Scoutheeten, Patrick Gallinari

Figure 1 for A Hierarchical Model for Data-to-Text Generation

Figure 2 for A Hierarchical Model for Data-to-Text Generation

Figure 3 for A Hierarchical Model for Data-to-Text Generation

Figure 4 for A Hierarchical Model for Data-to-Text Generation

Abstract:Transcribing structured data into natural language descriptions has emerged as a challenging task, referred to as "data-to-text". These structures generally regroup multiple elements, as well as their attributes. Most attempts rely on translation encoder-decoder methods which linearize elements into a sequence. This however loses most of the structure contained in the data. In this work, we propose to overpass this limitation with a hierarchical model that encodes the data-structure at the element-level and the structure level. Evaluations on RotoWire show the effectiveness of our model w.r.t. qualitative and quantitative metrics.

* Accepted at the 42nd European Conference on IR Research, ECIR 2020

Via

Access Paper or Ask Questions