Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Niklas Friedrich

X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents

May 30, 2022

Sotaro Takeshita, Tommaso Green, Niklas Friedrich, Kai Eckert, Simone Paolo Ponzetto

Figure 1 for X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents

Figure 2 for X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents

Figure 3 for X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents

Figure 4 for X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents

Abstract:The number of scientific publications nowadays is rapidly increasing, causing information overload for researchers and making it hard for scholars to keep up to date with current trends and lines of work. Consequently, recent work on applying text mining technologies for scholarly publications has investigated the application of automatic text summarization technologies, including extreme summarization, for this domain. However, previous work has concentrated only on monolingual settings, primarily in English. In this paper, we fill this research gap and present an abstractive cross-lingual summarization dataset for four different languages in the scholarly domain, which enables us to train and evaluate models that process English papers and generate summaries in German, Italian, Chinese and Japanese. We present our new X-SCITLDR dataset for multilingual summarization and thoroughly benchmark different models based on a state-of-the-art multilingual pre-trained model, including a two-stage `summarize and translate' approach and a direct cross-lingual model. We additionally explore the benefits of intermediate-stage training using English monolingual summarization and machine translation as intermediate tasks and analyze performance in zero- and few-shot scenarios.

* JCDL2022

Via

Access Paper or Ask Questions

AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark

Sep 15, 2021

Niklas Friedrich, Kiril Gashteovski, Mingying Yu, Bhushan Kotnis, Carolin Lawrence, Mathias Niepert, Goran Glavaš

Figure 1 for AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark

Figure 2 for AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark

Figure 3 for AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark

Figure 4 for AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark

Abstract:Open Information Extraction (OIE) is the task of extracting facts from sentences in the form of relations and their corresponding arguments in schema-free manner. Intrinsic performance of OIE systems is difficult to measure due to the incompleteness of existing OIE benchmarks: the ground truth extractions do not group all acceptable surface realizations of the same fact that can be extracted from a sentence. To measure performance of OIE systems more realistically, it is necessary to manually annotate complete facts (i.e., clusters of all acceptable surface realizations of the same fact) from input sentences. We propose AnnIE: an interactive annotation platform that facilitates such challenging annotation tasks and supports creation of complete fact-oriented OIE evaluation benchmarks. AnnIE is modular and flexible in order to support different use case scenarios (i.e., benchmarks covering different types of facts). We use AnnIE to build two complete OIE benchmarks: one with verb-mediated facts and another with facts encompassing named entities. Finally, we evaluate several OIE systems on our complete benchmarks created with AnnIE. Our results suggest that existing incomplete benchmarks are overly lenient, and that OIE systems are not as robust as previously reported. We publicly release AnnIE under non-restrictive license.

Via

Access Paper or Ask Questions

DebIE: A Platform for Implicit and Explicit Debiasing of Word Embedding Spaces

Mar 11, 2021

Niklas Friedrich, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš

Figure 1 for DebIE: A Platform for Implicit and Explicit Debiasing of Word Embedding Spaces

Figure 2 for DebIE: A Platform for Implicit and Explicit Debiasing of Word Embedding Spaces

Figure 3 for DebIE: A Platform for Implicit and Explicit Debiasing of Word Embedding Spaces

Abstract:Recent research efforts in NLP have demonstrated that distributional word vector spaces often encode stereotypical human biases, such as racism and sexism. With word representations ubiquitously used in NLP models and pipelines, this raises ethical issues and jeopardizes the fairness of language technologies. While there exists a large body of work on bias measures and debiasing methods, to date, there is no platform that would unify these research efforts and make bias measuring and debiasing of representation spaces widely accessible. In this work, we present DebIE, the first integrated platform for (1) measuring and (2) mitigating bias in word embeddings. Given an (i) embedding space (users can choose between the predefined spaces or upload their own) and (ii) a bias specification (users can choose between existing bias specifications or create their own), DebIE can (1) compute several measures of implicit and explicit bias and modify the embedding space by executing two (mutually composable) debiasing models. DebIE's functionality can be accessed through four different interfaces: (a) a web application, (b) a desktop application, (c) a REST-ful API, and (d) as a command-line application. DebIE is available at: debie.informatik.uni-mannheim.de.

* Accepted as EACL21 Demo

Via

Access Paper or Ask Questions