Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hinrich Schuetze

Xerox PARC and Stanford University

Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence

Mar 06, 2025

Mohsen Fayyaz, Ali Modarressi, Hinrich Schuetze, Nanyun Peng

Abstract:Dense retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG). Since they often serve as the first step in these systems, their robustness is critical to avoid failures. In this work, by repurposing a relation extraction dataset (e.g. Re-DocRED), we design controlled experiments to quantify the impact of heuristic biases, such as favoring shorter documents, in retrievers like Dragon+ and Contriever. Our findings reveal significant vulnerabilities: retrievers often rely on superficial patterns like over-prioritizing document beginnings, shorter documents, repeated entities, and literal matches. Additionally, they tend to overlook whether the document contains the query's answer, lacking deep semantic understanding. Notably, when multiple biases combine, models exhibit catastrophic performance degradation, selecting the answer-containing document in less than 3% of cases over a biased document without the answer. Furthermore, we show that these biases have direct consequences for downstream applications like RAG, where retrieval-preferred documents can mislead LLMs, resulting in a 34% performance drop than not providing any documents at all.

Via

Access Paper or Ask Questions

Problem Solving Through Human-AI Preference-Based Cooperation

Aug 15, 2024

Subhabrata Dutta, Timo Kaufmann, Goran Glavaš, Ivan Habernal, Kristian Kersting, Frauke Kreuter, Mira Mezini, Iryna Gurevych, Eyke Hüllermeier, Hinrich Schuetze

Abstract:While there is a widespread belief that artificial general intelligence (AGI) -- or even superhuman AI -- is imminent, complex problems in expert domains are far from being solved. We argue that such problems require human-AI cooperation and that the current state of the art in generative AI is unable to play the role of a reliable partner due to a multitude of shortcomings, including inability to keep track of a complex solution artifact (e.g., a software program), limited support for versatile human preference expression and lack of adapting to human preference in an interactive setting. To address these challenges, we propose HAI-Co2, a novel human-AI co-construction framework. We formalize HAI-Co2 and discuss the difficult open research problems that it faces. Finally, we present a case study of HAI-Co2 and demonstrate its efficacy compared to monolithic generative AI models.

* 16 pages (excluding references)

Via

Access Paper or Ask Questions

Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments

Mar 22, 2022

Antonis Maronikolakis, Axel Wisiorek, Leah Nann, Haris Jabbar, Sahana Udupa, Hinrich Schuetze

Figure 1 for Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments

Figure 2 for Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments

Figure 3 for Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments

Figure 4 for Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments

Abstract:Building on current work on multilingual hate speech (e.g., Ousidhoum et al. (2019)) and hate speech reduction (e.g., Sap et al. (2020)), we present XTREMESPEECH, a new hate speech dataset containing 20,297 social media passages from Brazil, Germany, India and Kenya. The key novelty is that we directly involve the affected communities in collecting and annotating the data - as opposed to giving companies and governments control over defining and combatting hate speech. This inclusive approach results in datasets more representative of actually occurring online speech and is likely to facilitate the removal of the social media content that marginalized communities view as causing the most harm. Based on XTREMESPEECH, we establish novel tasks with accompanying baselines, provide evidence that cross-country training is generally not feasible due to cultural differences between countries and perform an interpretability analysis of BERT's predictions.

* Accepted to ACL 2022 Findings

Via

Access Paper or Ask Questions

Differentiable Multi-Agent Actor-Critic for Multi-Step Radiology Report Summarization

Mar 15, 2022

Sanjeev Kumar Karn, Ning Liu, Hinrich Schuetze, Oladimeji Farri

Figure 1 for Differentiable Multi-Agent Actor-Critic for Multi-Step Radiology Report Summarization

Figure 2 for Differentiable Multi-Agent Actor-Critic for Multi-Step Radiology Report Summarization

Figure 3 for Differentiable Multi-Agent Actor-Critic for Multi-Step Radiology Report Summarization

Figure 4 for Differentiable Multi-Agent Actor-Critic for Multi-Step Radiology Report Summarization

Abstract:The IMPRESSIONS section of a radiology report about an imaging study is a summary of the radiologist's reasoning and conclusions, and it also aids the referring physician in confirming or excluding certain diagnoses. A cascade of tasks are required to automatically generate an abstractive summary of the typical information-rich radiology report. These tasks include acquisition of salient content from the report and generation of a concise, easily consumable IMPRESSIONS section. Prior research on radiology report summarization has focused on single-step end-to-end models -- which subsume the task of salient content acquisition. To fully explore the cascade structure and explainability of radiology report summarization, we introduce two innovations. First, we design a two-step approach: extractive summarization followed by abstractive summarization. Second, we additionally break down the extractive part into two independent tasks: extraction of salient (1) sentences and (2) keywords. Experiments on a publicly available radiology report dataset show our novel approach leads to a more precise summary compared to single-step and to two-step-with-single-extractive-process baselines with an overall improvement in F1 score Of 3-4%.

* 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022
* Accepted at 60th Annual Meeting of the Association for Computational Linguistics 2022 Main Conference

Via

Access Paper or Ask Questions

Few-Shot Learning of an Interleaved Text Summarization Model by Pretraining with Synthetic Data

Mar 08, 2021

Sanjeev Kumar Karn, Francine Chen, Yan-Ying Chen, Ulli Waltinger, Hinrich Schuetze

Figure 1 for Few-Shot Learning of an Interleaved Text Summarization Model by Pretraining with Synthetic Data

Figure 2 for Few-Shot Learning of an Interleaved Text Summarization Model by Pretraining with Synthetic Data

Figure 3 for Few-Shot Learning of an Interleaved Text Summarization Model by Pretraining with Synthetic Data

Figure 4 for Few-Shot Learning of an Interleaved Text Summarization Model by Pretraining with Synthetic Data

Abstract:Interleaved texts, where posts belonging to different threads occur in a sequence, commonly occur in online chat posts, so that it can be time-consuming to quickly obtain an overview of the discussions. Existing systems first disentangle the posts by threads and then extract summaries from those threads. A major issue with such systems is error propagation from the disentanglement component. While end-to-end trainable summarization system could obviate explicit disentanglement, such systems require a large amount of labeled data. To address this, we propose to pretrain an end-to-end trainable hierarchical encoder-decoder system using synthetic interleaved texts. We show that by fine-tuning on a real-world meeting dataset (AMI), such a system out-performs a traditional two-step system by 22%. We also compare against transformer models and observed that pretraining with synthetic data both the encoder and decoder outperforms the BertSumExtAbs transformer model which pretrains only the encoder on a large dataset.

* Adapt-NLP: The Second Workshop on Domain Adaptation for NLP

Via

Access Paper or Ask Questions

Nonsymbolic Text Representation

May 01, 2017

Hinrich Schuetze, Heike Adel, Ehsaneddin Asgari

Figure 1 for Nonsymbolic Text Representation

Figure 2 for Nonsymbolic Text Representation

Figure 3 for Nonsymbolic Text Representation

Figure 4 for Nonsymbolic Text Representation

Abstract:We introduce the first generic text representation model that is completely nonsymbolic, i.e., it does not require the availability of a segmentation or tokenization method that attempts to identify words or other symbolic units in text. This applies to training the parameters of the model on a training corpus as well as to applying it when computing the representation of a new text. We show that our model performs better than prior work on an information extraction and a text denoising task.

Via

Access Paper or Ask Questions

Two SVDs produce more focal deep learning representations

May 11, 2013

Hinrich Schuetze, Christian Scheible

Figure 1 for Two SVDs produce more focal deep learning representations

Figure 2 for Two SVDs produce more focal deep learning representations

Figure 3 for Two SVDs produce more focal deep learning representations

Abstract:A key characteristic of work on deep learning and neural networks in general is that it relies on representations of the input that support generalization, robust inference, domain adaptation and other desirable functionalities. Much recent progress in the field has focused on efficient and effective methods for computing representations. In this paper, we propose an alternative method that is more efficient than prior work and produces representations that have a property we call focality -- a property we hypothesize to be important for neural network representations. The method consists of a simple application of two consecutive SVDs and is inspired by Anandkumar (2012).

Via

Access Paper or Ask Questions

Cutting Recursive Autoencoder Trees

Apr 26, 2013

Christian Scheible, Hinrich Schuetze

Figure 1 for Cutting Recursive Autoencoder Trees

Figure 2 for Cutting Recursive Autoencoder Trees

Abstract:Deep Learning models enjoy considerable success in Natural Language Processing. While deep architectures produce useful representations that lead to improvements in various tasks, they are often difficult to interpret. This makes the analysis of learned structures particularly difficult. In this paper, we rely on empirical tests to see whether a particular structure makes sense. We present an analysis of the Semi-Supervised Recursive Autoencoder, a well-known model that produces structural representations of text. We show that for certain tasks, the structure of the autoencoder can be significantly reduced without loss of classification accuracy and we evaluate the produced structures using human judgment.

Via

Access Paper or Ask Questions

Automatic Detection of Text Genre

Jul 08, 1997

Brett Kessler, Geoffrey Nunberg, Hinrich Schuetze

Figure 1 for Automatic Detection of Text Genre

Figure 2 for Automatic Detection of Text Genre

Figure 3 for Automatic Detection of Text Genre

Abstract:As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of classification. We propose a theory of genres as bundles of facets, which correlate with various surface cues, and argue that genre detection based on surface cues is as successful as detection based on deeper structural properties.

* Proceedings ACL/EACL 1997, Madrid, p. 32-38
* 7 pages

Via

Access Paper or Ask Questions

Distributional Part-of-Speech Tagging

Mar 08, 1995

Hinrich Schuetze

Figure 1 for Distributional Part-of-Speech Tagging

Figure 2 for Distributional Part-of-Speech Tagging

Figure 3 for Distributional Part-of-Speech Tagging

Figure 4 for Distributional Part-of-Speech Tagging

Abstract:This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. Unlike previous work, the algorithm categorizes word tokens in context instead of word types. The algorithm is evaluated on the Brown Corpus.

* EACL 95
* 8 pages

Via

Access Paper or Ask Questions