Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefan F. Schouten

Truth-value judgment in language models: belief directions are context sensitive

Apr 29, 2024

Stefan F. Schouten, Peter Bloem, Ilia Markov, Piek Vossen

Abstract:Recent work has demonstrated that the latent spaces of large language models (LLMs) contain directions predictive of the truth of sentences. Multiple methods recover such directions and build probes that are described as getting at a model's "knowledge" or "beliefs". We investigate this phenomenon, looking closely at the impact of context on the probes. Our experiments establish where in the LLM the probe's predictions can be described as being conditional on the preceding (related) sentences. Specifically, we quantify the responsiveness of the probes to the presence of (negated) supporting and contradicting sentences, and score the probes on their consistency. We also perform a causal intervention experiment, investigating whether moving the representation of a premise along these belief directions influences the position of the hypothesis along that same direction. We find that the probes we test are generally context sensitive, but that contexts which should not affect the truth often still impact the probe outputs. Our experiments show that the type of errors depend on the layer, the (type of) model, and the kind of data. Finally, our results suggest that belief directions are (one of the) causal mediators in the inference process that incorporates in-context information.

Via

Access Paper or Ask Questions

Reasoning about Ambiguous Definite Descriptions

Oct 23, 2023

Stefan F. Schouten, Peter Bloem, Ilia Markov, Piek Vossen

Figure 1 for Reasoning about Ambiguous Definite Descriptions

Figure 2 for Reasoning about Ambiguous Definite Descriptions

Figure 3 for Reasoning about Ambiguous Definite Descriptions

Figure 4 for Reasoning about Ambiguous Definite Descriptions

Abstract:Natural language reasoning plays an increasingly important role in improving language models' ability to solve complex language understanding tasks. An interesting use case for reasoning is the resolution of context-dependent ambiguity. But no resources exist to evaluate how well Large Language Models can use explicit reasoning to resolve ambiguity in language. We propose to use ambiguous definite descriptions for this purpose and create and publish the first benchmark dataset consisting of such phrases. Our method includes all information required to resolve the ambiguity in the prompt, which means a model does not require anything but reasoning to do well. We find this to be a challenging task for recent LLMs. Code and data available at: https://github.com/sfschouten/exploiting-ambiguity

* EMNLP 2023 Findings

Via

Access Paper or Ask Questions

Cross-Domain Toxic Spans Detection

Jun 16, 2023

Stefan F. Schouten, Baran Barbarestani, Wondimagegnhue Tufa, Piek Vossen, Ilia Markov

Abstract:Given the dynamic nature of toxic language use, automated methods for detecting toxic spans are likely to encounter distributional shift. To explore this phenomenon, we evaluate three approaches for detecting toxic spans under cross-domain conditions: lexicon-based, rationale extraction, and fine-tuned language models. Our findings indicate that a simple method using off-the-shelf lexicons performs best in the cross-domain setup. The cross-domain error analysis suggests that (1) rationale extraction methods are prone to false negatives, while (2) language models, despite performing best for the in-domain case, recall fewer explicitly toxic words than lexicons and are prone to certain types of false positives. Our code is publicly available at: https://github.com/sfschouten/toxic-cross-domain.

* NLDB 2023

Via

Access Paper or Ask Questions

A Song of agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

May 09, 2022

Michael Neely, Stefan F. Schouten, Maurits Bleeker, Ana Lucic

Figure 1 for A Song of agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

Figure 2 for A Song of agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

Figure 3 for A Song of agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

Figure 4 for A Song of agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

Abstract:There has been significant debate in the NLP community about whether or not attention weights can be used as an explanation - a mechanism for interpreting how important each input token is for a particular prediction. The validity of "attention as explanation" has so far been evaluated by computing the rank correlation between attention-based explanations and existing feature attribution explanations using LSTM-based models. In our work, we (i) compare the rank correlation between five more recent feature attribution methods and two attention-based methods, on two types of NLP tasks, and (ii) extend this analysis to also include transformer-based models. We find that attention-based explanations do not correlate strongly with any recent feature attribution methods, regardless of the model or task. Furthermore, we find that none of the tested explanations correlate strongly with one another for the transformer-based model, leading us to question the underlying assumption that we should measure the validity of attention-based explanations based on how well they correlate with existing feature attribution explanation methods. After conducting experiments on five datasets using two different models, we argue that the community should stop using rank correlation as an evaluation metric for attention-based explanations. We suggest that researchers and practitioners should instead test various explanation methods and employ a human-in-the-loop process to determine if the explanations align with human intuition for the particular use case at hand.

Via

Access Paper or Ask Questions

Order in the Court: Explainable AI Methods Prone to Disagreement

May 07, 2021

Michael Neely, Stefan F. Schouten, Maurits J. R. Bleeker, Ana Lucic

Figure 1 for Order in the Court: Explainable AI Methods Prone to Disagreement

Figure 2 for Order in the Court: Explainable AI Methods Prone to Disagreement

Abstract:In Natural Language Processing, feature-additive explanation methods quantify the independent contribution of each input token towards a model's decision. By computing the rank correlation between attention weights and the scores produced by a small sample of these methods, previous analyses have sought to either invalidate or support the role of attention-based explanations as a faithful and plausible measure of salience. To investigate what measures of rank correlation can reliably conclude, we comprehensively compare feature-additive methods, including attention-based explanations, across several neural architectures and tasks. In most cases, we find that none of our chosen methods agree. Therefore, we argue that rank correlation is largely uninformative and does not measure the quality of feature-additive methods. Additionally, the range of conclusions a practitioner may draw from a single explainability algorithm are limited.

Via

Access Paper or Ask Questions