Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Braun

Position: Editing Large Language Models Poses Serious Safety Risks

Feb 05, 2025

Paul Youssef, Zhixue Zhao, Daniel Braun, Jörg Schlötterer, Christin Seifert

Abstract:Large Language Models (LLMs) contain large amounts of facts about the world. These facts can become outdated over time, which has led to the development of knowledge editing methods (KEs) that can change specific facts in LLMs with limited side effects. This position paper argues that editing LLMs poses serious safety risks that have been largely overlooked. First, we note the fact that KEs are widely available, computationally inexpensive, highly performant, and stealthy makes them an attractive tool for malicious actors. Second, we discuss malicious use cases of KEs, showing how KEs can be easily adapted for a variety of malicious purposes. Third, we highlight vulnerabilities in the AI ecosystem that allow unrestricted uploading and downloading of updated models without verification. Fourth, we argue that a lack of social and institutional awareness exacerbates this risk, and discuss the implications for different stakeholders. We call on the community to (i) research tamper-resistant models and countermeasures against malicious model editing, and (ii) actively engage in securing the AI ecosystem.

Via

Access Paper or Ask Questions

Leveraging Annotator Disagreement for Text Classification

Sep 26, 2024

Jin Xu, Mariët Theune, Daniel Braun

Figure 1 for Leveraging Annotator Disagreement for Text Classification

Figure 2 for Leveraging Annotator Disagreement for Text Classification

Figure 3 for Leveraging Annotator Disagreement for Text Classification

Figure 4 for Leveraging Annotator Disagreement for Text Classification

Abstract:It is common practice in text classification to only use one majority label for model training even if a dataset has been annotated by multiple annotators. Doing so can remove valuable nuances and diverse perspectives inherent in the annotators' assessments. This paper proposes and compares three different strategies to leverage annotator disagreement for text classification: a probability-based multi-label method, an ensemble system, and instruction tuning. All three approaches are evaluated on the tasks of hate speech and abusive conversation detection, which inherently entail a high degree of subjectivity. Moreover, to evaluate the effectiveness of embracing annotation disagreements for model training, we conduct an online survey that compares the performance of the multi-label model against a baseline model, which is trained with the majority label. The results show that in hate speech detection, the multi-label method outperforms the other two approaches, while in abusive conversation detection, instruction tuning achieves the best performance. The results of the survey also show that the outputs from the multi-label models are considered a better representation of the texts than the single-label model.

Via

Access Paper or Ask Questions

Beware of Validation by Eye: Visual Validation of Linear Trends in Scatterplots

Jul 16, 2024

Daniel Braun, Remco Chang, Michael Gleicher, Tatiana von Landesberger

Figure 1 for Beware of Validation by Eye: Visual Validation of Linear Trends in Scatterplots

Figure 2 for Beware of Validation by Eye: Visual Validation of Linear Trends in Scatterplots

Figure 3 for Beware of Validation by Eye: Visual Validation of Linear Trends in Scatterplots

Figure 4 for Beware of Validation by Eye: Visual Validation of Linear Trends in Scatterplots

Abstract:Visual validation of regression models in scatterplots is a common practice for assessing model quality, yet its efficacy remains unquantified. We conducted two empirical experiments to investigate individuals' ability to visually validate linear regression models (linear trends) and to examine the impact of common visualization designs on validation quality. The first experiment showed that the level of accuracy for visual estimation of slope (i.e., fitting a line to data) is higher than for visual validation of slope (i.e., accepting a shown line). Notably, we found bias toward slopes that are "too steep" in both cases. This lead to novel insights that participants naturally assessed regression with orthogonal distances between the points and the line (i.e., ODR regression) rather than the common vertical distances (OLS regression). In the second experiment, we investigated whether incorporating common designs for regression visualization (error lines, bounding boxes, and confidence intervals) would improve visual validation. Even though error lines reduced validation bias, results failed to show the desired improvements in accuracy for any design. Overall, our findings suggest caution in using visual model validation for linear trends in scatterplots.

* Preprint and Author Version of a Full Paper, accepted to the 2024 IEEE Visualization Conference (VIS)

Via

Access Paper or Ask Questions

AGB-DE: A Corpus for the Automated Legal Assessment of Clauses in German Consumer Contracts

Jun 10, 2024

Daniel Braun, Florian Matthes

Abstract:Legal tasks and datasets are often used as benchmarks for the capabilities of language models. However, openly available annotated datasets are rare. In this paper, we introduce AGB-DE, a corpus of 3,764 clauses from German consumer contracts that have been annotated and legally assessed by legal experts. Together with the data, we present a first baseline for the task of detecting potentially void clauses, comparing the performance of an SVM baseline with three fine-tuned open language models and the performance of GPT-3.5. Our results show the challenging nature of the task, with no approach exceeding an F1-score of 0.54. While the fine-tuned models often performed better with regard to precision, GPT-3.5 outperformed the other approaches with regard to recall. An analysis of the errors indicates that one of the main challenges could be the correct interpretation of complex clauses, rather than the decision boundaries of what is permissible and what is not.

Via

Access Paper or Ask Questions

Efficient Black-Box Adversarial Attacks on Neural Text Detectors

Nov 03, 2023

Vitalii Fishchuk, Daniel Braun

Abstract:Neural text detectors are models trained to detect whether a given text was generated by a language model or written by a human. In this paper, we investigate three simple and resource-efficient strategies (parameter tweaking, prompt engineering, and character-level mutations) to alter texts generated by GPT-3.5 that are unsuspicious or unnoticeable for humans but cause misclassification by neural text detectors. The results show that especially parameter tweaking and character-level mutations are effective strategies.

* Accepted at ICNLSP 2023

Via

Access Paper or Ask Questions

Visual Validation versus Visual Estimation: A Study on the Average Value in Scatterplots

Jul 18, 2023

Daniel Braun, Ashley Suh, Remco Chang, Michael Gleicher, Tatiana von Landesberger

Figure 1 for Visual Validation versus Visual Estimation: A Study on the Average Value in Scatterplots

Figure 2 for Visual Validation versus Visual Estimation: A Study on the Average Value in Scatterplots

Figure 3 for Visual Validation versus Visual Estimation: A Study on the Average Value in Scatterplots

Figure 4 for Visual Validation versus Visual Estimation: A Study on the Average Value in Scatterplots

Abstract:We investigate the ability of individuals to visually validate statistical models in terms of their fit to the data. While visual model estimation has been studied extensively, visual model validation remains under-investigated. It is unknown how well people are able to visually validate models, and how their performance compares to visual and computational estimation. As a starting point, we conducted a study across two populations (crowdsourced and volunteers). Participants had to both visually estimate (i.e, draw) and visually validate (i.e., accept or reject) the frequently studied model of averages. Across both populations, the level of accuracy of the models that were considered valid was lower than the accuracy of the estimated models. We find that participants' validation and estimation were unbiased. Moreover, their natural critical point between accepting and rejecting a given mean value is close to the boundary of its 95% confidence interval, indicating that the visually perceived confidence interval corresponds to a common statistical standard. Our work contributes to the understanding of visual model validation and opens new research opportunities.

* Preprint and Author Version of a Short Paper, accepted to the 2023 IEEE Visualization Conference (VIS)

Via

Access Paper or Ask Questions

Reclaiming the Horizon: Novel Visualization Designs for Time-Series Data with Large Value Ranges

Jul 18, 2023

Daniel Braun, Rita Borgo, Max Sondag, Tatiana von Landesberger

Abstract:We introduce two novel visualization designs to support practitioners in performing identification and discrimination tasks on large value ranges (i.e., several orders of magnitude) in time-series data: (1) The order of magnitude horizon graph, which extends the classic horizon graph; and (2) the order of magnitude line chart, which adapts the log-line chart. These new visualization designs visualize large value ranges by explicitly splitting the mantissa m and exponent e of a value v = m * 10e . We evaluate our novel designs against the most relevant state-of-the-art visualizations in an empirical user study. It focuses on four main tasks commonly employed in the analysis of time-series and large value ranges visualization: identification, discrimination, estimation, and trend detection. For each task we analyse error, confidence, and response time. The new order of magnitude horizon graph performs better or equal to all other designs in identification, discrimination, and estimation tasks. Only for trend detection tasks, the more traditional horizon graphs reported better performance. Our results are domain-independent, only requiring time-series data with large value ranges.

* Preprint and Author Version of a Full Paper, accepted to the 2023 IEEE Visualization Conference (VIS)

Via

Access Paper or Ask Questions

Challenges in Domain-Specific Abstractive Summarization and How to Overcome them

Jul 03, 2023

Anum Afzal, Juraj Vladika, Daniel Braun, Florian Matthes

Abstract:Large Language Models work quite well with general-purpose data and many tasks in Natural Language Processing. However, they show several limitations when used for a task such as domain-specific abstractive text summarization. This paper identifies three of those limitations as research problems in the context of abstractive text summarization: 1) Quadratic complexity of transformer-based models with respect to the input text length; 2) Model Hallucination, which is a model's ability to generate factually incorrect text; and 3) Domain Shift, which happens when the distribution of the model's training and test corpus is not the same. Along with a discussion of the open research questions, this paper also provides an assessment of existing state-of-the-art techniques relevant to domain-specific text summarization to address the research gaps.

Via

Access Paper or Ask Questions

Investigating Conversational Search Behavior For Domain Exploration

Jan 10, 2023

Phillip Schneider, Anum Afzal, Juraj Vladika, Daniel Braun, Florian Matthes

Abstract:Conversational search has evolved as a new information retrieval paradigm, marking a shift from traditional search systems towards interactive dialogues with intelligent search agents. This change especially affects exploratory information-seeking contexts, where conversational search systems can guide the discovery of unfamiliar domains. In these scenarios, users find it often difficult to express their information goals due to insufficient background knowledge. Conversational interfaces can provide assistance by eliciting information needs and narrowing down the search space. However, due to the complexity of information-seeking behavior, the design of conversational interfaces for retrieving information remains a great challenge. Although prior work has employed user studies to empirically ground the system design, most existing studies are limited to well-defined search tasks or known domains, thus being less exploratory in nature. Therefore, we conducted a laboratory study to investigate open-ended search behavior for navigation through unknown information landscapes. The study comprised of 26 participants who were restricted in their search to a text chat interface. Based on the collected dialogue transcripts, we applied statistical analyses and process mining techniques to uncover general information-seeking patterns across five different domains. We not only identify core dialogue acts and their interrelations that enable users to discover domain knowledge, but also derive design suggestions for conversational search systems.

* Accepted to ECIR 2023

Via

Access Paper or Ask Questions

Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches

Nov 29, 2022

Tim Schopf, Daniel Braun, Florian Matthes

Figure 1 for Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches

Figure 2 for Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches

Figure 3 for Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches

Figure 4 for Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches

Abstract:Text classification of unseen classes is a challenging Natural Language Processing task and is mainly attempted using two different types of approaches. Similarity-based approaches attempt to classify instances based on similarities between text document representations and class description representations. Zero-shot text classification approaches aim to generalize knowledge gained from a training task by assigning appropriate labels of unknown classes to text documents. Although existing studies have already investigated individual approaches to these categories, the experiments in literature do not provide a consistent comparison. This paper addresses this gap by conducting a systematic evaluation of different similarity-based and zero-shot approaches for text classification of unseen classes. Different state-of-the-art approaches are benchmarked on four text classification datasets, including a new dataset from the medical domain. Additionally, novel SimCSE and SBERT-based baselines are proposed, as other baselines used in existing work yield weak classification results and are easily outperformed. Finally, the novel similarity-based Lbl2TransformerVec approach is presented, which outperforms previous state-of-the-art approaches in unsupervised text classification. Our experiments show that similarity-based approaches significantly outperform zero-shot approaches in most cases. Additionally, using SimCSE or SBERT embeddings instead of simpler text representations increases similarity-based classification results even further.

* Accepted to 6th International Conference on Natural Language Processing and Information Retrieval (NLPIR '22)

Via

Access Paper or Ask Questions