Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Charles Welch

Funzac at CoMeDi Shared Task: Modeling Annotator Disagreement from Word-In-Context Perspectives

Jan 24, 2025

Olufunke O. Sarumi, Charles Welch, Lucie Flek, Jörg Schlötterer

Abstract:In this work, we evaluate annotator disagreement in Word-in-Context (WiC) tasks exploring the relationship between contextual meaning and disagreement as part of the CoMeDi shared task competition. While prior studies have modeled disagreement by analyzing annotator attributes with single-sentence inputs, this shared task incorporates WiC to bridge the gap between sentence-level semantic representation and annotator judgment variability. We describe three different methods that we developed for the shared task, including a feature enrichment approach that combines concatenation, element-wise differences, products, and cosine similarity, Euclidean and Manhattan distances to extend contextual embedding representations, a transformation by Adapter blocks to obtain task-specific representations of contextual embeddings, and classifiers of varying complexities, including ensembles. The comparison of our methods demonstrates improved performance for methods that include enriched and task-specfic features. While the performance of our method falls short in comparison to the best system in subtask 1 (OGWiC), it is competitive to the official evaluation results in subtask 2 (DisWiC).

* Accepted to CoMeDi Shared Task at COLING 2025

Via

Access Paper or Ask Questions

The Muddy Waters of Modeling Empathy in Language: The Practical Impacts of Theoretical Constructs

Jan 24, 2025

Allison Lahnala, Charles Welch, David Jurgens, Lucie Flek

Abstract:Conceptual operationalizations of empathy in NLP are varied, with some having specific behaviors and properties, while others are more abstract. How these variations relate to one another and capture properties of empathy observable in text remains unclear. To provide insight into this, we analyze the transfer performance of empathy models adapted to empathy tasks with different theoretical groundings. We study (1) the dimensionality of empathy definitions, (2) the correspondence between the defined dimensions and measured/observed properties, and (3) the conduciveness of the data to represent them, finding they have a significant impact to performance compared to other transfer setting features. Characterizing the theoretical grounding of empathy tasks as direct, abstract, or adjacent further indicates that tasks that directly predict specified empathy components have higher transferability. Our work provides empirical evidence for the need for precise and multidimensional empathy operationalizations.

Via

Access Paper or Ask Questions

Do Multilingual Large Language Models Mitigate Stereotype Bias?

Jul 09, 2024

Shangrui Nie, Michael Fromm, Charles Welch, Rebekka Görge, Akbar Karimi, Joan Plepi, Nazia Afsan Mowmita, Nicolas Flores-Herr, Mehdi Ali, Lucie Flek

Figure 1 for Do Multilingual Large Language Models Mitigate Stereotype Bias?

Figure 2 for Do Multilingual Large Language Models Mitigate Stereotype Bias?

Figure 3 for Do Multilingual Large Language Models Mitigate Stereotype Bias?

Figure 4 for Do Multilingual Large Language Models Mitigate Stereotype Bias?

Abstract:While preliminary findings indicate that multilingual LLMs exhibit reduced bias compared to monolingual ones, a comprehensive understanding of the effect of multilingual training on bias mitigation, is lacking. This study addresses this gap by systematically training six LLMs of identical size (2.6B parameters) and architecture: five monolingual models (English, German, French, Italian, and Spanish) and one multilingual model trained on an equal distribution of data across these languages, all using publicly available data. To ensure robust evaluation, standard bias benchmarks were automatically translated into the five target languages and verified for both translation quality and bias preservation by human annotators. Our results consistently demonstrate that multilingual training effectively mitigates bias. Moreover, we observe that multilingual models achieve not only lower bias but also superior prediction accuracy when compared to monolingual models with the same amount of training data, model architecture, and size.

* 19 pages, 8 figures, C3NLP 2024

Via

Access Paper or Ask Questions

Corpus Considerations for Annotator Modeling and Scaling

Apr 02, 2024

Olufunke O. Sarumi, Béla Neuendorf, Joan Plepi, Lucie Flek, Jörg Schlötterer, Charles Welch

Abstract:Recent trends in natural language processing research and annotation tasks affirm a paradigm shift from the traditional reliance on a single ground truth to a focus on individual perspectives, particularly in subjective tasks. In scenarios where annotation tasks are meant to encompass diversity, models that solely rely on the majority class labels may inadvertently disregard valuable minority perspectives. This oversight could result in the omission of crucial information and, in a broader context, risk disrupting the balance within larger ecosystems. As the landscape of annotator modeling unfolds with diverse representation techniques, it becomes imperative to investigate their effectiveness with the fine-grained features of the datasets in view. This study systematically explores various annotator modeling techniques and compares their performance across seven corpora. From our findings, we show that the commonly used user token model consistently outperforms more complex models. We introduce a composite embedding approach and show distinct differences in which model performs best as a function of the agreement with a given dataset. Our findings shed light on the relationship between corpus statistics and annotator modeling performance, which informs future work on corpus construction and perspectivist NLP.

* Accepted at NAACL 2024

Via

Access Paper or Ask Questions

Style Locality for Controllable Generation with kNN Language Models

Nov 01, 2023

Gilles Nawezi, Lucie Flek, Charles Welch

Figure 1 for Style Locality for Controllable Generation with kNN Language Models

Figure 2 for Style Locality for Controllable Generation with kNN Language Models

Figure 3 for Style Locality for Controllable Generation with kNN Language Models

Figure 4 for Style Locality for Controllable Generation with kNN Language Models

Abstract:Recent language models have been improved by the addition of external memory. Nearest neighbor language models retrieve similar contexts to assist in word prediction. The addition of locality levels allows a model to learn how to weight neighbors based on their relative location to the current text in source documents, and have been shown to further improve model performance. Nearest neighbor models have been explored for controllable generation but have not examined the use of locality levels. We present a novel approach for this purpose and evaluate it using automatic and human evaluation on politeness, formality, supportiveness, and toxicity textual data. We find that our model is successfully able to control style and provides a better fluency-style trade-off than previous work.

* Accepted to TamingLLM Workshop at SIGDIAL 2023

Via

Access Paper or Ask Questions

Challenges of GPT-3-based Conversational Agents for Healthcare

Aug 29, 2023

Fabian Lechner, Allison Lahnala, Charles Welch, Lucie Flek

Figure 1 for Challenges of GPT-3-based Conversational Agents for Healthcare

Figure 2 for Challenges of GPT-3-based Conversational Agents for Healthcare

Figure 3 for Challenges of GPT-3-based Conversational Agents for Healthcare

Figure 4 for Challenges of GPT-3-based Conversational Agents for Healthcare

Abstract:The potential to provide patients with faster information access while allowing medical specialists to concentrate on critical tasks makes medical domain dialog agents appealing. However, the integration of large-language models (LLMs) into these agents presents certain limitations that may result in serious consequences. This paper investigates the challenges and risks of using GPT-3-based models for medical question-answering (MedQA). We perform several evaluations contextualized in terms of standard medical principles. We provide a procedure for manually designing patient queries to stress-test high-risk limitations of LLMs in MedQA systems. Our analysis reveals that LLMs fail to respond adequately to these queries, generating erroneous medical information, unsafe recommendations, and content that may be considered offensive.

* 12 pages, 9 Tables, accepted to RANLP 2023

Via

Access Paper or Ask Questions

Unifying Data Perspectivism and Personalization: An Application to Social Norms

Oct 31, 2022

Joan Plepi, Béla Neuendorf, Lucie Flek, Charles Welch

Abstract:Instead of using a single ground truth for language processing tasks, several recent studies have examined how to represent and predict the labels of the set of annotators. However, often little or no information about annotators is known, or the set of annotators is small. In this work, we examine a corpus of social media posts about conflict from a set of 13k annotators and 210k judgements of social norms. We provide a novel experimental setup that applies personalization methods to the modeling of annotators and compare their effectiveness for predicting the perception of social norms. We further provide an analysis of performance across subsets of social situations that vary by the closeness of the relationship between parties in conflict, and assess where personalization helps the most.

Via

Access Paper or Ask Questions

A Critical Reflection and Forward Perspective on Empathy and Natural Language Processing

Oct 29, 2022

Allison Lahnala, Charles Welch, David Jurgens, Lucie Flek

Figure 1 for A Critical Reflection and Forward Perspective on Empathy and Natural Language Processing

Figure 2 for A Critical Reflection and Forward Perspective on Empathy and Natural Language Processing

Figure 3 for A Critical Reflection and Forward Perspective on Empathy and Natural Language Processing

Figure 4 for A Critical Reflection and Forward Perspective on Empathy and Natural Language Processing

Abstract:We review the state of research on empathy in natural language processing and identify the following issues: (1) empathy definitions are absent or abstract, which (2) leads to low construct validity and reproducibility. Moreover, (3) emotional empathy is overemphasized, skewing our focus to a narrow subset of simplified tasks. We believe these issues hinder research progress and argue that current directions will benefit from a clear conceptualization that includes operationalizing cognitive empathy components. Our main objectives are to provide insight and guidance on empathy conceptualization for NLP research objectives and to encourage researchers to pursue the overlooked opportunities in this area, highly relevant, e.g., for clinical and educational sectors.

* To appear at Findings of EMNLP 2022

Via

Access Paper or Ask Questions

Nearest Neighbor Language Models for Stylistic Controllable Generation

Oct 27, 2022

Severino Trotta, Lucie Flek, Charles Welch

Abstract:Recent language modeling performance has been greatly improved by the use of external memory. This memory encodes the context so that similar contexts can be recalled during decoding. This similarity depends on how the model learns to encode context, which can be altered to include other attributes, such as style. We construct and evaluate an architecture for this purpose, using corpora annotated for politeness, formality, and toxicity. Through extensive experiments and human evaluation we demonstrate the potential of our method to generate text while controlling style. We find that style-specific datastores improve generation performance, though results vary greatly across styles, and the effect of pretraining data and specific styles should be explored in future work.

* Accepted to GEM workshop at EMNLP 2022

Via

Access Paper or Ask Questions

Understanding Interpersonal Conflict Types and their Impact on Perception Classification

Aug 18, 2022

Charles Welch, Joan Plepi, Béla Neuendorf, Lucie Flek

Figure 1 for Understanding Interpersonal Conflict Types and their Impact on Perception Classification

Figure 2 for Understanding Interpersonal Conflict Types and their Impact on Perception Classification

Figure 3 for Understanding Interpersonal Conflict Types and their Impact on Perception Classification

Figure 4 for Understanding Interpersonal Conflict Types and their Impact on Perception Classification

Abstract:Studies on interpersonal conflict have a long history and contain many suggestions for conflict typology. We use this as the basis of a novel annotation scheme and release a new dataset of situations and conflict aspect annotations. We then build a classifier to predict whether someone will perceive the actions of one individual as right or wrong in a given situation, outperforming previous work on this task. Our analyses include conflict aspects, but also generated clusters, which are human validated, and show differences in conflict content based on the relationship of participants to the author. Our findings have important implications for understanding conflict and social norms.

Via

Access Paper or Ask Questions