Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Christopher Bagdon

Which Demographics do LLMs Default to During Annotation?

Oct 11, 2024

Christopher Bagdon, Aidan Combs, Lynn Greschner, Roman Klinger, Jiahui Li, Sean Papay, Nadine Probol, Yarik Menchaca Resendiz, Johannes Schäfer, Aswathy Velutharambath(+2 more)

Figure 1 for Which Demographics do LLMs Default to During Annotation?

Figure 2 for Which Demographics do LLMs Default to During Annotation?

Figure 3 for Which Demographics do LLMs Default to During Annotation?

Figure 4 for Which Demographics do LLMs Default to During Annotation?

Abstract:Demographics and cultural background of annotators influence the labels they assign in text annotation -- for instance, an elderly woman might find it offensive to read a message addressed to a "bro", but a male teenager might find it appropriate. It is therefore important to acknowledge label variations to not under-represent members of a society. Two research directions developed out of this observation in the context of using large language models (LLM) for data annotations, namely (1) studying biases and inherent knowledge of LLMs and (2) injecting diversity in the output by manipulating the prompt with demographic information. We combine these two strands of research and ask the question to which demographics an LLM resorts to when no demographics is given. To answer this question, we evaluate which attributes of human annotators LLMs inherently mimic. Furthermore, we compare non-demographic conditioned prompts and placebo-conditioned prompts (e.g., "you are an annotator who lives in house number 5") to demographics-conditioned prompts ("You are a 45 year old man and an expert on politeness annotation. How do you rate {instance}"). We study these questions for politeness and offensiveness annotations on the POPQUORN data set, a corpus created in a controlled manner to investigate human label variations based on demographics which has not been used for LLM-based analyses so far. We observe notable influences related to gender, race, and age in demographic prompting, which contrasts with previous studies that found no such effects.

Via

Access Paper or Ask Questions

"You are an expert annotator": Automatic Best-Worst-Scaling Annotations for Emotion Intensity Modeling

Mar 26, 2024

Christopher Bagdon, Prathamesh Karmalker, Harsha Gurulingappa, Roman Klinger

Figure 1 for "You are an expert annotator": Automatic Best-Worst-Scaling Annotations for Emotion Intensity Modeling

Figure 2 for "You are an expert annotator": Automatic Best-Worst-Scaling Annotations for Emotion Intensity Modeling

Figure 3 for "You are an expert annotator": Automatic Best-Worst-Scaling Annotations for Emotion Intensity Modeling

Figure 4 for "You are an expert annotator": Automatic Best-Worst-Scaling Annotations for Emotion Intensity Modeling

Abstract:Labeling corpora constitutes a bottleneck to create models for new tasks or domains. Large language models mitigate the issue with automatic corpus labeling methods, particularly for categorical annotations. Some NLP tasks such as emotion intensity prediction, however, require text regression, but there is no work on automating annotations for continuous label assignments. Regression is considered more challenging than classification: The fact that humans perform worse when tasked to choose values from a rating scale lead to comparative annotation methods, including best-worst scaling. This raises the question if large language model-based annotation methods show similar patterns, namely that they perform worse on rating scale annotation tasks than on comparative annotation tasks. To study this, we automate emotion intensity predictions and compare direct rating scale predictions, pairwise comparisons and best-worst scaling. We find that the latter shows the highest reliability. A transformer regressor fine-tuned on these data performs nearly on par with a model trained on the original manual annotations.

* accepted for publication in NAACL 2024

Via

Access Paper or Ask Questions