Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ji-Ung Lee

B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability

Feb 18, 2025

Yifan Wang, Sukrut Rao, Ji-Ung Lee, Mayank Jobanputra, Vera Demberg

Abstract:Post-hoc explanation methods for black-box models often struggle with faithfulness and human interpretability due to the lack of explainability in current neural models. Meanwhile, B-cos networks have been introduced to improve model explainability through architectural and computational adaptations, but their application has so far been limited to computer vision models and their associated training pipelines. In this work, we introduce B-cos LMs, i.e., B-cos networks empowered for NLP tasks. Our approach directly transforms pre-trained language models into B-cos LMs by combining B-cos conversion and task fine-tuning, improving efficiency compared to previous B-cos methods. Our automatic and human evaluation results demonstrate that B-cos LMs produce more faithful and human interpretable explanations than post hoc methods, while maintaining task performance comparable to conventional fine-tuning. Our in-depth analysis explores how B-cos LMs differ from conventionally fine-tuned models in their learning processes and explanation patterns. Finally, we provide practical guidelines for effectively building B-cos LMs based on our findings. Our code is available at https://anonymous.4open.science/r/bcos_lm.

* 20 pages, 15 figures

Via

Access Paper or Ask Questions

Constrained C-Test Generation via Mixed-Integer Programming

Apr 12, 2024

Ji-Ung Lee, Marc E. Pfetsch, Iryna Gurevych

Abstract:This work proposes a novel method to generate C-Tests; a deviated form of cloze tests (a gap filling exercise) where only the last part of a word is turned into a gap. In contrast to previous works that only consider varying the gap size or gap placement to achieve locally optimal solutions, we propose a mixed-integer programming (MIP) approach. This allows us to consider gap size and placement simultaneously, achieving globally optimal solutions, and to directly integrate state-of-the-art models for gap difficulty prediction into the optimization problem. A user study with 40 participants across four C-Test generation strategies (including GPT-4) shows that our approach (MIP) significantly outperforms two of the baseline strategies (based on gap placement and GPT-4); and performs on-par with the third (based on gap size). Our analysis shows that GPT-4 still struggles to fulfill explicit constraints during generation and that MIP produces C-Tests that correlate best with the perceived difficulty. We publish our code, model, and collected data consisting of 32 English C-Tests with 20 gaps each (totaling 3,200 individual gap responses) under an open source license.

* Github: https://github.com/UKPLab/arxiv2024-constrained-ctest-generation

Via

Access Paper or Ask Questions

Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research

Jun 29, 2023

Ji-Ung Lee, Haritz Puerto, Betty van Aken, Yuki Arase, Jessica Zosa Forde, Leon Derczynski, Andreas Rücklé, Iryna Gurevych, Roy Schwartz, Emma Strubell(+1 more)

Figure 1 for Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research

Figure 2 for Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research

Figure 3 for Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research

Figure 4 for Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research

Abstract:Many recent improvements in NLP stem from the development and use of large pre-trained language models (PLMs) with billions of parameters. Large model sizes makes computational cost one of the main limiting factors for training and evaluating such models; and has raised severe concerns about the sustainability, reproducibility, and inclusiveness for researching PLMs. These concerns are often based on personal experiences and observations. However, there had not been any large-scale surveys that investigate them. In this work, we provide a first attempt to quantify these concerns regarding three topics, namely, environmental impact, equity, and impact on peer reviewing. By conducting a survey with 312 participants from the NLP community, we capture existing (dis)parities between different and within groups with respect to seniority, academia, and industry; and their impact on the peer reviewing process. For each topic, we provide an analysis and devise recommendations to mitigate found disparities, some of which already successfully implemented. Finally, we discuss additional concerns raised by many participants in free-text responses.

Via

Access Paper or Ask Questions

Lessons Learned from a Citizen Science Project for Natural Language Processing

Apr 25, 2023

Jan-Christoph Klie, Ji-Ung Lee, Kevin Stowe, Gözde Gül Şahin, Nafise Sadat Moosavi, Luke Bates, Dominic Petrak, Richard Eckart de Castilho, Iryna Gurevych

Figure 1 for Lessons Learned from a Citizen Science Project for Natural Language Processing

Figure 2 for Lessons Learned from a Citizen Science Project for Natural Language Processing

Figure 3 for Lessons Learned from a Citizen Science Project for Natural Language Processing

Figure 4 for Lessons Learned from a Citizen Science Project for Natural Language Processing

Abstract:Many Natural Language Processing (NLP) systems use annotated corpora for training and evaluation. However, labeled data is often costly to obtain and scaling annotation projects is difficult, which is why annotation tasks are often outsourced to paid crowdworkers. Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP. To investigate whether and how well Citizen Science can be applied in this setting, we conduct an exploratory study into engaging different groups of volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing crowdsourced dataset. Our results show that this can yield high-quality annotations and attract motivated volunteers, but also requires considering factors such as scalability, participation over time, and legal and ethical issues. We summarize lessons learned in the form of guidelines and provide our code and data to aid future work on Citizen Science.

* Accepted to EACL 2023. Code will be published on github: https://github.com/UKPLab/eacl2023-citizen-science-lessons-learned

Via

Access Paper or Ask Questions

Transformers with Learnable Activation Functions

Sep 01, 2022

Haishuo Fang, Ji-Ung Lee, Nafise Sadat Moosavi, Iryna Gurevych

Figure 1 for Transformers with Learnable Activation Functions

Figure 2 for Transformers with Learnable Activation Functions

Figure 3 for Transformers with Learnable Activation Functions

Figure 4 for Transformers with Learnable Activation Functions

Abstract:Activation functions can have a significant impact on reducing the topological complexity of input data and therefore improve the performance of the model. Selecting a suitable activation function is an essential step in neural model design. However, the choice of activation function is seldom discussed or explored in Transformer-based language models. Their activation functions are chosen beforehand and then remain fixed from pre-training to fine-tuning. As a result, the inductive biases they imposed on models cannot be adjusted during this long life cycle. Moreover, subsequently developed models (e.g., RoBERTa, BART, and GPT-3) often follow up prior work (e.g., BERT) to use the same activation function without justification. In this paper, we investigate the effectiveness of using Rational Activation Function (RAF), a learnable activation function, in the Transformer architecture. In contrast to conventional, predefined activation functions, RAFs can adaptively learn optimal activation functions during training according to input data. Our experiments show the RAF-based Transformer (RAFT) achieves a lower validation perplexity than a vanilla BERT with the GELU function. We further evaluate RAFT on downstream tasks in low- and full-data settings. Our results show that RAFT outperforms the counterpart model across the majority of tasks and settings. For instance, RAFT outperforms vanilla BERT on the GLUE benchmark by 5.71 points on average in low-data scenario (where 100 training examples are available) and by 2.05 points on SQuAD in full-data setting. Analysis of the shapes of learned RAFs further unveils that they substantially vary between different layers of the pre-trained model and mostly look very different from conventional activation functions. RAFT opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.

Via

Access Paper or Ask Questions

TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation

Aug 16, 2022

Lorenz Stangier, Ji-Ung Lee, Yuxi Wang, Marvin Müller, Nicholas Frick, Joachim Metternich, Iryna Gurevych

Figure 1 for TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation

Figure 2 for TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation

Figure 3 for TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation

Figure 4 for TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation

Abstract:Collecting and annotating task-oriented dialog data is difficult, especially for highly specific domains that require expert knowledge. At the same time, informal communication channels such as instant messengers are increasingly being used at work. This has led to a lot of work-relevant information that is disseminated through those channels and needs to be post-processed manually by the employees. To alleviate this problem, we present TexPrax, a messaging system to collect and annotate problems, causes, and solutions that occur in work-related chats. TexPrax uses a chatbot to directly engage the employees to provide lightweight annotations on their conversation and ease their documentation work. To comply with data privacy and security regulations, we use an end-to-end message encryption and give our users full control over their data which has various advantages over conventional annotation tools. We evaluate TexPrax in a user-study with German factory employees who ask their colleagues for solutions on problems that arise during their daily work. Overall, we collect 201 task-oriented German dialogues containing 1,027 sentences with sentence-level expert annotations. Our data analysis also reveals that real-world conversations frequently contain instances with code-switching, varying abbreviations for the same entity, and dialects which NLP systems should be able to handle.

* Code and data: https://github.com/UKPLab/TexPrax

Via

Access Paper or Ask Questions

Annotation Curricula to Implicitly Train Non-Expert Annotators

Jun 09, 2021

Ji-Ung Lee, Jan-Christoph Klie, Iryna Gurevych

Figure 1 for Annotation Curricula to Implicitly Train Non-Expert Annotators

Figure 2 for Annotation Curricula to Implicitly Train Non-Expert Annotators

Figure 3 for Annotation Curricula to Implicitly Train Non-Expert Annotators

Figure 4 for Annotation Curricula to Implicitly Train Non-Expert Annotators

Abstract:Annotation studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain. This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations; especially in citizen science or crowd sourcing scenarios where domain expertise is not required and only annotation guidelines are provided. To alleviate these issues, we propose annotation curricula, a novel approach to implicitly train annotators. Our goal is to gradually introduce annotators into the task by ordering instances that are annotated according to a learning curriculum. To do so, we first formalize annotation curricula for sentence- and paragraph-level annotation tasks, define an ordering strategy, and identify well-performing heuristics and interactively trained models on three existing English datasets. We then conduct a user study with 40 voluntary participants who are asked to identify the most fitting misconception for English tweets about the Covid-19 pandemic. Our results show that using a simple heuristic to order instances can already significantly reduce the total annotation time while preserving a high annotation quality. Annotation curricula thus can provide a novel way to improve data collection. To facilitate future research, we further share our code and data consisting of 2,400 annotations.

Via

Access Paper or Ask Questions

Investigating label suggestions for opinion mining in German Covid-19 social media

Jun 08, 2021

Tilman Beck, Ji-Ung Lee, Christina Viehmann, Marcus Maurer, Oliver Quiring, Iryna Gurevych

Figure 1 for Investigating label suggestions for opinion mining in German Covid-19 social media

Figure 2 for Investigating label suggestions for opinion mining in German Covid-19 social media

Figure 3 for Investigating label suggestions for opinion mining in German Covid-19 social media

Figure 4 for Investigating label suggestions for opinion mining in German Covid-19 social media

Abstract:This work investigates the use of interactively updated label suggestions to improve upon the efficiency of gathering annotations on the task of opinion mining in German Covid-19 social media data. We develop guidelines to conduct a controlled annotation study with social science students and find that suggestions from a model trained on a small, expert-annotated dataset already lead to a substantial improvement - in terms of inter-annotator agreement(+.14 Fleiss' $\kappa$) and annotation quality - compared to students that do not receive any label suggestions. We further find that label suggestions from interactively trained models do not lead to an improvement over suggestions from a static model. Nonetheless, our analysis of suggestion bias shows that annotators remain capable of reflecting upon the suggested label in general. Finally, we confirm the quality of the annotated data in transfer learning experiments between different annotator groups. To facilitate further research in opinion mining on social media data, we release our collected data consisting of 200 expert and 2,785 student annotations.

* To Appear at ACL 2021

Via

Access Paper or Ask Questions

Empowering Active Learning to Jointly Optimize System and User Demands

May 12, 2020

Ji-Ung Lee, Christian M. Meyer, Iryna Gurevych

Figure 1 for Empowering Active Learning to Jointly Optimize System and User Demands

Figure 2 for Empowering Active Learning to Jointly Optimize System and User Demands

Figure 3 for Empowering Active Learning to Jointly Optimize System and User Demands

Figure 4 for Empowering Active Learning to Jointly Optimize System and User Demands

Abstract:Existing approaches to active learning maximize the system performance by sampling unlabeled instances for annotation that yield the most efficient training. However, when active learning is integrated with an end-user application, this can lead to frustration for participating users, as they spend time labeling instances that they would not otherwise be interested in reading. In this paper, we propose a new active learning approach that jointly optimizes the seemingly counteracting objectives of the active learning system (training efficiently) and the user (receiving useful instances). We study our approach in an educational application, which particularly benefits from this technique as the system needs to rapidly learn to predict the appropriateness of an exercise to a particular user, while the users should receive only exercises that match their skills. We evaluate multiple learning strategies and user types with data from real users and find that our joint approach better satisfies both objectives when alternative methods lead to many unsuitable exercises for end users.

* To appear as a long paper in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020). Download our code and simulated user models at github: https://github.com/UKPLab/acl2020-empowering-active-learning

Via

Access Paper or Ask Questions

Manipulating the Difficulty of C-Tests

Jul 02, 2019

Ji-Ung Lee, Erik Schwan, Christian M. Meyer

Figure 1 for Manipulating the Difficulty of C-Tests

Figure 2 for Manipulating the Difficulty of C-Tests

Figure 3 for Manipulating the Difficulty of C-Tests

Figure 4 for Manipulating the Difficulty of C-Tests

Abstract:We propose two novel manipulation strategies for increasing and decreasing the difficulty of C-tests automatically. This is a crucial step towards generating learner-adaptive exercises for self-directed language learning and preparing language assessment tests. To reach the desired difficulty level, we manipulate the size and the distribution of gaps based on absolute and relative gap difficulty predictions. We evaluate our approach in corpus-based experiments and in a user study with 60 participants. We find that both strategies are able to generate C-tests with the desired difficulty level.

* To appear as a long paper in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019). Download our code and data from the user study at github: https://github.com/UKPLab/acl2019-ctest-difficulty-manipulation

Via

Access Paper or Ask Questions