Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fredrik Olsson

Have LLMs Made Active Learning Obsolete? Surveying the NLP Community

Mar 12, 2025

Julia Romberg, Christopher Schröder, Julius Gonsior, Katrin Tomanek, Fredrik Olsson

Abstract:Supervised learning relies on annotated data, which is expensive to obtain. A longstanding strategy to reduce annotation costs is active learning, an iterative process, in which a human annotates only data instances deemed informative by a model. Large language models (LLMs) have pushed the effectiveness of active learning, but have also improved methods such as few- or zero-shot learning, and text synthesis - thereby introducing potential alternatives. This raises the question: has active learning become obsolete? To answer this fully, we must look beyond literature to practical experiences. We conduct an online survey in the NLP community to collect previously intangible insights on the perceived relevance of data annotation, particularly focusing on active learning, including best practices, obstacles and expected future developments. Our findings show that annotated data remains a key factor, and active learning continues to be relevant. While the majority of active learning users find it effective, a comparison with a community survey from over a decade ago reveals persistent challenges: setup complexity, estimation of cost reduction, and tooling. We publish an anonymized version of the collected dataset

Via

Access Paper or Ask Questions

Text Annotation Handbook: A Practical Guide for Machine Learning Projects

Oct 18, 2023

Felix Stollenwerk, Joey Öhman, Danila Petrelli, Emma Wallerö, Fredrik Olsson, Camilla Bengtsson, Andreas Horndahl, Gabriela Zarzar Gandler

Figure 1 for Text Annotation Handbook: A Practical Guide for Machine Learning Projects

Figure 2 for Text Annotation Handbook: A Practical Guide for Machine Learning Projects

Figure 3 for Text Annotation Handbook: A Practical Guide for Machine Learning Projects

Figure 4 for Text Annotation Handbook: A Practical Guide for Machine Learning Projects

Abstract:This handbook is a hands-on guide on how to approach text annotation tasks. It provides a gentle introduction to the topic, an overview of theoretical concepts as well as practical advice. The topics covered are mostly technical, but business, ethical and regulatory issues are also touched upon. The focus lies on readability and conciseness rather than completeness and scientific rigor. Experience with annotation and knowledge of machine learning are useful but not required. The document may serve as a primer or reference book for a wide range of professions such as team leaders, project managers, IT architects, software developers and machine learning engineers.

* 30 pages, white paper

Via

Access Paper or Ask Questions

We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing

Oct 11, 2021

Fredrik Olsson, Magnus Sahlgren

Figure 1 for We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing

Figure 2 for We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing

Figure 3 for We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing

Figure 4 for We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing

Abstract:In this paper, we identify the state of data as being an important reason for failure in applied Natural Language Processing (NLP) projects. We argue that there is a gap between academic research in NLP and its application to problems outside academia, and that this gap is rooted in poor mutual understanding between academic researchers and their non-academic peers who seek to apply research results to their operations. To foster transfer of research results from academia to non-academic settings, and the corresponding influx of requirements back to academia, we propose a method for improving the communication between researchers and external stakeholders regarding the accessibility, validity, and utility of data based on Data Readiness Levels \cite{lawrence2017data}. While still in its infancy, the method has been iterated on and applied in multiple innovation and research projects carried out with stakeholders in both the private and public sectors. Finally, we invite researchers and practitioners to share their experiences, and thus contributing to a body of work aimed at raising awareness of the importance of data readiness for NLP.

Via

Access Paper or Ask Questions

Exploring the Assessment List for Trustworthy AI in the Context of Advanced Driver-Assistance Systems

Mar 04, 2021

Markus Borg, Joshua Bronson, Linus Christensson, Fredrik Olsson, Olof Lennartsson, Elias Sonnsjö, Hamid Ebabi, Martin Karsberg

Figure 1 for Exploring the Assessment List for Trustworthy AI in the Context of Advanced Driver-Assistance Systems

Figure 2 for Exploring the Assessment List for Trustworthy AI in the Context of Advanced Driver-Assistance Systems

Figure 3 for Exploring the Assessment List for Trustworthy AI in the Context of Advanced Driver-Assistance Systems

Abstract:Artificial Intelligence (AI) is increasingly used in critical applications. Thus, the need for dependable AI systems is rapidly growing. In 2018, the European Commission appointed experts to a High-Level Expert Group on AI (AI-HLEG). AI-HLEG defined Trustworthy AI as 1) lawful, 2) ethical, and 3) robust and specified seven corresponding key requirements. To help development organizations, AI-HLEG recently published the Assessment List for Trustworthy AI (ALTAI). We present an illustrative case study from applying ALTAI to an ongoing development project of an Advanced Driver-Assistance System (ADAS) that relies on Machine Learning (ML). Our experience shows that ALTAI is largely applicable to ADAS development, but specific parts related to human agency and transparency can be disregarded. Moreover, bigger questions related to societal and environmental impact cannot be tackled by an ADAS supplier in isolation. We present how we plan to develop the ADAS to ensure ALTAI-compliance. Finally, we provide three recommendations for the next revision of ALTAI, i.e., life-cycle variants, domain-specific adaptations, and removed redundancy.

* Accepted for publication in the Proc. of the 2nd Workshop on Ethics in Software Engineering Research and Practice

Via

Access Paper or Ask Questions

Data Readiness for Natural Language Processing

Sep 30, 2020

Fredrik Olsson, Magnus Sahlgren

Figure 1 for Data Readiness for Natural Language Processing

Abstract:This document concerns data readiness in the context of machine learning and Natural Language Processing. It describes how an organization may proceed to identify, make available, validate, and prepare data to facilitate automated analysis methods. The contents of the document is based on the practical challenges and frequently asked questions we have encountered in our work as an applied research institute with helping organizations and companies, both in the public and private sectors, to use data in their business processes.

Via

Access Paper or Ask Questions

Joint axis estimation for fast and slow movements using weighted gyroscope and acceleration constraints

Mar 18, 2019

Fredrik Olsson, Thomas Seel, Dustin Lehmann, Kjartan Halvorsen

Figure 1 for Joint axis estimation for fast and slow movements using weighted gyroscope and acceleration constraints

Figure 2 for Joint axis estimation for fast and slow movements using weighted gyroscope and acceleration constraints

Figure 3 for Joint axis estimation for fast and slow movements using weighted gyroscope and acceleration constraints

Figure 4 for Joint axis estimation for fast and slow movements using weighted gyroscope and acceleration constraints

Abstract:Sensor-to-segment calibration is a crucial step in inertial motion tracking. When two segments are connected by a hinge joint, for example in human knee and finger joints as well as in many robotic limbs, then the joint axis vector must be identified in the intrinsic sensor coordinate systems. There exist methods that identify these coordinates by solving an optimization problem that is based on kinematic joint constraints, which involve either the measured accelerations or the measured angular rates. In the current paper we demonstrate that using only one of these constraints leads to inaccurate estimates at either fast or slow motions. We propose a novel method based on a cost function that combines both constraints. The restrictive assumption of a homogeneous magnetic field is avoided by using only accelerometer and gyroscope readings. To combine the advantages of both sensor types, the residual weights are adjusted automatically based on the estimated signal variances and a nonlinear weighting of the acceleration norm difference. The method is evaluated using real data from nine different motions of an upper limb exoskeleton. Results show that, unlike previous approaches, the proposed method yields accurate joint axis estimation after only five seconds for all fast and slow motions.

* 8 pages, 4 figures, 1 table

Via

Access Paper or Ask Questions