Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rasmus Hvingelby

Towards Reducing Data Acquisition and Labeling for Defect Detection using Simulated Data

Jun 27, 2024

Lukas Malte Kemeter, Rasmus Hvingelby, Paulina Sierak, Tobias Schön, Bishwajit Gosswam

Figure 1 for Towards Reducing Data Acquisition and Labeling for Defect Detection using Simulated Data

Figure 2 for Towards Reducing Data Acquisition and Labeling for Defect Detection using Simulated Data

Figure 3 for Towards Reducing Data Acquisition and Labeling for Defect Detection using Simulated Data

Figure 4 for Towards Reducing Data Acquisition and Labeling for Defect Detection using Simulated Data

Abstract:In many manufacturing settings, annotating data for machine learning and computer vision is costly, but synthetic data can be generated at significantly lower cost. Substituting the real-world data with synthetic data is therefore appealing for many machine learning applications that require large amounts of training data. However, relying solely on synthetic data is frequently inadequate for effectively training models that perform well on real-world data, primarily due to domain shifts between the synthetic and real-world data. We discuss approaches for dealing with such a domain shift when detecting defects in X-ray scans of aluminium wheels. Using both simulated and real-world X-ray images, we train an object detection model with different strategies to identify the training approach that generates the best detection results while minimising the demand for annotated real-world training samples. Our preliminary findings suggest that the sim-2-real domain adaptation approach is more cost-efficient than a fully supervised oracle - if the total number of available annotated samples is fixed. Given a certain number of labeled real-world samples, training on a mix of synthetic and unlabeled real-world data achieved comparable or even better detection results at significantly lower cost. We argue that future research into the cost-efficiency of different training strategies is important for a better understanding of how to allocate budget in applied machine learning projects.

Via

Access Paper or Ask Questions

How To Overcome Confirmation Bias in Semi-Supervised Image Classification By Active Learning

Aug 16, 2023

Sandra Gilhuber, Rasmus Hvingelby, Mang Ling Ada Fok, Thomas Seidl

Abstract:Do we need active learning? The rise of strong deep semi-supervised methods raises doubt about the usability of active learning in limited labeled data settings. This is caused by results showing that combining semi-supervised learning (SSL) methods with a random selection for labeling can outperform existing active learning (AL) techniques. However, these results are obtained from experiments on well-established benchmark datasets that can overestimate the external validity. However, the literature lacks sufficient research on the performance of active semi-supervised learning methods in realistic data scenarios, leaving a notable gap in our understanding. Therefore we present three data challenges common in real-world applications: between-class imbalance, within-class imbalance, and between-class similarity. These challenges can hurt SSL performance due to confirmation bias. We conduct experiments with SSL and AL on simulated data challenges and find that random sampling does not mitigate confirmation bias and, in some cases, leads to worse performance than supervised learning. In contrast, we demonstrate that AL can overcome confirmation bias in SSL in these realistic settings. Our results provide insights into the potential of combining active and semi-supervised learning in the presence of common real-world challenges, which is a promising direction for robust methods when learning with limited labeled data in real-world applications.

* Accepted @ ECML PKDD 2023. This is the author's version of the work. The definitive Version of Record will be published in the Proceedings of ECML PKDD 2023

Via

Access Paper or Ask Questions

Multimodal Deep Learning

Jan 12, 2023

Cem Akkus, Luyang Chu, Vladana Djakovic, Steffen Jauch-Walser, Philipp Koch, Giacomo Loss, Christopher Marquardt, Marco Moldovan, Nadja Sauter, Maximilian Schneider(+7 more)

Abstract:This book is the result of a seminar in which we reviewed multimodal approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Further, modeling frameworks are discussed where one modality is transformed into the other, as well as models in which one modality is utilized to enhance representation learning for the other. To conclude the second part, architectures with a focus on handling both modalities simultaneously are introduced. Finally, we also cover other modalities as well as general-purpose multi-modal models, which are able to handle different tasks on different modalities within one unified architecture. One interesting application (Generative Art) eventually caps off this booklet.

Via

Access Paper or Ask Questions

Deep Semi-Supervised Learning for Time Series Classification

Feb 06, 2021

Jann Goschenhofer, Rasmus Hvingelby, David Rügamer, Janek Thomas, Moritz Wagner, Bernd Bischl

Figure 1 for Deep Semi-Supervised Learning for Time Series Classification

Figure 2 for Deep Semi-Supervised Learning for Time Series Classification

Figure 3 for Deep Semi-Supervised Learning for Time Series Classification

Abstract:While Semi-supervised learning has gained much attention in computer vision on image data, yet limited research exists on its applicability in the time series domain. In this work, we investigate the transferability of state-of-the-art deep semi-supervised models from image to time series classification. We discuss the necessary model adaptations, in particular an appropriate model backbone architecture and the use of tailored data augmentation strategies. Based on these adaptations, we explore the potential of deep semi-supervised learning in the context of time series classification by evaluating our methods on large public time series classification problems with varying amounts of labelled samples. We perform extensive comparisons under a decidedly realistic and appropriate evaluation scheme with a unified reimplementation of all algorithms considered, which is yet lacking in the field. We find that these transferred semi-supervised models show significant performance gains over strong supervised, semi-supervised and self-supervised alternatives, especially for scenarios with very few labelled samples.

Via

Access Paper or Ask Questions

Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias

Sep 28, 2020

Ana Valeria Gonzalez, Maria Barrett, Rasmus Hvingelby, Kellie Webster, Anders Søgaard

Figure 1 for Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias

Figure 2 for Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias

Figure 3 for Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias

Figure 4 for Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias

Abstract:The one-sided focus on English in previous studies of gender bias in NLP misses out on opportunities in other languages: English challenge datasets such as GAP and WinoGender highlight model preferences that are "hallucinatory", e.g., disambiguating gender-ambiguous occurrences of 'doctor' as male doctors. We show that for languages with type B reflexivization, e.g., Swedish and Russian, we can construct multi-task challenge datasets for detecting gender bias that lead to unambiguously wrong model predictions: In these languages, the direct translation of 'the doctor removed his mask' is not ambiguous between a coreferential reading and a disjoint reading. Instead, the coreferential reading requires a non-gendered pronoun, and the gendered, possessive pronouns are anti-reflexive. We present a multilingual, multi-task challenge dataset, which spans four languages and four NLP tasks and focuses only on this phenomenon. We find evidence for gender bias across all task-language combinations and correlate model bias with national labor market statistics.

* To appear in EMNLP 2020

Via

Access Paper or Ask Questions

The Danish Gigaword Project

May 08, 2020

Leon Strømberg-Derczynski, Rebekah Baglini, Morten H. Christiansen, Manuel R. Ciosici, Jacob Aarup Dalsgaard, Riccardo Fusaroli, Peter Juel Henrichsen, Rasmus Hvingelby, Andreas Kirkedal, Alex Speed Kjeldsen(+5 more)

Figure 1 for The Danish Gigaword Project

Abstract:Danish is a North Germanic/Scandinavian language spoken primarily in Denmark, a country with a tradition of technological and scientific innovation. However, from a technological perspective, the Danish language has received relatively little attention and, as a result, Danish language technology is hard to develop, in part due to a lack of large or broad-coverage Danish corpora. This paper describes the Danish Gigaword project, which aims to construct a freely-available one billion word corpus of Danish text that represents the breadth of the written language.

Via

Access Paper or Ask Questions