Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rainer Kiko

Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation

Jul 13, 2022

Lars Schmarje, Vasco Grossmann, Claudius Zelenka, Sabine Dippel, Rainer Kiko, Mariusz Oszust, Matti Pastell, Jenny Stracke, Anna Valros, Nina Volkmann(+1 more)

Figure 1 for Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation

Figure 2 for Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation

Figure 3 for Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation

Figure 4 for Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation

Abstract:High-quality data is necessary for modern machine learning. However, the acquisition of such data is difficult due to noisy and ambiguous annotations of humans. The aggregation of such annotations to determine the label of an image leads to a lower data quality. We propose a data-centric image classification benchmark with nine real-world datasets and multiple annotations per image to investigate and quantify the impact of such data quality issues. We focus on a data-centric perspective by asking how we could improve the data quality. Across thousands of experiments, we show that multiple annotations allow a better approximation of the real underlying class distribution. We identify that hard labels can not capture the ambiguity of the data and this might lead to the common issue of overconfident models. Based on the presented datasets, benchmark baselines, and analysis, we create multiple research opportunities for the future.

* Data, supplementary and source code will be released soon

Via

Access Paper or Ask Questions

Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy

Oct 13, 2021

Lars Schmarje, Johannes Brünger, Monty Santarossa, Simon-Martin Schröder, Rainer Kiko, Reinhard Koch

Figure 1 for Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy

Figure 2 for Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy

Figure 3 for Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy

Figure 4 for Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy

Abstract:Deep learning has been successfully applied to many classification problems including underwater challenges. However, a long-standing issue with deep learning is the need for large and consistently labeled datasets. Although current approaches in semi-supervised learning can decrease the required amount of annotated data by a factor of 10 or even more, this line of research still uses distinct classes. For underwater classification, and uncurated real-world datasets in general, clean class boundaries can often not be given due to a limited information content in the images and transitional stages of the depicted objects. This leads to different experts having different opinions and thus producing fuzzy labels which could also be considered ambiguous or divergent. We propose a novel framework for handling semi-supervised classifications of such fuzzy labels. It is based on the idea of overclustering to detect substructures in these fuzzy labels. We propose a novel loss to improve the overclustering capability of our framework and show the benefit of overclustering for fuzzy labels. We show that our framework is superior to previous state-of-the-art semi-supervised methods when applied to real-world plankton data with fuzzy labels. Moreover, we acquire 5 to 10\% more consistent predictions of substructures.

* Sensors 2021, 21(19), 6661
* Source code: https://github.com/Emprime/FuzzyOverclustering Datasets: https://doi.org/10.5281/zenodo.5550918. arXiv admin note: substantial text overlap with arXiv:2012.01768

Via

Access Paper or Ask Questions

S2C2 - An orthogonal method for Semi-Supervised Learning on fuzzy labels

Jun 30, 2021

Lars Schmarje, Monty Santarossa, Simon-Martin Schröder, Claudius Zelenka, Rainer Kiko, Jenny Stracke, Nina Volkmann, Reinhard Koch

Figure 1 for S2C2 - An orthogonal method for Semi-Supervised Learning on fuzzy labels

Figure 2 for S2C2 - An orthogonal method for Semi-Supervised Learning on fuzzy labels

Figure 3 for S2C2 - An orthogonal method for Semi-Supervised Learning on fuzzy labels

Figure 4 for S2C2 - An orthogonal method for Semi-Supervised Learning on fuzzy labels

Abstract:Semi-Supervised Learning (SSL) can decrease the amount of required labeled image data and thus the cost for deep learning. Most SSL methods only consider a clear distinction between classes but in many real-world datasets, this clear distinction is not given due to intra- or interobserver variability. This variability can lead to different annotations per image. Thus many images have ambiguous annotations and their label needs to be considered "fuzzy". This fuzziness of labels must be addressed as it will limit the performance of Semi-Supervised Learning (SSL) and deep learning in general. We propose Semi-Supervised Classification & Clustering (S2C2) which can extend many deep SSL algorithms. S2C2 can estimate the fuzziness of a label and applies SSL as a classification to certainly labeled data while creating distinct clusters for images with similar but fuzzy labels. We show that S2C2 results in median 7.4% better F1-score for classifications and 5.4% lower inner distance of clusters across multiple SSL algorithms and datasets while being more interpretable due to the fuzziness estimation of our method. Overall, a combination of Semi-Supervised Learning with our method S2C2 leads to better handling of the fuzziness of labels and thus real-world datasets.

Via

Access Paper or Ask Questions

Beyond Cats and Dogs: Semi-supervised Classification of fuzzy labels with overclustering

Dec 03, 2020

Lars Schmarje, Johannes Brünger, Monty Santarossa, Simon-Martin Schröder, Rainer Kiko, Reinhard Koch

Figure 1 for Beyond Cats and Dogs: Semi-supervised Classification of fuzzy labels with overclustering

Figure 2 for Beyond Cats and Dogs: Semi-supervised Classification of fuzzy labels with overclustering

Figure 3 for Beyond Cats and Dogs: Semi-supervised Classification of fuzzy labels with overclustering

Figure 4 for Beyond Cats and Dogs: Semi-supervised Classification of fuzzy labels with overclustering

Abstract:A long-standing issue with deep learning is the need for large and consistently labeled datasets. Although the current research in semi-supervised learning can decrease the required amount of annotated data by a factor of 10 or even more, this line of research still uses distinct classes like cats and dogs. However, in the real-world we often encounter problems where different experts have different opinions, thus producing fuzzy labels. We propose a novel framework for handling semi-supervised classifications of such fuzzy labels. Our framework is based on the idea of overclustering to detect substructures in these fuzzy labels. We propose a novel loss to improve the overclustering capability of our framework and show on the common image classification dataset STL-10 that it is faster and has better overclustering performance than previous work. On a real-world plankton dataset, we illustrate the benefit of overclustering for fuzzy labels and show that we beat previous state-of-the-art semisupervised methods. Moreover, we acquire 5 to 10% more consistent predictions of substructures.

Via

Access Paper or Ask Questions

MorphoCluster: Efficient Annotation of Plankton images by Clustering

May 04, 2020

Simon-Martin Schröder, Rainer Kiko, Reinhard Koch

Figure 1 for MorphoCluster: Efficient Annotation of Plankton images by Clustering

Figure 2 for MorphoCluster: Efficient Annotation of Plankton images by Clustering

Figure 3 for MorphoCluster: Efficient Annotation of Plankton images by Clustering

Figure 4 for MorphoCluster: Efficient Annotation of Plankton images by Clustering

Abstract:In this work, we present MorphoCluster, a software tool for data-driven, fast and accurate annotation of large image data sets. While already having surpassed the annotation rate of human experts, volume and complexity of marine data will continue to increase in the coming years. Still, this data requires interpretation. MorphoCluster augments the human ability to discover patterns and perform object classification in large amounts of data by embedding unsupervised clustering in an interactive process. By aggregating similar images into clusters, our novel approach to image annotation increases consistency, multiplies the throughput of an annotator and allows experts to adapt the granularity of their sorting scheme to the structure in the data. By sorting a set of 1.2M objects into 280 data-driven classes in 71 hours (16k objects per hour), with 90% of these classes having a precision of 0.889 or higher. This shows that MorphoCluster is at the same time fast, accurate and consistent, provides a fine-grained and data-driven classification and enables novelty detection. MorphoCluster is available as open-source software at https://github.com/morphocluster.

* 27 pages, 11 figures. Submitted to MDPI Sensors

Via

Access Paper or Ask Questions