Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ulyana Tkachenko

ObjectLab: Automated Diagnosis of Mislabeled Images in Object Detection Data

Sep 02, 2023

Ulyana Tkachenko, Aditya Thyagarajan, Jonas Mueller

Abstract:Despite powering sensitive systems like autonomous vehicles, object detection remains fairly brittle in part due to annotation errors that plague most real-world training datasets. We propose ObjectLab, a straightforward algorithm to detect diverse errors in object detection labels, including: overlooked bounding boxes, badly located boxes, and incorrect class label assignments. ObjectLab utilizes any trained object detection model to score the label quality of each image, such that mislabeled images can be automatically prioritized for label review/correction. Properly handling erroneous data enables training a better version of the same object detection model, without any change in existing modeling code. Across different object detection datasets (including COCO) and different models (including Detectron-X101 and Faster-RCNN), ObjectLab consistently detects annotation errors with much better precision/recall compared to other label quality scores.

* ICML Workshop on Data-centric Machine Learning Research

Via

Access Paper or Ask Questions

Utilizing supervised models to infer consensus labels and their quality from data with multiple annotators

Oct 13, 2022

Hui Wen Goh, Ulyana Tkachenko, Jonas Mueller

Figure 1 for Utilizing supervised models to infer consensus labels and their quality from data with multiple annotators

Figure 2 for Utilizing supervised models to infer consensus labels and their quality from data with multiple annotators

Figure 3 for Utilizing supervised models to infer consensus labels and their quality from data with multiple annotators

Abstract:Real-world data for classification is often labeled by multiple annotators. For analyzing such data, we introduce CROWDLAB, a straightforward approach to estimate: (1) A consensus label for each example that aggregates the individual annotations (more accurately than aggregation via majority-vote or other algorithms used in crowdsourcing); (2) A confidence score for how likely each consensus label is correct (via well-calibrated estimates that account for the number of annotations for each example and their agreement, prediction-confidence from a trained classifier, and trustworthiness of each annotator vs. the classifier); (3) A rating for each annotator quantifying the overall correctness of their labels. While many algorithms have been proposed to estimate related quantities in crowdsourcing, these often rely on sophisticated generative models with iterative inference schemes, whereas CROWDLAB is based on simple weighted ensembling. Many algorithms also rely solely on annotator statistics, ignoring the features of the examples from which the annotations derive. CROWDLAB in contrast utilizes any classifier model trained on these features, which can generalize between examples with similar features. In evaluations on real-world multi-annotator image data, our proposed method provides superior estimates for (1)-(3) than many alternative algorithms.

Via

Access Paper or Ask Questions