Abstract:In recent years, performance on existing anomaly detection benchmarks like MVTec AD and VisA has started to saturate in terms of segmentation AU-PRO, with state-of-the-art models often competing in the range of less than one percentage point. This lack of discriminatory power prevents a meaningful comparison of models and thus hinders progress of the field, especially when considering the inherent stochastic nature of machine learning results. We present MVTec AD 2, a collection of eight anomaly detection scenarios with more than 8000 high-resolution images. It comprises challenging and highly relevant industrial inspection use cases that have not been considered in previous datasets, including transparent and overlapping objects, dark-field and back light illumination, objects with high variance in the normal data, and extremely small defects. We provide comprehensive evaluations of state-of-the-art methods and show that their performance remains below 60% average AU-PRO. Additionally, our dataset provides test scenarios with lighting condition changes to assess the robustness of methods under real-world distribution shifts. We host a publicly accessible evaluation server that holds the pixel-precise ground truth of the test set (https://benchmark.mvtec.com/). All image data is available at https://www.mvtec.com/company/research/datasets/mvtec-ad-2.
Abstract:Zero-shot detection (ZSD), i.e., detection on classes not seen during training, is essential for real world detection use-cases, but remains a difficult task. Recent research attempts ZSD with detection models that output embeddings instead of direct class labels. To this aim, the output of the detection model must be aligned to a learned embedding space such as CLIP. However, this alignment is hindered by detection data sets which are expensive to produce compared to image classification annotations, and the resulting lack of category diversity in the training data. We address this challenge by leveraging the CLIP embedding space in combination with image labels from ImageNet. Our results show that image labels are able to better align the detector output to the embedding space and thus have a high potential for ZSD. Compared to only training on detection data, we see a significant gain by adding image label data of 3.3 mAP for the 65/15 split on COCO on the unseen classes, i.e., we more than double the gain of related work.
Abstract:The usability of Reinforcement Learning is restricted by the large computation times it requires. Curriculum Reinforcement Learning speeds up learning by defining a helpful order in which an agent encounters tasks, i.e. from simple to hard. Curricula based on Absolute Learning Progress (ALP) have proven successful in different environments, but waste computation on repeating already learned behaviour in new tasks. We solve this problem by introducing a new regularization method based on Self-Paced (Deep) Learning, called Self-Paced Absolute Learning Progress (SPALP). We evaluate our method in three different environments. Our method achieves performance comparable to original ALP in all cases, and reaches it quicker than ALP in two of them. We illustrate possibilities to further improve the efficiency and performance of SPALP.