Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

Apr 10, 2023

Atsuyuki Miyai, Qing Yu, Go Irie, Kiyoharu Aizawa

Figure 1 for Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

Figure 2 for Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

Figure 3 for Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

Figure 4 for Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

Share this with someone who'll enjoy it:

Abstract:Removing out-of-distribution (OOD) images from noisy images scraped from the Internet is an important preprocessing for constructing datasets, which can be addressed by zero-shot OOD detection with vision language foundation models (CLIP). The existing zero-shot OOD detection setting does not consider the realistic case where an image has both in-distribution (ID) objects and OOD objects. However, it is important to identify such images as ID images when collecting the images of rare classes or ethically inappropriate classes that must not be missed. In this paper, we propose a novel problem setting called in-distribution (ID) detection, where we identify images containing ID objects as ID images, even if they contain OOD objects, and images lacking ID objects as OOD images. To solve this problem, we present a new approach, \textbf{G}lobal-\textbf{L}ocal \textbf{M}aximum \textbf{C}oncept \textbf{M}atching (GL-MCM), based on both global and local visual-text alignments of CLIP features, which can identify any image containing ID objects as ID images. Extensive experiments demonstrate that GL-MCM outperforms comparison methods on both multi-object datasets and single-object ImageNet benchmarks.

View paper on

Share this with someone who'll enjoy it:

Title:Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

Paper and Code