Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Vocabulary-free Image Classification and Semantic Segmentation

Apr 16, 2024

Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci

Figure 1 for Vocabulary-free Image Classification and Semantic Segmentation

Figure 2 for Vocabulary-free Image Classification and Semantic Segmentation

Figure 3 for Vocabulary-free Image Classification and Semantic Segmentation

Figure 4 for Vocabulary-free Image Classification and Semantic Segmentation

Share this with someone who'll enjoy it:

Abstract:Large vision-language models revolutionized image classification and semantic segmentation paradigms. However, they typically assume a pre-defined set of categories, or vocabulary, at test time for composing textual prompts. This assumption is impractical in scenarios with unknown or evolving semantic context. Here, we address this issue and introduce the Vocabulary-free Image Classification (VIC) task, which aims to assign a class from an unconstrained language-induced semantic space to an input image without needing a known vocabulary. VIC is challenging due to the vastness of the semantic space, which contains millions of concepts, including fine-grained categories. To address VIC, we propose Category Search from External Databases (CaSED), a training-free method that leverages a pre-trained vision-language model and an external database. CaSED first extracts the set of candidate categories from the most semantically similar captions in the database and then assigns the image to the best-matching candidate category according to the same vision-language model. Furthermore, we demonstrate that CaSED can be applied locally to generate a coarse segmentation mask that classifies image regions, introducing the task of Vocabulary-free Semantic Segmentation. CaSED and its variants outperform other more complex vision-language models, on classification and semantic segmentation benchmarks, while using much fewer parameters.

* Under review, 22 pages, 10 figures, code is available at https://github.com/altndrr/vicss. arXiv admin note: text overlap with arXiv:2306.00917

View paper on

Share this with someone who'll enjoy it:

Title:Vocabulary-free Image Classification and Semantic Segmentation

Paper and Code