Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multi-modal Transformers Excel at Class-agnostic Object Detection

Nov 22, 2021

Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang

Figure 1 for Multi-modal Transformers Excel at Class-agnostic Object Detection

Figure 2 for Multi-modal Transformers Excel at Class-agnostic Object Detection

Figure 3 for Multi-modal Transformers Excel at Class-agnostic Object Detection

Figure 4 for Multi-modal Transformers Excel at Class-agnostic Object Detection

Share this with someone who'll enjoy it:

Abstract:What constitutes an object? This has been a long-standing question in computer vision. Towards this goal, numerous learning-free and learning-based approaches have been developed to score objectness. However, they generally do not scale well across new domains and for unseen objects. In this paper, we advocate that existing methods lack a top-down supervision signal governed by human-understandable semantics. To bridge this gap, we explore recent Multi-modal Vision Transformers (MViT) that have been trained with aligned image-text pairs. Our extensive experiments across various domains and novel objects show the state-of-the-art performance of MViTs to localize generic objects in images. Based on these findings, we develop an efficient and flexible MViT architecture using multi-scale feature processing and deformable self-attention that can adaptively generate proposals given a specific language query. We show the significance of MViT proposals in a diverse range of applications including open-world object detection, salient and camouflage object detection, supervised and self-supervised detection tasks. Further, MViTs offer enhanced interactability with intelligible text queries. Code: https://git.io/J1HPY.

View paper on

Share this with someone who'll enjoy it:

Title:Multi-modal Transformers Excel at Class-agnostic Object Detection

Paper and Code