Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Mar 11, 2024

Tiancheng Zhao, Peng Liu, Xuan He, Lu Zhang, Kyusong Lee

Figure 1 for Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Figure 2 for Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Figure 3 for Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Figure 4 for Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Share this with someone who'll enjoy it:

Abstract:End-to-end transformer-based detectors (DETRs) have shown exceptional performance in both closed-set and open-vocabulary object detection (OVD) tasks through the integration of language modalities. However, their demanding computational requirements have hindered their practical application in real-time object detection (OD) scenarios. In this paper, we scrutinize the limitations of two leading models in the OVDEval benchmark, OmDet and Grounding-DINO, and introduce OmDet-Turbo. This novel transformer-based real-time OVD model features an innovative Efficient Fusion Head (EFH) module designed to alleviate the bottlenecks observed in OmDet and Grounding-DINO. Notably, OmDet-Turbo-Base achieves a 100.2 frames per second (FPS) with TensorRT and language cache techniques applied. Notably, in zero-shot scenarios on COCO and LVIS datasets, OmDet-Turbo achieves performance levels nearly on par with current state-of-the-art supervised models. Furthermore, it establishes new state-of-the-art benchmarks on ODinW and OVDEval, boasting an AP of 30.1 and an NMS-AP of 26.86, respectively. The practicality of OmDet-Turbo in industrial applications is underscored by its exceptional performance on benchmark datasets and superior inference speed, positioning it as a compelling choice for real-time object detection tasks. Code: \url{https://github.com/om-ai-lab/OmDet}

* Preprint

View paper on

Share this with someone who'll enjoy it:

Title:Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Paper and Code