Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

Mar 23, 2023

Qifan Yu, Juncheng Li, Yu Wu, Siliang Tang, Wei Ji, Yueting Zhuang

Figure 1 for Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

Figure 2 for Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

Figure 3 for Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

Figure 4 for Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

Share this with someone who'll enjoy it:

Abstract:Scene Graph Generation (SGG) aims to extract <subject, predicate, object> relationships in images for vision understanding. Although recent works have made steady progress on SGG, they still suffer long-tail distribution issues that tail-predicates are more costly to train and hard to distinguish due to a small amount of annotated data compared to frequent predicates. Existing re-balancing strategies try to haddle it via prior rules but are still confined to pre-defined conditions, which are not scalable for various models and datasets. In this paper, we propose a Cross-modal prediCate boosting (CaCao) framework, where a visually-prompted language model is learned to generate diverse fine-grained predicates in a low-resource way. The proposed CaCao can be applied in a plug-and-play fashion and automatically strengthen existing SGG to tackle the long-tailed problem. Based on that, we further introduce a novel Entangled cross-modal prompt approach for open-world predicate scene graph generation (Epic), where models can generalize to unseen predicates in a zero-shot manner. Comprehensive experiments on three benchmark datasets show that CaCao consistently boosts the performance of multiple scene graph generation models in a model-agnostic way. Moreover, our Epic achieves competitive performance on open-world predicate prediction.

* 21 pages, 16 figures

View paper on

Share this with someone who'll enjoy it:

Title:Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

Paper and Code