Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Improving Scene Graph Generation with Relation Words' Debiasing in Vision-Language Models

Mar 24, 2024

Yuxuan Wang, Xiaoyuan Liu

Figure 1 for Improving Scene Graph Generation with Relation Words' Debiasing in Vision-Language Models

Figure 2 for Improving Scene Graph Generation with Relation Words' Debiasing in Vision-Language Models

Figure 3 for Improving Scene Graph Generation with Relation Words' Debiasing in Vision-Language Models

Figure 4 for Improving Scene Graph Generation with Relation Words' Debiasing in Vision-Language Models

Share this with someone who'll enjoy it:

Abstract:Scene Graph Generation (SGG) provides basic language representation of visual scenes, requiring models to grasp complex and diverse semantics between various objects. However, this complexity and diversity in SGG also leads to underrepresentation, where part of test triplets are rare or even unseen during training, resulting in imprecise predictions. To tackle this, we propose using the SGG models with pretrained vision-language models (VLMs) to enhance representation. However, due to the gap between the pretraining and SGG, directly ensembling the pretrained VLMs leads to severe biases across relation words. Thus, we introduce LM Estimation to approximate the words' distribution underlies in the pretraining language sets, and then use the distribution for debiasing. After that, we ensemble VLMs with SGG models to enhance representation. Considering that each model may represent better at different samples, we use a certainty-aware indicator to score each sample and dynamically adjust the ensemble weights. Our method effectively addresses the words biases, enhances SGG's representation, and achieve markable performance enhancements. It is training-free and integrates well with existing SGG models.

View paper on

Share this with someone who'll enjoy it:

Title:Improving Scene Graph Generation with Relation Words' Debiasing in Vision-Language Models

Paper and Code