Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

Apr 15, 2022

Feilong Chen, Xiuyi Chen, Shuang Xu, Bo Xu

Figure 1 for Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

Figure 2 for Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

Figure 3 for Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

Figure 4 for Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

Share this with someone who'll enjoy it:

Abstract:Visual Dialog is a challenging vision-language task since the visual dialog agent needs to answer a series of questions after reasoning over both the image content and dialog history. Though existing methods try to deal with the cross-modal understanding in visual dialog, they are still not enough in ranking candidate answers based on their understanding of visual and textual contexts. In this paper, we analyze the cross-modal understanding in visual dialog based on the vision-language pre-training model VD-BERT and propose a novel approach to improve the cross-modal understanding for visual dialog, named ICMU. ICMU enhances cross-modal understanding by distinguishing different pulled inputs (i.e. pulled images, questions or answers) based on four-way contrastive learning. In addition, ICMU exploits the single-turn visual question answering to enhance the visual dialog model's cross-modal understanding to handle a multi-turn visually-grounded conversation. Experiments show that the proposed approach improves the visual dialog model's cross-modal understanding and brings satisfactory gain to the VisDial dataset.

* ICASSP 2022

View paper on

Share this with someone who'll enjoy it:

Title:Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

Paper and Code