Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

Jan 05, 2023

Yuxing Long, Binyuan Hui, Fulong Ye, Yanyang Li, Zhuoxin Han, Caixia Yuan, Yongbin Li, Xiaojie Wang

Figure 1 for SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

Figure 2 for SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

Figure 3 for SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

Figure 4 for SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

Share this with someone who'll enjoy it:

Abstract:Existing multimodal conversation agents have shown impressive abilities to locate absolute positions or retrieve attributes in simple scenarios, but they fail to perform well when complex relative positions and information alignments are involved, which poses a bottleneck in response quality. In this paper, we propose a Situated Conversation Agent Petrained with Multimodal Questions from INcremental Layout Graph (SPRING) with abilities of reasoning multi-hops spatial relations and connecting them with visual attributes in crowded situated scenarios. Specifically, we design two types of Multimodal Question Answering (MQA) tasks to pretrain the agent. All QA pairs utilized during pretraining are generated from novel Incremental Layout Graphs (ILG). QA pair difficulty labels automatically annotated by ILG are used to promote MQA-based Curriculum Learning. Experimental results verify the SPRING's effectiveness, showing that it significantly outperforms state-of-the-art approaches on both SIMMC 1.0 and SIMMC 2.0 datasets.

* AAAI 2023

View paper on

Share this with someone who'll enjoy it:

Title:SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

Paper and Code