Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Resolving References in Visually-Grounded Dialogue via Text Generation

Sep 23, 2023

Bram Willemsen, Livia Qian, Gabriel Skantze

Figure 1 for Resolving References in Visually-Grounded Dialogue via Text Generation

Figure 2 for Resolving References in Visually-Grounded Dialogue via Text Generation

Figure 3 for Resolving References in Visually-Grounded Dialogue via Text Generation

Figure 4 for Resolving References in Visually-Grounded Dialogue via Text Generation

Share this with someone who'll enjoy it:

Abstract:Vision-language models (VLMs) have shown to be effective at image retrieval based on simple text queries, but text-image retrieval based on conversational input remains a challenge. Consequently, if we want to use VLMs for reference resolution in visually-grounded dialogue, the discourse processing capabilities of these models need to be augmented. To address this issue, we propose fine-tuning a causal large language model (LLM) to generate definite descriptions that summarize coreferential information found in the linguistic context of references. We then use a pretrained VLM to identify referents based on the generated descriptions, zero-shot. We evaluate our approach on a manually annotated dataset of visually-grounded dialogues and achieve results that, on average, exceed the performance of the baselines we compare against. Furthermore, we find that using referent descriptions based on larger context windows has the potential to yield higher returns.

* Published at SIGDIAL 2023

View paper on

Share this with someone who'll enjoy it:

Title:Resolving References in Visually-Grounded Dialogue via Text Generation

Paper and Code