Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots

Nov 15, 2023

Giulio Antonio Abbo, Tony Belpaeme

Figure 1 for I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots

Figure 2 for I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots

Figure 3 for I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots

Share this with someone who'll enjoy it:

Abstract:In the rapidly evolving landscape of human-computer interaction, the integration of vision capabilities into conversational agents stands as a crucial advancement. This paper presents an initial implementation of a dialogue manager that leverages the latest progress in Large Language Models (e.g., GPT-4, IDEFICS) to enhance the traditional text-based prompts with real-time visual input. LLMs are used to interpret both textual prompts and visual stimuli, creating a more contextually aware conversational agent. The system's prompt engineering, incorporating dialogue with summarisation of the images, ensures a balance between context preservation and computational efficiency. Six interactions with a Furhat robot powered by this system are reported, illustrating and discussing the results obtained. By implementing this vision-enabled dialogue system, the paper envisions a future where conversational agents seamlessly blend textual and visual modalities, enabling richer, more context-aware dialogues.

* 8 pages, 3 figures

View paper on

Share this with someone who'll enjoy it:

Title:I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots

Paper and Code