Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding

Feb 17, 2025

Kimia Ramezan, Alireza Amiri Bavandpour, Yifei Yuan, Clemencia Siro, Mohammad Aliannejadi

Figure 1 for Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding

Figure 2 for Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding

Figure 3 for Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding

Figure 4 for Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding

Share this with someone who'll enjoy it:

Abstract:Conversational query clarification enables users to refine their search queries through interactive dialogue, improving search effectiveness. Traditional approaches rely on text-based clarifying questions, which often fail to capture complex user preferences, particularly those involving visual attributes. While recent work has explored single-turn multi-modal clarification with images alongside text, such methods do not fully support the progressive nature of user intent refinement over multiple turns. Motivated by this, we introduce the Multi-turn Multi-modal Clarifying Questions (MMCQ) task, which combines text and visual modalities to refine user queries in a multi-turn conversation. To facilitate this task, we create a large-scale dataset named ClariMM comprising over 13k multi-turn interactions and 33k question-answer pairs containing multi-modal clarifying questions. We propose Mario, a retrieval framework that employs a two-phase ranking strategy: initial retrieval with BM25, followed by a multi-modal generative re-ranking model that integrates textual and visual information from conversational history. Our experiments show that multi-turn multi-modal clarification outperforms uni-modal and single-turn approaches, improving MRR by 12.88%. The gains are most significant in longer interactions, demonstrating the value of progressive refinement for complex queries.

View paper on

Share this with someone who'll enjoy it:

Title:Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding

Paper and Code