Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Enhancing Intent Understanding for Ambiguous Prompts through Human-Machine Co-Adaptation

Jan 25, 2025

Yangfan He, Jianhui Wang, Kun Li, Yijin Wang, Li Sun, Jun Yin, Miao Zhang, Xueqian Wang

Figure 1 for Enhancing Intent Understanding for Ambiguous Prompts through Human-Machine Co-Adaptation

Figure 2 for Enhancing Intent Understanding for Ambiguous Prompts through Human-Machine Co-Adaptation

Figure 3 for Enhancing Intent Understanding for Ambiguous Prompts through Human-Machine Co-Adaptation

Figure 4 for Enhancing Intent Understanding for Ambiguous Prompts through Human-Machine Co-Adaptation

Share this with someone who'll enjoy it:

Abstract:Modern image generation systems can produce high-quality visuals, yet user prompts often contain ambiguities, requiring multiple revisions. Existing methods struggle to address the nuanced needs of non-expert users. We propose Visual Co-Adaptation (VCA), a novel framework that iteratively refines prompts and aligns generated images with user preferences. VCA employs a fine-tuned language model with reinforcement learning and multi-turn dialogues for prompt disambiguation. Key components include the Incremental Context-Enhanced Dialogue Block for interactive clarification, the Semantic Exploration and Disambiguation Module (SESD) leveraging Retrieval-Augmented Generation (RAG) and CLIP scoring, and the Pixel Precision and Consistency Optimization Module (PPCO) for refining image details using Proximal Policy Optimization (PPO). A human-in-the-loop feedback mechanism further improves performance. Experiments show that VCA surpasses models like DALL-E 3 and Stable Diffusion, reducing dialogue rounds to 4.3, achieving a CLIP score of 0.92, and enhancing user satisfaction to 4.73/5. Additionally, we introduce a novel multi-round dialogue dataset with prompt-image pairs and user intent annotations.

View paper on

Share this with someone who'll enjoy it:

Title:Enhancing Intent Understanding for Ambiguous Prompts through Human-Machine Co-Adaptation

Paper and Code