Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:KAHANI: Culturally-Nuanced Visual Storytelling Pipeline for Non-Western Cultures

Oct 28, 2024

Hamna, Deepthi Sudharsan, Agrima Seth, Ritvik Budhiraja, Deepika Khullar, Vyshak Jain, Kalika Bali, Aditya Vashistha, Sameer Segal

Figure 1 for KAHANI: Culturally-Nuanced Visual Storytelling Pipeline for Non-Western Cultures

Figure 2 for KAHANI: Culturally-Nuanced Visual Storytelling Pipeline for Non-Western Cultures

Figure 3 for KAHANI: Culturally-Nuanced Visual Storytelling Pipeline for Non-Western Cultures

Figure 4 for KAHANI: Culturally-Nuanced Visual Storytelling Pipeline for Non-Western Cultures

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) and Text-To-Image (T2I) models have demonstrated the ability to generate compelling text and visual stories. However, their outputs are predominantly aligned with the sensibilities of the Global North, often resulting in an outsider's gaze on other cultures. As a result, non-Western communities have to put extra effort into generating culturally specific stories. To address this challenge, we developed a visual storytelling pipeline called KAHANI that generates culturally grounded visual stories for non-Western cultures. Our pipeline leverages off-the-shelf models GPT-4 Turbo and Stable Diffusion XL (SDXL). By using Chain of Thought (CoT) and T2I prompting techniques, we capture the cultural context from user's prompt and generate vivid descriptions of the characters and scene compositions. To evaluate the effectiveness of KAHANI, we conducted a comparative user study with ChatGPT-4 (with DALL-E3) in which participants from different regions of India compared the cultural relevance of stories generated by the two tools. Results from the qualitative and quantitative analysis performed on the user study showed that KAHANI was able to capture and incorporate more Culturally Specific Items (CSIs) compared to ChatGPT-4. In terms of both its cultural competence and visual story generation quality, our pipeline outperformed ChatGPT-4 in 27 out of the 36 comparisons.

* Under review

View paper on

Share this with someone who'll enjoy it:

Title:KAHANI: Culturally-Nuanced Visual Storytelling Pipeline for Non-Western Cultures

Paper and Code