Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Every picture tells a story: Image-grounded controllable stylistic story generation

Sep 11, 2022

Holy Lovenia, Bryan Wilie, Romain Barraud, Samuel Cahyawijaya, Willy Chung, Pascale Fung

Figure 1 for Every picture tells a story: Image-grounded controllable stylistic story generation

Figure 2 for Every picture tells a story: Image-grounded controllable stylistic story generation

Figure 3 for Every picture tells a story: Image-grounded controllable stylistic story generation

Figure 4 for Every picture tells a story: Image-grounded controllable stylistic story generation

Share this with someone who'll enjoy it:

Abstract:Generating a short story out of an image is arduous. Unlike image captioning, story generation from an image poses multiple challenges: preserving the story coherence, appropriately assessing the quality of the story, steering the generated story into a certain style, and addressing the scarcity of image-story pair reference datasets limiting supervision during training. In this work, we introduce Plug-and-Play Story Teller (PPST) and improve image-to-story generation by: 1) alleviating the data scarcity problem by incorporating large pre-trained models, namely CLIP and GPT-2, to facilitate a fluent image-to-text generation with minimal supervision, and 2) enabling a more style-relevant generation by incorporating stylistic adapters to control the story generation. We conduct image-to-story generation experiments with non-styled, romance-styled, and action-styled PPST approaches and compare our generated stories with those of previous work over three aspects, i.e., story coherence, image-story relevance, and style fitness, using both automatic and human evaluation. The results show that PPST improves story coherence and has better image-story relevance, but has yet to be adequately stylistic.

* Accepted in LaTeCH-CLfL 2022 (6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature), COLING 2022

View paper on

Share this with someone who'll enjoy it:

Title:Every picture tells a story: Image-grounded controllable stylistic story generation

Paper and Code