Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kyeongman Park

MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers

May 19, 2025

Kyeongman Park, Seongho Joo, Kyomin Jung

Figure 1 for MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers

Figure 2 for MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers

Figure 3 for MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers

Abstract:We introduce MultiActor-Audiobook, a zero-shot approach for generating audiobooks that automatically produces consistent, expressive, and speaker-appropriate prosody, including intonation and emotion. Previous audiobook systems have several limitations: they require users to manually configure the speaker's prosody, read each sentence with a monotonic tone compared to voice actors, or rely on costly training. However, our MultiActor-Audiobook addresses these issues by introducing two novel processes: (1) MSP (**Multimodal Speaker Persona Generation**) and (2) LSI (**LLM-based Script Instruction Generation**). With these two processes, MultiActor-Audiobook can generate more emotionally expressive audiobooks with a consistent speaker prosody without additional training. We compare our system with commercial products, through human and MLLM evaluations, achieving competitive results. Furthermore, we demonstrate the effectiveness of MSP and LSI through ablation studies.

Via

Access Paper or Ask Questions

A Character-Centric Creative Story Generation via Imagination

Sep 25, 2024

Kyeongman Park, Minbeom Kim, Kyomin Jung

Figure 1 for A Character-Centric Creative Story Generation via Imagination

Figure 2 for A Character-Centric Creative Story Generation via Imagination

Figure 3 for A Character-Centric Creative Story Generation via Imagination

Figure 4 for A Character-Centric Creative Story Generation via Imagination

Abstract:Creative story generation with diverse and detailed story elements is a long-standing goal for large language models. While existing methodologies generate long and coherent stories, they fall significantly short of human capabilities in terms of diversity and character detail. To address this, we introduce a novel story generation framework called CCI (Character-centric Creative story generation via Imagination). CCI features two innovative modules for creative story generation: IG (Image-Guided Imagination) and MW (Multi-Writer model). In the IG module, we utilize DALL-E 3 to create visual representations of key story elements. The IG generates more novel and concrete characters, backgrounds, and main plots than text-only methods. The MW module uses these story elements created by IG to generate multiple description candidates for the protagonist and select the best one. This method incorporates vivid and rich character descriptions into the story. We compared the stories generated by CCI and baseline models through human evaluation and statistical analysis. The results showed significant improvements in the creativity. Furthermore, by enabling interactive multi-modal story generation with users, we have opened up possibilities for human-LLM integration in cultural development.

Via

Access Paper or Ask Questions

LongStory: Coherent, Complete and Length Controlled Long story Generation

Nov 26, 2023

Kyeongman Park, Nakyeong Yang, Kyomin Jung

Figure 1 for LongStory: Coherent, Complete and Length Controlled Long story Generation

Figure 2 for LongStory: Coherent, Complete and Length Controlled Long story Generation

Figure 3 for LongStory: Coherent, Complete and Length Controlled Long story Generation

Figure 4 for LongStory: Coherent, Complete and Length Controlled Long story Generation

Abstract:A human author can write any length of story without losing coherence. Also, they always bring the story to a proper ending, an ability that current language models lack. In this work, we present the LongStory for coherent, complete, and length-controlled long story generation. LongStory introduces two novel methodologies: (1) the long and short-term contexts weight calibrator (CWC) and (2) long story structural positions (LSP). The CWC adjusts weights for long-term context Memory and short-term context Cheating, acknowledging their distinct roles. The LSP employs discourse tokens to convey the structural positions of a long story. Trained on three datasets with varied average story lengths, LongStory outperforms other baselines, including the strong story generator Plotmachine, in coherence, completeness, relevance, and repetitiveness. We also perform zero-shot tests on each dataset to assess the model's ability to predict outcomes beyond its training data and validate our methodology by comparing its performance with variants of our model.

Via

Access Paper or Ask Questions