Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jingbin Hu

FleSpeech: Flexibly Controllable Speech Generation with Various Prompts

Jan 08, 2025

Hanzhao Li, Yuke Li, Xinsheng Wang, Jingbin Hu, Qicong Xie, Shan Yang, Lei Xie

Figure 1 for FleSpeech: Flexibly Controllable Speech Generation with Various Prompts

Figure 2 for FleSpeech: Flexibly Controllable Speech Generation with Various Prompts

Figure 3 for FleSpeech: Flexibly Controllable Speech Generation with Various Prompts

Figure 4 for FleSpeech: Flexibly Controllable Speech Generation with Various Prompts

Abstract:Controllable speech generation methods typically rely on single or fixed prompts, hindering creativity and flexibility. These limitations make it difficult to meet specific user needs in certain scenarios, such as adjusting the style while preserving a selected speaker's timbre, or choosing a style and generating a voice that matches a character's visual appearance. To overcome these challenges, we propose \textit{FleSpeech}, a novel multi-stage speech generation framework that allows for more flexible manipulation of speech attributes by integrating various forms of control. FleSpeech employs a multimodal prompt encoder that processes and unifies different text, audio, and visual prompts into a cohesive representation. This approach enhances the adaptability of speech synthesis and supports creative and precise control over the generated speech. Additionally, we develop a data collection pipeline for multimodal datasets to facilitate further research and applications in this field. Comprehensive subjective and objective experiments demonstrate the effectiveness of FleSpeech. Audio samples are available at https://kkksuper.github.io/FleSpeech/

* 14 pages, 3 figures

Via

Access Paper or Ask Questions

R-ODE: Ricci Curvature Tells When You Will be Informed

May 27, 2024

Li Sun, Jingbin Hu, Mengjie Li, Hao Peng

Figure 1 for R-ODE: Ricci Curvature Tells When You Will be Informed

Figure 2 for R-ODE: Ricci Curvature Tells When You Will be Informed

Figure 3 for R-ODE: Ricci Curvature Tells When You Will be Informed

Abstract:Information diffusion prediction is fundamental to understand the structure and organization of the online social networks, and plays a crucial role to blocking rumor spread, influence maximization, political propaganda, etc. So far, most existing solutions primarily predict the next user who will be informed with historical cascades, but ignore an important factor in the diffusion process - the time. Such limitation motivates us to pose the problem of the time-aware personalized information diffusion prediction for the first time, telling the time when the target user will be informed. In this paper, we address this problem from a fresh geometric perspective of Ricci curvature, and propose a novel Ricci-curvature regulated Ordinary Differential Equation (R-ODE). In the diffusion process, R-ODE considers that the inter-correlated users are organized in a dynamic system in the representation space, and the cascades give the observations sampled from the continuous realm. At each infection time, the message diffuses along the largest Ricci curvature, signifying less transportation effort. In the continuous realm, the message triggers users' movement, whose trajectory in the space is parameterized by an ODE with graph neural network. Consequently, R-ODE predicts the infection time of a target user by the movement trajectory learnt from the observations. Extensive experiments evaluate the personalized time prediction ability of R-ODE, and show R-ODE outperforms the state-of-the-art baselines.

* Accepted by SIGIR 2024

Via

Access Paper or Ask Questions