Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Factor-Conditioned Speaking-Style Captioning

Jun 27, 2024

Atsushi Ando, Takafumi Moriya, Shota Horiguchi, Ryo Masumura

Figure 1 for Factor-Conditioned Speaking-Style Captioning

Figure 2 for Factor-Conditioned Speaking-Style Captioning

Figure 3 for Factor-Conditioned Speaking-Style Captioning

Figure 4 for Factor-Conditioned Speaking-Style Captioning

Share this with someone who'll enjoy it:

Abstract:This paper presents a novel speaking-style captioning method that generates diverse descriptions while accurately predicting speaking-style information. Conventional learning criteria directly use original captions that contain not only speaking-style factor terms but also syntax words, which disturbs learning speaking-style information. To solve this problem, we introduce factor-conditioned captioning (FCC), which first outputs a phrase representing speaking-style factors (e.g., gender, pitch, etc.), and then generates a caption to ensure the model explicitly learns speaking-style factors. We also propose greedy-then-sampling (GtS) decoding, which first predicts speaking-style factors deterministically to guarantee semantic accuracy, and then generates a caption based on factor-conditioned sampling to ensure diversity. Experiments show that FCC outperforms the original caption-based training, and with GtS, it generates more diverse captions while keeping style prediction performance.

* Accepted to Interspeech 2024

View paper on

Share this with someone who'll enjoy it:

Title:Factor-Conditioned Speaking-Style Captioning

Paper and Code