Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jonathan E. Avila

Towards cross-language prosody transfer for dialog

Jul 09, 2023

Jonathan E. Avila, Nigel G. Ward

Figure 1 for Towards cross-language prosody transfer for dialog

Figure 2 for Towards cross-language prosody transfer for dialog

Figure 3 for Towards cross-language prosody transfer for dialog

Figure 4 for Towards cross-language prosody transfer for dialog

Abstract:Speech-to-speech translation systems today do not adequately support use for dialog purposes. In particular, nuances of speaker intent and stance can be lost due to improper prosody transfer. We present an exploration of what needs to be done to overcome this. First, we developed a data collection protocol in which bilingual speakers re-enact utterances from an earlier conversation in their other language, and used this to collect an English-Spanish corpus, so far comprising 1871 matched utterance pairs. Second, we developed a simple prosodic dissimilarity metric based on Euclidean distance over a broad set of prosodic features. We then used these to investigate cross-language prosodic differences, measure the likely utility of three simple baseline models, and identify phenomena which will require more powerful modeling. Our findings should inform future research on cross-language prosody and the design of speech-to-speech translation systems capable of effective prosody transfer.

* Accepted to Interspeech 2023

Via

Access Paper or Ask Questions

Dialogs Re-enacted Across Languages

Nov 18, 2022

Nigel G. Ward, Jonathan E. Avila, Emilia Rivas

Figure 1 for Dialogs Re-enacted Across Languages

Figure 2 for Dialogs Re-enacted Across Languages

Figure 3 for Dialogs Re-enacted Across Languages

Figure 4 for Dialogs Re-enacted Across Languages

Abstract:To support machine learning of cross-language prosodic mappings and other ways to improve speech-to-speech translation, we present a protocol for collecting closely matched pairs of utterances across languages, a description of the resulting data collection, and some observations and musings. This report is intended for 1) people using the corpus, 2) people extending the corpus, and 3) people designing similar collections of bilingual dialog data.

Via

Access Paper or Ask Questions