Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Oct 23, 2023

Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li, Yashesh Gaur

Figure 1 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Figure 2 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Figure 3 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Share this with someone who'll enjoy it:

Abstract:The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions. This has made offering translations in multiple languages essential for user applications. Traditional approaches to automatic speech recognition (ASR) and speech translation (ST) have often relied on separate systems, leading to inefficiencies in computational resources, and increased synchronization complexity in real time. In this paper, we propose a streaming Transformer-Transducer (T-T) model able to jointly produce many-to-one and one-to-many transcription and translation using a single decoder. We introduce a novel method for joint token-level serialized output training based on timestamp information to effectively produce ASR and ST outputs in the streaming setting. Experiments on {it,es,de}->en prove the effectiveness of our approach, enabling the generation of one-to-many joint outputs with a single decoder for the first time.

* \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

View paper on

Share this with someone who'll enjoy it:

Title:Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Paper and Code