Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR

Jul 05, 2024

Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Iuliia Nigmatulina, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju

Figure 1 for TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR

Figure 2 for TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR

Figure 3 for TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR

Figure 4 for TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR

Share this with someone who'll enjoy it:

Abstract:In traditional conversational intelligence from speech, a cascaded pipeline is used, involving tasks such as voice activity detection, diarization, transcription, and subsequent processing with different NLP models for tasks like semantic endpointing and named entity recognition (NER). Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks. This is achieved by integrating task-specific tokens into the reference text during ASR model training, streamlining the inference and eliminating the need for separate NLP models. In addition to ASR, we conduct experiments on 3 different tasks: speaker change detection, endpointing, and NER. Our experiments on a public and a private dataset show that the proposed method improves ASR by up to 7.7% in relative WER while outperforming the cascaded pipeline approach in individual task performance. Additionally, we present task transfer learning to a new task within an existing TokenVerse.

* 5 pages, double column

View paper on

Share this with someone who'll enjoy it:

Title:TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR

Paper and Code