Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling

Feb 05, 2025

Jakob Poncelet, Hugo Van hamme

Figure 1 for Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling

Figure 2 for Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling

Figure 3 for Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling

Figure 4 for Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling

Share this with someone who'll enjoy it:

Abstract:The recent advancement of speech recognition technology has been driven by large-scale datasets and attention-based architectures, but many challenges still remain, especially for low-resource languages and dialects. This paper explores the integration of weakly supervised transcripts from TV subtitles into automatic speech recognition (ASR) systems, aiming to improve both verbatim transcriptions and automatically generated subtitles. To this end, verbatim data and subtitles are regarded as different domains or languages, due to their distinct characteristics. We propose and compare several end-to-end architectures that are designed to jointly model both modalities with separate or shared encoders and decoders. The proposed methods are able to jointly generate a verbatim transcription and a subtitle. Evaluation on Flemish (Belgian Dutch) demonstrates that a model with cascaded encoders and separate decoders allows to represent the differences between the two data types most efficiently while improving on both domains. Despite differences in domain and linguistic variations, combining verbatim transcripts with subtitle data leads to notable ASR improvements without the need for extensive preprocessing. Additionally, experiments with a large-scale subtitle dataset show the scalability of the proposed approach. The methods not only improve ASR accuracy but also generate subtitles that closely match standard written text, offering several potential applications.

* Preprint

View paper on

Share this with someone who'll enjoy it:

Title:Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling

Paper and Code