Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Théo Johannet

ASR Benchmarking: Need for a More Representative Conversational Dataset

Sep 18, 2024

Gaurav Maheshwari, Dmitry Ivanov, Théo Johannet, Kevin El Haddad

Figure 1 for ASR Benchmarking: Need for a More Representative Conversational Dataset

Figure 2 for ASR Benchmarking: Need for a More Representative Conversational Dataset

Figure 3 for ASR Benchmarking: Need for a More Representative Conversational Dataset

Figure 4 for ASR Benchmarking: Need for a More Representative Conversational Dataset

Abstract:Automatic Speech Recognition (ASR) systems have achieved remarkable performance on widely used benchmarks such as LibriSpeech and Fleurs. However, these benchmarks do not adequately reflect the complexities of real-world conversational environments, where speech is often unstructured and contains disfluencies such as pauses, interruptions, and diverse accents. In this study, we introduce a multilingual conversational dataset, derived from TalkBank, consisting of unstructured phone conversation between adults. Our results show a significant performance drop across various state-of-the-art ASR models when tested in conversational settings. Furthermore, we observe a correlation between Word Error Rate and the presence of speech disfluencies, highlighting the critical need for more realistic, conversational ASR benchmarks.

Via

Access Paper or Ask Questions