Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Adam Janin

Toward Zero Oracle Word Error Rate on the Switchboard Benchmark

Jun 13, 2022

Arlo Faria, Adam Janin, Korbinian Riedhammer, Sidhi Adkoli

Figure 1 for Toward Zero Oracle Word Error Rate on the Switchboard Benchmark

Figure 2 for Toward Zero Oracle Word Error Rate on the Switchboard Benchmark

Figure 3 for Toward Zero Oracle Word Error Rate on the Switchboard Benchmark

Figure 4 for Toward Zero Oracle Word Error Rate on the Switchboard Benchmark

Abstract:The "Switchboard benchmark" is a very well-known test set in automatic speech recognition (ASR) research, establishing record-setting performance for systems that claim human-level transcription accuracy. This work highlights lesser-known practical considerations of this evaluation, demonstrating major improvements in word error rate (WER) by correcting the reference transcriptions and deviating from the official scoring methodology. In this more detailed and reproducible scheme, even commercial ASR systems can score below 5\% WER and the established record for a research system is lowered to 2.3%. An alternative metric of transcript precision is proposed, which does not penalize deletions and appears to be more discriminating for human vs. machine performance. While commercial ASR systems are still below this threshold, a research system is shown to clearly surpass the accuracy of commercial human speech recognition. This work also explores using standardized scoring tools to compute oracle WER by selecting the best among a list of alternatives. A phrase alternatives representation is compared to utterance-level N-best lists and word-level data structures; using dense lattices and adding out-of-vocabulary words, this achieves an oracle WER of 0.18%.

* Submitted to Interspeech 2022

Via

Access Paper or Ask Questions

The YLI-MED Corpus: Characteristics, Procedures, and Plans

Mar 13, 2015

Julia Bernd, Damian Borth, Benjamin Elizalde, Gerald Friedland, Heather Gallagher, Luke Gottlieb, Adam Janin, Sara Karabashlieva, Jocelyn Takahashi, Jennifer Won

Figure 1 for The YLI-MED Corpus: Characteristics, Procedures, and Plans

Figure 2 for The YLI-MED Corpus: Characteristics, Procedures, and Plans

Figure 3 for The YLI-MED Corpus: Characteristics, Procedures, and Plans

Figure 4 for The YLI-MED Corpus: Characteristics, Procedures, and Plans

Abstract:The YLI Multimedia Event Detection corpus is a public-domain index of videos with annotations and computed features, specialized for research in multimedia event detection (MED), i.e., automatically identifying what's happening in a video by analyzing the audio and visual content. The videos indexed in the YLI-MED corpus are a subset of the larger YLI feature corpus, which is being developed by the International Computer Science Institute and Lawrence Livermore National Laboratory based on the Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset. The videos in YLI-MED are categorized as depicting one of ten target events, or no target event, and are annotated for additional attributes like language spoken and whether the video has a musical score. The annotations also include degree of annotator agreement and average annotator confidence scores for the event categorization of each video. Version 1.0 of YLI-MED includes 1823 "positive" videos that depict the target events and 48,138 "negative" videos, as well as 177 supplementary videos that are similar to event videos but are not positive examples. Our goal in producing YLI-MED is to be as open about our data and procedures as possible. This report describes the procedures used to collect the corpus; gives detailed descriptive statistics about the corpus makeup (and how video attributes affected annotators' judgments); discusses possible biases in the corpus introduced by our procedural choices and compares it with the most similar existing dataset, TRECVID MED's HAVIC corpus; and gives an overview of our future plans for expanding the annotation effort.

* 47 pages; 3 figures; 25 tables. Also published as ICSI Technical Report TR-15-001

Via

Access Paper or Ask Questions