Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hendrik Schreiber

Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark

Nov 23, 2023

Ondřej Cífka, Constantinos Dimitriou, Cheng-i Wang, Hendrik Schreiber, Luke Miner, Fabian-Robert Stöter

Figure 1 for Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark

Figure 2 for Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark

Figure 3 for Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark

Figure 4 for Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark

Abstract:Current automatic lyrics transcription (ALT) benchmarks focus exclusively on word content and ignore the finer nuances of written lyrics including formatting and punctuation, which leads to a potential misalignment with the creative products of musicians and songwriters as well as listeners' experiences. For example, line breaks are important in conveying information about rhythm, emotional emphasis, rhyme, and high-level structure. To address this issue, we introduce Jam-ALT, a new lyrics transcription benchmark based on the JamendoLyrics dataset. Our contribution is twofold. Firstly, a complete revision of the transcripts, geared specifically towards ALT evaluation by following a newly created annotation guide that unifies the music industry's guidelines, covering aspects such as punctuation, line breaks, spelling, background vocals, and non-word sounds. Secondly, a suite of evaluation metrics designed, unlike the traditional word error rate, to capture such phenomena. We hope that the proposed benchmark contributes to the ALT task, enabling more precise and reliable assessments of transcription systems and enhancing the user experience in lyrics applications such as subtitle renderings for live captioning or karaoke.

* 6 pages (3 pages main content); website: https://audioshake.github.io/jam-alt/; data: https://huggingface.co/datasets/audioshake/jam-alt; code: https://github.com/audioshake/alt-eval/

Via

Access Paper or Ask Questions

Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters

Mar 26, 2019

Hendrik Schreiber, Meinard Müller

Figure 1 for Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters

Figure 2 for Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters

Figure 3 for Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters

Figure 4 for Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters

Abstract:In this article we explore how the different semantics of spectrograms' time and frequency axes can be exploited for musical tempo and key estimation using Convolutional Neural Networks (CNN). By addressing both tasks with the same network architectures ranging from shallow, domain-specific approaches to deep variants with directional filters, we show that axis-aligned architectures perform similarly well as common VGG-style networks developed for computer vision, while being less vulnerable to confounding factors and requiring fewer model parameters.

* Sound & Music Computing Conference (SMC), M\'alaga, Spain, May 2019

Via

Access Paper or Ask Questions