Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Christof Weiß

Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond

Apr 11, 2023

Michael Krause, Christof Weiß, Meinard Müller

Figure 1 for Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond

Figure 2 for Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond

Figure 3 for Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond

Figure 4 for Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond

Abstract:Many tasks in music information retrieval (MIR) involve weakly aligned data, where exact temporal correspondences are unknown. The connectionist temporal classification (CTC) loss is a standard technique to learn feature representations based on weakly aligned training data. However, CTC is limited to discrete-valued target sequences and can be difficult to extend to multi-label problems. In this article, we show how soft dynamic time warping (SoftDTW), a differentiable variant of classical DTW, can be used as an alternative to CTC. Using multi-pitch estimation as an example scenario, we show that SoftDTW yields results on par with a state-of-the-art multi-label extension of CTC. In addition to being more elegant in terms of its algorithmic formulation, SoftDTW naturally extends to real-valued target sequences.

* Accepted at ICASSP 2023

Via

Access Paper or Ask Questions

Deep-Learning Architectures for Multi-Pitch Estimation: Towards Reliable Evaluation

Feb 18, 2022

Christof Weiß, Geoffroy Peeters

Figure 1 for Deep-Learning Architectures for Multi-Pitch Estimation: Towards Reliable Evaluation

Figure 2 for Deep-Learning Architectures for Multi-Pitch Estimation: Towards Reliable Evaluation

Figure 3 for Deep-Learning Architectures for Multi-Pitch Estimation: Towards Reliable Evaluation

Figure 4 for Deep-Learning Architectures for Multi-Pitch Estimation: Towards Reliable Evaluation

Abstract:Extracting pitch information from music recordings is a challenging but important problem in music signal processing. Frame-wise transcription or multi-pitch estimation aims for detecting the simultaneous activity of pitches in polyphonic music recordings and has recently seen major improvements thanks to deep-learning techniques, with a variety of proposed network architectures. In this paper, we realize different architectures based on CNNs, the U-net structure, and self-attention components. We propose several modifications to these architectures including self-attention modules for skip connections, recurrent layers to replace the self-attention, and a multi-task strategy with simultaneous prediction of the degree of polyphony. We compare variants of these architectures in different sizes for multi-pitch estimation, focusing on Western classical music beyond the piano-solo scenario using the MusicNet and Schubert Winterreise datasets. Our experiments indicate that most architectures yield competitive results and that larger model variants seem to be beneficial. However, we find that these results substantially depend on randomization effects and the particular choice of the training-test split, which questions the claim of superiority for particular architectures given only small improvements. We therefore investigate the influence of dataset splits in the presence of several movements of a work cycle (cross-version evaluation) and propose a best-practice splitting strategy for MusicNet, which weakens the influence of individual test tracks and suppresses overfitting to specific works and recording conditions. A final evaluation on a mixed dataset suggests that improvements on one specific dataset do not necessarily generalize to other scenarios, thus emphasizing the need for further high-quality multi-pitch datasets in order to reliably measure progress in music transcription tasks.

Via

Access Paper or Ask Questions