Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Mel-RoFormer for Vocal Separation and Vocal Melody Transcription

Sep 07, 2024

Ju-Chiang Wang, Wei-Tsung Lu, Jitong Chen

Figure 1 for Mel-RoFormer for Vocal Separation and Vocal Melody Transcription

Figure 2 for Mel-RoFormer for Vocal Separation and Vocal Melody Transcription

Figure 3 for Mel-RoFormer for Vocal Separation and Vocal Melody Transcription

Figure 4 for Mel-RoFormer for Vocal Separation and Vocal Melody Transcription

Share this with someone who'll enjoy it:

Abstract:Developing a versatile deep neural network to model music audio is crucial in MIR. This task is challenging due to the intricate spectral variations inherent in music signals, which convey melody, harmonics, and timbres of diverse instruments. In this paper, we introduce Mel-RoFormer, a spectrogram-based model featuring two key designs: a novel Mel-band Projection module at the front-end to enhance the model's capability to capture informative features across multiple frequency bands, and interleaved RoPE Transformers to explicitly model the frequency and time dimensions as two separate sequences. We apply Mel-RoFormer to tackle two essential MIR tasks: vocal separation and vocal melody transcription, aimed at isolating singing voices from audio mixtures and transcribing their lead melodies, respectively. Despite their shared focus on singing signals, these tasks possess distinct optimization objectives. Instead of training a unified model, we adopt a two-step approach. Initially, we train a vocal separation model, which subsequently serves as a foundation model for fine-tuning for vocal melody transcription. Through extensive experiments conducted on benchmark datasets, we showcase that our models achieve state-of-the-art performance in both vocal separation and melody transcription tasks, underscoring the efficacy and versatility of Mel-RoFormer in modeling complex music audio signals.

* Accepted to appear in ISMIR 2024

View paper on

Share this with someone who'll enjoy it:

Title:Mel-RoFormer for Vocal Separation and Vocal Melody Transcription

Paper and Code