Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinglin Bai

A Two-Stage Band-Split Mamba-2 Network for Music Separation

Sep 10, 2024

Jinglin Bai, Yuan Fang, Jiajie Wang, Xueliang Zhang

Figure 1 for A Two-Stage Band-Split Mamba-2 Network for Music Separation

Figure 2 for A Two-Stage Band-Split Mamba-2 Network for Music Separation

Figure 3 for A Two-Stage Band-Split Mamba-2 Network for Music Separation

Figure 4 for A Two-Stage Band-Split Mamba-2 Network for Music Separation

Abstract:Music source separation (MSS) aims to separate mixed music into its distinct tracks, such as vocals, bass, drums, and more. MSS is considered to be a challenging audio separation task due to the complexity of music signals. Although the RNN and Transformer architecture are not perfect, they are commonly used to model the music sequence for MSS. Recently, Mamba-2 has already demonstrated high efficiency in various sequential modeling tasks, but its superiority has not been investigated in MSS. This paper applies Mamba-2 with a two-stage strategy, which introduces residual mapping based on the mask method, effectively compensating for the details absent in the mask and further improving separation performance. Experiments confirm the superiority of bidirectional Mamba-2 and the effectiveness of the two-stage network in MSS. The source code is publicly accessible at https://github.com/baijinglin/TS-BSmamba2.

Via

Access Paper or Ask Questions

Attention-Based Beamformer For Multi-Channel Speech Enhancement

Sep 10, 2024

Jinglin Bai, Hao Li, Xueliang Zhang, Fei Chen

Figure 1 for Attention-Based Beamformer For Multi-Channel Speech Enhancement

Figure 2 for Attention-Based Beamformer For Multi-Channel Speech Enhancement

Figure 3 for Attention-Based Beamformer For Multi-Channel Speech Enhancement

Figure 4 for Attention-Based Beamformer For Multi-Channel Speech Enhancement

Abstract:Minimum Variance Distortionless Response (MVDR) is a classical adaptive beamformer that theoretically ensures the distortionless transmission of signals in the target direction. Its performance in noise reduction actually depends on the accuracy of the noise spatial covariance matrix (SCM) estimate. Although recent deep learning has shown remarkable performance in multi-channel speech enhancement, the property of distortionless response still makes MVDR highly popular in real applications. In this paper, we propose an attention-based mechanism to calculate the speech and noise SCM and then apply MVDR to obtain the enhanced speech. Moreover, a deep learning architecture using the inplace convolution operator and frequency-independent LSTM has proven effective in facilitating SCM estimation. The model is optimized in an end-to-end manner. Experimental results indicate that the proposed method is extremely effective in tracking moving or stationary speakers under non-causal and causal conditions, outperforming other baselines. It is worth mentioning that our model has only 0.35 million parameters, making it easy to be deployed on edge devices.

Via

Access Paper or Ask Questions