Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mateusz Matuszewski

Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks

Aug 17, 2020

Michał Romaniuk, Piotr Masztalski, Karol Piaskowski, Mateusz Matuszewski

Figure 1 for Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks

Figure 2 for Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks

Figure 3 for Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks

Figure 4 for Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks

Abstract:We propose Mobile Audio Streaming Networks (MASnet) for efficient low-latency speech enhancement, which is particularly suitable for mobile devices and other applications where computational capacity is a limitation. MASnet processes linear-scale spectrograms, transforming successive noisy frames into complex-valued ratio masks which are then applied to the respective noisy frames. MASnet can operate in a low-latency incremental inference mode which matches the complexity of layer-by-layer batch mode. Compared to a similar fully-convolutional architecture, MASnet incorporates depthwise and pointwise convolutions for a large reduction in fused multiply-accumulate operations per second (FMA/s), at the cost of some reduction in SNR.

* Accepted for INTERSPEECH 2020

Via

Access Paper or Ask Questions

StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation

Aug 17, 2020

Piotr Masztalski, Mateusz Matuszewski, Karol Piaskowski, Michał Romaniuk

Figure 1 for StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation

Figure 2 for StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation

Figure 3 for StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation

Abstract:In this paper we introduce StoRIR - a stochastic room impulse response generation method dedicated to audio data augmentation in machine learning applications. This technique, in contrary to geometrical methods like image-source or ray tracing, does not require prior definition of room geometry, absorption coefficients or microphone and source placement and is dependent solely on the acoustic parameters of the room. The method is intuitive, easy to implement and allows to generate RIRs of very complicated enclosures. We show that StoRIR, when used for audio data augmentation in a speech enhancement task, allows deep learning models to achieve better results on a wide range of metrics than when using the conventional image-source method, effectively improving many of them by more than 5 %. We publish a Python implementation of StoRIR online

* Accepted for INTERSPEECH 2020

Via

Access Paper or Ask Questions