Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

Jan 25, 2023

Philippe Gonzalez, Tommy Sonne Alstrøm, Tobias May

Figure 1 for On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

Figure 2 for On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

Figure 3 for On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

Figure 4 for On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

Share this with someone who'll enjoy it:

Abstract:The performance of neural network-based speech enhancement systems is primarily influenced by the model architecture, whereas training times and computational resource utilization are primarily affected by training parameters such as the batch size. Since noisy and reverberant speech mixtures can have different duration, a batching strategy is required to handle variable size inputs during training, in particular for state-of-the-art end-to-end systems. Such strategies usually strive a compromise between zero-padding and data randomization, and can be combined with a dynamic batch size for a more consistent amount of data in each batch. However, the effect of these practices on resource utilization and more importantly network performance is not well documented. This paper is an empirical study of the effect of different batching strategies and batch sizes on the training statistics and speech enhancement performance of a Conv-TasNet, evaluated in both matched and mismatched conditions. We find that using a small batch size during training improves performance in both conditions for all batching strategies. Moreover, using sorted or bucket batching with a dynamic batch size allows for reduced training time and GPU memory usage while achieving similar performance compared to random batching with a fixed batch size.

View paper on

Share this with someone who'll enjoy it:

Title:On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

Paper and Code