Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm

Oct 22, 2020

Hideyuki Tachibana

Figure 1 for Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm

Figure 2 for Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm

Figure 3 for Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm

Figure 4 for Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm

Share this with someone who'll enjoy it:

Abstract:In neural network-based monaural speech separation techniques, it has been recently common to evaluate the loss using the permutation invariant training (PIT) loss. However, the ordinary PIT requires to try all $N!$ permutations between $N$ ground truths and $N$ estimates. Since the factorial complexity explodes very rapidly as $N$ increases, a PIT-based training works only when the number of source signals is small, such as $N = 2$ or $3$. To overcome this limitation, this paper proposes a SinkPIT, a novel variant of the PIT losses, which is much more efficient than the ordinary PIT loss when $N$ is large. The SinkPIT is based on Sinkhorn's matrix balancing algorithm, which efficiently finds a doubly stochastic matrix which approximates the best permutation in a differentiable manner. The author conducted an experiment to train a neural network model to decompose a single-channel mixture into 10 sources using the SinkPIT, and obtained promising results.

* 5 pages, 8 figures

View paper on

Share this with someone who'll enjoy it:

Title:Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm

Paper and Code