Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Teerapat Jenrungrot

LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models

Mar 23, 2023

Teerapat Jenrungrot, Michael Chinen, W. Bastiaan Kleijn, Jan Skoglund, Zalán Borsos, Neil Zeghidour, Marco Tagliasacchi

Figure 1 for LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models

Figure 2 for LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models

Figure 3 for LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models

Figure 4 for LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models

Abstract:We introduce LMCodec, a causal neural speech codec that provides high quality audio at very low bitrates. The backbone of the system is a causal convolutional codec that encodes audio into a hierarchy of coarse-to-fine tokens using residual vector quantization. LMCodec trains a Transformer language model to predict the fine tokens from the coarse ones in a generative fashion, allowing for the transmission of fewer codes. A second Transformer predicts the uncertainty of the next codes given the past transmitted codes, and is used to perform conditional entropy coding. A MUSHRA subjective test was conducted and shows that the quality is comparable to reference codecs at higher bitrates. Example audio is available at https://mjenrungrot.github.io/chrome-media-audio-papers/publications/lmcodec.

* 5 pages, accepted to ICASSP 2023, project page: https://mjenrungrot.github.io/chrome-media-audio-papers/publications/lmcodec

Via

Access Paper or Ask Questions

The Cone of Silence: Speech Separation by Localization

Oct 12, 2020

Teerapat Jenrungrot, Vivek Jayaram, Steve Seitz, Ira Kemelmacher-Shlizerman

Figure 1 for The Cone of Silence: Speech Separation by Localization

Figure 2 for The Cone of Silence: Speech Separation by Localization

Figure 3 for The Cone of Silence: Speech Separation by Localization

Figure 4 for The Cone of Silence: Speech Separation by Localization

Abstract:Given a multi-microphone recording of an unknown number of speakers talking concurrently, we simultaneously localize the sources and separate the individual speakers. At the core of our method is a deep network, in the waveform domain, which isolates sources within an angular region $\theta \pm w/2$, given an angle of interest $\theta$ and angular window size $w$. By exponentially decreasing $w$, we can perform a binary search to localize and separate all sources in logarithmic time. Our algorithm allows for an arbitrary number of potentially moving speakers at test time, including more speakers than seen during training. Experiments demonstrate state-of-the-art performance for both source separation and source localization, particularly in high levels of background noise.

* 9 pages + references + supplementary. Oral presentation at NeurIPS 2020

Via

Access Paper or Ask Questions