Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers

Apr 30, 2024

Yuzhe Gu, Enmao Diao

Figure 1 for ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers

Figure 2 for ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers

Figure 3 for ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers

Figure 4 for ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers

Share this with someone who'll enjoy it:

Abstract:Existing neural audio codecs usually sacrifice computational complexity for audio quality. They build the feature transformation layers mainly on convolutional blocks, which are not inherently appropriate for capturing local redundancies of audio signals. As compensation, either adversarial losses from a discriminator or a large number of model parameters are required to improve the codec. To that end, we propose Efficient Speech Codec (ESC), a lightweight parameter-efficient codec laid on cross-scale residual vector quantization and transformers. Our model leverages mirrored hierarchical window-attention transformer blocks and performs step-wise decoding from coarse-to-fine feature representations. To enhance codebook utilization, we design a learning paradigm that involves a pre-training stage to assist with codec training. Extensive results show that ESC can achieve high audio quality with much lower complexity, which is a prospective alternative in place of existing codecs.

* Preprint

View paper on

Share this with someone who'll enjoy it:

Title:ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers

Paper and Code