Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Are discrete units necessary for Spoken Language Modeling?

Mar 11, 2022

Tu Anh Nguyen, Benoit Sagot, Emmanuel Dupoux

Figure 1 for Are discrete units necessary for Spoken Language Modeling?

Figure 2 for Are discrete units necessary for Spoken Language Modeling?

Figure 3 for Are discrete units necessary for Spoken Language Modeling?

Figure 4 for Are discrete units necessary for Spoken Language Modeling?

Share this with someone who'll enjoy it:

Abstract:Recent work in spoken language modeling shows the possibility of learning a language unsupervisedly from raw audio without any text labels. The approach relies first on transforming the audio into a sequence of discrete units (or pseudo-text) and then training a language model directly on such pseudo-text. Is such a discrete bottleneck necessary, potentially introducing irreversible errors in the encoding of the speech signal, or could we learn a language model without discrete units at all? In this work, show that discretization is indeed essential for good results in spoken language modeling, but that can omit the discrete bottleneck if we use using discrete target features from a higher level than the input features. We also show that an end-to-end model trained with discrete target like HuBERT achieves similar results as the best language model trained on pseudo-text on a set of zero-shot spoken language modeling metrics from the Zero Resource Speech Challenge 2021.

View paper on

Share this with someone who'll enjoy it:

Title:Are discrete units necessary for Spoken Language Modeling?

Paper and Code