Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations

Oct 05, 2022

Vasista Sai Lodagala, Sreyan Ghosh, S. Umesh

Figure 1 for CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations

Figure 2 for CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations

Figure 3 for CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations

Figure 4 for CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations

Share this with someone who'll enjoy it:

Abstract:While Self-Supervised Learning has helped reap the benefit of the scale from the available unlabeled data, the learning paradigms are continuously being bettered. We present a new pre-training strategy named ccc-wav2vec 2.0, which uses clustering and an augmentation-based cross-contrastive loss as its self-supervised objective. Through the clustering module, we scale down the influence of those negative examples that are highly similar to the positive. The Cross-Contrastive loss is computed between the encoder output of the original sample and the quantizer output of its augmentation and vice-versa, bringing robustness to the pre-training strategy. ccc-wav2vec 2.0 achieves up to 15.6% and 12.7% relative WER improvement over the baseline wav2vec 2.0 on the test-clean and test-other sets, respectively, of LibriSpeech, without the use of any language model. The proposed method also achieves up to 14.9% relative WER improvement over the baseline wav2vec 2.0 when fine-tuned on Switchboard data. We make all our codes publicly available on GitHub.

* To appear at IEEE SLT 2022

View paper on

Share this with someone who'll enjoy it:

Title:CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations

Paper and Code