Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jamie Dougherty

An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions

Oct 31, 2024

Theo Clark, Benedetta Cevoli, Eloy de Jong, Timofey Abramski, Jamie Dougherty

Figure 1 for An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions

Figure 2 for An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions

Figure 3 for An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions

Figure 4 for An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions

Abstract:Self-supervised learning (SSL) models have become crucial in speech processing, with recent advancements concentrating on developing architectures that capture representations across multiple timescales. The primary goal of these multi-scale architectures is to exploit the hierarchical nature of speech, where lower-resolution components aim to capture representations that align with increasingly abstract concepts (e.g., from phones to words to sentences). Although multi-scale approaches have demonstrated some improvements over single-scale models, the precise reasons for these enhancements have poor empirical support. In this study, we present an initial analysis of layer-wise representations in multi-scale architectures, with a focus on Canonical Correlation Analysis (CCA) and Mutual Information (MI). We apply this analysis to Multi-Resolution HuBERT (MR-HuBERT) and find that (1) the improved performance on SUPERB tasks is primarily due to the auxiliary low-resolution loss rather than the downsampling itself, and (2) downsampling to lower resolutions neither improves downstream performance nor correlates with higher-level information (e.g., words), though it does improve computational efficiency. These findings challenge assumptions about the multi-scale nature of MR-HuBERT and motivate the importance of disentangling computational efficiency from learning better representations.

Via

Access Paper or Ask Questions

Hierarchical Quantized Autoencoders

Feb 19, 2020

Will Williams, Sam Ringer, Tom Ash, John Hughes, David MacLeod, Jamie Dougherty

Figure 1 for Hierarchical Quantized Autoencoders

Figure 2 for Hierarchical Quantized Autoencoders

Figure 3 for Hierarchical Quantized Autoencoders

Figure 4 for Hierarchical Quantized Autoencoders

Abstract:Despite progress in training neural networks for lossy image compression, current approaches fail to maintain both perceptual quality and high-level features at very low bitrates. Encouraged by recent success in learning discrete representations with Vector Quantized Variational AutoEncoders (VQ-VAEs), we motivate the use of a hierarchy of VQ-VAEs to attain high factors of compression. We show that the combination of quantization and hierarchical latent structure aids likelihood-based image compression. This leads us to introduce a more probabilistic framing of the VQ-VAE, of which previous work is a limiting case. Our hierarchy produces a Markovian series of latent variables that reconstruct high-quality images which retain semantically meaningful features. These latents can then be further used to generate realistic samples. We provide qualitative and quantitative evaluations of reconstructions and samples on the CelebA and MNIST datasets.

Via

Access Paper or Ask Questions