Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Luca Zampierin

SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning

Feb 26, 2024

Luca Zampierin, Ghouthi Boukli Hacene, Bac Nguyen, Mirco Ravanelli

Figure 1 for SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning

Figure 2 for SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning

Figure 3 for SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning

Figure 4 for SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning

Abstract:Self-supervised learning (SSL) has achieved remarkable success across various speech-processing tasks. To enhance its efficiency, previous works often leverage the use of compression techniques. A notable recent attempt is DPHuBERT, which applies joint knowledge distillation (KD) and structured pruning to learn a significantly smaller SSL model. In this paper, we contribute to this research domain by introducing SKILL, a novel method that conducts distillation across groups of layers instead of distilling individual arbitrarily selected layers within the teacher network. The identification of the layers to distill is achieved through a hierarchical clustering procedure applied to layer similarity measures. Extensive experiments demonstrate that our distilled version of WavLM Base+ not only outperforms DPHuBERT but also achieves state-of-the-art results in the 30M parameters model class across several SUPERB tasks.

* Accepted at the Self-supervision in Audio, Speech and Beyond (SASB) Workshop at ICASSP 2024

Via

Access Paper or Ask Questions