Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Scaling up masked audio encoder learning for general audio classification

Jun 11, 2024

Heinrich Dinkel, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang, Bin Wang

Figure 1 for Scaling up masked audio encoder learning for general audio classification

Figure 2 for Scaling up masked audio encoder learning for general audio classification

Figure 3 for Scaling up masked audio encoder learning for general audio classification

Figure 4 for Scaling up masked audio encoder learning for general audio classification

Share this with someone who'll enjoy it:

Abstract:Despite progress in audio classification, a generalization gap remains between speech and other sound domains, such as environmental sounds and music. Models trained for speech tasks often fail to perform well on environmental or musical audio tasks, and vice versa. While self-supervised (SSL) audio representations offer an alternative, there has been limited exploration of scaling both model and dataset sizes for SSL-based general audio classification. We introduce Dasheng, a simple SSL audio encoder, based on the efficient masked autoencoder framework. Trained with 1.2 billion parameters on 272,356 hours of diverse audio, Dasheng obtains significant performance gains on the HEAR benchmark. It outperforms previous works on CREMA-D, LibriCount, Speech Commands, VoxLingua, and competes well in music and environment classification. Dasheng features inherently contain rich speech, music, and environmental information, as shown in nearest-neighbor classification experiments. Code is available https://github.com/richermans/dasheng/.

* Interspeech 2024

View paper on

Share this with someone who'll enjoy it:

Title:Scaling up masked audio encoder learning for general audio classification

Paper and Code