Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Fine-tune the pretrained ATST model for sound event detection

Sep 15, 2023

Nian Shao, Xian Li, Xiaofei Li

Figure 1 for Fine-tune the pretrained ATST model for sound event detection

Figure 2 for Fine-tune the pretrained ATST model for sound event detection

Figure 3 for Fine-tune the pretrained ATST model for sound event detection

Figure 4 for Fine-tune the pretrained ATST model for sound event detection

Share this with someone who'll enjoy it:

Abstract:Sound event detection (SED) often suffers from the data deficiency problem. The recent baseline system in the DCASE2023 challenge task 4 leverages the large pretrained self-supervised learning (SelfSL) models to mitigate such restriction, where the pretrained models help to produce more discriminative features for SED. However, the pretrained models are regarded as a frozen feature extractor in the challenge baseline system and most of the challenge submissions, and fine-tuning of the pretrained models has been rarely studied. In this work, we study the fine-tuning method of the pretrained models for SED. We first introduce ATST-Frame, our newly proposed SelfSL model, to the SED system. ATST-Frame was especially designed for learning frame-level representations of audio signals and obtained state-of-the-art (SOTA) performances on a series of downstream tasks. We then propose a fine-tuning method for ATST-Frame using both (in-domain) unlabelled and labelled SED data. Our experiments show that, the proposed method overcomes the overfitting problem when fine-tuning the large pretrained network, and our SED system obtains new SOTA results of 0.587/0.812 PSDS1/PSDS2 scores on the DCASE challenge task 4 dataset.

* 5 pages, 3 figures, this paper is submitted to ICASSP 2024

View paper on

Share this with someone who'll enjoy it:

Title:Fine-tune the pretrained ATST model for sound event detection

Paper and Code