Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ilyass Moummad

Can Masked Autoencoders Also Listen to Birds?

Apr 17, 2025

Lukas Rauch, Ilyass Moummad, René Heinrich, Alexis Joly, Bernhard Sick, Christoph Scholz

Abstract:Masked Autoencoders (MAEs) pretrained on AudioSet fail to capture the fine-grained acoustic characteristics of specialized domains such as bioacoustic monitoring. Bird sound classification is critical for assessing environmental health, yet general-purpose models inadequately address its unique acoustic challenges. To address this, we introduce Bird-MAE, a domain-specialized MAE pretrained on the large-scale BirdSet dataset. We explore adjustments to pretraining, fine-tuning and utilizing frozen representations. Bird-MAE achieves state-of-the-art results across all BirdSet downstream tasks, substantially improving multi-label classification performance compared to the general-purpose Audio-MAE baseline. Additionally, we propose prototypical probing, a parameter-efficient method for leveraging MAEs' frozen representations. Bird-MAE's prototypical probes outperform linear probing by up to 37\% in MAP and narrow the gap to fine-tuning to approximately 3\% on average on BirdSet.

Via

Access Paper or Ask Questions

Domain-Invariant Representation Learning of Bird Sounds

Sep 16, 2024

Ilyass Moummad, Romain Serizel, Emmanouil Benetos, Nicolas Farrugia

Figure 1 for Domain-Invariant Representation Learning of Bird Sounds

Figure 2 for Domain-Invariant Representation Learning of Bird Sounds

Abstract:Passive acoustic monitoring (PAM) is crucial for bioacoustic research, enabling non-invasive species tracking and biodiversity monitoring. Citizen science platforms like Xeno-Canto provide large annotated datasets from focal recordings, where the target species is intentionally recorded. However, PAM requires monitoring in passive soundscapes, creating a domain shift between focal and passive recordings, which challenges deep learning models trained on focal recordings. To address this, we leverage supervised contrastive learning to improve domain generalization in bird sound classification, enforcing domain invariance across same-class examples from different domains. We also propose ProtoCLR (Prototypical Contrastive Learning of Representations), which reduces the computational complexity of the SupCon loss by comparing examples to class prototypes instead of pairwise comparisons. Additionally, we present a new few-shot classification benchmark based on BirdSet, a large-scale bird sound dataset, and demonstrate the effectiveness of our approach in achieving strong transfer performance.

Via

Access Paper or Ask Questions

Acoustic identification of individual animals with hierarchical contrastive learning

Sep 13, 2024

Ines Nolasco, Ilyass Moummad, Dan Stowell, Emmanouil Benetos

Figure 1 for Acoustic identification of individual animals with hierarchical contrastive learning

Figure 2 for Acoustic identification of individual animals with hierarchical contrastive learning

Figure 3 for Acoustic identification of individual animals with hierarchical contrastive learning

Figure 4 for Acoustic identification of individual animals with hierarchical contrastive learning

Abstract:Acoustic identification of individual animals (AIID) is closely related to audio-based species classification but requires a finer level of detail to distinguish between individual animals within the same species. In this work, we frame AIID as a hierarchical multi-label classification task and propose the use of hierarchy-aware loss functions to learn robust representations of individual identities that maintain the hierarchical relationships among species and taxa. Our results demonstrate that hierarchical embeddings not only enhance identification accuracy at the individual level but also at higher taxonomic levels, effectively preserving the hierarchical structure in the learned representations. By comparing our approach with non-hierarchical models, we highlight the advantage of enforcing this structure in the embedding space. Additionally, we extend the evaluation to the classification of novel individual classes, demonstrating the potential of our method in open-set classification scenarios.

* Under review; Submitted to ICASSP 2025

Via

Access Paper or Ask Questions

Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds

Mar 14, 2024

Ilyass Moummad, Nicolas Farrugia, Romain Serizel, Jeremy Froidevaux, Vincent Lostanlen

Abstract:Multi-label imbalanced classification poses a significant challenge in machine learning, particularly evident in bioacoustics where animal sounds often co-occur, and certain sounds are much less frequent than others. This paper focuses on the specific case of classifying anuran species sounds using the dataset AnuraSet, that contains both class imbalance and multi-label examples. To address these challenges, we introduce Mixture of Mixups (Mix2), a framework that leverages mixing regularization methods Mixup, Manifold Mixup, and MultiMix. Experimental results show that these methods, individually, may lead to suboptimal results; however, when applied randomly, with one selected at each training iteration, they prove effective in addressing the mentioned challenges, particularly for rare classes with few occurrences. Further analysis reveals that Mix2 is also proficient in classifying sounds across various levels of class co-occurrences.

Via

Access Paper or Ask Questions

Self-Supervised Learning for Few-Shot Bird Sound Classification

Jan 16, 2024

Ilyass Moummad, Romain Serizel, Nicolas Farrugia

Abstract:Self-supervised learning (SSL) in audio holds significant potential across various domains, particularly in situations where abundant, unlabeled data is readily available at no cost. This is particularly pertinent in bioacoustics, where biologists routinely collect extensive sound datasets from the natural environment. In this study, we demonstrate that SSL is capable of acquiring meaningful representations of bird sounds from audio recordings without the need for annotations. Our experiments showcase that these learned representations exhibit the capacity to generalize to new bird species in few-shot learning (FSL) scenarios. Additionally, we show that selecting windows with high bird activation for self-supervised learning, using a pretrained audio neural network, significantly enhances the quality of the learned representations.

Via

Access Paper or Ask Questions

Regularized Contrastive Pre-training for Few-shot Bioacoustic Sound Detection

Sep 16, 2023

Ilyass Moummad, Romain Serizel, Nicolas Farrugia

Abstract:Bioacoustic sound event detection allows for better understanding of animal behavior and for better monitoring biodiversity using audio. Deep learning systems can help achieve this goal, however it is difficult to acquire sufficient annotated data to train these systems from scratch. To address this limitation, the Detection and Classification of Acoustic Scenes and Events (DCASE) community has recasted the problem within the framework of few-shot learning and organize an annual challenge for learning to detect animal sounds from only five annotated examples. In this work, we regularize supervised contrastive pre-training to learn features that can transfer well on new target tasks with animal sounds unseen during training, achieving a high F-score of 61.52%(0.48) when no feature adaptation is applied, and an F-score of 68.19%(0.75) when we further adapt the learned features for each new target task. This work aims to lower the entry bar to few-shot bioacoustic sound event detection by proposing a simple and yet effective framework for this task, by also providing open-source code.

Via

Access Paper or Ask Questions

Pretraining Representations for Bioacoustic Few-shot Detection using Supervised Contrastive Learning

Sep 02, 2023

Ilyass Moummad, Romain Serizel, Nicolas Farrugia

Figure 1 for Pretraining Representations for Bioacoustic Few-shot Detection using Supervised Contrastive Learning

Figure 2 for Pretraining Representations for Bioacoustic Few-shot Detection using Supervised Contrastive Learning

Figure 3 for Pretraining Representations for Bioacoustic Few-shot Detection using Supervised Contrastive Learning

Abstract:Deep learning has been widely used recently for sound event detection and classification. Its success is linked to the availability of sufficiently large datasets, possibly with corresponding annotations when supervised learning is considered. In bioacoustic applications, most tasks come with few labelled training data, because annotating long recordings is time consuming and costly. Therefore supervised learning is not the best suited approach to solve bioacoustic tasks. The bioacoustic community recasted the problem of sound event detection within the framework of few-shot learning, i.e. training a system with only few labeled examples. The few-shot bioacoustic sound event detection task in the DCASE challenge focuses on detecting events in long audio recordings given only five annotated examples for each class of interest. In this paper, we show that learning a rich feature extractor from scratch can be achieved by leveraging data augmentation using a supervised contrastive learning framework. We highlight the ability of this framework to transfer well for five-shot event detection on previously unseen classes in the training data. We obtain an F-score of 63.46\% on the validation set and 42.7\% on the test set, ranking second in the DCASE challenge. We provide an ablation study for the critical choices of data augmentation techniques as well as for the learning strategy applied on the training set.

Via

Access Paper or Ask Questions

Supervised Contrastive Learning for Respiratory Sound Classification

Oct 27, 2022

Ilyass Moummad, Nicolas Farrugia

Figure 1 for Supervised Contrastive Learning for Respiratory Sound Classification

Figure 2 for Supervised Contrastive Learning for Respiratory Sound Classification

Figure 3 for Supervised Contrastive Learning for Respiratory Sound Classification

Figure 4 for Supervised Contrastive Learning for Respiratory Sound Classification

Abstract:Automatic respiratory sound classification using machine learning is a challenging task, due to large biological variability, imbalanced datasets, as well as a diversity in recording techniques used to capture the respiration signal. While datasets with annotated respiration cycles have been proposed, methods based on supervised learning using annotations only may be limited in their generalization capability. In this study, we address this issue using supervised contrastive learning, relying both on respiration cycle annotations and a spectrogram frequency and temporal masking method SpecAugment to generate augmented samples for representation learning with a contrastive loss. We demonstrate that such an approach can outperform supervised learning using experiments on a convolutional neural network trained from scratch, achieving the new state of the art. Our work shows the potential of supervised contrastive learning in imbalanced and noisy settings. Our code is released at https://github.com/ilyassmoummad/scl_icbhi2017

Via

Access Paper or Ask Questions