Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stephan Chalup

Diffusion in Zero-Shot Learning for Environmental Audio

Dec 04, 2024

Ysobel Sims, Stephan Chalup, Alexandre Mendes

Abstract:Zero-shot learning enables models to generalize to unseen classes by leveraging semantic information, bridging the gap between training and testing sets with non-overlapping classes. While much research has focused on zero-shot learning in computer vision, the application of these methods to environmental audio remains underexplored, with poor performance in existing studies. Generative methods, which have demonstrated success in computer vision, are notably absent from environmental audio zero-shot learning, where classification-based approaches dominate. To address this gap, this work investigates generative methods for zero-shot learning in environmental audio. Two successful generative models from computer vision are adapted: a cross-aligned and distribution-aligned variational autoencoder (CADA-VAE) and a leveraging invariant side generative adversarial network (LisGAN). Additionally, a novel diffusion model conditioned on class auxiliary data is introduced. The diffusion model generates synthetic data for unseen classes, which is combined with seen-class data to train a classifier. Experiments are conducted on two environmental audio datasets, ESC-50 and FSC22. Results show that the diffusion model significantly outperforms all baseline methods, achieving more than 25% higher accuracy on the ESC-50 test partition. This work establishes the diffusion model as a promising generative approach for zero-shot learning and introduces the first benchmark of generative methods for environmental audio zero-shot learning, providing a foundation for future research in the field. Code is provided at https://github.com/ysims/ZeroDiffusion for the novel ZeroDiffusion method.

* This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

Synthetic Data Generation and Deep Learning for the Topological Analysis of 3D Data

Sep 29, 2023

Dylan Peek, Matt P. Skerritt, Stephan Chalup

Abstract:This research uses deep learning to estimate the topology of manifolds represented by sparse, unordered point cloud scenes in 3D. A new labelled dataset was synthesised to train neural networks and evaluate their ability to estimate the genus of these manifolds. This data used random homeomorphic deformations to provoke the learning of visual topological features. We demonstrate that deep learning models could extract these features and discuss some advantages over existing topological data analysis tools that are based on persistent homology. Semantic segmentation was used to provide additional geometric information in conjunction with topological labels. Common point cloud multi-layer perceptron and transformer networks were both used to compare the viability of these methods. The experimental results of this pilot study support the hypothesis that, with the aid of sophisticated synthetic data generation, neural networks can perform segmentation-based topological data analysis. While our study focused on simulated data, the accuracy achieved suggests a potential for future applications using real data.

* 8 pages, 7 figures, Dicta 2023

Via

Access Paper or Ask Questions

Topology Estimation of Simulated 4D Image Data by Combining Downscaling and Convolutional Neural Networks

Jun 26, 2023

Khalil Mathieu Hannouch, Stephan Chalup

Abstract:Four-dimensional image-type data can quickly become prohibitively large, and it may not be feasible to directly apply methods, such as persistent homology or convolutional neural networks, to determine the topological characteristics of these data because they can encounter complexity issues. This study aims to determine the Betti numbers of large four-dimensional image-type data. The experiments use synthetic data, and demonstrate that it is possible to circumvent these issues by applying downscaling methods to the data prior to training a convolutional neural network, even when persistent homology software indicates that downscaling can significantly alter the homology of the training data. When provided with downscaled test data, the neural network can estimate the Betti numbers of the original samples with reasonable accuracy.

* 22 pages, 4 figures, 7 tables, 1 appendix

Via

Access Paper or Ask Questions