Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Elia Formisano

Audio-Language Datasets of Scenes and Events: A Survey

Jul 09, 2024

Gijs Wijngaard, Elia Formisano, Michele Esposito, Michel Dumontier

Figure 1 for Audio-Language Datasets of Scenes and Events: A Survey

Figure 2 for Audio-Language Datasets of Scenes and Events: A Survey

Figure 3 for Audio-Language Datasets of Scenes and Events: A Survey

Figure 4 for Audio-Language Datasets of Scenes and Events: A Survey

Abstract:Audio-language models (ALMs) process sounds to provide a linguistic description of sound-producing events and scenes. Recent advances in computing power and dataset creation have led to significant progress in this domain. This paper surveys existing datasets used for training audio-language models, emphasizing the recent trend towards using large, diverse datasets to enhance model performance. Key sources of these datasets include the Freesound platform and AudioSet that have contributed to the field's rapid growth. Although prior surveys primarily address techniques and training details, this survey categorizes and evaluates a wide array of datasets, addressing their origins, characteristics, and use cases. It also performs a data leak analysis to ensure dataset integrity and mitigate bias between datasets. This survey was conducted by analyzing research papers up to and including December 2023, and does not contain any papers after that period.

Via

Access Paper or Ask Questions

ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds

Mar 27, 2024

Gijs Wijngaard, Elia Formisano, Bruno L. Giordano, Michel Dumontier

Figure 1 for ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds

Figure 2 for ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds

Figure 3 for ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds

Figure 4 for ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds

Abstract:Automated Audio Captioning is a multimodal task that aims to convert audio content into natural language. The assessment of audio captioning systems is typically based on quantitative metrics applied to text data. Previous studies have employed metrics derived from machine translation and image captioning to evaluate the quality of generated audio captions. Drawing inspiration from auditory cognitive neuroscience research, we introduce a novel metric approach -- Audio Captioning Evaluation on Semantics of Sound (ACES). ACES takes into account how human listeners parse semantic information from sounds, providing a novel and comprehensive evaluation perspective for automated audio captioning systems. ACES combines semantic similarities and semantic entity labeling. ACES outperforms similar automated audio captioning metrics on the Clotho-Eval FENSE benchmark in two evaluation categories.

Via

Access Paper or Ask Questions

How can deep learning advance computational modeling of sensory information processing?

Sep 25, 2018

Jessica A. F. Thompson, Yoshua Bengio, Elia Formisano, Marc Schönwiesner

Abstract:Deep learning, computational neuroscience, and cognitive science have overlapping goals related to understanding intelligence such that perception and behaviour can be simulated in computational systems. In neuroimaging, machine learning methods have been used to test computational models of sensory information processing. Recently, these model comparison techniques have been used to evaluate deep neural networks (DNNs) as models of sensory information processing. However, the interpretation of such model evaluations is muddied by imprecise statistical conclusions. Here, we make explicit the types of conclusions that can be drawn from these existing model comparison techniques and how these conclusions change when the model in question is a DNN. We discuss how DNNs are amenable to new model comparison techniques that allow for stronger conclusions to be made about the computational mechanisms underlying sensory information processing.

* Presented at MLINI-2016 workshop, 2016 (arXiv:1701.01437)

Via

Access Paper or Ask Questions