Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michele Esposito

AudSemThinker: Enhancing Audio-Language Models through Reasoning over Semantics of Sound

May 20, 2025

Gijs Wijngaard, Elia Formisano, Michele Esposito, Michel Dumontier

Abstract:Audio-language models have shown promising results in various sound understanding tasks, yet they remain limited in their ability to reason over the fine-grained semantics of sound. In this paper, we present AudSemThinker, a model whose reasoning is structured around a framework of auditory semantics inspired by human cognition. To support this, we introduce AudSem, a novel dataset specifically curated for semantic descriptor reasoning in audio-language models. AudSem addresses the persistent challenge of data contamination in zero-shot evaluations by providing a carefully filtered collection of audio samples paired with captions generated through a robust multi-stage pipeline. Our experiments demonstrate that AudSemThinker outperforms state-of-the-art models across multiple training settings, highlighting its strength in semantic audio reasoning. Both AudSemThinker and the AudSem dataset are released publicly.

Via

Access Paper or Ask Questions

Audio-Language Datasets of Scenes and Events: A Survey

Jul 09, 2024

Gijs Wijngaard, Elia Formisano, Michele Esposito, Michel Dumontier

Figure 1 for Audio-Language Datasets of Scenes and Events: A Survey

Figure 2 for Audio-Language Datasets of Scenes and Events: A Survey

Figure 3 for Audio-Language Datasets of Scenes and Events: A Survey

Figure 4 for Audio-Language Datasets of Scenes and Events: A Survey

Abstract:Audio-language models (ALMs) process sounds to provide a linguistic description of sound-producing events and scenes. Recent advances in computing power and dataset creation have led to significant progress in this domain. This paper surveys existing datasets used for training audio-language models, emphasizing the recent trend towards using large, diverse datasets to enhance model performance. Key sources of these datasets include the Freesound platform and AudioSet that have contributed to the field's rapid growth. Although prior surveys primarily address techniques and training details, this survey categorizes and evaluates a wide array of datasets, addressing their origins, characteristics, and use cases. It also performs a data leak analysis to ensure dataset integrity and mitigate bias between datasets. This survey was conducted by analyzing research papers up to and including December 2023, and does not contain any papers after that period.

Via

Access Paper or Ask Questions

Hyper Association Graph Matching with Uncertainty Quantification for Coronary Artery Semantic Labeling

Aug 20, 2023

Chen Zhao, Michele Esposito, Zhihui Xu, Weihua Zhou

Figure 1 for Hyper Association Graph Matching with Uncertainty Quantification for Coronary Artery Semantic Labeling

Figure 2 for Hyper Association Graph Matching with Uncertainty Quantification for Coronary Artery Semantic Labeling

Figure 3 for Hyper Association Graph Matching with Uncertainty Quantification for Coronary Artery Semantic Labeling

Figure 4 for Hyper Association Graph Matching with Uncertainty Quantification for Coronary Artery Semantic Labeling

Abstract:Coronary artery disease (CAD) is one of the primary causes leading to death worldwide. Accurate extraction of individual arterial branches on invasive coronary angiograms (ICA) is important for stenosis detection and CAD diagnosis. However, deep learning-based models face challenges in generating semantic segmentation for coronary arteries due to the morphological similarity among different types of coronary arteries. To address this challenge, we propose an innovative approach using the hyper association graph-matching neural network with uncertainty quantification (HAGMN-UQ) for coronary artery semantic labeling on ICAs. The graph-matching procedure maps the arterial branches between two individual graphs, so that the unlabeled arterial segments are classified by the labeled segments, and the coronary artery semantic labeling is achieved. By incorporating the anatomical structural loss and uncertainty, our model achieved an accuracy of 0.9345 for coronary artery semantic labeling with a fast inference speed, leading to an effective and efficient prediction in real-time clinical decision-making scenarios.

* 8 pages, 2 figures

Via

Access Paper or Ask Questions

AGMN: Association Graph-based Graph Matching Network for Coronary Artery Semantic Labeling on Invasive Coronary Angiograms

Jan 11, 2023

Chen Zhao, Zhihui Xu, Jingfeng Jiang, Michele Esposito, Drew Pienta, Guang-Uei Hung, Weihua Zhou

Figure 1 for AGMN: Association Graph-based Graph Matching Network for Coronary Artery Semantic Labeling on Invasive Coronary Angiograms

Figure 2 for AGMN: Association Graph-based Graph Matching Network for Coronary Artery Semantic Labeling on Invasive Coronary Angiograms

Figure 3 for AGMN: Association Graph-based Graph Matching Network for Coronary Artery Semantic Labeling on Invasive Coronary Angiograms

Figure 4 for AGMN: Association Graph-based Graph Matching Network for Coronary Artery Semantic Labeling on Invasive Coronary Angiograms

Abstract:Semantic labeling of coronary arterial segments in invasive coronary angiography (ICA) is important for automated assessment and report generation of coronary artery stenosis in the computer-aided diagnosis of coronary artery disease (CAD). Inspired by the training procedure of interventional cardiologists for interpreting the structure of coronary arteries, we propose an association graph-based graph matching network (AGMN) for coronary arterial semantic labeling. We first extract the vascular tree from invasive coronary angiography (ICA) and convert it into multiple individual graphs. Then, an association graph is constructed from two individual graphs where each vertex represents the relationship between two arterial segments. Using the association graph, the AGMN extracts the vertex features by the embedding module, aggregates the features from adjacent vertices and edges by graph convolution network, and decodes the features to generate the semantic mappings between arteries. By learning the mapping of arterial branches between two individual graphs, the unlabeled arterial segments are classified by the labeled segments to achieve semantic labeling. A dataset containing 263 ICAs was employed to train and validate the proposed model, and a five-fold cross-validation scheme was performed. Our AGMN model achieved an average accuracy of 0.8264, an average precision of 0.8276, an average recall of 0.8264, and an average F1-score of 0.8262, which significantly outperformed existing coronary artery semantic labeling methods. In conclusion, we have developed and validated a new algorithm with high accuracy, interpretability, and robustness for coronary artery semantic labeling on ICAs.

* 26 pages, 7 figures

Via

Access Paper or Ask Questions