Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:AuD-Former: A Hierarchical Transformer Network for Multimodal Audio-Based Disease Prediction

Oct 11, 2024

Jinjin Cai, Ruiqi Wang, Dezhong Zhao, Ziqin Yuan, Victoria McKenna, Aaron Friedman, Rachel Foot, Susan Storey, Ryan Boente, Sudip Vhaduri(+1 more)

Figure 1 for AuD-Former: A Hierarchical Transformer Network for Multimodal Audio-Based Disease Prediction

Figure 2 for AuD-Former: A Hierarchical Transformer Network for Multimodal Audio-Based Disease Prediction

Figure 3 for AuD-Former: A Hierarchical Transformer Network for Multimodal Audio-Based Disease Prediction

Figure 4 for AuD-Former: A Hierarchical Transformer Network for Multimodal Audio-Based Disease Prediction

Share this with someone who'll enjoy it:

Abstract:Audio-based disease prediction is emerging as a promising supplement to traditional medical diagnosis methods, facilitating early, convenient, and non-invasive disease detection and prevention. Multimodal fusion, which integrates features from various domains within or across bio-acoustic modalities, has proven effective in enhancing diagnostic performance. However, most existing methods in the field employ unilateral fusion strategies that focus solely on either intra-modal or inter-modal fusion. This approach limits the full exploitation of the complementary nature of diverse acoustic feature domains and bio-acoustic modalities. Additionally, the inadequate and isolated exploration of latent dependencies within modality-specific and modality-shared spaces curtails their capacity to manage the inherent heterogeneity in multimodal data. To fill these gaps, we propose AuD-Former, a hierarchical transformer network designed for general multimodal audio-based disease prediction. Specifically, we seamlessly integrate intra-modal and inter-modal fusion in a hierarchical manner and proficiently encode the necessary intra-modal and inter-modal complementary correlations, respectively. Comprehensive experiments demonstrate that AuD-Former achieves state-of-the-art performance in predicting three diseases: COVID-19, Parkinson's disease, and pathological dysarthria, showcasing its promising potential in a broad context of audio-based disease prediction tasks. Additionally, extensive ablation studies and qualitative analyses highlight the significant benefits of each main component within our model.

View paper on

Share this with someone who'll enjoy it:

Title:AuD-Former: A Hierarchical Transformer Network for Multimodal Audio-Based Disease Prediction

Paper and Code