Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bohang Sun

School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China

Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers

Jan 02, 2025

Bohang Sun, Pietro Liò

Figure 1 for Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers

Figure 2 for Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers

Figure 3 for Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers

Figure 4 for Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers

Abstract:In this study, we introduce the Multi-Head Explainer (MHEX), a versatile and modular framework that enhances both the explainability and accuracy of Convolutional Neural Networks (CNNs) and Transformer-based models. MHEX consists of three core components: an Attention Gate that dynamically highlights task-relevant features, Deep Supervision that guides early layers to capture fine-grained details pertinent to the target class, and an Equivalent Matrix that unifies refined local and global representations to generate comprehensive saliency maps. Our approach demonstrates superior compatibility, enabling effortless integration into existing residual networks like ResNet and Transformer architectures such as BERT with minimal modifications. Extensive experiments on benchmark datasets in medical imaging and text classification show that MHEX not only improves classification accuracy but also produces highly interpretable and detailed saliency scores.

Via

Access Paper or Ask Questions

FilterViT and DropoutViT: Lightweight Vision Transformer Models for Efficient Attention Mechanisms

Oct 30, 2024

Bohang Sun

Abstract:In this study, we introduce FilterViT, an enhanced version of MobileViT, which leverages an attention-based mechanism for early-stage downsampling. Traditional QKV operations on high-resolution feature maps are computationally intensive due to the abundance of tokens. To address this, we propose a filter attention mechanism using a convolutional neural network (CNN) to generate an importance mask, focusing attention on key image regions. The method significantly reduces computational complexity while maintaining interpretability, as it highlights essential image areas. Experimental results show that FilterViT achieves substantial gains in both efficiency and accuracy compared to other models. We also introduce DropoutViT, a variant that uses a stochastic approach for pixel selection, further enhancing robustness.

Via

Access Paper or Ask Questions

Audio-to-Score Conversion Model Based on Whisper methodology

Oct 22, 2024

Hongyao Zhang, Bohang Sun

Figure 1 for Audio-to-Score Conversion Model Based on Whisper methodology

Figure 2 for Audio-to-Score Conversion Model Based on Whisper methodology

Figure 3 for Audio-to-Score Conversion Model Based on Whisper methodology

Figure 4 for Audio-to-Score Conversion Model Based on Whisper methodology

Abstract:This thesis develops a Transformer model based on Whisper, which extracts melodies and chords from music audio and records them into ABC notation. A comprehensive data processing workflow is customized for ABC notation, including data cleansing, formatting, and conversion, and a mutation mechanism is implemented to increase the diversity and quality of training data. This thesis innovatively introduces the "Orpheus' Score", a custom notation system that converts music information into tokens, designs a custom vocabulary library, and trains a corresponding custom tokenizer. Experiments show that compared to traditional algorithms, the model has significantly improved accuracy and performance. While providing a convenient audio-to-score tool for music enthusiasts, this work also provides new ideas and tools for research in music information processing.

* 5 pages, 7 figures

Via

Access Paper or Ask Questions