Abstract:Explanations obtained from transformer-based architectures in the form of raw attention, can be seen as a class-agnostic saliency map. Additionally, attention-based pooling serves as a form of masking the in feature space. Motivated by this observation, we design an attention-based pooling mechanism intended to replace Global Average Pooling (GAP) at inference. This mechanism, called Cross-Attention Stream (CA-Stream), comprises a stream of cross attention blocks interacting with features at different network depths. CA-Stream enhances interpretability in models, while preserving recognition performance.
Abstract:Methods based on class activation maps (CAM) provide a simple mechanism to interpret predictions of convolutional neural networks by using linear combinations of feature maps as saliency maps. By contrast, masking-based methods optimize a saliency map directly in the image space or learn it by training another network on additional data. In this work we introduce Opti-CAM, combining ideas from CAM-based and masking-based approaches. Our saliency map is a linear combination of feature maps, where weights are optimized per image such that the logit of the masked image for a given class is maximized. We also fix a fundamental flaw in two of the most common evaluation metrics of attribution methods. On several datasets, Opti-CAM largely outperforms other CAM-based approaches according to the most relevant classification metrics. We provide empirical evidence supporting that localization and classifier interpretability are not necessarily aligned.
Abstract:Bone Age Assessment (BAA) is a task performed by radiologists to diagnose abnormal growth in a child. In manual approaches, radiologists take into account different identity markers when calculating bone age, i.e., chronological age and gender. However, the current automated Bone Age Assessment methods do not completely exploit the information present in the patient's metadata. With this lack of available methods as motivation, we present SIMBA: Specific Identity Markers for Bone Age Assessment. SIMBA is a novel approach for the task of BAA based on the use of identity markers. For this purpose, we build upon the state-of-the-art model, fusing the information present in the identity markers with the visual features created from the original hand radiograph. We then use this robust representation to estimate the patient's relative bone age: the difference between chronological age and bone age. We validate SIMBA on the Radiological Hand Pose Estimation dataset and find that it outperforms previous state-of-the-art methods. SIMBA sets a trend of a new wave of Computer-aided Diagnosis methods that incorporate all of the data that is available regarding a patient. To promote further research in this area and ensure reproducibility we will provide the source code as well as the pre-trained models of SIMBA.