Abstract:While deep learning-based Alzheimer's disease (AD) diagnosis has recently made significant advancements, particularly in predicting the conversion of mild cognitive impairment (MCI) to AD based on MRI images, there remains a critical gap in research regarding the group robustness of the diagnosis. Although numerous studies pointed out that deep learning-based classifiers may exhibit poor performance in certain groups by relying on unimportant attributes, this issue has been largely overlooked in the early diagnosis of MCI to AD conversion. In this paper, we present the first comprehensive investigation of the group robustness in the early diagnosis of MCI to AD conversion using MRI images, focusing on disparities in accuracy between groups, specifically sMCI and pMCI individuals divided by age. Our experiments reveal that standard classifiers consistently underperform for certain groups across different architectures, highlighting the need for more tailored approaches. To address this, we propose a novel method, dubbed DEAL (DEcoupled classifier with Adaptive Linear modulation), comprising two key components: (1) a linear modulation of features from the penultimate layer, incorporating easily obtainable age and cognitive indicative tabular features, and (2) a decoupled classifier that provides more tailored decision boundaries for each group, further improving performance. Through extensive experiments and evaluations across different architectures, we demonstrate the efficacy of DEAL in improving the group robustness of the MCI to AD conversion prediction.
Abstract:A classifier may depend on incidental features stemming from a strong correlation between the feature and the classification target in the training dataset. Recently, Last Layer Retraining (LLR) with group-balanced datasets is known to be efficient in mitigating the spurious correlation of classifiers. However, the acquisition of group-balanced datasets is costly, which hinders the applicability of the LLR method. In this work, we propose to perform LLR based on text datasets built with large language models for a general image classifier. We demonstrate that text can be a proxy for its corresponding image beyond the image-text joint embedding space, such as CLIP. Based on this, we use generated texts to train the final layer in the embedding space of the arbitrary image classifier. In addition, we propose a method of filtering the generated words to get rid of noisy, imprecise words, which reduces the effort of inspecting each word. We dub these procedures as TLDR (\textbf{T}ext-based \textbf{L}ast layer retraining for \textbf{D}ebiasing image classifie\textbf{R}s) and show our method achieves the performance that is comparable to those of the LLR methods that also utilize group-balanced image dataset for retraining. Furthermore, TLDR outperforms other baselines that involve training the last linear layer without a group annotated dataset.