School of Biomedical Engineering and Imaging Sciences, King's College London, U.K
Abstract:The difference between the chronological and biological brain age of a subject can be an important biomarker for neurodegenerative diseases, thus brain age estimation can be crucial in clinical settings. One way to incorporate multimodal information into this estimation is through population graphs, which combine various types of imaging data and capture the associations among individuals within a population. In medical imaging, population graphs have demonstrated promising results, mostly for classification tasks. In most cases, the graph structure is pre-defined and remains static during training. However, extracting population graphs is a non-trivial task and can significantly impact the performance of Graph Neural Networks (GNNs), which are sensitive to the graph structure. In this work, we highlight the importance of a meaningful graph construction and experiment with different population-graph construction methods and their effect on GNN performance on brain age estimation. We use the homophily metric and graph visualizations to gain valuable quantitative and qualitative insights on the extracted graph structures. For the experimental evaluation, we leverage the UK Biobank dataset, which offers many imaging and non-imaging phenotypes. Our results indicate that architectures highly sensitive to the graph structure, such as Graph Convolutional Network (GCN) and Graph Attention Network (GAT), struggle with low homophily graphs, while other architectures, such as GraphSage and Chebyshev, are more robust across different homophily ratios. We conclude that static graph construction approaches are potentially insufficient for the task of brain age estimation and make recommendations for alternative research directions.
Abstract:Brain age estimation is clinically important as it can provide valuable information in the context of neurodegenerative diseases such as Alzheimer's. Population graphs, which include multimodal imaging information of the subjects along with the relationships among the population, have been used in literature along with Graph Convolutional Networks (GCNs) and have proved beneficial for a variety of medical imaging tasks. A population graph is usually static and constructed manually using non-imaging information. However, graph construction is not a trivial task and might significantly affect the performance of the GCN, which is inherently very sensitive to the graph structure. In this work, we propose a framework that learns a population graph structure optimized for the downstream task. An attention mechanism assigns weights to a set of imaging and non-imaging features (phenotypes), which are then used for edge extraction. The resulting graph is used to train the GCN. The entire pipeline can be trained end-to-end. Additionally, by visualizing the attention weights that were the most important for the graph construction, we increase the interpretability of the graph. We use the UK Biobank, which provides a large variety of neuroimaging and non-imaging phenotypes, to evaluate our method on brain age regression and classification. The proposed method outperforms competing static graph approaches and other state-of-the-art adaptive methods. We further show that the assigned attention scores indicate that there are both imaging and non-imaging phenotypes that are informative for brain age estimation and are in agreement with the relevant literature.
Abstract:Convolutional neural networks (CNNs) are increasingly being used to automate the segmentation of brain structures in magnetic resonance (MR) images for research studies. In other applications, CNN models have been shown to exhibit bias against certain demographic groups when they are under-represented in the training sets. In this work, we investigate whether CNN models for brain MR segmentation have the potential to contain sex or race bias when trained with imbalanced training sets. We train multiple instances of the FastSurferCNN model using different levels of sex imbalance in white subjects. We evaluate the performance of these models separately for white male and white female test sets to assess sex bias, and furthermore evaluate them on black male and black female test sets to assess potential racial bias. We find significant sex and race bias effects in segmentation model performance. The biases have a strong spatial component, with some brain regions exhibiting much stronger bias than others. Overall, our results suggest that race bias is more significant than sex bias. Our study demonstrates the importance of considering race and sex balance when forming training sets for CNN-based brain MR segmentation, to avoid maintaining or even exacerbating existing health inequalities through biased research study findings.
Abstract:Brain aging, and more specifically the difference between the chronological and the biological age of a person, may be a promising biomarker for identifying neurodegenerative diseases. For this purpose accurate prediction is important but the localisation of the areas that play a significant role in the prediction is also crucial, in order to gain clinicians' trust and reassurance about the performance of a prediction model. Most interpretability methods are focused on classification tasks and cannot be directly transferred to regression tasks. In this study, we focus on the task of brain age regression from 3D brain Magnetic Resonance (MR) images using a Convolutional Neural Network, termed prediction model. We interpret its predictions by extracting importance maps, which discover the parts of the brain that are the most important for brain age. In order to do so, we assume that voxels that are not useful for the regression are resilient to noise addition. We implement a noise model which aims to add as much noise as possible to the input without harming the performance of the prediction model. We average the importance maps of the subjects and end up with a population-based importance map, which displays the regions of the brain that are influential for the task. We test our method on 13,750 3D brain MR images from the UK Biobank, and our findings are consistent with the existing neuropathology literature, highlighting that the hippocampus and the ventricles are the most relevant regions for brain aging.
Abstract:Brain age estimation from Magnetic Resonance Images (MRI) derives the difference between a subject's biological brain age and their chronological age. This is a potential biomarker for neurodegeneration, e.g. as part of Alzheimer's disease. Early detection of neurodegeneration manifesting as a higher brain age can potentially facilitate better medical care and planning for affected individuals. Many studies have been proposed for the prediction of chronological age from brain MRI using machine learning and specifically deep learning techniques. Contrary to most studies, which use the whole brain volume, in this study, we develop a new deep learning approach that uses 3D patches of the brain as well as convolutional neural networks (CNNs) to develop a localised brain age estimator. In this way, we can obtain a visualization of the regions that play the most important role for estimating brain age, leading to more anatomically driven and interpretable results, and thus confirming relevant literature which suggests that the ventricles and the hippocampus are the areas that are most informative. In addition, we leverage this knowledge in order to improve the overall performance on the task of age estimation by combining the results of different patches using an ensemble method, such as averaging or linear regression. The network is trained on the UK Biobank dataset and the method achieves state-of-the-art results with a Mean Absolute Error of 2.46 years for purely regional estimates, and 2.13 years for an ensemble of patches before bias correction, while 1.96 years after bias correction.
Abstract:Medical imaging is a domain which suffers from a paucity of manually annotated data for the training of learning algorithms. Manually delineating pathological regions at a pixel level is a time consuming process, especially in 3D images, and often requires the time of a trained expert. As a result, supervised machine learning solutions must make do with small amounts of labelled data, despite there often being additional unlabelled data available. Whilst of less value than labelled images, these unlabelled images can contain potentially useful information. In this paper we propose combining both labelled and unlabelled data within a GAN framework, before using the resulting network to produce images for use when training a segmentation network. We explore the task of deep grey matter multi-class segmentation in an AD dataset and show that the proposed method leads to a significant improvement in segmentation results, particularly in cases where the amount of labelled data is restricted. We show that this improvement is largely driven by a greater ability to segment the structures known to be the most affected by AD, thereby demonstrating the benefits of exposing the system to more examples of pathological anatomical variation. We also show how a shift in domain of the training data from young and healthy towards older and more pathological examples leads to better segmentations of the latter cases, and that this leads to a significant improvement in the ability for the computed segmentations to stratify cases of AD.
Abstract:One of the biggest issues facing the use of machine learning in medical imaging is the lack of availability of large, labelled datasets. The annotation of medical images is not only expensive and time consuming but also highly dependent on the availability of expert observers. The limited amount of training data can inhibit the performance of supervised machine learning algorithms which often need very large quantities of data on which to train to avoid overfitting. So far, much effort has been directed at extracting as much information as possible from what data is available. Generative Adversarial Networks (GANs) offer a novel way to unlock additional information from a dataset by generating synthetic samples with the appearance of real images. This paper demonstrates the feasibility of introducing GAN derived synthetic data to the training datasets in two brain segmentation tasks, leading to improvements in Dice Similarity Coefficient (DSC) of between 1 and 5 percentage points under different conditions, with the strongest effects seen fewer than ten training image stacks are available.