Abstract:An increasing array of biomedical and computer vision applications requires the predictive modeling of complex data, for example images and shapes. The main challenge when predicting such objects lies in the fact that they do not comply to the assumptions of Euclidean geometry. Rather, they occupy non-linear spaces, a.k.a. manifolds, where it is difficult to define concepts such as coordinates, vectors and expected values. In this work, we construct a non-parametric predictive methodology for manifold-valued objects, based on a distance modification of the Random Forest algorithm. Our method is versatile and can be applied both in cases where the response space is a well-defined manifold, but also when such knowledge is not available. Model fitting and prediction phases only require the definition of a suitable distance function for the observed responses. We validate our methodology using simulations and apply it on a series of illustrative image completion applications, showcasing superior predictive performance, compared to various established regression methods.
Abstract:Machine learning analysis of neuroimaging data can accurately predict chronological age in healthy people and deviations from healthy brain ageing have been associated with cognitive impairment and disease. Here we sought to further establish the credentials of "brain-predicted age" as a biomarker of individual differences in the brain ageing process, using a predictive modelling approach based on deep learning, and specifically convolutional neural networks (CNN), and applied to both pre-processed and raw T1-weighted MRI data. Firstly, we aimed to demonstrate the accuracy of CNN brain-predicted age using a large dataset of healthy adults (N = 2001). Next, we sought to establish the heritability of brain-predicted age using a sample of monozygotic and dizygotic female twins (N = 62). Thirdly, we examined the test-retest and multi-centre reliability of brain-predicted age using two samples (within-scanner N = 20; between-scanner N = 11). CNN brain-predicted ages were generated and compared to a Gaussian Process Regression (GPR) approach, on all datasets. Input data were grey matter (GM) or white matter (WM) volumetric maps generated by Statistical Parametric Mapping (SPM) or raw data. Brain-predicted age represents an accurate, highly reliable and genetically-valid phenotype, that has potential to be used as a biomarker of brain ageing. Moreover, age predictions can be accurately generated on raw T1-MRI data, substantially reducing computation time for novel data, bringing the process closer to giving real-time information on brain health in clinical settings.
Abstract:We propose a non-parametric regression methodology, Random Forests on Distance Matrices (RFDM), for detecting genetic variants associated to quantitative phenotypes representing the human brain's structure or function, and obtained using neuroimaging techniques. RFDM, which is an extension of decision forests, requires a distance matrix as response that encodes all pair-wise phenotypic distances in the random sample. We discuss ways to learn such distances directly from the data using manifold learning techniques, and how to define such distances when the phenotypes are non-vectorial objects such as brain connectivity networks. We also describe an extension of RFDM to detect espistatic effects while keeping the computational complexity low. Extensive simulation results and an application to an imaging genetics study of Alzheimer's Disease are presented and discussed.